AI & MACHINE LEARNING

LLM (Large Language Model)

Language model with billions to trillions of parameters, trained on massive text corpora via the Transformer architecture. Immediate ancestors: BERT (2018) and GPT-2 (2019). Milestones: GPT-3 (2020), instruction-tuned models (2022), multimodal models (2023+).

Extended definition

LLM (large language model) is a language model at the scale of billions to trillions of parameters, typically trained via the Transformer architecture on a massive text corpus (hundreds of billions to trillions of tokens) using a self-supervised next-token prediction objective. Immediate ancestors were BERT (2018, 110-340M parameters) and GPT-2 (2019, 1.5B). The turning point was GPT-3 (Brown et al., 2020, 175B), which demonstrated emergent capabilities of in-context learning and few-shot prompting without fine-tuning for each task. From 2022, instruction-tuned LLMs (such as ChatGPT, based on InstructGPT) consolidated the conversational interface. From 2023, multimodal models (GPT-4V, Gemini, Claude) integrated images, audio, and video. Bommasani et al. (2021) proposed the term foundation models for the broader category, capturing the generic nature of these systems as infrastructure on which specific applications are built.

When it applies

LLMs are appropriate in NLP tasks that benefit from broad world knowledge and generalization capacity: summarization of scientific literature in research, text classification in multiple categories, entity extraction in new domains without annotated datasets, response generation in conversational systems, academic translation, and writing assistance. In research, LLMs enable initial triage in systematic reviews (with subsequent human review), categorization of open-ended survey responses, and structured data extraction from literature. Responsible application requires explicit documentation of the model used, prompts employed, and human validation — editorial transparency is a growing requirement in journals.

When it does not apply

LLMs do not replace quantitative data analysis when the goal is a numerically precise answer — hallucinations are a documented problem in statistics, citations, and specific facts. They do not replace human peer review. They should not be used in high-stakes decisions (health, criminal, financial) without specialized human validation. They do not replace human-conducted bibliographic review in rigorous research — generating hallucinated citations is a frequent failure. They should not be used under strict NDA without guaranteed non-leakage (commercial LLMs store interactions by default, even with opt-out possible). In research involving patient data or sensitive intellectual property, local models (Llama, Mistral) are alternatives.

Applications by field

Research in humanities and social sciences: large-scale discourse analysis, thematic categorization, distant reading of historical text corpora. — Health: assistance in systematic review, medical record classification (with regulatory care), writing assistance for patients. — Computer science and engineering: code generation, debugging, technical documentation; IDE integration (GitHub Copilot, Cursor). — Education: adaptive tutoring, exercise generation, automated feedback — with vigilance over academic integrity.

Common pitfalls

The first pitfall is confusing fluency with factual correctness — LLMs produce fluent text on any topic, including topics on which they have no reliable information, with the same apparent confidence. The second is not verifying generated citations: hallucination of DOIs, authors, and titles is a documented problem requiring human validation before any editorial use. The third is treating LLM output as a primary source — output should be a starting point for verification, not a conclusion. The fourth is assuming reproducibility: commercial models change silently between versions; the same prompt produces different results over time. Documenting the exact version is a minimum practice. The fifth is ignoring representational bias: training corpora reflect historical and geographic biases (over-representation of English, Anglophone perspectives), and this appears in responses in subtle and non-subtle ways.

Last updated —