AI & MACHINE LEARNING

Sentiment analysis

NLP subfield that classifies affective polarity (positive, negative, neutral) or identifies specific emotions in text. Approaches evolved from manual lexicons to supervised classifiers to transformer-based models. Pang and Lee (2008) consolidated the field.

Extended definition

Sentiment analysis (also called opinion mining) is the NLP subfield that classifies affective polarity in text — typically positive, negative, neutral — or identifies specific emotions (joy, anger, fear, surprise, sadness, disgust, based on models like Ekman’s or Plutchik’s). Pang and Lee (2008, Foundations and Trends in Information Retrieval) and Liu (2012, Synthesis Lectures) offered the field’s consolidating references. Approaches evolved in three generations: (1) manual lexicons with valenced word lists (LIWC, SentiWordNet, VADER) — fragile to context and negation; (2) classical supervised classifiers (Naive Bayes, SVM with TF-IDF and n-grams) over labeled datasets — solid baseline for years; (3) transformer-based models (fine-tuned BERT, RoBERTa, multilingual models like XLM-R) — state-of-the-art in contextual sentiment and aspect-based sentiment analysis (ABSA), where sentiment is detected over specific attributes of the object (e.g., battery quality vs. camera quality of a product).

When it applies

Sentiment analysis applies in brand monitoring (social media, reviews), market research, political communication analysis, media studies, customer experience analytics, call-center analysis. In academic research, it applies in public-opinion studies, institutional discourse analysis, media political polarization, and in digital humanities for emotional analysis of literary or historical texts. It applies in initial text filtering in larger pipelines: prioritizing negative messages in customer support, detecting hate speech in moderation. ABSA is particularly useful in product review analysis where multiple aspects coexist.

When it does not apply

It does not apply as a substitute for deep qualitative content analysis in research on subjective experience — sentiment encodes simplified polarity that loses important nuances (irony, sarcasm, ambivalence, humor). It does not apply to short text without sufficient context: sarcasm is a chronic challenge, especially out of training domain. It does not apply directly to languages with low coverage in pretrained models without careful adaptation: English bias persists. It does not replace human validation in high-impact decisions: error in automated hate-speech moderation has real social cost. In clinical or legal texts, generic sentiment is rarely informative — the domain demands specific models.

Applications by field

Marketing and CX: review monitoring in e-commerce, social media; ABSA to extract granular feedback on attributes. — Political communication: polarization analysis in speeches and media; electoral campaign studies. — Mental health and digital phenotyping: detection of affective markers in patient text (with ethical care). — Digital humanities: emotional analysis of historical, literary texts; affective change mapping in longitudinal corpora.

Common pitfalls

The first pitfall is treating positive/negative sentiment as a single sufficient dimension: texts can have simultaneous ambivalence (positive about A, negative about B in the same utterance); ABSA is needed for granularity. The second is not testing performance in the application domain: a model trained on Amazon reviews performs worse on political tweets; transfer requires fine-tuning or validation. The third is blindly trusting off-the-shelf tools (VADER, TextBlob) in technical domains: specialized vocabulary breaks manual lexicons and requires adaptation. The fourth is ignoring sarcasm and irony: their detection is still an open problem and even large pretrained models err frequently. The fifth is failing to document representational bias limitations: models reflect training data biases, often underrepresenting dialects, regional slang, and other languages.

Last updated —