algo-social-sentiment

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

VADER Sentiment Analysis

VADER 情感分析

Overview

概述

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment tool optimized for social media. Returns compound score [-1, +1] combining positive, negative, and neutral proportions. Runs in O(n) per text where n = word count. No training required.
VADER(Valence Aware Dictionary and sEntiment Reasoner)是一款针对社交媒体优化的基于词典和规则的情感分析工具。它会返回结合正面、负面和中性比例的复合分数[-1, +1]。处理每条文本的时间复杂度为O(n),其中n为单词数量。无需训练。

When to Use

使用场景

Trigger conditions:
  • Analyzing sentiment in social media posts, tweets, or reviews
  • Quick sentiment scoring without ML model training
  • Processing text with slang, emoticons, and informal language
When NOT to use:
  • For formal/academic text (VADER is tuned for social media)
  • When domain-specific sentiment matters (e.g., financial sentiment — use FinBERT)
  • When sarcasm detection is critical (VADER doesn't detect sarcasm)
触发条件:
  • 分析社交媒体帖子、推文或评论中的情感
  • 无需训练机器学习模型即可快速进行情感评分
  • 处理包含俚语、表情符号和非正式语言的文本
不适用场景:
  • 正式/学术文本(VADER专为社交媒体调校)
  • 需要特定领域情感分析的场景(如金融情感分析——请使用FinBERT)
  • 讽刺检测至关重要的场景(VADER无法检测讽刺)

Algorithm

算法

IRON LAW: VADER Is Designed for SOCIAL MEDIA Text
It handles slang, emoticons, capitalization, and punctuation as
sentiment modifiers. Applying it to formal documents (legal, academic,
medical) produces unreliable scores. For domain-specific text, use
domain-trained models instead.
IRON LAW: VADER Is Designed for SOCIAL MEDIA Text
It handles slang, emoticons, capitalization, and punctuation as
sentiment modifiers. Applying it to formal documents (legal, academic,
medical) produces unreliable scores. For domain-specific text, use
domain-trained models instead.

Phase 1: Input Validation

阶段1:输入验证

Tokenize text. Preserve: capitalization (ALL CAPS = emphasis), punctuation (! amplifies), emoticons/emoji. Gate: Text is non-empty, encoding handled correctly.
对文本进行分词。保留以下信息:大小写(全大写表示强调)、标点符号(!会增强情感)、表情符号/Emoji。 校验门: 文本非空,编码处理正确。

Phase 2: Core Algorithm

阶段2:核心算法

  1. Look up each token in VADER lexicon (7,500+ sentiment-rated terms)
  2. Apply grammatical rules: negation ("not good" = negative), degree modifiers ("very good" > "good"), capitalization boost, punctuation amplification
  3. Compute raw valence scores for positive, negative, neutral proportions
  4. Compute compound score: normalized sum of all valence scores using formula: compound = sum / √(sum² + α) where α = 15
  1. 在VADER词典(包含7500+个带有情感评分的词汇)中查找每个分词
  2. 应用语法规则:否定(“not good”=负面)、程度修饰词(“very good”>“good”)、大小写增强、标点符号放大
  3. 计算正面、负面、中性比例的原始情感 valence 分数
  4. 计算复合分数:使用公式对所有valence分数进行归一化求和:compound = sum / √(sum² + α),其中α=15

Phase 3: Verification

阶段3:验证

Classify: compound ≥ 0.05 → positive, ≤ -0.05 → negative, else neutral. Spot-check sample results. Gate: Classifications pass manual spot-check on 10-20 examples.
分类规则:compound ≥0.05→正面,≤-0.05→负面,否则为中性。对样本结果进行抽查。 校验门: 分类结果在10-20个示例的人工抽查中合格。

Phase 4: Output

阶段4:输出

Return compound score and polarity classification per text.
返回每条文本的复合分数和极性分类结果。

Output Format

输出格式

json
{
  "results": [{"text": "...", "compound": 0.76, "pos": 0.45, "neu": 0.55, "neg": 0.0, "label": "positive"}],
  "metadata": {"texts_analyzed": 500, "distribution": {"positive": 0.45, "neutral": 0.35, "negative": 0.20}}
}
json
{
  "results": [{"text": "...", "compound": 0.76, "pos": 0.45, "neu": 0.55, "neg": 0.0, "label": "positive"}],
  "metadata": {"texts_analyzed": 500, "distribution": {"positive": 0.45, "neutral": 0.35, "negative": 0.20}}
}

Examples

示例

Sample I/O

输入输出示例

Input: "This product is AMAZING!!! 😍" Expected: compound ≈ 0.87 (positive). Boosted by: CAPS, !!!, 😍 emoji.
输入: "This product is AMAZING!!! 😍" 预期结果: compound≈0.87(正面)。分数提升原因:全大写、!!!、😍表情符号。

Edge Cases

边缘案例

InputExpectedWhy
"Not bad at all"Slightly positive (~0.2)Double negation partially handled
"😂😂😂"PositiveEmoji mapped in lexicon
Empty stringCompound = 0, neutralNo tokens to score
输入预期结果原因
"Not bad at all"轻微正面(~0.2)部分处理双重否定
"😂😂😂"正面表情符号已映射到词典中
空字符串Compound=0,中性无分词可评分

Gotchas

注意事项

  • Sarcasm is invisible: "Oh great, another meeting" reads as positive. VADER has no sarcasm detection.
  • Negation window: VADER applies negation within a 3-word window. "I do not think this is bad" may misparse the negation chain.
  • Emoji coverage: VADER's emoji lexicon may not cover newer emoji. Update or supplement as needed.
  • Language limitation: VADER is English-only. For Chinese/Japanese, use language-specific tools (e.g., SnowNLP for Chinese).
  • Compound threshold sensitivity: The 0.05 boundary is arbitrary. Adjust thresholds based on your specific use case and tolerance for false positives.
  • 无法识别讽刺:"Oh great, another meeting"会被判定为正面。VADER不具备讽刺检测能力。
  • 否定窗口限制:VADER在3个单词的窗口内应用否定规则。"I do not think this is bad"可能会错误解析否定链。
  • 表情符号覆盖范围:VADER的表情符号词典可能未覆盖较新的表情符号。必要时请更新或补充。
  • 语言限制:VADER仅支持英文。对于中文/日文,请使用特定语言工具(如中文的SnowNLP)。
  • 复合分数阈值敏感性:0.05的边界是人为设定的。请根据具体使用场景和对假阳性的容忍度调整阈值。

References

参考资料

  • For VADER lexicon and rules documentation, see
    references/vader-rules.md
  • For comparison with transformer-based sentiment models, see
    references/model-comparison.md
  • 关于VADER词典和规则的文档,请查看
    references/vader-rules.md
  • 关于与基于Transformer的情感模型的对比,请查看
    references/model-comparison.md