algo-social-sentiment

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

VADER Sentiment Analysis

VADER 情感分析

Overview

概述

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment tool optimized for social media. Returns compound score [-1, +1] combining positive, negative, and neutral proportions. Runs in O(n) per text where n = word count. No training required.

VADER（Valence Aware Dictionary and sEntiment Reasoner）是一款针对社交媒体优化的基于词典和规则的情感分析工具。它会返回结合正面、负面和中性比例的复合分数[-1, +1]。处理每条文本的时间复杂度为O(n)，其中n为单词数量。无需训练。

When to Use

使用场景

Trigger conditions:

Analyzing sentiment in social media posts, tweets, or reviews
Quick sentiment scoring without ML model training
Processing text with slang, emoticons, and informal language

When NOT to use:

For formal/academic text (VADER is tuned for social media)
When domain-specific sentiment matters (e.g., financial sentiment — use FinBERT)
When sarcasm detection is critical (VADER doesn't detect sarcasm)

触发条件：

分析社交媒体帖子、推文或评论中的情感
无需训练机器学习模型即可快速进行情感评分
处理包含俚语、表情符号和非正式语言的文本

不适用场景：

正式/学术文本（VADER专为社交媒体调校）
需要特定领域情感分析的场景（如金融情感分析——请使用FinBERT）
讽刺检测至关重要的场景（VADER无法检测讽刺）

Algorithm

算法

IRON LAW: VADER Is Designed for SOCIAL MEDIA Text
It handles slang, emoticons, capitalization, and punctuation as
sentiment modifiers. Applying it to formal documents (legal, academic,
medical) produces unreliable scores. For domain-specific text, use
domain-trained models instead.

IRON LAW: VADER Is Designed for SOCIAL MEDIA Text
It handles slang, emoticons, capitalization, and punctuation as
sentiment modifiers. Applying it to formal documents (legal, academic,
medical) produces unreliable scores. For domain-specific text, use
domain-trained models instead.

Phase 1: Input Validation

阶段1：输入验证

Tokenize text. Preserve: capitalization (ALL CAPS = emphasis), punctuation (! amplifies), emoticons/emoji. Gate: Text is non-empty, encoding handled correctly.

对文本进行分词。保留以下信息：大小写（全大写表示强调）、标点符号（!会增强情感）、表情符号/Emoji。 校验门： 文本非空，编码处理正确。

Phase 2: Core Algorithm

阶段2：核心算法

Look up each token in VADER lexicon (7,500+ sentiment-rated terms)
Apply grammatical rules: negation ("not good" = negative), degree modifiers ("very good" > "good"), capitalization boost, punctuation amplification
Compute raw valence scores for positive, negative, neutral proportions
Compute compound score: normalized sum of all valence scores using formula: compound = sum / √(sum² + α) where α = 15

在VADER词典（包含7500+个带有情感评分的词汇）中查找每个分词
应用语法规则：否定（“not good”=负面）、程度修饰词（“very good”>“good”）、大小写增强、标点符号放大
计算正面、负面、中性比例的原始情感 valence 分数
计算复合分数：使用公式对所有valence分数进行归一化求和：compound = sum / √(sum² + α)，其中α=15

Phase 3: Verification

阶段3：验证

Classify: compound ≥ 0.05 → positive, ≤ -0.05 → negative, else neutral. Spot-check sample results. Gate: Classifications pass manual spot-check on 10-20 examples.

分类规则：compound ≥0.05→正面，≤-0.05→负面，否则为中性。对样本结果进行抽查。 校验门： 分类结果在10-20个示例的人工抽查中合格。

Phase 4: Output

阶段4：输出

Return compound score and polarity classification per text.

返回每条文本的复合分数和极性分类结果。

Output Format

输出格式

json

{
  "results": [{"text": "...", "compound": 0.76, "pos": 0.45, "neu": 0.55, "neg": 0.0, "label": "positive"}],
  "metadata": {"texts_analyzed": 500, "distribution": {"positive": 0.45, "neutral": 0.35, "negative": 0.20}}
}

json

{
  "results": [{"text": "...", "compound": 0.76, "pos": 0.45, "neu": 0.55, "neg": 0.0, "label": "positive"}],
  "metadata": {"texts_analyzed": 500, "distribution": {"positive": 0.45, "neutral": 0.35, "negative": 0.20}}
}

Examples

示例

Sample I/O

输入输出示例

Input: "This product is AMAZING!!! 😍" Expected: compound ≈ 0.87 (positive). Boosted by: CAPS, !!!, 😍 emoji.

输入： "This product is AMAZING!!! 😍" 预期结果： compound≈0.87（正面）。分数提升原因：全大写、!!!、😍表情符号。

Edge Cases

边缘案例

Input	Expected	Why
"Not bad at all"	Slightly positive (~0.2)	Double negation partially handled
"😂😂😂"	Positive	Emoji mapped in lexicon
Empty string	Compound = 0, neutral	No tokens to score

输入	预期结果	原因
"Not bad at all"	轻微正面(~0.2)	部分处理双重否定
"😂😂😂"	正面	表情符号已映射到词典中
空字符串	Compound=0，中性	无分词可评分

Gotchas

注意事项

Sarcasm is invisible: "Oh great, another meeting" reads as positive. VADER has no sarcasm detection.
Negation window: VADER applies negation within a 3-word window. "I do not think this is bad" may misparse the negation chain.
Emoji coverage: VADER's emoji lexicon may not cover newer emoji. Update or supplement as needed.
Language limitation: VADER is English-only. For Chinese/Japanese, use language-specific tools (e.g., SnowNLP for Chinese).
Compound threshold sensitivity: The 0.05 boundary is arbitrary. Adjust thresholds based on your specific use case and tolerance for false positives.

无法识别讽刺："Oh great, another meeting"会被判定为正面。VADER不具备讽刺检测能力。
否定窗口限制：VADER在3个单词的窗口内应用否定规则。"I do not think this is bad"可能会错误解析否定链。
表情符号覆盖范围：VADER的表情符号词典可能未覆盖较新的表情符号。必要时请更新或补充。
语言限制：VADER仅支持英文。对于中文/日文，请使用特定语言工具（如中文的SnowNLP）。
复合分数阈值敏感性：0.05的边界是人为设定的。请根据具体使用场景和对假阳性的容忍度调整阈值。

References

参考资料

For VADER lexicon and rules documentation, see
```
references/vader-rules.md
```
For comparison with transformer-based sentiment models, see
```
references/model-comparison.md
```

关于VADER词典和规则的文档，请查看
```
references/vader-rules.md
```
关于与基于Transformer的情感模型的对比，请查看
```
references/model-comparison.md
```