algo-social-sentiment
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVADER Sentiment Analysis
VADER 情感分析
Overview
概述
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment tool optimized for social media. Returns compound score [-1, +1] combining positive, negative, and neutral proportions. Runs in O(n) per text where n = word count. No training required.
VADER(Valence Aware Dictionary and sEntiment Reasoner)是一款针对社交媒体优化的基于词典和规则的情感分析工具。它会返回结合正面、负面和中性比例的复合分数[-1, +1]。处理每条文本的时间复杂度为O(n),其中n为单词数量。无需训练。
When to Use
使用场景
Trigger conditions:
- Analyzing sentiment in social media posts, tweets, or reviews
- Quick sentiment scoring without ML model training
- Processing text with slang, emoticons, and informal language
When NOT to use:
- For formal/academic text (VADER is tuned for social media)
- When domain-specific sentiment matters (e.g., financial sentiment — use FinBERT)
- When sarcasm detection is critical (VADER doesn't detect sarcasm)
触发条件:
- 分析社交媒体帖子、推文或评论中的情感
- 无需训练机器学习模型即可快速进行情感评分
- 处理包含俚语、表情符号和非正式语言的文本
不适用场景:
- 正式/学术文本(VADER专为社交媒体调校)
- 需要特定领域情感分析的场景(如金融情感分析——请使用FinBERT)
- 讽刺检测至关重要的场景(VADER无法检测讽刺)
Algorithm
算法
IRON LAW: VADER Is Designed for SOCIAL MEDIA Text
It handles slang, emoticons, capitalization, and punctuation as
sentiment modifiers. Applying it to formal documents (legal, academic,
medical) produces unreliable scores. For domain-specific text, use
domain-trained models instead.IRON LAW: VADER Is Designed for SOCIAL MEDIA Text
It handles slang, emoticons, capitalization, and punctuation as
sentiment modifiers. Applying it to formal documents (legal, academic,
medical) produces unreliable scores. For domain-specific text, use
domain-trained models instead.Phase 1: Input Validation
阶段1:输入验证
Tokenize text. Preserve: capitalization (ALL CAPS = emphasis), punctuation (! amplifies), emoticons/emoji.
Gate: Text is non-empty, encoding handled correctly.
对文本进行分词。保留以下信息:大小写(全大写表示强调)、标点符号(!会增强情感)、表情符号/Emoji。
校验门: 文本非空,编码处理正确。
Phase 2: Core Algorithm
阶段2:核心算法
- Look up each token in VADER lexicon (7,500+ sentiment-rated terms)
- Apply grammatical rules: negation ("not good" = negative), degree modifiers ("very good" > "good"), capitalization boost, punctuation amplification
- Compute raw valence scores for positive, negative, neutral proportions
- Compute compound score: normalized sum of all valence scores using formula: compound = sum / √(sum² + α) where α = 15
- 在VADER词典(包含7500+个带有情感评分的词汇)中查找每个分词
- 应用语法规则:否定(“not good”=负面)、程度修饰词(“very good”>“good”)、大小写增强、标点符号放大
- 计算正面、负面、中性比例的原始情感 valence 分数
- 计算复合分数:使用公式对所有valence分数进行归一化求和:compound = sum / √(sum² + α),其中α=15
Phase 3: Verification
阶段3:验证
Classify: compound ≥ 0.05 → positive, ≤ -0.05 → negative, else neutral. Spot-check sample results.
Gate: Classifications pass manual spot-check on 10-20 examples.
分类规则:compound ≥0.05→正面,≤-0.05→负面,否则为中性。对样本结果进行抽查。
校验门: 分类结果在10-20个示例的人工抽查中合格。
Phase 4: Output
阶段4:输出
Return compound score and polarity classification per text.
返回每条文本的复合分数和极性分类结果。
Output Format
输出格式
json
{
"results": [{"text": "...", "compound": 0.76, "pos": 0.45, "neu": 0.55, "neg": 0.0, "label": "positive"}],
"metadata": {"texts_analyzed": 500, "distribution": {"positive": 0.45, "neutral": 0.35, "negative": 0.20}}
}json
{
"results": [{"text": "...", "compound": 0.76, "pos": 0.45, "neu": 0.55, "neg": 0.0, "label": "positive"}],
"metadata": {"texts_analyzed": 500, "distribution": {"positive": 0.45, "neutral": 0.35, "negative": 0.20}}
}Examples
示例
Sample I/O
输入输出示例
Input: "This product is AMAZING!!! 😍"
Expected: compound ≈ 0.87 (positive). Boosted by: CAPS, !!!, 😍 emoji.
输入: "This product is AMAZING!!! 😍"
预期结果: compound≈0.87(正面)。分数提升原因:全大写、!!!、😍表情符号。
Edge Cases
边缘案例
| Input | Expected | Why |
|---|---|---|
| "Not bad at all" | Slightly positive (~0.2) | Double negation partially handled |
| "😂😂😂" | Positive | Emoji mapped in lexicon |
| Empty string | Compound = 0, neutral | No tokens to score |
| 输入 | 预期结果 | 原因 |
|---|---|---|
| "Not bad at all" | 轻微正面(~0.2) | 部分处理双重否定 |
| "😂😂😂" | 正面 | 表情符号已映射到词典中 |
| 空字符串 | Compound=0,中性 | 无分词可评分 |
Gotchas
注意事项
- Sarcasm is invisible: "Oh great, another meeting" reads as positive. VADER has no sarcasm detection.
- Negation window: VADER applies negation within a 3-word window. "I do not think this is bad" may misparse the negation chain.
- Emoji coverage: VADER's emoji lexicon may not cover newer emoji. Update or supplement as needed.
- Language limitation: VADER is English-only. For Chinese/Japanese, use language-specific tools (e.g., SnowNLP for Chinese).
- Compound threshold sensitivity: The 0.05 boundary is arbitrary. Adjust thresholds based on your specific use case and tolerance for false positives.
- 无法识别讽刺:"Oh great, another meeting"会被判定为正面。VADER不具备讽刺检测能力。
- 否定窗口限制:VADER在3个单词的窗口内应用否定规则。"I do not think this is bad"可能会错误解析否定链。
- 表情符号覆盖范围:VADER的表情符号词典可能未覆盖较新的表情符号。必要时请更新或补充。
- 语言限制:VADER仅支持英文。对于中文/日文,请使用特定语言工具(如中文的SnowNLP)。
- 复合分数阈值敏感性:0.05的边界是人为设定的。请根据具体使用场景和对假阳性的容忍度调整阈值。
References
参考资料
- For VADER lexicon and rules documentation, see
references/vader-rules.md - For comparison with transformer-based sentiment models, see
references/model-comparison.md
- 关于VADER词典和规则的文档,请查看
references/vader-rules.md - 关于与基于Transformer的情感模型的对比,请查看
references/model-comparison.md