opinion-miner

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

舆情分析工具 (Opinion Miner)

Opinion Miner Tool

分析社区评论,挖掘用户真正的核心观点和立场。
Analyze community comments to uncover users' true core viewpoints and positions.

何时使用此技能

When to Use This Skill

当用户想要了解社区对某个话题的看法时使用此技能——不仅仅是"他们说了什么",而是"他们持有什么样的立场以及在哪里存在分歧"。目标是将原始评论转化为结构化的洞察。
典型触发场景:
  • "大家对 X 在 Reddit/Bilibili/GitHub 上怎么说?"
  • "分析一下这个帖子的评论"
  • "这里主要的分歧点是什么?"
  • "帮我了解一下社区对这个问题的态度"
  • "做一下这个话题的舆情分析"
Use this skill when users want to understand the community's opinions on a topic — not just "what they said", but "what positions they hold and where disagreements lie". The goal is to transform raw comments into structured insights.
Typical Trigger Scenarios:
  • "What are people saying about X on Reddit/Bilibili/GitHub?"
  • "Analyze the comments on this post"
  • "What are the main points of disagreement here?"
  • "Help me understand the community's attitude towards this issue"
  • "Conduct opinion analysis on this topic"

支持的数据来源

Supported Data Sources

数据来源爬取方式
Bilibili 视频评论
agent-browser
(需要 JS 渲染)或通过
webfetch
调用 Bilibili API
Reddit 帖子通过
webfetch
访问
old.reddit.com
或 Reddit JSON API (
/.json
)
GitHub Issues 评论通过
webfetch
调用 GitHub API (
/repos/owner/repo/issues/N/comments
)
如果用户提供了 URL,先识别平台类型,然后使用相应的方法进行爬取。
Data SourcesScraping Methods
Bilibili Video Comments
agent-browser
(requires JS rendering) or call Bilibili API via
webfetch
Reddit PostsAccess
old.reddit.com
or Reddit JSON API (
/.json
) via
webfetch
GitHub Issues CommentsCall GitHub API (
/repos/owner/repo/issues/N/comments
) via
webfetch
If the user provides a URL, first identify the platform type, then use the corresponding method for scraping.

工作流程

Workflow

步骤 1: 爬取评论

Step 1: Scrape Comments

从给定的 URL 收集所有评论。将原始数据保存到
comments_raw.json
,使用以下结构:
json
[
  {
    "id": "unique-id",
    "author": "username",
    "text": "comment body",
    "likes": 0,
    "replies": [],
    "timestamp": "2026-01-15T10:30:00Z"
  }
]
平台特定的爬取方式:
Bilibili: 先尝试评论 API —
https://api.bilibili.com/x/v2/reply/main?type=1&oid={video_id}&mode=3&ps=20&pn={page}
。分页爬取直到评论耗尽。如果 API 失败,回退到
agent-browser
agent-browser open "https://www.bilibili.com/video/BVxxxxx" && agent-browser wait --load networkidle
agent-browser snapshot -i
然后滚动页面并通过 DOM 快照提取评论。
Reddit: 使用 JSON API — 在任意 Reddit URL 末尾添加
.json
webfetch "https://www.reddit.com/r/subreddit/comments/postid.json?limit=500"
解析嵌套的树状结构。将回复作为嵌套评论包含在内,但在聚类时将其展平(回复通常会重复父评论的观点)。
GitHub Issues: 使用 GitHub API:
webfetch "https://api.github.com/repos/owner/repo/issues/issue_number/comments?per_page=100"
使用
&page=N
进行分页。同时获取 issue 正文——它定义了讨论的背景。
Collect all comments from the given URL. Save the raw data to
comments_raw.json
using the following structure:
json
[
  {
    "id": "unique-id",
    "author": "username",
    "text": "comment body",
    "likes": 0,
    "replies": [],
    "timestamp": "2026-01-15T10:30:00Z"
  }
]
Platform-Specific Scraping Methods:
Bilibili: First try the comment API —
https://api.bilibili.com/x/v2/reply/main?type=1&oid={video_id}&mode=3&ps=20&pn={page}
. Scrape paginated results until all comments are retrieved. If the API fails, fall back to
agent-browser
:
agent-browser open "https://www.bilibili.com/video/BVxxxxx" && agent-browser wait --load networkidle
agent-browser snapshot -i
Then scroll the page and extract comments via DOM snapshots.
Reddit: Use the JSON API — append
.json
to any Reddit URL:
webfetch "https://www.reddit.com/r/subreddit/comments/postid.json?limit=500"
Parse the nested tree structure. Include replies as nested comments, but flatten them during clustering (replies usually repeat the parent comment's viewpoint).
GitHub Issues: Use the GitHub API:
webfetch "https://api.github.com/repos/owner/repo/issues/issue_number/comments?per_page=100"
Use
&page=N
for pagination. Also retrieve the issue body — it defines the context of the discussion.

步骤 2: 预处理

Step 2: Preprocessing

在分析之前清理原始评论:
  1. 移除机器人评论、垃圾信息以及无意义评论(如 "+1"、"bump"、单个表情)
  2. 如果评论超过 500 条,战略性采样——选取高赞评论 + 随机抽取中热度评论,以捕捉少数派观点
  3. 保留元数据(点赞数、作者)——有助于判断哪些观点更受欢迎
将清理后的数据保存到
comments_cleaned.json
Clean the raw comments before analysis:
  1. Remove bot comments, spam, and meaningless comments (e.g., "+1", "bump", single emojis)
  2. If there are more than 500 comments, perform strategic sampling — select high-like comments + randomly sample medium-popularity comments to capture minority viewpoints
  3. Retain metadata (like count, author) — helps judge which viewpoints are more popular
Save the cleaned data to
comments_cleaned.json
.

步骤 3: 语义聚类

Step 3: Semantic Clustering

阅读所有清理后的评论,按语义相似性进行分组——表达相同底层论点的评论归为一组,即使措辞完全不同。
高效聚类的方法:
  • 批量阅读评论(每次 50-100 条)并进行第一轮分组
  • 通过对比各批次之间的聚类结果进行合并——相同论点 = 同一聚类
  • 每个聚类应该代表一个独特的立场论点,而不仅仅是主题
  • 用简洁的论点陈述来命名每个聚类(而不是主题标签)
聚类命名规范: 每个聚类名称应该是一个论点,而不是主题。
  • 好的例子:"此功能破坏向后兼容性,应该改为可选"
  • 不好的例子:"向后兼容性担忧"
将聚类结果保存到
clusters.json
json
[
  {
    "cluster_id": 1,
    "name": "Concise argument statement",
    "comment_count": 45,
    "representative_comments": ["full text of 2-3 best examples"],
    "support_ratio": 0.7,
    "sample_comment_ids": ["id1", "id2", "id3"]
  }
]
Read all cleaned comments and group them by semantic similarity — comments expressing the same underlying argument are grouped together, even if the wording is completely different.
Efficient Clustering Methods:
  • Read comments in batches (50-100 at a time) and perform the first round of grouping
  • Merge clusters by comparing results across batches — same argument = same cluster
  • Each cluster should represent a unique position or argument, not just a topic
  • Name each cluster with a concise argument statement (instead of a hashtag)
Cluster Naming Guidelines: Each cluster name should be an argument, not a topic.
  • Good example: "This feature breaks backward compatibility and should be made optional"
  • Bad example: "Backward compatibility concerns"
Save the clustering results to
clusters.json
:
json
[
  {
    "cluster_id": 1,
    "name": "Concise argument statement",
    "comment_count": 45,
    "representative_comments": ["full text of 2-3 best examples"],
    "support_ratio": 0.7,
    "sample_comment_ids": ["id1", "id2", "id3"]
  }
]

步骤 4: 辩论分析

Step 4: Debate Analysis

针对每个聚类,确定以下内容:
  1. 立场:这个群体到底在争论什么?
  2. 论据:他们引用了什么事实、经验或逻辑?
  3. 信念强度:他们是肯定的还是犹豫的?利用语言线索和点赞数进行判断
  4. 与其他聚类的关系:这是对另一个聚类的反对观点吗?还是补充或延伸?
然后综合所有聚类进行识别:
  • 核心争论轴:根本性的分歧(如"安全 vs. 便利"、"创新 vs. 稳定")
  • 共识点:大多数聚类都同意的点
  • 分歧点:社区存在明显对立的点
  • 少数派观点:持有者少但论据有力的观点
For each cluster, identify the following:
  1. Position: What exactly is this group arguing for?
  2. Arguments: What facts, experiences, or logic do they cite?
  3. Belief Strength: Are they definitive or hesitant? Use language cues and like counts to judge
  4. Relationship with Other Clusters: Is this an opposing view to another cluster? Or a supplement or extension?
Then synthesize all clusters to identify:
  • Core Debate Axes: Fundamental disagreements (e.g., "Security vs. Convenience", "Innovation vs. Stability")
  • Consensus Points: Points agreed upon by most clusters
  • Disagreement Points: Points where the community has clear opposition
  • Minority Viewpoints: Viewpoints held by few but with strong arguments

步骤 5: 生成报告

Step 5: Generate Report

使用以下模板输出 Markdown 报告:
markdown
undefined
Output a Markdown report using the following template:
markdown
undefined

[Topic] 社区观点分析

[Topic] Community Opinion Analysis

数据来源: [URL] 评论总数: N (分析了 M 条有效评论) 生成时间: YYYY-MM-DD
Data Source: [URL] Total Comments: N (M valid comments analyzed) Generated Time: YYYY-MM-DD

摘要

Summary

[2-3 句话概括社区的整体态度和主要分歧]
[2-3 sentences summarizing the community's overall attitude and main disagreements]

核心争论点

Core Debate Points

[描述最核心的 1-2 个分歧轴,解释为什么这是争论的焦点]
[Describe the 1-2 most core disagreement axes, explain why these are the focus of the debate]

观点聚类

Viewpoint Clusters

观点 1: [论点陈述]

Viewpoint 1: [Argument Statement]

  • 占比: ~X% (约 N 条评论)
  • 核心论据: [支持这个观点的主要理由]
  • 典型评论: [1-2 条代表性原文]
  • 热度: [点赞/支持度]
  • Proportion: ~X% (about N comments)
  • Core Arguments: [Main reasons supporting this viewpoint]
  • Typical Comments: [1-2 representative original comments]
  • Popularity: [Like count/support level]

观点 2: [论点陈述]

Viewpoint 2: [Argument Statement]

...
...

共识与分歧

Consensus & Disagreements

共识

Consensus

  • [大多数人都同意的点]
  • [Points agreed upon by most people]

分歧

Disagreements

  • [主要对立点,哪些观点之间存在直接冲突]
  • [Main opposing points, which viewpoints are in direct conflict]

少数派观点

Minority Viewpoints

  • [持有者少但论据有力的观点,值得关注]
  • [Viewpoints held by few but with strong arguments, worthy of attention]

情绪分析

Sentiment Analysis

  • 整体情绪: [正面/负面/中立/混合]
  • 情绪强度: [激烈/温和]
  • 情绪变化趋势: [如有时间线数据]

如果用户也请求 JSON 输出,请同时保存结构化数据。
  • Overall Sentiment: [Positive/Negative/Neutral/Mixed]
  • Sentiment Intensity: [Intense/Mild]
  • Sentiment Trend: [If timeline data is available]

If the user also requests JSON output, save the structured data as well.

分析技巧

Analysis Tips

  • 不要只数投票数——高赞的少数派观点可能比低参与度的多数派立场更重要
  • 注意人们如何争论,而不仅仅是他们说了什么。讽刺、情绪化语言和防御性表态都表明强烈的立场
  • 寻找隐含的论点——有时真正的分歧并未明确说出(例如,人们争论实现细节实际上可能是在争论优先级)
  • 与回复交叉参考——一个强烈反对父评论的回复揭示了辩论结构
  • 如果评论涉及多种语言,按论点进行聚类(不考虑语言),然后在每个聚类中注明语言分布
  • Don't just count votes — a high-like minority viewpoint may be more important than a low-participation majority position
  • Pay attention to how people argue, not just what they say. Sarcasm, emotional language, and defensive statements all indicate strong positions
  • Look for implied arguments — sometimes the real disagreement is not explicitly stated (e.g., people arguing about implementation details may actually be arguing about priorities)
  • Cross-reference with replies — a reply strongly opposing a parent comment reveals the debate structure
  • If comments involve multiple languages, cluster by argument (regardless of language), then note the language distribution in each cluster