Opinion Miner Tool
Analyze community comments to uncover users' true core viewpoints and positions.
When to Use This Skill
Use this skill when users want to understand the community's opinions on a topic — not just "what they said", but "what positions they hold and where disagreements lie". The goal is to transform raw comments into structured insights.
Typical Trigger Scenarios:
- "What are people saying about X on Reddit/Bilibili/GitHub?"
- "Analyze the comments on this post"
- "What are the main points of disagreement here?"
- "Help me understand the community's attitude towards this issue"
- "Conduct opinion analysis on this topic"
Supported Data Sources
| Data Sources | Scraping Methods |
|---|
| Bilibili Video Comments | (requires JS rendering) or call Bilibili API via |
| Reddit Posts | Access or Reddit JSON API () via |
| GitHub Issues Comments | Call GitHub API (/repos/owner/repo/issues/N/comments
) via |
If the user provides a URL, first identify the platform type, then use the corresponding method for scraping.
Workflow
Step 1: Scrape Comments
Collect all comments from the given URL. Save the raw data to
using the following structure:
json
[
{
"id": "unique-id",
"author": "username",
"text": "comment body",
"likes": 0,
"replies": [],
"timestamp": "2026-01-15T10:30:00Z"
}
]
Platform-Specific Scraping Methods:
Bilibili: First try the comment API —
https://api.bilibili.com/x/v2/reply/main?type=1&oid={video_id}&mode=3&ps=20&pn={page}
. Scrape paginated results until all comments are retrieved. If the API fails, fall back to
:
agent-browser open "https://www.bilibili.com/video/BVxxxxx" && agent-browser wait --load networkidle
agent-browser snapshot -i
Then scroll the page and extract comments via DOM snapshots.
Reddit: Use the JSON API — append
to any Reddit URL:
webfetch "https://www.reddit.com/r/subreddit/comments/postid.json?limit=500"
Parse the nested tree structure. Include replies as nested comments, but flatten them during clustering (replies usually repeat the parent comment's viewpoint).
GitHub Issues: Use the GitHub API:
webfetch "https://api.github.com/repos/owner/repo/issues/issue_number/comments?per_page=100"
Use
for pagination. Also retrieve the issue body — it defines the context of the discussion.
Step 2: Preprocessing
Clean the raw comments before analysis:
- Remove bot comments, spam, and meaningless comments (e.g., "+1", "bump", single emojis)
- If there are more than 500 comments, perform strategic sampling — select high-like comments + randomly sample medium-popularity comments to capture minority viewpoints
- Retain metadata (like count, author) — helps judge which viewpoints are more popular
Save the cleaned data to
.
Step 3: Semantic Clustering
Read all cleaned comments and group them by semantic similarity — comments expressing the same underlying argument are grouped together, even if the wording is completely different.
Efficient Clustering Methods:
- Read comments in batches (50-100 at a time) and perform the first round of grouping
- Merge clusters by comparing results across batches — same argument = same cluster
- Each cluster should represent a unique position or argument, not just a topic
- Name each cluster with a concise argument statement (instead of a hashtag)
Cluster Naming Guidelines: Each cluster name should be an argument, not a topic.
- Good example: "This feature breaks backward compatibility and should be made optional"
- Bad example: "Backward compatibility concerns"
Save the clustering results to
:
json
[
{
"cluster_id": 1,
"name": "Concise argument statement",
"comment_count": 45,
"representative_comments": ["full text of 2-3 best examples"],
"support_ratio": 0.7,
"sample_comment_ids": ["id1", "id2", "id3"]
}
]
Step 4: Debate Analysis
For each cluster, identify the following:
- Position: What exactly is this group arguing for?
- Arguments: What facts, experiences, or logic do they cite?
- Belief Strength: Are they definitive or hesitant? Use language cues and like counts to judge
- Relationship with Other Clusters: Is this an opposing view to another cluster? Or a supplement or extension?
Then synthesize all clusters to identify:
- Core Debate Axes: Fundamental disagreements (e.g., "Security vs. Convenience", "Innovation vs. Stability")
- Consensus Points: Points agreed upon by most clusters
- Disagreement Points: Points where the community has clear opposition
- Minority Viewpoints: Viewpoints held by few but with strong arguments
Step 5: Generate Report
Output a Markdown report using the following template:
markdown
# [Topic] Community Opinion Analysis
> Data Source: [URL]
> Total Comments: N (M valid comments analyzed)
> Generated Time: YYYY-MM-DD
## Summary
[2-3 sentences summarizing the community's overall attitude and main disagreements]
## Core Debate Points
[Describe the 1-2 most core disagreement axes, explain why these are the focus of the debate]
## Viewpoint Clusters
### Viewpoint 1: [Argument Statement]
- **Proportion**: ~X% (about N comments)
- **Core Arguments**: [Main reasons supporting this viewpoint]
- **Typical Comments**: [1-2 representative original comments]
- **Popularity**: [Like count/support level]
### Viewpoint 2: [Argument Statement]
...
## Consensus & Disagreements
### Consensus
- [Points agreed upon by most people]
### Disagreements
- [Main opposing points, which viewpoints are in direct conflict]
## Minority Viewpoints
- [Viewpoints held by few but with strong arguments, worthy of attention]
## Sentiment Analysis
- **Overall Sentiment**: [Positive/Negative/Neutral/Mixed]
- **Sentiment Intensity**: [Intense/Mild]
- **Sentiment Trend**: [If timeline data is available]
If the user also requests JSON output, save the structured data as well.
Analysis Tips
- Don't just count votes — a high-like minority viewpoint may be more important than a low-participation majority position
- Pay attention to how people argue, not just what they say. Sarcasm, emotional language, and defensive statements all indicate strong positions
- Look for implied arguments — sometimes the real disagreement is not explicitly stated (e.g., people arguing about implementation details may actually be arguing about priorities)
- Cross-reference with replies — a reply strongly opposing a parent comment reveals the debate structure
- If comments involve multiple languages, cluster by argument (regardless of language), then note the language distribution in each cluster