opinion-miner

Original：🇨🇳 Chinese

Translated

Analyze community opinions from forums and comment sections. Scrapes comments from Bilibili, Reddit, or GitHub Issues, clusters them by semantic similarity, and extracts the core arguments, debates, and viewpoints. Produces a structured report showing what the community actually thinks — not just a summary of comments, but the underlying positions people hold and where the real disagreements are. Use this skill when the user wants to understand public opinion on a topic, find the main points of contention in a discussion, or do competitive/event research from community sources. Triggers include requests to "analyze comments", "what are people saying about X", "summarize the debate", "find the key arguments", "what's the community consensus", or any task involving opinion extraction from forum or comment data.

14installs

Sourcelumincui/skills

Added on2026-04-26

NPX Install

npx skill4agent add lumincui/skills opinion-miner

SKILL.md Content (Chinese)

View Translation Comparison →

Opinion Miner Tool

Analyze community comments to uncover users' true core viewpoints and positions.

When to Use This Skill

Use this skill when users want to understand the community's opinions on a topic — not just "what they said", but "what positions they hold and where disagreements lie". The goal is to transform raw comments into structured insights.

Typical Trigger Scenarios:

"What are people saying about X on Reddit/Bilibili/GitHub?"
"Analyze the comments on this post"
"What are the main points of disagreement here?"
"Help me understand the community's attitude towards this issue"
"Conduct opinion analysis on this topic"

Supported Data Sources

Data Sources	Scraping Methods
Bilibili Video Comments	`agent-browser` (requires JS rendering) or call Bilibili API via `webfetch`
Reddit Posts	Access `old.reddit.com` or Reddit JSON API ( `/.json` ) via `webfetch`
GitHub Issues Comments	Call GitHub API ( `/repos/owner/repo/issues/N/comments` ) via `webfetch`

If the user provides a URL, first identify the platform type, then use the corresponding method for scraping.

Workflow

Step 1: Scrape Comments

Collect all comments from the given URL. Save the raw data to

comments_raw.json

using the following structure:

json

[
  {
    "id": "unique-id",
    "author": "username",
    "text": "comment body",
    "likes": 0,
    "replies": [],
    "timestamp": "2026-01-15T10:30:00Z"
  }
]

Platform-Specific Scraping Methods:

Bilibili: First try the comment API —

https://api.bilibili.com/x/v2/reply/main?type=1&oid={video_id}&mode=3&ps=20&pn={page}

. Scrape paginated results until all comments are retrieved. If the API fails, fall back to

agent-browser

:

agent-browser open "https://www.bilibili.com/video/BVxxxxx" && agent-browser wait --load networkidle
agent-browser snapshot -i

Then scroll the page and extract comments via DOM snapshots.

Reddit: Use the JSON API — append

.json

to any Reddit URL:

webfetch "https://www.reddit.com/r/subreddit/comments/postid.json?limit=500"

Parse the nested tree structure. Include replies as nested comments, but flatten them during clustering (replies usually repeat the parent comment's viewpoint).

GitHub Issues: Use the GitHub API:

webfetch "https://api.github.com/repos/owner/repo/issues/issue_number/comments?per_page=100"

Use

&page=N

for pagination. Also retrieve the issue body — it defines the context of the discussion.

Step 2: Preprocessing

Clean the raw comments before analysis:

Remove bot comments, spam, and meaningless comments (e.g., "+1", "bump", single emojis)
If there are more than 500 comments, perform strategic sampling — select high-like comments + randomly sample medium-popularity comments to capture minority viewpoints
Retain metadata (like count, author) — helps judge which viewpoints are more popular

Save the cleaned data to

comments_cleaned.json

.

Step 3: Semantic Clustering

Read all cleaned comments and group them by semantic similarity — comments expressing the same underlying argument are grouped together, even if the wording is completely different.

Efficient Clustering Methods:

Read comments in batches (50-100 at a time) and perform the first round of grouping
Merge clusters by comparing results across batches — same argument = same cluster
Each cluster should represent a unique position or argument, not just a topic
Name each cluster with a concise argument statement (instead of a hashtag)

Cluster Naming Guidelines: Each cluster name should be an argument, not a topic.

Good example: "This feature breaks backward compatibility and should be made optional"
Bad example: "Backward compatibility concerns"

Save the clustering results to

clusters.json

:

json

[
  {
    "cluster_id": 1,
    "name": "Concise argument statement",
    "comment_count": 45,
    "representative_comments": ["full text of 2-3 best examples"],
    "support_ratio": 0.7,
    "sample_comment_ids": ["id1", "id2", "id3"]
  }
]

Step 4: Debate Analysis

For each cluster, identify the following:

Position: What exactly is this group arguing for?
Arguments: What facts, experiences, or logic do they cite?
Belief Strength: Are they definitive or hesitant? Use language cues and like counts to judge
Relationship with Other Clusters: Is this an opposing view to another cluster? Or a supplement or extension?

Then synthesize all clusters to identify:

Core Debate Axes: Fundamental disagreements (e.g., "Security vs. Convenience", "Innovation vs. Stability")
Consensus Points: Points agreed upon by most clusters
Disagreement Points: Points where the community has clear opposition
Minority Viewpoints: Viewpoints held by few but with strong arguments

Step 5: Generate Report

Output a Markdown report using the following template:

markdown

# [Topic] Community Opinion Analysis

> Data Source: [URL]
> Total Comments: N (M valid comments analyzed)
> Generated Time: YYYY-MM-DD

## Summary

[2-3 sentences summarizing the community's overall attitude and main disagreements]

## Core Debate Points

[Describe the 1-2 most core disagreement axes, explain why these are the focus of the debate]

## Viewpoint Clusters

### Viewpoint 1: [Argument Statement]
- **Proportion**: ~X% (about N comments)
- **Core Arguments**: [Main reasons supporting this viewpoint]
- **Typical Comments**: [1-2 representative original comments]
- **Popularity**: [Like count/support level]

### Viewpoint 2: [Argument Statement]
...

## Consensus & Disagreements

### Consensus
- [Points agreed upon by most people]

### Disagreements
- [Main opposing points, which viewpoints are in direct conflict]

## Minority Viewpoints
- [Viewpoints held by few but with strong arguments, worthy of attention]

## Sentiment Analysis
- **Overall Sentiment**: [Positive/Negative/Neutral/Mixed]
- **Sentiment Intensity**: [Intense/Mild]
- **Sentiment Trend**: [If timeline data is available]

If the user also requests JSON output, save the structured data as well.

Analysis Tips

Don't just count votes — a high-like minority viewpoint may be more important than a low-participation majority position
Pay attention to how people argue, not just what they say. Sarcasm, emotional language, and defensive statements all indicate strong positions
Look for implied arguments — sometimes the real disagreement is not explicitly stated (e.g., people arguing about implementation details may actually be arguing about priorities)
Cross-reference with replies — a reply strongly opposing a parent comment reveals the debate structure
If comments involve multiple languages, cluster by argument (regardless of language), then note the language distribution in each cluster