LLM-Based Public Opinion Analytics Assistant

Skill by ara.so — Data Skills collection.

Overview

This project is an intelligent public opinion analysis assistant that integrates real-time data from 15 mainstream platforms across 26 ranking lists with large language model (LLM) analysis capabilities. It provides conversational hot search queries, topic-specific searches, topic clustering, and sentiment analysis. The system supports:

Real-time web scraping from platforms like Weibo, Bilibili, Douyin, Baidu, etc.
LLM-powered content analysis (including video content extraction)
Multi-channel push notifications (WeChat, Enterprise WeChat, Telegram, Email)
Keyboard shortcuts for crawler control
Quick data lookup and platform jumping

Installation

Prerequisites

Python Environment: Python 3.8+
MySQL Database: MySQL 5.7+ or 8.0+
Browser Driver: ChromeDriver or EdgeDriver

Step 1: Browser Driver Setup

Download the driver matching your browser version:

Chrome: ChromeDriver Downloads
Edge: EdgeDriver Downloads

Add the driver to your system PATH:

bash

# macOS/Linux
export PATH=$PATH:/path/to/driver/directory

# Windows: Add to System Environment Variables

Verify installation:

bash

chromedriver --version
# or
msedgedriver --version

Step 2: Clone and Install Dependencies

bash

git clone https://github.com/hmmnxkl/LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant.git
cd LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Step 3: Database Setup

Create MySQL database and tables:

python

# Reference init.py for schema
import mysql.connector

conn = mysql.connector.connect(
    host=os.getenv('MYSQL_HOST', 'localhost'),
    user=os.getenv('MYSQL_USER'),
    password=os.getenv('MYSQL_PASSWORD')
)

cursor = conn.cursor()
cursor.execute("CREATE DATABASE IF NOT EXISTS hotsearch_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci")
cursor.execute("USE hotsearch_db")

# Create tables (see init.py for full schema)
cursor.execute("""
    CREATE TABLE IF NOT EXISTS hot_search_items (
        id INT AUTO_INCREMENT PRIMARY KEY,
        platform VARCHAR(50),
        title VARCHAR(500),
        url TEXT,
        rank_index INT,
        heat_value VARCHAR(100),
        collected_at DATETIME,
        content TEXT,
        sentiment VARCHAR(20),
        INDEX idx_platform (platform),
        INDEX idx_collected (collected_at)
    )
""")

conn.commit()

Step 4: Environment Configuration

Create

.env

file in project root:

bash

# MySQL Configuration
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_USER=your_mysql_user
MYSQL_PASSWORD=your_mysql_password
MYSQL_DATABASE=hotsearch_db

# LLM Configuration (OpenAI-compatible API)
OPENAI_API_KEY=your_api_key
OPENAI_API_BASE=https://api.openai.com/v1
MODEL_NAME=gpt-4

# Or use Huawei Pangu Model (local deployment)
# PANGU_MODEL_PATH=/path/to/pangu/model
# PANGU_API_URL=http://localhost:8080

# Push Notification Channels
# WeChat Work Bot
WECHAT_WORK_BOT_WEBHOOK=your_webhook_url

# WeChat Work App
WECHAT_WORK_CORP_ID=your_corp_id
WECHAT_WORK_AGENT_ID=your_agent_id
WECHAT_WORK_SECRET=your_secret

# Telegram
TELEGRAM_BOT_TOKEN=your_bot_token
TELEGRAM_CHAT_ID=your_chat_id

# Email (SMTP)
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_USER=your_email@gmail.com
SMTP_PASSWORD=your_app_password
SMTP_RECIPIENTS=recipient1@example.com,recipient2@example.com

Core Components

1. Web Scraping System (

hotsearchcrawler/

)

The crawler cluster supports 15 platforms with 26 ranking lists:

python

# Run all spiders
python run_spiders.py

# Test specific spider
python runspider-test.py weibo  # Test Weibo scraper

Crawler Configuration

Edit

hotsearchcrawler/settings.py

python

# MySQL settings
MYSQL_HOST = os.getenv('MYSQL_HOST', 'localhost')
MYSQL_PORT = int(os.getenv('MYSQL_PORT', 3306))
MYSQL_USER = os.getenv('MYSQL_USER')
MYSQL_PASSWORD = os.getenv('MYSQL_PASSWORD')
MYSQL_DATABASE = os.getenv('MYSQL_DATABASE', 'hotsearch_db')

# Optional: Platform-specific cookies
COOKIES = {
    'weibo': 'your_weibo_cookies',
    'bilibili': 'your_bilibili_cookies'
}

# Crawler settings
CONCURRENT_REQUESTS = 16
DOWNLOAD_DELAY = 1
RANDOMIZE_DOWNLOAD_DELAY = True

Available Platforms

Social Media: Weibo, Douyin, Kuaishou
Video: Bilibili, Tencent Video
News: Baidu, Toutiao, Zhihu
E-commerce: Taobao, JD.com
Gaming: Steam, Tap Tap
Others: Tieba, Douban, etc.

2. Analysis System (

hotsearch_analysis_agent/

)

LLM-powered analysis engine for topic clustering, sentiment analysis, and report generation.

python

from hotsearch_analysis_agent.analyzer import HotSearchAnalyzer

# Initialize analyzer
analyzer = HotSearchAnalyzer(
    api_key=os.getenv('OPENAI_API_KEY'),
    api_base=os.getenv('OPENAI_API_BASE'),
    model_name=os.getenv('MODEL_NAME', 'gpt-4')
)

# Analyze topics
topics = analyzer.fetch_topics(
    platform='weibo',
    start_date='2026-05-01',
    end_date='2026-05-20'
)

# Topic clustering
clusters = analyzer.cluster_topics(topics, n_clusters=5)

# Sentiment analysis
for topic in topics:
    sentiment = analyzer.analyze_sentiment(topic['title'], topic['content'])
    print(f"{topic['title']}: {sentiment}")

# Generate report
report = analyzer.generate_report(
    query="人工智能与前沿科技",
    platforms=['weibo', 'bilibili', 'zhihu'],
    days=7
)
print(report)

Custom LLM Integration

python

# Using Huawei Pangu Model (local deployment)
from hotsearch_analysis_agent.llm import PanguLLM

pangu = PanguLLM(
    model_path=os.getenv('PANGU_MODEL_PATH'),
    api_url=os.getenv('PANGU_API_URL')
)

response = pangu.generate(
    prompt="分析以下新闻的情感倾向:\n{news_content}",
    max_tokens=500
)

3. Web Application (

app.py

)

FastAPI-based web interface for interactive queries and control.

python

# Start the web application
python app.py

# Default runs on http://localhost:8000

API Endpoints

python

from fastapi import FastAPI
from hotsearch_analysis_agent.api import router

app = FastAPI()
app.include_router(router)

# Example API calls
import httpx

# Query hot searches
response = httpx.get('http://localhost:8000/api/hot-search', params={
    'platform': 'weibo',
    'limit': 20
})

# Search by keyword
response = httpx.post('http://localhost:8000/api/search', json={
    'keyword': '人工智能',
    'platforms': ['weibo', 'zhihu'],
    'days': 7
})

# Start crawler
response = httpx.post('http://localhost:8000/api/crawler/start', json={
    'platforms': ['weibo', 'bilibili']
})

# Stop crawler
response = httpx.post('http://localhost:8000/api/crawler/stop')

Push Notification System

Configure and test multi-channel alerts:

python

# test_push_task.py
from hotsearch_analysis_agent.push import PushManager

manager = PushManager()

# Configure push task
task = {
    'name': 'AI Tech Monitor',
    'query': '人工智能',
    'platforms': ['weibo', 'zhihu', 'bilibili'],
    'schedule': '0 9,18 * * *',  # Cron format: 9 AM and 6 PM daily
    'channels': ['wechat_work', 'telegram', 'email'],
    'min_heat': 100000  # Minimum heat value threshold
}

manager.create_task(task)

# Test push manually
report = """
## AI Technology Hot Topics - 2026-05-20

### Key Findings
- GPT-6 context window leaked: 2M tokens
- DeepSeek V4 uses Huawei Ascend chips
- Chinese LLM API calls lead globally for 5 weeks

[Full report content...]
"""

# Send to WeChat Work
manager.send_wechat_work(report)

# Send to Telegram
manager.send_telegram(report)

# Send email
manager.send_email(
    subject="AI Technology Hot Topics - 2026-05-20",
    content=report
)

Push Channel Configuration

python

# WeChat Work Bot (Group Webhook)
import requests

def send_wechat_work_bot(content):
    webhook = os.getenv('WECHAT_WORK_BOT_WEBHOOK')
    data = {
        "msgtype": "markdown",
        "markdown": {
            "content": content
        }
    }
    requests.post(webhook, json=data)

# Telegram Bot
from telegram import Bot

def send_telegram(content):
    bot = Bot(token=os.getenv('TELEGRAM_BOT_TOKEN'))
    chat_id = os.getenv('TELEGRAM_CHAT_ID')
    bot.send_message(chat_id=chat_id, text=content, parse_mode='Markdown')

# Email via SMTP
import smtplib
from email.mime.text import MIMEText

def send_email(subject, content):
    msg = MIMEText(content, 'html', 'utf-8')
    msg['Subject'] = subject
    msg['From'] = os.getenv('SMTP_USER')
    msg['To'] = os.getenv('SMTP_RECIPIENTS')
    
    with smtplib.SMTP(os.getenv('SMTP_HOST'), int(os.getenv('SMTP_PORT'))) as server:
        server.starttls()
        server.login(os.getenv('SMTP_USER'), os.getenv('SMTP_PASSWORD'))
        server.send_message(msg)

Common Usage Patterns

Pattern 1: Daily Hot Topic Monitoring

python

from datetime import datetime, timedelta
from hotsearch_analysis_agent.analyzer import HotSearchAnalyzer
from hotsearch_analysis_agent.push import PushManager

analyzer = HotSearchAnalyzer()
push_manager = PushManager()

# Get yesterday's hot topics
yesterday = datetime.now() - timedelta(days=1)
topics = analyzer.fetch_topics(
    platforms=['weibo', 'zhihu', 'bilibili'],
    start_date=yesterday.strftime('%Y-%m-%d'),
    heat_threshold=50000
)

# Cluster and analyze
clusters = analyzer.cluster_topics(topics, n_clusters=5)

# Generate report
report = analyzer.generate_report_from_clusters(clusters)

# Push to all channels
push_manager.broadcast(report, channels=['wechat_work', 'telegram', 'email'])

Pattern 2: Keyword Alert System

python

# Monitor specific keywords and send immediate alerts
from hotsearch_analysis_agent.monitor import KeywordMonitor

monitor = KeywordMonitor(
    keywords=['芯片', 'AI', '大模型', '华为'],
    platforms=['weibo', 'toutiao', 'zhihu'],
    check_interval=300  # Check every 5 minutes
)

def on_match(topic):
    """Callback when keyword is matched"""
    alert = f"""
    🔔 Keyword Alert: {topic['title']}
    Platform: {topic['platform']}
    Heat: {topic['heat_value']}
    URL: {topic['url']}
    """
    push_manager.send_telegram(alert)

monitor.start(callback=on_match)

Pattern 3: Deep Content Analysis

python

# Analyze news detail pages (including video content)
from hotsearch_analysis_agent.content_extractor import ContentExtractor

extractor = ContentExtractor()

# Get detailed content from URL
url = 'https://www.bilibili.com/video/BV13pSoBBEvX/'
content = extractor.extract(url)

print(f"Title: {content['title']}")
print(f"Type: {content['type']}")  # 'video' or 'article'
print(f"Content: {content['text'][:500]}...")  # Extracted transcript/text

# Analyze sentiment
sentiment = analyzer.analyze_sentiment(content['title'], content['text'])
print(f"Sentiment: {sentiment}")

# Extract entities
entities = analyzer.extract_entities(content['text'])
print(f"Entities: {entities}")

Pattern 4: Custom Report Generation

python

# Generate custom analytical report
report_config = {
    'title': '科技行业周报',
    'query': '人工智能 OR 芯片 OR 量子计算',
    'platforms': ['all'],
    'date_range': 7,
    'sections': [
        'core_findings',  # Key discoveries
        'news_details',   # Detailed news list
        'trend_analysis', # Trend analysis
        'entity_network'  # Entity relationship graph
    ],
    'output_format': 'markdown'
}

report = analyzer.generate_custom_report(**report_config)

# Save to file
with open(f"report_{datetime.now().strftime('%Y%m%d')}.md", 'w', encoding='utf-8') as f:
    f.write(report)

Troubleshooting

Issue 1: Browser Driver Errors

selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH

Solution: Ensure ChromeDriver/EdgeDriver is in system PATH and matches browser version.

bash

# Check driver version
chromedriver --version

# Check Chrome version
google-chrome --version  # Linux
# or open chrome://version in browser

# Download matching version from https://chromedriver.chromium.org/

Issue 2: Database Connection Failures

mysql.connector.errors.ProgrammingError: Access denied for user

Solution: Verify MySQL credentials in

.env

and ensure user has proper permissions.

sql

-- Grant permissions
GRANT ALL PRIVILEGES ON hotsearch_db.* TO 'your_user'@'localhost';
FLUSH PRIVILEGES;

Issue 3: LLM API Rate Limits

openai.error.RateLimitError: Rate limit exceeded

Solution: Implement request throttling or switch to local model:

python

import time
from functools import wraps

def rate_limit(calls_per_minute=10):
    min_interval = 60.0 / calls_per_minute
    last_called = [0.0]
    
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            elapsed = time.time() - last_called[0]
            wait_time = min_interval - elapsed
            if wait_time > 0:
                time.sleep(wait_time)
            result = func(*args, **kwargs)
            last_called[0] = time.time()
            return result
        return wrapper
    return decorator

@rate_limit(calls_per_minute=10)
def call_llm(prompt):
    return analyzer.generate(prompt)

Issue 4: Crawler Being Blocked

Solution: Rotate user agents and add delays:

python

# In hotsearchcrawler/settings.py
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
}

DOWNLOAD_DELAY = 3
RANDOMIZE_DOWNLOAD_DELAY = True
CONCURRENT_REQUESTS_PER_DOMAIN = 2

Issue 5: Encoding Issues with Chinese Text

Solution: Ensure UTF-8 encoding throughout:

python

# Database connection
import mysql.connector

conn = mysql.connector.connect(
    host=os.getenv('MYSQL_HOST'),
    user=os.getenv('MYSQL_USER'),
    password=os.getenv('MYSQL_PASSWORD'),
    database=os.getenv('MYSQL_DATABASE'),
    charset='utf8mb4',
    collation='utf8mb4_unicode_ci'
)

# File operations
with open('report.md', 'w', encoding='utf-8') as f:
    f.write(report)

Advanced Configuration

Using Huawei Pangu Model (Local Deployment)

Download and deploy the model:

bash

# Download from https://ai.gitcode.com/ascend-tribe/openpangu-embedded-7b-model
# Start model service
python -m hotsearch_analysis_agent.llm.pangu_server --model_path /path/to/model --port 8080

Configure in code:

python

from hotsearch_analysis_agent.llm import PanguLLM

analyzer = HotSearchAnalyzer(
    llm=PanguLLM(api_url='http://localhost:8080')
)

Distributed Crawling

Scale up with multiple crawler instances:

bash

# Instance 1: Weibo, Zhihu
python run_spiders.py --platforms weibo,zhihu

# Instance 2: Bilibili, Douyin
python run_spiders.py --platforms bilibili,douyin

# Instance 3: News platforms
python run_spiders.py --platforms baidu,toutiao

Project Structure Reference

.
├── app.py                          # Web application entry
├── run_spiders.py                  # Crawler launcher
├── runspider-test.py               # Crawler testing
├── test_push_task.py               # Push notification testing
├── init.py                         # Database initialization
├── requirements.txt                # Python dependencies
├── .env                            # Environment configuration
├── hotsearchcrawler/               # Crawler cluster
│   ├── spiders/                    # Platform-specific spiders
│   ├── settings.py                 # Crawler settings
│   └── pipelines.py                # Data pipelines
└── hotsearch_analysis_agent/       # Analysis system
    ├── analyzer.py                 # Core analysis engine
    ├── llm/                        # LLM integrations
    ├── push/                       # Push notification modules
    ├── api/                        # Web API endpoints
    └── content_extractor.py        # Content extraction utilities

llm-public-opinion-analytics

NPX Install

Tags

SKILL.md Content

LLM-Based Public Opinion Analytics Assistant

Overview

Installation

Prerequisites

Step 1: Browser Driver Setup

Step 2: Clone and Install Dependencies

Step 3: Database Setup

Step 4: Environment Configuration

Core Components

1. Web Scraping System (
`hotsearchcrawler/`
)

Crawler Configuration

Available Platforms

2. Analysis System (
`hotsearch_analysis_agent/`
)

Custom LLM Integration

3. Web Application (
`app.py`
)

API Endpoints

Push Notification System

Push Channel Configuration

Common Usage Patterns

Pattern 1: Daily Hot Topic Monitoring

Pattern 2: Keyword Alert System

Pattern 3: Deep Content Analysis

Pattern 4: Custom Report Generation

Troubleshooting

Issue 1: Browser Driver Errors

Issue 2: Database Connection Failures

Issue 3: LLM API Rate Limits

Issue 4: Crawler Being Blocked

Issue 5: Encoding Issues with Chinese Text

Advanced Configuration

Using Huawei Pangu Model (Local Deployment)

Distributed Crawling

Project Structure Reference

llm-public-opinion-analytics

NPX Install

Tags

SKILL.md Content

LLM-Based Public Opinion Analytics Assistant

Overview

Installation

Prerequisites

Step 1: Browser Driver Setup

Step 2: Clone and Install Dependencies

Step 3: Database Setup

Step 4: Environment Configuration

Core Components

1. Web Scraping System (hotsearchcrawler/)

Crawler Configuration

Available Platforms

2. Analysis System (hotsearch_analysis_agent/)

Custom LLM Integration

3. Web Application (app.py)

API Endpoints

Push Notification System

Push Channel Configuration

Common Usage Patterns

Pattern 1: Daily Hot Topic Monitoring

Pattern 2: Keyword Alert System

Pattern 3: Deep Content Analysis

Pattern 4: Custom Report Generation

Troubleshooting

Issue 1: Browser Driver Errors

Issue 2: Database Connection Failures

Issue 3: LLM API Rate Limits

Issue 4: Crawler Being Blocked

Issue 5: Encoding Issues with Chinese Text

Advanced Configuration

Using Huawei Pangu Model (Local Deployment)

Distributed Crawling

Project Structure Reference

1. Web Scraping System (
`hotsearchcrawler/`
)

2. Analysis System (
`hotsearch_analysis_agent/`
)

3. Web Application (
`app.py`
)