content-analysis

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Content Analysis Skill

内容分析技能

Analyze text content using advanced NLP techniques and LLM-powered insights to extract sentiment, topics, and actionable intelligence from various content sources.

使用先进的NLP技术和LLM驱动的洞察分析文本内容，从多种内容来源中提取情感、主题和可落地的情报信息。

Quick Start

快速入门

This skill helps you:

Analyze sentiment using both traditional NLP and LLM methods
Extract topics and keywords from large text datasets
Classify and cluster content automatically
Identify viral content patterns and characteristics
Generate content insights and recommendations
Support multiple languages and content formats

本技能可帮助你：

使用传统NLP和LLM两种方法分析情感
从大型文本数据集中提取主题和关键词
自动对内容进行分类和聚类
识别爆款内容的模式和特征
生成内容洞察和优化建议
支持多语言和多种内容格式

When to Use

适用场景

Social Media Analysis: Facebook, Twitter, Instagram, Weibo posts
Content Marketing: Blog posts, articles, marketing copy analysis
Video Content: YouTube titles, descriptions, comments analysis
Product Reviews: Amazon, e-commerce customer feedback
News Analysis: Article categorization, sentiment tracking
Customer Feedback: Support tickets, surveys, reviews analysis

社交媒体分析：Facebook、Twitter、Instagram、微博帖子
内容营销：博客文章、软文、营销文案分析
视频内容：YouTube标题、描述、评论分析
产品评论：亚马逊、电商平台客户反馈
新闻分析：文章分类、情感追踪
客户反馈：支持工单、调研、评论分析

Key Requirements

核心环境要求

Traditional NLP Analysis

传统NLP分析

bash

pip install pandas numpy matplotlib seaborn nltk scikit-learn wordcloud

bash

pip install pandas numpy matplotlib seaborn nltk scikit-learn wordcloud

LLM-Enhanced Analysis (Optional)

LLM增强分析（可选）

bash

pip install openai dashscope  # For OpenAI and Qwen API access

bash

pip install openai dashscope  # For OpenAI and Qwen API access

Setup NLTK Data

配置NLTK数据

python

import nltk
nltk.download('vader_lexicon')
nltk.download('punkt')
nltk.download('stopwords')

python

import nltk
nltk.download('vader_lexicon')
nltk.download('punkt')
nltk.download('stopwords')

Core Workflow

核心工作流

1. Data Preparation

1. 数据准备

Your data should include:

Text Content: Main text to analyze (titles, descriptions, comments, etc.)
Metadata: Optional (author, date, category, engagement metrics)
Multiple Languages: Support for English, Chinese, and other languages

你的数据需要包含：

文本内容：待分析的核心文本（标题、描述、评论等）
元数据：可选（作者、日期、分类、互动指标）
多语言支持：支持英文、中文及其他语言

2. Analysis Process

2. 分析流程

Text Preprocessing: Clean, tokenize, and normalize text
Sentiment Analysis: Traditional VADER + LLM-enhanced analysis
Topic Extraction: TF-IDF keywords + LLM semantic topics
Content Classification: Automated categorization and clustering
Pattern Recognition: Identify viral content characteristics
Insight Generation: Actionable recommendations

文本预处理：对文本进行清洗、分词和标准化
情感分析：传统VADER + LLM增强分析
主题提取：TF-IDF关键词 + LLM语义主题
内容分类：自动分类和聚类
模式识别：识别爆款内容特征
洞察生成：可落地的优化建议

3. Output Deliverables

3. 输出交付物

Sentiment analysis reports with confidence scores
Topic models and keyword extractions
Content classification results
Viral content pattern analysis
Optimization recommendations

带置信度评分的情感分析报告
主题模型和关键词提取结果
内容分类结果
爆款内容模式分析
优化建议

Example Usage Scenarios

使用场景示例

Social Media Content Analysis

社交媒体内容分析

python

undefined

python

undefined

Analyze Twitter posts for brand sentiment

Identify trending topics and hashtags

Measure engagement patterns

undefined

undefined

YouTube Video Analysis

YouTube视频分析

python

undefined

python

undefined

Analyze video titles and descriptions

Extract topics from comments

Identify viral content patterns

undefined

undefined

Product Review Analysis

产品评论分析

python

undefined

python

undefined

Analyze customer feedback sentiment

Extract product feature mentions

Identify improvement opportunities

undefined

undefined

Key Analysis Methods

核心分析方法

Traditional NLP Techniques

传统NLP技术

VADER Sentiment Analysis: Rule-based sentiment scoring
TF-IDF Keyword Extraction: Statistical term importance
Text Clustering: K-means and hierarchical clustering
Word Frequency Analysis: Term frequency and co-occurrence
Language Detection: Automatic language identification

VADER情感分析：基于规则的情感评分
TF-IDF关键词提取：统计词项重要性
文本聚类：K-means和层次聚类
词频分析：词频和共现分析
语言检测：自动识别语言

LLM-Enhanced Analysis

LLM增强分析

Context-Aware Sentiment: Nuanced emotion understanding
Semantic Topic Extraction: Meaning-based topic identification
Content Summarization: Automatic text summarization
Multi-Language Support: Cross-lingual analysis
Zero-Shot Classification: Categorization without training data

上下文感知情感分析：细粒度情感理解
语义主题提取：基于含义的主题识别
内容摘要：自动文本摘要
多语言支持：跨语言分析
零样本分类：无需训练数据即可分类

Advanced Analytics

高级分析

Time Series Analysis: Content trends over time
Engagement Prediction: Predict viral potential
Competitive Analysis: Compare content performance
Audience Insights: Demographic and preference analysis

时间序列分析：内容趋势随时间变化情况
互动量预测：预测爆款潜力
竞品分析：对比内容表现
受众洞察：人口统计和偏好分析

Common Business Questions Answered

可解答的常见业务问题

What is the overall sentiment toward our brand?
Which topics are trending in our industry?
What makes content go viral?
How does sentiment vary by demographic or region?
What are customers saying about our products?
Which content formats perform best?

用户对我们品牌的整体情感倾向如何？
我们行业内的热门主题有哪些？
内容成为爆款的原因是什么？
不同人群或地区的情感倾向有什么差异？
客户对我们产品的评价如何？
哪种内容格式表现最好？

Integration Examples

集成示例

See examples/ directory for:

```
basic_content_analysis.py
```
- Traditional NLP analysis
```
llm_enhanced_analysis.py
```
- LLM-powered analysis
```
social_media_analysis.py
```
- Social media specific analysis
Sample datasets for testing

查看examples/目录获取以下内容：

```
basic_content_analysis.py
```
- 传统NLP分析
```
llm_enhanced_analysis.py
```
- LLM驱动的分析
```
social_media_analysis.py
```
- 社交媒体专项分析
测试用样本数据集

LLM Configuration

LLM配置

Supported LLM Providers

支持的LLM提供商

OpenAI: GPT-3.5, GPT-4 models
Qwen (通义千问): Chinese-optimized models
Open Source: Local models via HuggingFace

OpenAI：GPT-3.5、GPT-4模型
Qwen（通义千问）：中文优化模型
开源模型：通过HuggingFace接入本地模型

API Setup Examples

API配置示例

python

undefined

python

undefined

OpenAI Configuration

import openai openai.api_key = 'your-api-key'

Qwen Configuration

import dashscope dashscope.api_key = 'your-api-key'

undefined

import dashscope dashscope.api_key = 'your-api-key'

undefined

Best Practices

最佳实践

Data Quality: Ensure clean, consistent text data
Sampling Strategy: Use representative samples for LLM analysis
Cost Management: Balance traditional NLP with LLM calls
Language Handling: Configure appropriate language models
Validation: Cross-validate sentiment analysis results
Privacy: Ensure compliance with data protection regulations

数据质量：确保文本数据干净、格式统一
采样策略：LLM分析使用有代表性的样本
成本管理：平衡传统NLP和LLM调用的使用
语言处理：配置适配的语言模型
结果验证：交叉验证情感分析结果
隐私合规：确保符合数据保护法规要求

Performance Optimization

性能优化

For Large Datasets

针对大型数据集

Use data sampling for LLM analysis
Implement batch processing
Cache LLM responses when possible
Use traditional NLP for initial filtering

LLM分析使用数据采样
实现批处理
尽可能缓存LLM响应
初始过滤阶段使用传统NLP

Cost Management

成本管理

Prioritize important content for LLM analysis
Use traditional NLP for bulk processing
Implement smart sampling strategies
Monitor API usage and costs

优先对重要内容进行LLM分析
批量处理使用传统NLP
实现智能采样策略
监控API使用情况和成本

Advanced Features

高级功能

Real-time Analysis: Stream processing for live content
Multi-modal Analysis: Text + image + video content
Custom Models: Fine-tune models for specific domains
Integration APIs: Connect with content management systems
Automated Reporting: Scheduled analysis and reporting

实时分析：直播内容的流处理
多模态分析：文本+图片+视频内容分析
自定义模型：针对特定领域微调模型
集成API：对接内容管理系统
自动报告：定时生成分析报告