content-analysis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseContent Analysis Skill
内容分析技能
Analyze text content using advanced NLP techniques and LLM-powered insights to extract sentiment, topics, and actionable intelligence from various content sources.
使用先进的NLP技术和LLM驱动的洞察分析文本内容,从多种内容来源中提取情感、主题和可落地的情报信息。
Quick Start
快速入门
This skill helps you:
- Analyze sentiment using both traditional NLP and LLM methods
- Extract topics and keywords from large text datasets
- Classify and cluster content automatically
- Identify viral content patterns and characteristics
- Generate content insights and recommendations
- Support multiple languages and content formats
本技能可帮助你:
- 使用传统NLP和LLM两种方法分析情感
- 从大型文本数据集中提取主题和关键词
- 自动对内容进行分类和聚类
- 识别爆款内容的模式和特征
- 生成内容洞察和优化建议
- 支持多语言和多种内容格式
When to Use
适用场景
- Social Media Analysis: Facebook, Twitter, Instagram, Weibo posts
- Content Marketing: Blog posts, articles, marketing copy analysis
- Video Content: YouTube titles, descriptions, comments analysis
- Product Reviews: Amazon, e-commerce customer feedback
- News Analysis: Article categorization, sentiment tracking
- Customer Feedback: Support tickets, surveys, reviews analysis
- 社交媒体分析:Facebook、Twitter、Instagram、微博帖子
- 内容营销:博客文章、软文、营销文案分析
- 视频内容:YouTube标题、描述、评论分析
- 产品评论:亚马逊、电商平台客户反馈
- 新闻分析:文章分类、情感追踪
- 客户反馈:支持工单、调研、评论分析
Key Requirements
核心环境要求
Traditional NLP Analysis
传统NLP分析
bash
pip install pandas numpy matplotlib seaborn nltk scikit-learn wordcloudbash
pip install pandas numpy matplotlib seaborn nltk scikit-learn wordcloudLLM-Enhanced Analysis (Optional)
LLM增强分析(可选)
bash
pip install openai dashscope # For OpenAI and Qwen API accessbash
pip install openai dashscope # For OpenAI and Qwen API accessSetup NLTK Data
配置NLTK数据
python
import nltk
nltk.download('vader_lexicon')
nltk.download('punkt')
nltk.download('stopwords')python
import nltk
nltk.download('vader_lexicon')
nltk.download('punkt')
nltk.download('stopwords')Core Workflow
核心工作流
1. Data Preparation
1. 数据准备
Your data should include:
- Text Content: Main text to analyze (titles, descriptions, comments, etc.)
- Metadata: Optional (author, date, category, engagement metrics)
- Multiple Languages: Support for English, Chinese, and other languages
你的数据需要包含:
- 文本内容:待分析的核心文本(标题、描述、评论等)
- 元数据:可选(作者、日期、分类、互动指标)
- 多语言支持:支持英文、中文及其他语言
2. Analysis Process
2. 分析流程
- Text Preprocessing: Clean, tokenize, and normalize text
- Sentiment Analysis: Traditional VADER + LLM-enhanced analysis
- Topic Extraction: TF-IDF keywords + LLM semantic topics
- Content Classification: Automated categorization and clustering
- Pattern Recognition: Identify viral content characteristics
- Insight Generation: Actionable recommendations
- 文本预处理:对文本进行清洗、分词和标准化
- 情感分析:传统VADER + LLM增强分析
- 主题提取:TF-IDF关键词 + LLM语义主题
- 内容分类:自动分类和聚类
- 模式识别:识别爆款内容特征
- 洞察生成:可落地的优化建议
3. Output Deliverables
3. 输出交付物
- Sentiment analysis reports with confidence scores
- Topic models and keyword extractions
- Content classification results
- Viral content pattern analysis
- Optimization recommendations
- 带置信度评分的情感分析报告
- 主题模型和关键词提取结果
- 内容分类结果
- 爆款内容模式分析
- 优化建议
Example Usage Scenarios
使用场景示例
Social Media Content Analysis
社交媒体内容分析
python
undefinedpython
undefinedAnalyze Twitter posts for brand sentiment
Analyze Twitter posts for brand sentiment
Identify trending topics and hashtags
Identify trending topics and hashtags
Measure engagement patterns
Measure engagement patterns
undefinedundefinedYouTube Video Analysis
YouTube视频分析
python
undefinedpython
undefinedAnalyze video titles and descriptions
Analyze video titles and descriptions
Extract topics from comments
Extract topics from comments
Identify viral content patterns
Identify viral content patterns
undefinedundefinedProduct Review Analysis
产品评论分析
python
undefinedpython
undefinedAnalyze customer feedback sentiment
Analyze customer feedback sentiment
Extract product feature mentions
Extract product feature mentions
Identify improvement opportunities
Identify improvement opportunities
undefinedundefinedKey Analysis Methods
核心分析方法
Traditional NLP Techniques
传统NLP技术
- VADER Sentiment Analysis: Rule-based sentiment scoring
- TF-IDF Keyword Extraction: Statistical term importance
- Text Clustering: K-means and hierarchical clustering
- Word Frequency Analysis: Term frequency and co-occurrence
- Language Detection: Automatic language identification
- VADER情感分析:基于规则的情感评分
- TF-IDF关键词提取:统计词项重要性
- 文本聚类:K-means和层次聚类
- 词频分析:词频和共现分析
- 语言检测:自动识别语言
LLM-Enhanced Analysis
LLM增强分析
- Context-Aware Sentiment: Nuanced emotion understanding
- Semantic Topic Extraction: Meaning-based topic identification
- Content Summarization: Automatic text summarization
- Multi-Language Support: Cross-lingual analysis
- Zero-Shot Classification: Categorization without training data
- 上下文感知情感分析:细粒度情感理解
- 语义主题提取:基于含义的主题识别
- 内容摘要:自动文本摘要
- 多语言支持:跨语言分析
- 零样本分类:无需训练数据即可分类
Advanced Analytics
高级分析
- Time Series Analysis: Content trends over time
- Engagement Prediction: Predict viral potential
- Competitive Analysis: Compare content performance
- Audience Insights: Demographic and preference analysis
- 时间序列分析:内容趋势随时间变化情况
- 互动量预测:预测爆款潜力
- 竞品分析:对比内容表现
- 受众洞察:人口统计和偏好分析
Common Business Questions Answered
可解答的常见业务问题
- What is the overall sentiment toward our brand?
- Which topics are trending in our industry?
- What makes content go viral?
- How does sentiment vary by demographic or region?
- What are customers saying about our products?
- Which content formats perform best?
- 用户对我们品牌的整体情感倾向如何?
- 我们行业内的热门主题有哪些?
- 内容成为爆款的原因是什么?
- 不同人群或地区的情感倾向有什么差异?
- 客户对我们产品的评价如何?
- 哪种内容格式表现最好?
Integration Examples
集成示例
See examples/ directory for:
- - Traditional NLP analysis
basic_content_analysis.py - - LLM-powered analysis
llm_enhanced_analysis.py - - Social media specific analysis
social_media_analysis.py - Sample datasets for testing
查看examples/目录获取以下内容:
- - 传统NLP分析
basic_content_analysis.py - - LLM驱动的分析
llm_enhanced_analysis.py - - 社交媒体专项分析
social_media_analysis.py - 测试用样本数据集
LLM Configuration
LLM配置
Supported LLM Providers
支持的LLM提供商
- OpenAI: GPT-3.5, GPT-4 models
- Qwen (通义千问): Chinese-optimized models
- Open Source: Local models via HuggingFace
- OpenAI:GPT-3.5、GPT-4模型
- Qwen(通义千问):中文优化模型
- 开源模型:通过HuggingFace接入本地模型
API Setup Examples
API配置示例
python
undefinedpython
undefinedOpenAI Configuration
OpenAI Configuration
import openai
openai.api_key = 'your-api-key'
import openai
openai.api_key = 'your-api-key'
Qwen Configuration
Qwen Configuration
import dashscope
dashscope.api_key = 'your-api-key'
undefinedimport dashscope
dashscope.api_key = 'your-api-key'
undefinedBest Practices
最佳实践
- Data Quality: Ensure clean, consistent text data
- Sampling Strategy: Use representative samples for LLM analysis
- Cost Management: Balance traditional NLP with LLM calls
- Language Handling: Configure appropriate language models
- Validation: Cross-validate sentiment analysis results
- Privacy: Ensure compliance with data protection regulations
- 数据质量:确保文本数据干净、格式统一
- 采样策略:LLM分析使用有代表性的样本
- 成本管理:平衡传统NLP和LLM调用的使用
- 语言处理:配置适配的语言模型
- 结果验证:交叉验证情感分析结果
- 隐私合规:确保符合数据保护法规要求
Performance Optimization
性能优化
For Large Datasets
针对大型数据集
- Use data sampling for LLM analysis
- Implement batch processing
- Cache LLM responses when possible
- Use traditional NLP for initial filtering
- LLM分析使用数据采样
- 实现批处理
- 尽可能缓存LLM响应
- 初始过滤阶段使用传统NLP
Cost Management
成本管理
- Prioritize important content for LLM analysis
- Use traditional NLP for bulk processing
- Implement smart sampling strategies
- Monitor API usage and costs
- 优先对重要内容进行LLM分析
- 批量处理使用传统NLP
- 实现智能采样策略
- 监控API使用情况和成本
Advanced Features
高级功能
- Real-time Analysis: Stream processing for live content
- Multi-modal Analysis: Text + image + video content
- Custom Models: Fine-tune models for specific domains
- Integration APIs: Connect with content management systems
- Automated Reporting: Scheduled analysis and reporting
- 实时分析:直播内容的流处理
- 多模态分析:文本+图片+视频内容分析
- 自定义模型:针对特定领域微调模型
- 集成API:对接内容管理系统
- 自动报告:定时生成分析报告
Troubleshooting
故障排查
Common Issues
常见问题
- Low Sentiment Accuracy: Check language settings and text preprocessing
- High API Costs: Optimize sampling and caching strategies
- Slow Processing: Implement parallel processing and batching
- Language Support: Ensure appropriate models for non-English content
- 情感分析准确率低:检查语言设置和文本预处理步骤
- API成本过高:优化采样和缓存策略
- 处理速度慢:实现并行处理和批处理
- 语言支持问题:确保非英文内容使用适配的模型
Performance Tips
性能优化建议
- Pre-process text data effectively
- Use appropriate model sizes for tasks
- Implement result caching
- Monitor resource usage and optimize
- 对文本数据进行高效预处理
- 为任务选择合适大小的模型
- 实现结果缓存
- 监控资源使用情况并优化