categorizing-bsky-accounts
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCategorizing Bluesky Accounts
Bluesky账号分类
Fetch Bluesky account data and extract keywords for Claude to categorize by topic. The script compresses account context (bio + posts) into bio + keywords, then Claude performs intelligent categorization.
获取Bluesky账号数据并提取关键词,供Claude按主题分类。该脚本将账号上下文(简介+帖子)压缩为简介+关键词格式,随后由Claude执行智能分类。
Prerequisites
依赖条件
Requires: extracting-keywords skill (provides YAKE venv + domain stopwords)
The analyzer delegates keyword extraction to the extracting-keywords skill, which provides:
- Optimized YAKE installation with minimal dependencies
- Domain-specific stopwords: English (574), AI/ML (1357), Life Sciences (1293)
- Support for 34 languages
需要: extracting-keywords skill(提供YAKE venv + 领域停用词)
分析器将关键词提取任务委托给extracting-keywords skill,该技能提供:
- 优化后的YAKE安装包,依赖项极少
- 领域专属停用词:英文(574个)、AI/ML(1357个)、生命科学(1293个)
- 支持34种语言
Core Workflow
核心工作流程
When users request Bluesky account analysis:
-
Ensure keyword extraction is set up - Invoke the extracting-keywords skill using the Skill tool to ensure YAKE venv exists (skip if already invoked in this session)
-
Determine input mode based on user's request:
- Following list → use
--following handle - Followers → use
--followers handle - List of handles → use
--handles "h1,h2,h3" - File provided → use
--file accounts.txt
- Following list → use
-
Configure parameters:
- - Number to analyze (default: 100, max: 100)
--accounts N - - Posts per account (default: 20, max: 100)
--posts N - - Choose domain-specific stopwords:
--stopwords [en|ai|ls]- : English (general purpose)
en - : AI/ML domain (recommended for tech accounts)
ai - : Life Sciences (for biomedical/research accounts)
ls
- - Skip spam/bot accounts
--exclude "pattern1,pattern2"
-
Run script - Outputs simple text format to stdout:
@handle1.bsky.social (Display Name) Bio text here Keywords: keyword1, keyword2, keyword3 @handle2.bsky.social (Another Name) Bio text here Keywords: keyword4, keyword5, keyword6 -
Categorize accounts - Claude analyzes bio + keywords to categorize by topic
当用户请求Bluesky账号分析时:
-
确保关键词提取已配置完成 - 使用Skill工具调用extracting-keywords skill,确保YAKE虚拟环境已存在(若本次会话中已调用过则可跳过)
-
根据用户请求确定输入模式:
- 关注列表 → 使用
--following handle - 粉丝列表 → 使用
--followers handle - 账号句柄列表 → 使用
--handles "h1,h2,h3" - 提供文件 → 使用
--file accounts.txt
- 关注列表 → 使用
-
配置参数:
- - 要分析的账号数量(默认值:100,最大值:100)
--accounts N - - 每个账号提取的帖子数量(默认值:20,最大值:100)
--posts N - - 选择领域专属停用词:
--stopwords [en|ai|ls]- : 英文(通用场景)
en - : AI/ML领域(推荐用于科技类账号)
ai - : 生命科学(用于生物医学/研究类账号)
ls
- - 跳过垃圾/机器人账号
--exclude "pattern1,pattern2"
-
运行脚本 - 将纯文本格式结果输出到标准输出:
@handle1.bsky.social (Display Name) Bio text here Keywords: keyword1, keyword2, keyword3 @handle2.bsky.social (Another Name) Bio text here Keywords: keyword4, keyword5, keyword6 -
账号分类 - Claude根据简介+关键词执行主题分类
Quick Start
快速开始
Analyze following list with AI/ML stopwords:
bash
python scripts/bluesky_analyzer.py --following austegard.com --stopwords aiAnalyze followers:
bash
python scripts/bluesky_analyzer.py --followers austegard.comAnalyze specific handles:
bash
python scripts/bluesky_analyzer.py --handles "user1.bsky.social,user2.bsky.social,user3.bsky.social"From file:
bash
python scripts/bluesky_analyzer.py --file accounts.txt --stopwords aiFilter out bot accounts:
bash
python scripts/bluesky_analyzer.py --following handle --exclude "bot,spam,promo" --stopwords ai使用AI/ML停用词分析关注列表:
bash
python scripts/bluesky_analyzer.py --following austegard.com --stopwords ai分析粉丝列表:
bash
python scripts/bluesky_analyzer.py --followers austegard.com分析指定账号句柄:
bash
python scripts/bluesky_analyzer.py --handles "user1.bsky.social,user2.bsky.social,user3.bsky.social"从文件导入账号:
bash
python scripts/bluesky_analyzer.py --file accounts.txt --stopwords ai过滤机器人账号:
bash
python scripts/bluesky_analyzer.py --following handle --exclude "bot,spam,promo" --stopwords aiParameters
参数说明
Input Modes (choose one)
输入模式(选择其一)
--handles "h1,h2,h3"
Comma-separated list of Bluesky handles
--following HANDLE
Analyze accounts followed by HANDLE
--followers HANDLE
Analyze accounts following HANDLE
--file PATH
Read handles from file (one per line)
--handles "h1,h2,h3"
逗号分隔的Bluesky账号句柄列表
--following HANDLE
分析指定账号关注的所有账号
--followers HANDLE
分析关注指定账号的所有账号
--file PATH
从文件读取账号句柄(每行一个)
Analysis Options
分析选项
--accounts N
Number of accounts to analyze (1-100, default: 100)
--posts N
Posts to fetch per account (1-100, default: 20)
--stopwords [en|ai|ls]
Stopwords to use for keyword extraction (default: en)
- : English stopwords (574 terms) - general purpose
en - : AI/ML domain stopwords (1357 terms) - tech-focused accounts
ai - : Life Sciences stopwords (1293 terms) - biomedical/research accounts
ls
--exclude "word1,word2"
Skip accounts with these keywords in bio/posts
--accounts N
要分析的账号数量(1-100,默认值:100)
--posts N
每个账号获取的帖子数量(1-100,默认值:20)
--stopwords [en|ai|ls]
关键词提取使用的停用词(默认值:en)
- : 英文停用词(574个)- 通用场景
en - : AI/ML领域停用词(1357个)- 科技类账号专用
ai - : 生命科学停用词(1293个)- 生物医学/研究类账号专用
ls
--exclude "word1,word2"
跳过简介/帖子中包含这些关键词的账号
Output Format
输出格式
The script outputs simple text format for Claude to process:
@alice.bsky.social (Alice Smith)
AI researcher working on LLM alignment and safety
Keywords: alignment, safety research, interpretability, llm evaluation
@bob.bsky.social (Bob Johnson)
Full-stack developer building web applications
Keywords: react, typescript, node.js, api design, postgresql
@carol.bsky.social (Carol Williams)
Biotech researcher studying CRISPR applications
Keywords: crispr, gene editing, therapeutics, clinical trialsClaude then categorizes accounts based on bio + keywords without hardcoded rules.
脚本输出纯文本格式供Claude处理:
@alice.bsky.social (Alice Smith)
AI researcher working on LLM alignment and safety
Keywords: alignment, safety research, interpretability, llm evaluation
@bob.bsky.social (Bob Johnson)
Full-stack developer building web applications
Keywords: react, typescript, node.js, api design, postgresql
@carol.bsky.social (Carol Williams)
Biotech researcher studying CRISPR applications
Keywords: crispr, gene editing, therapeutics, clinical trials随后Claude会根据简介+关键词进行分类,无需硬编码规则。
Common Workflows
常见工作流
Audit Your Following List
审计你的关注列表
bash
python scripts/bluesky_analyzer.py --following your-handle.bsky.social --stopwords aiClaude will categorize accounts by topic and identify patterns in who you follow.
bash
python scripts/bluesky_analyzer.py --following your-handle.bsky.social --stopwords aiClaude会按主题分类账号,并识别你关注账号的模式。
Find Experts in a Topic
寻找特定主题的专家
bash
python scripts/bluesky_analyzer.py --following alice.bsky.social --stopwords aiAsk Claude: "Which of these accounts are ML researchers?" or "Who focuses on climate tech?"
bash
python scripts/bluesky_analyzer.py --following alice.bsky.social --stopwords ai你可以询问Claude:“这些账号中哪些是ML研究者?”或“谁专注于气候科技?”
Analyze a Curated List
分析整理好的账号列表
bash
cat > accounts.txt << 'EOF'
expert1.bsky.social
expert2.bsky.social
expert3.bsky.social
EOF
python scripts/bluesky_analyzer.py --file accounts.txt --stopwords lsbash
cat > accounts.txt << 'EOF'
expert1.bsky.social
expert2.bsky.social
expert3.bsky.social
EOF
python scripts/bluesky_analyzer.py --file accounts.txt --stopwords lsFilter Out Bot Accounts
过滤机器人账号
bash
python scripts/bluesky_analyzer.py --following handle --exclude "bot,spam,promo,follow back" --stopwords aibash
python scripts/bluesky_analyzer.py --following handle --exclude "bot,spam,promo,follow back" --stopwords aiTechnical Details
技术细节
Keyword Extraction
关键词提取
Delegates to extracting-keywords skill using YAKE venv:
- Stopwords options (--stopwords):
- : English (574 terms) - general purpose
en - : AI/ML domain (1357 terms) - filters technical noise, ML boilerplate
ai - : Life Sciences (1293 terms) - filters research methodology, clinical terms
ls
- N-grams: 1-3 words
- Deduplication: 0.9 threshold
- Top keywords: 10 per account
- Performance: ~5% overhead with domain stopwords vs English
委托给extracting-keywords skill使用YAKE虚拟环境:
- 停用词选项(--stopwords):
- : 英文(574个)- 通用场景
en - : AI/ML领域(1357个)- 过滤技术噪声、ML模板内容
ai - : 生命科学(1293个)- 过滤研究方法、临床术语
ls
- N-grams:1-3个词
- 去重阈值:0.9
- 每个账号提取的顶部关键词数量:10个
- 性能:使用领域停用词比通用英文停用词仅增加约5%的开销
API Rate Limits
API速率限制
Bluesky API limits:
- 3000 requests per 5 minutes
- 5000 requests per hour
The analyzer respects these limits with built-in delays.
Bluesky API限制:
- 5分钟内最多3000次请求
- 1小时内最多5000次请求
分析器内置延迟机制,严格遵守这些限制。
Categorization Algorithm
分类算法
Script's role:
- Fetch account data (bio + posts)
- Extract keywords to compress context
- Output bio + keywords in simple format
Claude's role:
- Read bio + keywords for each account
- Intelligently categorize by topic (no hardcoded rules)
- Group accounts, identify patterns, answer user questions
This agentic pattern is more flexible than hardcoded keyword matching.
脚本的作用:
- 获取账号数据(简介+帖子)
- 提取关键词以压缩上下文
- 输出简介+关键词的纯文本格式
Claude的作用:
- 读取每个账号的简介+关键词
- 智能按主题分类(无硬编码规则)
- 分组账号、识别模式、回答用户问题
这种智能代理模式比硬编码关键词匹配更灵活。
Troubleshooting
故障排除
"No accounts to analyze"
- Verify handle format (include domain: handle.bsky.social)
- Check if account exists and has public following/followers
"Insufficient content for keyword extraction"
- Account has few posts (<5)
- Posts are very short
- Try increasing parameter
--posts
Rate limit errors
- Reduce parameter
--accounts - Add delays between batches
- Check Bluesky API status
Import errors
- Verify extracting-keywords skill is available
- Check YAKE venv exists:
/home/claude/yake-venv/bin/python -c "import yake" - Verify Python 3.8+:
python3 --version
“No accounts to analyze”(无账号可分析)
- 验证账号句柄格式(需包含域名:handle.bsky.social)
- 检查账号是否存在且关注/粉丝列表为公开状态
“Insufficient content for keyword extraction”(关键词提取内容不足)
- 账号帖子数量过少(<5条)
- 帖子内容过短
- 尝试增加参数的值
--posts
速率限制错误
- 减少参数的值
--accounts - 在批次之间增加延迟
- 检查Bluesky API状态
导入错误
- 确认extracting-keywords skill可用
- 检查YAKE虚拟环境是否存在:
/home/claude/yake-venv/bin/python -c "import yake" - 确认Python版本为3.8+:
python3 --version
Integration with Other Skills
与其他技能的集成
Built-in integration:
- extracting-keywords: Automatically delegates keyword extraction to this skill's optimized YAKE venv with domain-specific stopwords
内置集成:
- extracting-keywords:自动将关键词提取任务委托给该技能的优化YAKE虚拟环境,支持领域专属停用词
Example Sessions
会话示例
User: "Can you analyze the accounts I follow on Bluesky and tell me what topics they focus on?"
Claude:
bash
python scripts/bluesky_analyzer.py --following user-handle.bsky.social --stopwords aiBased on the output, I can see you follow:
- AI/ML researchers (15 accounts): Focus on LLM safety, alignment, interpretability
- Software engineers (20 accounts): Web development, React, TypeScript, DevOps
- Writers (8 accounts): Tech journalism, newsletters, long-form content
- Scientists (7 accounts): Climate science, biotech, physics
User: "Find ML researchers in @alice's network"
Claude:
bash
python scripts/bluesky_analyzer.py --following alice.bsky.social --stopwords aiI found 23 ML researchers in Alice's network:
- 8 working on LLM alignment and safety
- 6 focused on model evaluation and benchmarks
- 5 in ML infrastructure and MLOps
- 4 in computer vision and multimodal models
User: "Here's a list of 30 accounts, categorize them"
Claude:
bash
python scripts/bluesky_analyzer.py --file accounts.txt --stopwords aiCategorized into:
- Climate Tech (8 accounts)
- Biotech (6 accounts)
- Fintech (5 accounts)
- AI/ML (7 accounts)
- Other (4 accounts)
用户:“你能分析我在Bluesky上关注的账号,告诉我他们关注的主题是什么吗?”
Claude:
bash
python scripts/bluesky_analyzer.py --following user-handle.bsky.social --stopwords ai根据输出结果,我可以看到你关注的账号分为:
- AI/ML研究者(15个账号):专注于LLM安全、对齐、可解释性
- 软件工程师(20个账号):Web开发、React、TypeScript、DevOps
- 作者(8个账号):科技新闻、通讯稿、长内容创作
- 科学家(7个账号):气候科学、生物技术、物理学
用户:“在@alice的社交网络中寻找ML研究者”
Claude:
bash
python scripts/bluesky_analyzer.py --following alice.bsky.social --stopwords ai我在Alice的社交网络中找到了23位ML研究者:
- 8位专注于LLM对齐与安全
- 6位专注于模型评估与基准测试
- 5位专注于ML基础设施与MLOps
- 4位专注于计算机视觉与多模态模型
用户:“这是30个账号的列表,帮我分类”
Claude:
bash
python scripts/bluesky_analyzer.py --file accounts.txt --stopwords ai分类结果:
- 气候科技(8个账号)
- 生物技术(6个账号)
- 金融科技(5个账号)
- AI/ML(7个账号)
- 其他(4个账号)