categorizing-bsky-accounts

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Categorizing Bluesky Accounts

Bluesky账号分类

Fetch Bluesky account data and extract keywords for Claude to categorize by topic. The script compresses account context (bio + posts) into bio + keywords, then Claude performs intelligent categorization.
获取Bluesky账号数据并提取关键词,供Claude按主题分类。该脚本将账号上下文(简介+帖子)压缩为简介+关键词格式,随后由Claude执行智能分类。

Prerequisites

依赖条件

Requires: extracting-keywords skill (provides YAKE venv + domain stopwords)
The analyzer delegates keyword extraction to the extracting-keywords skill, which provides:
  • Optimized YAKE installation with minimal dependencies
  • Domain-specific stopwords: English (574), AI/ML (1357), Life Sciences (1293)
  • Support for 34 languages
需要: extracting-keywords skill(提供YAKE venv + 领域停用词)
分析器将关键词提取任务委托给extracting-keywords skill,该技能提供:
  • 优化后的YAKE安装包,依赖项极少
  • 领域专属停用词:英文(574个)、AI/ML(1357个)、生命科学(1293个)
  • 支持34种语言

Core Workflow

核心工作流程

When users request Bluesky account analysis:
  1. Ensure keyword extraction is set up - Invoke the extracting-keywords skill using the Skill tool to ensure YAKE venv exists (skip if already invoked in this session)
  2. Determine input mode based on user's request:
    • Following list → use
      --following handle
    • Followers → use
      --followers handle
    • List of handles → use
      --handles "h1,h2,h3"
    • File provided → use
      --file accounts.txt
  3. Configure parameters:
    • --accounts N
      - Number to analyze (default: 100, max: 100)
    • --posts N
      - Posts per account (default: 20, max: 100)
    • --stopwords [en|ai|ls]
      - Choose domain-specific stopwords:
      • en
        : English (general purpose)
      • ai
        : AI/ML domain (recommended for tech accounts)
      • ls
        : Life Sciences (for biomedical/research accounts)
    • --exclude "pattern1,pattern2"
      - Skip spam/bot accounts
  4. Run script - Outputs simple text format to stdout:
    @handle1.bsky.social (Display Name)
    Bio text here
    Keywords: keyword1, keyword2, keyword3
    
    @handle2.bsky.social (Another Name)
    Bio text here
    Keywords: keyword4, keyword5, keyword6
  5. Categorize accounts - Claude analyzes bio + keywords to categorize by topic
当用户请求Bluesky账号分析时:
  1. 确保关键词提取已配置完成 - 使用Skill工具调用extracting-keywords skill,确保YAKE虚拟环境已存在(若本次会话中已调用过则可跳过)
  2. 根据用户请求确定输入模式
    • 关注列表 → 使用
      --following handle
    • 粉丝列表 → 使用
      --followers handle
    • 账号句柄列表 → 使用
      --handles "h1,h2,h3"
    • 提供文件 → 使用
      --file accounts.txt
  3. 配置参数:
    • --accounts N
      - 要分析的账号数量(默认值:100,最大值:100)
    • --posts N
      - 每个账号提取的帖子数量(默认值:20,最大值:100)
    • --stopwords [en|ai|ls]
      - 选择领域专属停用词:
      • en
        : 英文(通用场景)
      • ai
        : AI/ML领域(推荐用于科技类账号)
      • ls
        : 生命科学(用于生物医学/研究类账号)
    • --exclude "pattern1,pattern2"
      - 跳过垃圾/机器人账号
  4. 运行脚本 - 将纯文本格式结果输出到标准输出:
    @handle1.bsky.social (Display Name)
    Bio text here
    Keywords: keyword1, keyword2, keyword3
    
    @handle2.bsky.social (Another Name)
    Bio text here
    Keywords: keyword4, keyword5, keyword6
  5. 账号分类 - Claude根据简介+关键词执行主题分类

Quick Start

快速开始

Analyze following list with AI/ML stopwords:
bash
python scripts/bluesky_analyzer.py --following austegard.com --stopwords ai
Analyze followers:
bash
python scripts/bluesky_analyzer.py --followers austegard.com
Analyze specific handles:
bash
python scripts/bluesky_analyzer.py --handles "user1.bsky.social,user2.bsky.social,user3.bsky.social"
From file:
bash
python scripts/bluesky_analyzer.py --file accounts.txt --stopwords ai
Filter out bot accounts:
bash
python scripts/bluesky_analyzer.py --following handle --exclude "bot,spam,promo" --stopwords ai
使用AI/ML停用词分析关注列表:
bash
python scripts/bluesky_analyzer.py --following austegard.com --stopwords ai
分析粉丝列表:
bash
python scripts/bluesky_analyzer.py --followers austegard.com
分析指定账号句柄:
bash
python scripts/bluesky_analyzer.py --handles "user1.bsky.social,user2.bsky.social,user3.bsky.social"
从文件导入账号:
bash
python scripts/bluesky_analyzer.py --file accounts.txt --stopwords ai
过滤机器人账号:
bash
python scripts/bluesky_analyzer.py --following handle --exclude "bot,spam,promo" --stopwords ai

Parameters

参数说明

Input Modes (choose one)

输入模式(选择其一)

--handles "h1,h2,h3" Comma-separated list of Bluesky handles
--following HANDLE Analyze accounts followed by HANDLE
--followers HANDLE Analyze accounts following HANDLE
--file PATH Read handles from file (one per line)
--handles "h1,h2,h3" 逗号分隔的Bluesky账号句柄列表
--following HANDLE 分析指定账号关注的所有账号
--followers HANDLE 分析关注指定账号的所有账号
--file PATH 从文件读取账号句柄(每行一个)

Analysis Options

分析选项

--accounts N Number of accounts to analyze (1-100, default: 100)
--posts N Posts to fetch per account (1-100, default: 20)
--stopwords [en|ai|ls] Stopwords to use for keyword extraction (default: en)
  • en
    : English stopwords (574 terms) - general purpose
  • ai
    : AI/ML domain stopwords (1357 terms) - tech-focused accounts
  • ls
    : Life Sciences stopwords (1293 terms) - biomedical/research accounts
--exclude "word1,word2" Skip accounts with these keywords in bio/posts
--accounts N 要分析的账号数量(1-100,默认值:100)
--posts N 每个账号获取的帖子数量(1-100,默认值:20)
--stopwords [en|ai|ls] 关键词提取使用的停用词(默认值:en)
  • en
    : 英文停用词(574个)- 通用场景
  • ai
    : AI/ML领域停用词(1357个)- 科技类账号专用
  • ls
    : 生命科学停用词(1293个)- 生物医学/研究类账号专用
--exclude "word1,word2" 跳过简介/帖子中包含这些关键词的账号

Output Format

输出格式

The script outputs simple text format for Claude to process:
@alice.bsky.social (Alice Smith)
AI researcher working on LLM alignment and safety
Keywords: alignment, safety research, interpretability, llm evaluation

@bob.bsky.social (Bob Johnson)
Full-stack developer building web applications
Keywords: react, typescript, node.js, api design, postgresql

@carol.bsky.social (Carol Williams)
Biotech researcher studying CRISPR applications
Keywords: crispr, gene editing, therapeutics, clinical trials
Claude then categorizes accounts based on bio + keywords without hardcoded rules.
脚本输出纯文本格式供Claude处理:
@alice.bsky.social (Alice Smith)
AI researcher working on LLM alignment and safety
Keywords: alignment, safety research, interpretability, llm evaluation

@bob.bsky.social (Bob Johnson)
Full-stack developer building web applications
Keywords: react, typescript, node.js, api design, postgresql

@carol.bsky.social (Carol Williams)
Biotech researcher studying CRISPR applications
Keywords: crispr, gene editing, therapeutics, clinical trials
随后Claude会根据简介+关键词进行分类,无需硬编码规则。

Common Workflows

常见工作流

Audit Your Following List

审计你的关注列表

bash
python scripts/bluesky_analyzer.py --following your-handle.bsky.social --stopwords ai
Claude will categorize accounts by topic and identify patterns in who you follow.
bash
python scripts/bluesky_analyzer.py --following your-handle.bsky.social --stopwords ai
Claude会按主题分类账号,并识别你关注账号的模式。

Find Experts in a Topic

寻找特定主题的专家

bash
python scripts/bluesky_analyzer.py --following alice.bsky.social --stopwords ai
Ask Claude: "Which of these accounts are ML researchers?" or "Who focuses on climate tech?"
bash
python scripts/bluesky_analyzer.py --following alice.bsky.social --stopwords ai
你可以询问Claude:“这些账号中哪些是ML研究者?”或“谁专注于气候科技?”

Analyze a Curated List

分析整理好的账号列表

bash
cat > accounts.txt << 'EOF'
expert1.bsky.social
expert2.bsky.social
expert3.bsky.social
EOF

python scripts/bluesky_analyzer.py --file accounts.txt --stopwords ls
bash
cat > accounts.txt << 'EOF'
expert1.bsky.social
expert2.bsky.social
expert3.bsky.social
EOF

python scripts/bluesky_analyzer.py --file accounts.txt --stopwords ls

Filter Out Bot Accounts

过滤机器人账号

bash
python scripts/bluesky_analyzer.py --following handle --exclude "bot,spam,promo,follow back" --stopwords ai
bash
python scripts/bluesky_analyzer.py --following handle --exclude "bot,spam,promo,follow back" --stopwords ai

Technical Details

技术细节

Keyword Extraction

关键词提取

Delegates to extracting-keywords skill using YAKE venv:
  • Stopwords options (--stopwords):
    • en
      : English (574 terms) - general purpose
    • ai
      : AI/ML domain (1357 terms) - filters technical noise, ML boilerplate
    • ls
      : Life Sciences (1293 terms) - filters research methodology, clinical terms
  • N-grams: 1-3 words
  • Deduplication: 0.9 threshold
  • Top keywords: 10 per account
  • Performance: ~5% overhead with domain stopwords vs English
委托给extracting-keywords skill使用YAKE虚拟环境:
  • 停用词选项(--stopwords):
    • en
      : 英文(574个)- 通用场景
    • ai
      : AI/ML领域(1357个)- 过滤技术噪声、ML模板内容
    • ls
      : 生命科学(1293个)- 过滤研究方法、临床术语
  • N-grams:1-3个词
  • 去重阈值:0.9
  • 每个账号提取的顶部关键词数量:10个
  • 性能:使用领域停用词比通用英文停用词仅增加约5%的开销

API Rate Limits

API速率限制

Bluesky API limits:
  • 3000 requests per 5 minutes
  • 5000 requests per hour
The analyzer respects these limits with built-in delays.
Bluesky API限制:
  • 5分钟内最多3000次请求
  • 1小时内最多5000次请求
分析器内置延迟机制,严格遵守这些限制。

Categorization Algorithm

分类算法

Script's role:
  1. Fetch account data (bio + posts)
  2. Extract keywords to compress context
  3. Output bio + keywords in simple format
Claude's role:
  1. Read bio + keywords for each account
  2. Intelligently categorize by topic (no hardcoded rules)
  3. Group accounts, identify patterns, answer user questions
This agentic pattern is more flexible than hardcoded keyword matching.
脚本的作用:
  1. 获取账号数据(简介+帖子)
  2. 提取关键词以压缩上下文
  3. 输出简介+关键词的纯文本格式
Claude的作用:
  1. 读取每个账号的简介+关键词
  2. 智能按主题分类(无硬编码规则)
  3. 分组账号、识别模式、回答用户问题
这种智能代理模式比硬编码关键词匹配更灵活。

Troubleshooting

故障排除

"No accounts to analyze"
  • Verify handle format (include domain: handle.bsky.social)
  • Check if account exists and has public following/followers
"Insufficient content for keyword extraction"
  • Account has few posts (<5)
  • Posts are very short
  • Try increasing
    --posts
    parameter
Rate limit errors
  • Reduce
    --accounts
    parameter
  • Add delays between batches
  • Check Bluesky API status
Import errors
  • Verify extracting-keywords skill is available
  • Check YAKE venv exists:
    /home/claude/yake-venv/bin/python -c "import yake"
  • Verify Python 3.8+:
    python3 --version
“No accounts to analyze”(无账号可分析)
  • 验证账号句柄格式(需包含域名:handle.bsky.social)
  • 检查账号是否存在且关注/粉丝列表为公开状态
“Insufficient content for keyword extraction”(关键词提取内容不足)
  • 账号帖子数量过少(<5条)
  • 帖子内容过短
  • 尝试增加
    --posts
    参数的值
速率限制错误
  • 减少
    --accounts
    参数的值
  • 在批次之间增加延迟
  • 检查Bluesky API状态
导入错误
  • 确认extracting-keywords skill可用
  • 检查YAKE虚拟环境是否存在:
    /home/claude/yake-venv/bin/python -c "import yake"
  • 确认Python版本为3.8+:
    python3 --version

Integration with Other Skills

与其他技能的集成

Built-in integration:
  • extracting-keywords: Automatically delegates keyword extraction to this skill's optimized YAKE venv with domain-specific stopwords
内置集成:
  • extracting-keywords:自动将关键词提取任务委托给该技能的优化YAKE虚拟环境,支持领域专属停用词

Example Sessions

会话示例

User: "Can you analyze the accounts I follow on Bluesky and tell me what topics they focus on?"
Claude:
bash
python scripts/bluesky_analyzer.py --following user-handle.bsky.social --stopwords ai
Based on the output, I can see you follow:
  • AI/ML researchers (15 accounts): Focus on LLM safety, alignment, interpretability
  • Software engineers (20 accounts): Web development, React, TypeScript, DevOps
  • Writers (8 accounts): Tech journalism, newsletters, long-form content
  • Scientists (7 accounts): Climate science, biotech, physics
User: "Find ML researchers in @alice's network"
Claude:
bash
python scripts/bluesky_analyzer.py --following alice.bsky.social --stopwords ai
I found 23 ML researchers in Alice's network:
  • 8 working on LLM alignment and safety
  • 6 focused on model evaluation and benchmarks
  • 5 in ML infrastructure and MLOps
  • 4 in computer vision and multimodal models
User: "Here's a list of 30 accounts, categorize them"
Claude:
bash
python scripts/bluesky_analyzer.py --file accounts.txt --stopwords ai
Categorized into:
  • Climate Tech (8 accounts)
  • Biotech (6 accounts)
  • Fintech (5 accounts)
  • AI/ML (7 accounts)
  • Other (4 accounts)
用户:“你能分析我在Bluesky上关注的账号,告诉我他们关注的主题是什么吗?”
Claude:
bash
python scripts/bluesky_analyzer.py --following user-handle.bsky.social --stopwords ai
根据输出结果,我可以看到你关注的账号分为:
  • AI/ML研究者(15个账号):专注于LLM安全、对齐、可解释性
  • 软件工程师(20个账号):Web开发、React、TypeScript、DevOps
  • 作者(8个账号):科技新闻、通讯稿、长内容创作
  • 科学家(7个账号):气候科学、生物技术、物理学
用户:“在@alice的社交网络中寻找ML研究者”
Claude:
bash
python scripts/bluesky_analyzer.py --following alice.bsky.social --stopwords ai
我在Alice的社交网络中找到了23位ML研究者:
  • 8位专注于LLM对齐与安全
  • 6位专注于模型评估与基准测试
  • 5位专注于ML基础设施与MLOps
  • 4位专注于计算机视觉与多模态模型
用户:“这是30个账号的列表,帮我分类”
Claude:
bash
python scripts/bluesky_analyzer.py --file accounts.txt --stopwords ai
分类结果:
  • 气候科技(8个账号)
  • 生物技术(6个账号)
  • 金融科技(5个账号)
  • AI/ML(7个账号)
  • 其他(4个账号)