x-scraper

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

X.com Post Scraper

X.com 帖子抓取工具

Extracts recent posts from X.com users with full engagement data using authenticated cookies.
使用已认证的cookies抓取X.com用户的近期帖子及完整互动数据。

Quick Start

快速开始

Basic command:
bash
cd .opencode/skills/x-scraper/scripts
python3 scraper.py <username> [count]
Example:
bash
python3 scraper.py example_user 15
Output:
/tmp/x_{username}_posts.json

基础命令:
bash
cd .opencode/skills/x-scraper/scripts
python3 scraper.py <username> [count]
示例:
bash
python3 scraper.py example_user 15
输出:
/tmp/x_{username}_posts.json

Prerequisites

前置要求

Before first use, verify environment requirements:
  1. Python 3.11+: Check with
    python3 --version
  2. Playwright: Check with
    python3 -c "import playwright"
  3. Cookie file: Check with
    ls /tmp/x_cookies_pw.json
If any prerequisite is missing, see references/setup.md for detailed installation and configuration guide.

首次使用前,请确认环境满足以下要求:
  1. Python 3.11+: 使用
    python3 --version
    检查
  2. Playwright: 使用
    python3 -c "import playwright"
    检查
  3. Cookie文件: 使用
    ls /tmp/x_cookies_pw.json
    检查
若缺少任何前置条件,请查看references/setup.md获取详细的安装和配置指南。

Common Workflows

常见工作流程

First-time setup

首次设置

See references/setup.md for complete environment configuration.
查看references/setup.md获取完整的环境配置指南。

Daily scraping

日常抓取

bash
python3 scraper.py <username> [count]
bash
python3 scraper.py <username> [count]

Custom cookie file

自定义Cookie文件

bash
python3 scraper.py <username> [count] --cookie-file /path/to/cookies.json
bash
python3 scraper.py <username> [count] --cookie-file /path/to/cookies.json

Troubleshooting

故障排除

If scraper fails, see references/troubleshooting.md for common issues and solutions.

若抓取工具运行失败,请查看references/troubleshooting.md获取常见问题及解决方案。

Output Format

输出格式

json
{
  "index": 1,
  "username": "example_user",
  "postId": "1234567890123456789",
  "publishTime": "2025-12-03T18:28:32.000Z",
  "postLink": "https://x.com/example_user/status/1234567890123456789",
  "textContent": "Post text content...",
  "views": "471K",
  "likes": "1.1K",
  "retweets": "153",
  "replies": "44"
}
Key fields:
  • postLink
    - Direct URL to post
  • publishTime
    - ISO 8601 timestamp
  • views/likes/retweets/replies
    - Abbreviated metrics (K, M)

json
{
  "index": 1,
  "username": "example_user",
  "postId": "1234567890123456789",
  "publishTime": "2025-12-03T18:28:32.000Z",
  "postLink": "https://x.com/example_user/status/1234567890123456789",
  "textContent": "Post text content...",
  "views": "471K",
  "likes": "1.1K",
  "retweets": "153",
  "replies": "44"
}
关键字段:
  • postLink
    - 帖子直接URL
  • publishTime
    - ISO 8601格式时间戳
  • views/likes/retweets/replies
    - 缩写形式的互动数据(K代表千,M代表百万)

When to Use This Skill

何时使用该工具

Trigger when user requests:
  • "整理 @某人 最近的发言"
  • "看看某人在X上说了什么"
  • "Scrape X.com posts from @username"
  • "Get latest tweets from user"
  • "Analyze X user's recent posts"

当用户提出以下请求时触发:
  • "整理 @某人 最近的发言"
  • "看看某人在X上说了什么"
  • "Scrape X.com posts from @username"
  • "Get latest tweets from user"
  • "Analyze X user's recent posts"

Available Scripts

可用脚本

scraper.py
- Main scraper

scraper.py
- 主抓取脚本

bash
python3 scraper.py <username> [count] [--cookie-file <path>]
  • Scrapes user timeline with replies
  • Default count: 10 posts
  • Default cookie:
    /tmp/x_cookies_pw.json
bash
python3 scraper.py <username> [count] [--cookie-file <path>]
  • 抓取用户时间线及回复内容
  • 默认抓取数量:10条帖子
  • 默认Cookie文件:
    /tmp/x_cookies_pw.json

convert_cookies.py
- Cookie converter

convert_cookies.py
- Cookie转换工具

bash
python3 convert_cookies.py <input-file> [output-file]
  • Converts Cookie-Editor JSON to Playwright format
  • Required before first scraping

bash
python3 convert_cookies.py <input-file> [output-file]
  • 将Cookie-Editor格式的JSON转换为Playwright兼容格式
  • 首次抓取前必须执行

Reference Documents

参考文档

  • setup.md - Complete environment setup guide (Python, Playwright, cookies)
  • troubleshooting.md - Error diagnosis and solutions
  • usage.md - Detailed usage examples and advanced options

  • setup.md - 完整环境设置指南(Python、Playwright、Cookie配置)
  • troubleshooting.md - 错误诊断与解决方案
  • usage.md - 详细使用示例及高级选项

Limitations

限制说明

  • Requires X.com authentication cookies
  • Cookies expire (~7 days), need re-export
  • Rate limits may apply
  • Cannot access private/protected accounts
  • 需要X.com的认证Cookie
  • Cookie有效期约7天,到期后需重新导出
  • 存在速率限制
  • 无法访问私密/受保护的账号