crawl-xueqiu-user-timeline
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesecrawl-xueqiu-user-timeline
crawl-xueqiu-user-timeline
抓取雪球用户的发言时间线,保存为 Markdown 文件。
Crawl Xueqiu user's post timeline and save as Markdown file.
前置准备
Prerequisites
Base directory for this skill:
{base_dir}调用时请将 替换为实际路径,例如:
{base_dir}/home/cnife/code/try-agent-browser-automation/.agents/skills/crawl-xueqiu-user-timeline确保 Chrome 处于 Debug 模式并安装 agent-browser:
bash
sh {base_dir}/scripts/check-cdp.sh
sh {base_dir}/scripts/check-agent-browser.shBase directory for this skill:
{base_dir}Replace with the actual path when calling, for example:
{base_dir}/home/cnife/code/try-agent-browser-automation/.agents/skills/crawl-xueqiu-user-timelineEnsure Chrome is in Debug mode and agent-browser is installed:
bash
sh {base_dir}/scripts/check-cdp.sh
sh {base_dir}/scripts/check-agent-browser.sh使用方法
Usage
直接运行爬取脚本:
bash
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py <雪球用户主页链接> [选项]Run the crawling script directly:
bash
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py <Xueqiu user homepage URL> [options]参数说明
Parameter Description
| 参数 | 说明 | 默认值 |
|---|---|---|
| 雪球用户主页链接 | 必填 |
| 爬取最近 N 天 | 3 |
| 开始日期 (YYYY-MM-DD) | 3 天前 |
| 结束日期 (YYYY-MM-DD) | 今天 |
| 输出文件名 | 自动生成 |
注意: 和 参数互斥,不能同时使用。
--days--start-date| Parameter | Description | Default Value |
|---|---|---|
| Xueqiu user homepage URL | Required |
| Crawl posts from the last N days | 3 |
| Start date (YYYY-MM-DD) | 3 days ago |
| End date (YYYY-MM-DD) | Today |
| Output file name | Auto-generated |
Note: Parameters and are mutually exclusive and cannot be used together.
--days--start-date示例
Examples
bash
undefinedbash
undefined爬取最近 3 天(默认)
Crawl the last 3 days (default)
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686
爬取最近 7 天
Crawl the last 7 days
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 --days 7
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 --days 7
爬取最近 30 天
Crawl the last 30 days
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 --days 30
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 --days 30
指定日期范围
Specify date range
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 --start-date 2026-01-01 --end-date 2026-03-05
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 --start-date 2026-01-01 --end-date 2026-03-05
指定输出文件名
Specify output file name
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 -o my_timeline.md
undefined{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 -o my_timeline.md
undefined输出格式
Output Format
生成 Markdown 文件,包含:
- 用户基本信息(UID、粉丝、关注、帖子数)
- 按时间排序的发言记录
- 每条发言包含:发布时间、内容、引用内容(如有)、互动数据、原文链接
The generated Markdown file includes:
- Basic user information (UID, followers, followings, post count)
- Chronologically sorted post records
- Each post includes: publication time, content, quoted content (if any), interaction data, original post link
注意事项
Notes
- 需要先登录雪球账号
- 如遇 Verification 验证页面,需手动完成验证后重新运行
- 爬取过程中会自动处理分页和 md5__1038 令牌
- 输出文件保存在当前工作目录
是否需要进一步分析?
爬取完成后,请询问用户:是否需要把雪球用户的发言总结分析一下?
- You need to log in to your Xueqiu account first
- If you encounter the Verification page, complete the verification manually and run the script again
- Pagination and md5__1038 token are automatically handled during crawling
- The output file is saved in the current working directory
Need further analysis?
After crawling is completed, ask the user: Do you need to summarize and analyze the Xueqiu user's posts?