crawl-xueqiu-user-timeline

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

crawl-xueqiu-user-timeline

crawl-xueqiu-user-timeline

抓取雪球用户的发言时间线,保存为 Markdown 文件。
Crawl Xueqiu user's post timeline and save as Markdown file.

前置准备

Prerequisites

Base directory for this skill:
{base_dir}
调用时请将
{base_dir}
替换为实际路径
,例如:
/home/cnife/code/try-agent-browser-automation/.agents/skills/crawl-xueqiu-user-timeline
确保 Chrome 处于 Debug 模式并安装 agent-browser:
bash
sh {base_dir}/scripts/check-cdp.sh
sh {base_dir}/scripts/check-agent-browser.sh
Base directory for this skill:
{base_dir}
Replace
{base_dir}
with the actual path when calling, for example:
/home/cnife/code/try-agent-browser-automation/.agents/skills/crawl-xueqiu-user-timeline
Ensure Chrome is in Debug mode and agent-browser is installed:
bash
sh {base_dir}/scripts/check-cdp.sh
sh {base_dir}/scripts/check-agent-browser.sh

使用方法

Usage

直接运行爬取脚本:
bash
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py <雪球用户主页链接> [选项]
Run the crawling script directly:
bash
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py <Xueqiu user homepage URL> [options]

参数说明

Parameter Description

参数说明默认值
url
雪球用户主页链接必填
--days
爬取最近 N 天3
--start-date
开始日期 (YYYY-MM-DD)3 天前
--end-date
结束日期 (YYYY-MM-DD)今天
-o, --output
输出文件名自动生成
注意
--days
--start-date
参数互斥,不能同时使用。
ParameterDescriptionDefault Value
url
Xueqiu user homepage URLRequired
--days
Crawl posts from the last N days3
--start-date
Start date (YYYY-MM-DD)3 days ago
--end-date
End date (YYYY-MM-DD)Today
-o, --output
Output file nameAuto-generated
Note: Parameters
--days
and
--start-date
are mutually exclusive and cannot be used together.

示例

Examples

bash
undefined
bash
undefined

爬取最近 3 天(默认)

Crawl the last 3 days (default)

{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686

爬取最近 7 天

Crawl the last 7 days

{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 --days 7
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 --days 7

爬取最近 30 天

Crawl the last 30 days

{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 --days 30
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 --days 30

指定日期范围

Specify date range

{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 --start-date 2026-01-01 --end-date 2026-03-05
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 --start-date 2026-01-01 --end-date 2026-03-05

指定输出文件名

Specify output file name

{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 -o my_timeline.md
undefined
{base_dir}/scripts/crawl_xueqiu_user_timeline_api.py https://xueqiu.com/u/9493911686 -o my_timeline.md
undefined

输出格式

Output Format

生成 Markdown 文件,包含:
  • 用户基本信息(UID、粉丝、关注、帖子数)
  • 按时间排序的发言记录
  • 每条发言包含:发布时间、内容、引用内容(如有)、互动数据、原文链接
The generated Markdown file includes:
  • Basic user information (UID, followers, followings, post count)
  • Chronologically sorted post records
  • Each post includes: publication time, content, quoted content (if any), interaction data, original post link

注意事项

Notes

  1. 需要先登录雪球账号
  2. 如遇 Verification 验证页面,需手动完成验证后重新运行
  3. 爬取过程中会自动处理分页和 md5__1038 令牌
  4. 输出文件保存在当前工作目录

是否需要进一步分析?
爬取完成后,请询问用户:是否需要把雪球用户的发言总结分析一下?
  1. You need to log in to your Xueqiu account first
  2. If you encounter the Verification page, complete the verification manually and run the script again
  3. Pagination and md5__1038 token are automatically handled during crawling
  4. The output file is saved in the current working directory

Need further analysis?
After crawling is completed, ask the user: Do you need to summarize and analyze the Xueqiu user's posts?