tiktok-collection-scraper
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTikTok Collection Scraper
TikTok收藏夹抓取工具
Batch extract TikTok user collection folders and their video links — including play counts, likes, comments, and shares. Zero external API, no paid service, just .
curl_cffi批量提取TikTok用户收藏夹及其视频链接——包括播放量、点赞数、评论数和分享数。无需外部API,无需付费服务,仅需。
curl_cffiFeatures
功能特性
- 🔓 No login required — works without cookies for public collections (~80% coverage)
- 🔑 Full access with cookie — get all collections including private ones (100%)
- 🚀 Zero external API — only needs , no TikHub/RapidAPI/paid services
curl_cffi - 📥 7 input formats — username, @username, profile URL, video URL, short link, user_id, secUid
- 📊 Rich metadata — plays, likes, comments, shares per video
- ⚡ Fast — 50 collections + 300 videos in ~40 seconds
- 🔓 无需登录 —— 无需Cookie即可访问公开收藏内容(覆盖率约80%)
- 🔑 Cookie全权限访问 —— 获取所有收藏内容,包括私人收藏(覆盖率100%)
- 🚀 无外部API依赖 —— 仅需,无需TikHub/RapidAPI等付费服务
curl_cffi - 📥 7种输入格式 —— 用户名、@用户名、个人主页URL、视频URL、短链接、user_id、secUid
- 📊 丰富元数据 —— 每个视频的播放量、点赞数、评论数、分享数
- ⚡ 快速高效 —— 约40秒内完成50个收藏夹+300个视频的抓取
Prerequisites
前置条件
Ensure is installed:
curl_cffibash
pip install curl_cffi确保已安装:
curl_cffibash
pip install curl_cffiQuick Start
快速开始
Run the bundled script. All paths below are relative to this skill's directory.
bash
undefined运行捆绑脚本。以下所有路径均相对于本工具的目录。
bash
undefinedGuest mode (public collections, no cookie needed)
访客模式(仅公开收藏,无需Cookie)
python3 scripts/scrape_collections.py <target> -o /tmp/result.json
python3 scripts/scrape_collections.py <目标> -o /tmp/result.json
Login mode (all collections, 100% coverage)
登录模式(所有收藏,100%覆盖率)
python3 scripts/scrape_collections.py <target> --cookie /path/to/cookie.txt -o /tmp/result.json
undefinedpython3 scripts/scrape_collections.py <目标> --cookie /path/to/cookie.txt -o /tmp/result.json
undefinedSupported Input Formats
支持的输入格式
| Format | Example |
|---|---|
| Username | |
| @Username | |
| Profile URL | |
| Video URL | |
| Short link | |
| User ID | |
| secUid | |
| 格式 | 示例 |
|---|---|
| 用户名 | |
| @用户名 | |
| 个人主页URL | |
| 视频URL | |
| 短链接 | |
| 用户ID | |
| secUid | |
Output Format
输出格式
JSON with structure:
json
{
"target": "chengfeng_yulin",
"secUid": "MS4wLjAB...",
"uid": "68118...",
"uniqueId": "chengfeng_yulin",
"mode": "guest",
"totalCollections": 50,
"totalVideos": 308,
"elapsedSeconds": 40.0,
"collections": [
{
"collectionId": "760379...",
"name": "收藏夹名称",
"expected": 3,
"actual": 3,
"items": [
{
"id": "760251...",
"url": "https://www.tiktok.com/@author/video/760251...",
"desc": "Video description...",
"author": "author_username",
"plays": 2000000,
"likes": 25100,
"comments": 632,
"shares": 85000
}
]
}
]
}JSON结构如下:
json
{
\"target\": \"chengfeng_yulin\",
\"secUid\": \"MS4wLjAB...\",
\"uid\": \"68118...\",
\"uniqueId\": \"chengfeng_yulin\",
\"mode\": \"guest\",
\"totalCollections\": 50,
\"totalVideos\": 308,
\"elapsedSeconds\": 40.0,
\"collections\": [
{
\"collectionId\": \"760379...\",
\"name\": \"收藏夹名称\",
\"expected\": 3,
\"actual\": 3,
\"items\": [
{
\"id\": \"760251...\",
\"url\": \"https://www.tiktok.com/@author/video/760251...\",
\"desc\": \"视频描述...\",
\"author\": \"author_username\",
\"plays\": 2000000,
\"likes\": 25100,
\"comments\": 632,
\"shares\": 85000
}
]
}
]
}Cookie
Cookie说明
- Not needed for public collections (status=3, typically ~50% of folders, ~80% of videos)
- Needed for private collections (status=1) — must be the target account's own login cookie
- Cookie format: raw cookie string from browser (semicolon-separated key=value pairs)
- 非必需:访问公开收藏内容时无需Cookie(状态码=3,通常约50%的收藏夹,80%的视频)
- 必需:访问私人收藏内容时(状态码=1)——必须使用目标账号的登录Cookie
- Cookie格式:从浏览器复制的原始Cookie字符串(以分号分隔的key=value键值对)
How It Works
工作原理
Uses TikTok's internal web APIs with for Chrome TLS fingerprint impersonation:
curl_cffi- Resolve user — any input format → (via TikTok's own redirects and page parsing)
secUid - Fetch collections — (no auth needed)
GET /api/user/collection_list/ - Fetch videos — with
GET /api/collection/item_list/(the undocumented key parameter)sourceType=113
is an undocumented parameter discovered through browser request interception. Without it, the API returns success with empty results.sourceType=113
See for full API documentation.
references/api-notes.md通过模拟Chrome的TLS指纹,调用TikTok内部Web API:
curl_cffi- 解析用户信息 —— 任意输入格式 → 转换为(通过TikTok自身的重定向和页面解析)
secUid - 获取收藏夹列表 —— 调用(无需授权)
GET /api/user/collection_list/ - 获取视频内容 —— 调用并携带
GET /api/collection/item_list/(未公开的关键参数)sourceType=113
是通过拦截浏览器请求发现的未公开参数。如果没有该参数,API会返回成功但结果为空。sourceType=113
详细API文档请查看。
references/api-notes.mdError Handling
错误处理
- = fewer videos than expected (likely deleted videos)
⚠️ - = zero videos returned (video removed or API issue)
❌ - Script retries failed requests up to 3 times with 5s backoff
- Progress is printed to stderr; JSON output goes to stdout (or file with )
-o
- = 实际获取的视频数量少于预期(可能视频已被删除)
⚠️ - = 未返回任何视频(视频已移除或API问题)
❌ - 脚本会对失败的请求最多重试3次,每次间隔5秒
- 进度信息会输出到stderr;JSON结果输出到stdout(或通过参数保存到文件)
-o