tiktok-collection-scraper

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

TikTok Collection Scraper

TikTok收藏夹抓取工具

Batch extract TikTok user collection folders and their video links — including play counts, likes, comments, and shares. Zero external API, no paid service, just
curl_cffi
.
批量提取TikTok用户收藏夹及其视频链接——包括播放量、点赞数、评论数和分享数。无需外部API,无需付费服务,仅需
curl_cffi

Features

功能特性

  • 🔓 No login required — works without cookies for public collections (~80% coverage)
  • 🔑 Full access with cookie — get all collections including private ones (100%)
  • 🚀 Zero external API — only needs
    curl_cffi
    , no TikHub/RapidAPI/paid services
  • 📥 7 input formats — username, @username, profile URL, video URL, short link, user_id, secUid
  • 📊 Rich metadata — plays, likes, comments, shares per video
  • Fast — 50 collections + 300 videos in ~40 seconds
  • 🔓 无需登录 —— 无需Cookie即可访问公开收藏内容(覆盖率约80%)
  • 🔑 Cookie全权限访问 —— 获取所有收藏内容,包括私人收藏(覆盖率100%)
  • 🚀 无外部API依赖 —— 仅需
    curl_cffi
    ,无需TikHub/RapidAPI等付费服务
  • 📥 7种输入格式 —— 用户名、@用户名、个人主页URL、视频URL、短链接、user_id、secUid
  • 📊 丰富元数据 —— 每个视频的播放量、点赞数、评论数、分享数
  • 快速高效 —— 约40秒内完成50个收藏夹+300个视频的抓取

Prerequisites

前置条件

Ensure
curl_cffi
is installed:
bash
pip install curl_cffi
确保已安装
curl_cffi
bash
pip install curl_cffi

Quick Start

快速开始

Run the bundled script. All paths below are relative to this skill's directory.
bash
undefined
运行捆绑脚本。以下所有路径均相对于本工具的目录。
bash
undefined

Guest mode (public collections, no cookie needed)

访客模式(仅公开收藏,无需Cookie)

python3 scripts/scrape_collections.py <target> -o /tmp/result.json
python3 scripts/scrape_collections.py <目标> -o /tmp/result.json

Login mode (all collections, 100% coverage)

登录模式(所有收藏,100%覆盖率)

python3 scripts/scrape_collections.py <target> --cookie /path/to/cookie.txt -o /tmp/result.json
undefined
python3 scripts/scrape_collections.py <目标> --cookie /path/to/cookie.txt -o /tmp/result.json
undefined

Supported Input Formats

支持的输入格式

FormatExample
Username
chengfeng_yulin
@Username
@chengfeng_yulin
Profile URL
https://www.tiktok.com/@chengfeng_yulin
Video URL
https://www.tiktok.com/@user/video/7602514407133941000
Short link
https://vm.tiktok.com/ZMkVKQxsb/
User ID
6811802142106764293
secUid
MS4wLjABAAAA...
格式示例
用户名
chengfeng_yulin
@用户名
@chengfeng_yulin
个人主页URL
https://www.tiktok.com/@chengfeng_yulin
视频URL
https://www.tiktok.com/@user/video/7602514407133941000
短链接
https://vm.tiktok.com/ZMkVKQxsb/
用户ID
6811802142106764293
secUid
MS4wLjABAAAA...

Output Format

输出格式

JSON with structure:
json
{
  "target": "chengfeng_yulin",
  "secUid": "MS4wLjAB...",
  "uid": "68118...",
  "uniqueId": "chengfeng_yulin",
  "mode": "guest",
  "totalCollections": 50,
  "totalVideos": 308,
  "elapsedSeconds": 40.0,
  "collections": [
    {
      "collectionId": "760379...",
      "name": "收藏夹名称",
      "expected": 3,
      "actual": 3,
      "items": [
        {
          "id": "760251...",
          "url": "https://www.tiktok.com/@author/video/760251...",
          "desc": "Video description...",
          "author": "author_username",
          "plays": 2000000,
          "likes": 25100,
          "comments": 632,
          "shares": 85000
        }
      ]
    }
  ]
}
JSON结构如下:
json
{
  \"target\": \"chengfeng_yulin\",
  \"secUid\": \"MS4wLjAB...\",
  \"uid\": \"68118...\",
  \"uniqueId\": \"chengfeng_yulin\",
  \"mode\": \"guest\",
  \"totalCollections\": 50,
  \"totalVideos\": 308,
  \"elapsedSeconds\": 40.0,
  \"collections\": [
    {
      \"collectionId\": \"760379...\",
      \"name\": \"收藏夹名称\",
      \"expected\": 3,
      \"actual\": 3,
      \"items\": [
        {
          \"id\": \"760251...\",
          \"url\": \"https://www.tiktok.com/@author/video/760251...\",
          \"desc\": \"视频描述...\",
          \"author\": \"author_username\",
          \"plays\": 2000000,
          \"likes\": 25100,
          \"comments\": 632,
          \"shares\": 85000
        }
      ]
    }
  ]
}

Cookie

Cookie说明

  • Not needed for public collections (status=3, typically ~50% of folders, ~80% of videos)
  • Needed for private collections (status=1) — must be the target account's own login cookie
  • Cookie format: raw cookie string from browser (semicolon-separated key=value pairs)
  • 非必需:访问公开收藏内容时无需Cookie(状态码=3,通常约50%的收藏夹,80%的视频)
  • 必需:访问私人收藏内容时(状态码=1)——必须使用目标账号的登录Cookie
  • Cookie格式:从浏览器复制的原始Cookie字符串(以分号分隔的key=value键值对)

How It Works

工作原理

Uses TikTok's internal web APIs with
curl_cffi
for Chrome TLS fingerprint impersonation:
  1. Resolve user — any input format →
    secUid
    (via TikTok's own redirects and page parsing)
  2. Fetch collections
    GET /api/user/collection_list/
    (no auth needed)
  3. Fetch videos
    GET /api/collection/item_list/
    with
    sourceType=113
    (the undocumented key parameter)
sourceType=113
is an undocumented parameter discovered through browser request interception. Without it, the API returns success with empty results.
See
references/api-notes.md
for full API documentation.
通过
curl_cffi
模拟Chrome的TLS指纹,调用TikTok内部Web API:
  1. 解析用户信息 —— 任意输入格式 → 转换为
    secUid
    (通过TikTok自身的重定向和页面解析)
  2. 获取收藏夹列表 —— 调用
    GET /api/user/collection_list/
    (无需授权)
  3. 获取视频内容 —— 调用
    GET /api/collection/item_list/
    并携带
    sourceType=113
    (未公开的关键参数)
sourceType=113
是通过拦截浏览器请求发现的未公开参数。如果没有该参数,API会返回成功但结果为空。
详细API文档请查看
references/api-notes.md

Error Handling

错误处理

  • ⚠️
    = fewer videos than expected (likely deleted videos)
  • = zero videos returned (video removed or API issue)
  • Script retries failed requests up to 3 times with 5s backoff
  • Progress is printed to stderr; JSON output goes to stdout (or file with
    -o
    )
  • ⚠️
    = 实际获取的视频数量少于预期(可能视频已被删除)
  • = 未返回任何视频(视频已移除或API问题)
  • 脚本会对失败的请求最多重试3次,每次间隔5秒
  • 进度信息会输出到stderr;JSON结果输出到stdout(或通过
    -o
    参数保存到文件)