tikspyder

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

TikSpyder — Guided TikTok Data Collection

TikSpyder — 引导式TikTok数据采集

You are guiding a user (who may not be technical) through collecting TikTok data using TikSpyder, an open-source data collection tool. Your job is to set up the environment, configure credentials, gather search parameters conversationally, run the tool, and summarize results.

Throughout this process, communicate clearly and explain what you're doing at each step. If something fails, explain the error in plain language and suggest a fix.

你正在引导（可能没有技术背景的）用户使用开源数据采集工具TikSpyder采集TikTok数据。你的职责是搭建环境、配置凭证、通过对话收集搜索参数、运行工具并汇总结果。

整个过程中要清晰沟通，每一步都向用户解释你正在做的操作。如果操作失败，用通俗易懂的语言解释错误并给出修复建议。

Phase 1: Locate TikSpyder & Environment Check

阶段1：定位TikSpyder与环境检查

Before anything else, find (or install) TikSpyder and verify the environment is ready. Work through these steps in order and track two key variables:

TIKSPYDER_DIR — the root of the TikSpyder source repo (contains
```
main.py
```
,
```
config/
```
, etc.)
SKILL_DIR — the directory where this skill file lives

在开始任何操作前，先找到（或安装）TikSpyder并验证环境就绪。按顺序完成以下步骤，跟踪两个核心变量：

TIKSPYDER_DIR — TikSpyder源码仓库的根目录（包含
```
main.py
```
、
```
config/
```
等文件）
SKILL_DIR — 本技能文件所在的目录

Conda activation note

Conda激活说明

Throughout this skill, whenever you need to activate a conda environment, run:

bash

eval "$(conda shell.bash hook)" && conda activate tikspyder

This works on macOS, Linux, and Windows (Git Bash / miniforge). The

eval

line initializes conda for the current shell session so that

conda activate

works without requiring

conda init

to have modified the user's shell profile.

If this command fails, the user's conda installation likely needs repair — do not try to work around it with PATH manipulation. Tell the user what happened and suggest they run

conda init bash

and restart their terminal.

在本技能使用过程中，任何需要激活conda环境的场景，都运行以下命令：

bash

eval "$(conda shell.bash hook)" && conda activate tikspyder

该命令适用于macOS、Linux和Windows（Git Bash / miniforge）。

eval

行负责为当前shell会话初始化conda，无需提前通过

conda init

修改用户的shell配置文件即可正常执行

conda activate

。

如果该命令执行失败，说明用户的conda安装大概率需要修复——不要尝试通过修改PATH变量绕开问题。告知用户具体情况，建议他们运行

conda init bash

后重启终端。

1.0 Check for cached environment (fast path)

1.0 检查缓存环境（快速路径）

After a successful run, the skill saves environment details to

.tikspyder-env.json

inside the skill directory. If this file exists, read it — it contains everything you need to skip most of Phase 1:

bash

for candidate in .claude/skills/tikspyder "$HOME/.claude/skills/tikspyder"; do
  if [ -f "$candidate/.tikspyder-env.json" ]; then
    cat "$candidate/.tikspyder-env.json"
    break
  fi
done

If the file exists and contains valid data, use the cached values directly:

Set SKILL_DIR, TIKSPYDER_DIR, env_type, and ffmpeg status from the cached data
Activate the environment using the cached
```
env_type
```
: if conda, use the standard conda activation command; if venv, source the activate script
Do a quick sanity check: verify that TIKSPYDER_DIR still exists and
```
tikspyder --help
```
works after activation
If the sanity check passes, skip to Phase 1.7 (show the summary, but you can be brief since the user has seen it before — just confirm and move on)
If the sanity check fails (e.g., the directory was moved or env was deleted), delete the cache file and fall through to the full Phase 1

This saves significant time on repeat runs — no searching for environments, no conda env listing.

首次运行成功后，本技能会将环境详情保存到技能目录下的

.tikspyder-env.json

文件中。如果该文件存在，直接读取即可跳过阶段1的大部分步骤：

bash

for candidate in .claude/skills/tikspyder "$HOME/.claude/skills/tikspyder"; do
  if [ -f "$candidate/.tikspyder-env.json" ]; then
    cat "$candidate/.tikspyder-env.json"
    break
  fi
done

如果文件存在且包含有效数据，直接使用缓存值：

从缓存数据中读取SKILL_DIR、TIKSPYDER_DIR、env_type和ffmpeg状态
根据缓存的
```
env_type
```
激活环境：如果是conda，使用标准conda激活命令；如果是venv，source激活脚本
进行快速可用性校验：验证TIKSPYDER目录仍然存在，且激活环境后
```
tikspyder --help
```
可以正常运行
如果可用性校验通过，直接跳转到阶段1.7（可以简化摘要内容，因为用户之前已经看过，确认后即可继续）
如果可用性校验失败（例如目录被移动或环境被删除），删除缓存文件，继续执行完整的阶段1流程

该机制可以大幅节省重复运行的时间——无需搜索环境、无需罗列conda环境。

1.1 Find the skill directory

1.1 查找技能目录

The skill directory contains this SKILL.md file. Search for it in standard locations:

bash

for candidate in .claude/skills/tikspyder "$HOME/.claude/skills/tikspyder"; do
  if [ -f "$candidate/SKILL.md" ]; then
    echo "SKILL_DIR=$candidate"
    break
  fi
done

Store this as SKILL_DIR for later use.

技能目录包含本SKILL.md文件。在标准路径中搜索：

bash

for candidate in .claude/skills/tikspyder "$HOME/.claude/skills/tikspyder"; do
  if [ -f "$candidate/SKILL.md" ]; then
    echo "SKILL_DIR=$candidate"
    break
  fi
done

将结果保存为SKILL_DIR供后续使用。

1.2 Check if tikspyder is already installed

1.2 检查tikspyder是否已安装

Try these checks in order. Stop at the first one that works.

Step A — Check current environment:

bash

pip show tikspyder 2>&1

If found, extract

Location:

and verify

main.py

exists there. Set TIKSPYDER_DIR accordingly.

Step B — Check for a conda/mamba environment named
tikspyder
:

bash

conda env list 2>/dev/null | grep tikspyder

If a

tikspyder

environment exists, activate it (using the conda pattern from above) and run

pip show tikspyder

again. If found, use that environment for all subsequent commands.

Step C — Check for an existing venv with tikspyder:

Look for a venv in common locations where the user may have cloned and installed tikspyder:

bash

for candidate in "$SKILL_DIR/tik-spyder" "$HOME/tik-spyder" "./tik-spyder"; do
  for venv_dir in ".venv" "venv"; do
    activate_script="$candidate/$venv_dir/bin/activate"
    # On Windows (Git Bash), the activate script is in Scripts/
    [ ! -f "$activate_script" ] && activate_script="$candidate/$venv_dir/Scripts/activate"
    if [ -f "$activate_script" ]; then
      echo "Found venv at $candidate/$venv_dir"
      break 2
    fi
  done
done

If found, activate the venv with

source "$activate_script"

, then run

pip show tikspyder

. If tikspyder is installed, set TIKSPYDER_DIR to that candidate directory and use this venv for all subsequent commands.

Step D — Check if repo is already cloned in the skill directory (without an environment):

bash

ls "$SKILL_DIR/tik-spyder/main.py" 2>/dev/null

If found, set

TIKSPYDER_DIR=$SKILL_DIR/tik-spyder

. The environment will be created in Phase 1.5.

按顺序尝试以下检查，第一个成功后即可停止。

步骤A — 检查当前环境：

bash

pip show tikspyder 2>&1

如果找到，提取

Location:

路径，验证路径下存在

main.py

，对应设置TIKSPYDER_DIR。

步骤B — 检查是否有名为
tikspyder
的conda/mamba环境：

bash

conda env list 2>/dev/null | grep tikspyder

如果存在

tikspyder

环境，按照上文的conda模式激活，再次运行

pip show tikspyder

。如果找到，后续所有命令都使用该环境。

步骤C — 检查是否存在安装了tikspyder的venv环境：

在用户可能克隆并安装tikspyder的常见路径下查找venv：

bash

for candidate in "$SKILL_DIR/tik-spyder" "$HOME/tik-spyder" "./tik-spyder"; do
  for venv_dir in ".venv" "venv"; do
    activate_script="$candidate/$venv_dir/bin/activate"
    # Windows（Git Bash）下激活脚本位于Scripts/目录
    [ ! -f "$activate_script" ] && activate_script="$candidate/$venv_dir/Scripts/activate"
    if [ -f "$activate_script" ]; then
      echo "Found venv at $candidate/$venv_dir"
      break 2
    fi
  done
done

如果找到，使用

source "$activate_script"

激活venv，然后运行

pip show tikspyder

。如果tikspyder已安装，将候选目录设置为TIKSPYDER_DIR，后续所有命令都使用该venv。

步骤D — 检查技能目录下是否已克隆仓库（无环境）：

bash

ls "$SKILL_DIR/tik-spyder/main.py" 2>/dev/null

如果找到，设置

TIKSPYDER_DIR=$SKILL_DIR/tik-spyder

，环境将在阶段1.5中创建。

1.3 Download and install if needed

1.3 按需下载安装

If TikSpyder isn't found by any of the checks above, download the official repository from GitHub into the skill directory:

bash

cd "$SKILL_DIR"
git clone https://github.com/estebanpdl/tik-spyder.git

Then set

TIKSPYDER_DIR=$SKILL_DIR/tik-spyder

如果以上检查都没有找到TikSpyder，将官方GitHub仓库下载到技能目录：

bash

cd "$SKILL_DIR"
git clone https://github.com/estebanpdl/tik-spyder.git

然后设置

TIKSPYDER_DIR=$SKILL_DIR/tik-spyder

。

1.4 Python version

1.4 Python版本检查

bash

python --version 2>&1 || python3 --version 2>&1

TikSpyder needs Python 3.11 or newer. If the version is too old, tell the user and stop.

bash

python --version 2>&1 || python3 --version 2>&1

TikSpyder需要Python 3.11或更高版本。如果版本过低，告知用户后终止流程。

1.5 Create environment and install (only if tikspyder wasn't found in 1.2)

1.5 创建环境并安装（仅当1.2中未找到tikspyder时执行）

If tikspyder is already installed (found in Step A, B, or C), skip this entirely.

Otherwise, create an environment and install. First check whether conda/mamba is available:

bash

conda --version 2>/dev/null || mamba --version 2>/dev/null

If conda/mamba is available, create a dedicated environment, activate it (using the conda pattern), and install TikSpyder from the cloned repository with

pip install -e .

inside TIKSPYDER_DIR.

If conda/mamba is not available, use Python's built-in venv:

bash

cd "$TIKSPYDER_DIR"
python -m venv .venv

Activate — the path differs by platform:

Windows (Git Bash):

source "$TIKSPYDER_DIR/.venv/Scripts/activate"

macOS / Linux:

source "$TIKSPYDER_DIR/.venv/bin/activate"

Then install with

pip install -e .

inside TIKSPYDER_DIR.

Verify it works regardless of which environment type was created:

bash

tikspyder --help

Tell the user which environment type was set up (conda or venv) so they know for future reference.

如果tikspyder已安装（在步骤A、B或C中找到），完全跳过本步骤。

否则创建环境并安装。首先检查是否可用conda/mamba：

bash

conda --version 2>/dev/null || mamba --version 2>/dev/null

如果可用conda/mamba，创建专属环境，按照conda模式激活，在TIKSPYDER_DIR目录下执行

pip install -e .

从克隆的仓库安装TikSpyder。

如果不可用conda/mamba，使用Python内置的venv：

bash

cd "$TIKSPYDER_DIR"
python -m venv .venv

根据平台不同选择激活路径：

Windows（Git Bash）：

source "$TIKSPYDER_DIR/.venv/Scripts/activate"

macOS / Linux：

source "$TIKSPYDER_DIR/.venv/bin/activate"

然后在TIKSPYDER_DIR目录下执行

pip install -e .

完成安装。

无论创建的是哪种环境类型，都验证是否可以正常运行：

bash

tikspyder --help

告知用户搭建的环境类型（conda或venv），方便后续使用。

1.6 Check ffmpeg

1.6 检查ffmpeg

bash

ffmpeg -version 2>&1 | head -1

ffmpeg is needed for audio extraction and keyframe extraction. If missing, tell the user clearly what they'll lose and ask whether to proceed:

"ffmpeg is not installed. Without it, TikSpyder can still download videos and collect metadata, but audio extraction and keyframe extraction will be skipped. You can install it from https://ffmpeg.org/download.html. Would you like to proceed without ffmpeg, or install it first?"

If the user wants to proceed without ffmpeg, that's fine — just remember this for Phase 4 so you're not surprised by

No such file or directory: 'ffmpeg'

errors in the output (they're expected and harmless).

bash

ffmpeg -version 2>&1 | head -1

ffmpeg用于音频提取和关键帧提取。如果缺失，明确告知用户功能受限，询问是否继续：

"未安装ffmpeg。缺少该组件时，TikSpyder仍然可以下载视频和收集元数据，但会跳过音频提取和关键帧提取步骤。你可以从https://ffmpeg.org/download.html安装该组件。你希望暂时不安装ffmpeg继续流程，还是先安装后再继续？"

如果用户选择不安装ffmpeg继续也可以——在阶段4中提前做好预期，不要对输出中的

No such file or directory: 'ffmpeg'

报错感到意外，这类报错属于预期内的无害提示。

1.7 Summary and readiness check (STOP HERE before continuing)

1.7 摘要与就绪检查（继续前请先停留在此步骤）

Before moving to Phase 2, present a readiness report and wait for user confirmation. Do NOT proceed until the user says it's OK. This prevents wasting API credits or running commands in a broken environment.

Show something like:

Environment readiness:
- Python: 3.12 (OK)
- TikSpyder: v0.1.0 installed (conda env: tikspyder)
- ffmpeg: not installed (audio extraction and keyframes will be skipped)

Ready to continue with API key setup?

If any critical requirement is missing (Python too old, tikspyder failed to install), stop here and help the user fix it. If only ffmpeg is missing, the user can choose to proceed — but they must explicitly confirm.

进入阶段2前，先向用户出示就绪报告，等待用户确认。在用户同意前不要继续，避免浪费API额度或在异常环境中运行命令。

示例展示内容：

环境就绪情况：
- Python: 3.12（正常）
- TikSpyder: v0.1.0已安装（conda环境：tikspyder）
- ffmpeg: 未安装（将跳过音频提取和关键帧提取）

是否继续进行API密钥配置？

如果有任何核心要求未满足（Python版本过低、tikspyder安装失败），停在此步骤协助用户修复。如果仅缺失ffmpeg，用户可以选择继续——但必须获得用户明确确认。

Phase 2: API Key Configuration

阶段2：API密钥配置

TikSpyder uses two external APIs:

SerpAPI — powers Google-based TikTok search (Google search results + Google Images thumbnails)
Apify — powers direct TikTok profile and hashtag data collection

Important: TikSpyder runs SerpAPI calls in ALL modes — even

--user

and

--tag

modes trigger Google search and Google Images calls before the Apify step. Without a valid SerpAPI key, those calls will fail with 401 errors. The Apify step will still work, but data collection will be incomplete. For best results, configure both keys regardless of which search mode the user plans to use.

TikSpyder使用两个外部API：

SerpAPI — 提供基于Google的TikTok搜索能力（Google搜索结果 + Google图片缩略图）
Apify — 提供直接的TikTok账号和话题标签数据采集能力

重要提示： TikSpyder在所有模式下都会调用SerpAPI——即使是

--user

和

--tag

模式，也会在Apify步骤前触发Google搜索和Google图片调用。没有有效的SerpAPI密钥时，这些调用会返回401错误。Apify步骤仍然可以正常运行，但数据采集会不完整。为了获得最佳效果，无论用户计划使用哪种搜索模式，都建议配置两个密钥。

2.1 Check existing keys

2.1 检查现有密钥

Read the config file at

$TIKSPYDER_DIR/config/config.ini

using the Read tool. Check whether

api_key

and

apify_token

contain real credentials.

A key is NOT valid if it:

Is empty or whitespace
Contains the word
```
your
```
(e.g.,
```
your_serp_api_key
```
,
```
your_apify_token
```
)
Is literally
```
<the_key>
```
or
```
<the_token>
```
Is shorter than 20 characters (real API keys are longer)

Security rules:

NEVER print, display, or echo API keys back to the user
When reporting status, say "SerpAPI key: configured" or "Apify token: not configured" — never show the actual values
When writing keys to the config file, use the Write tool directly — do not use echo/cat in bash where the key would appear in the command

使用读取工具读取

$TIKSPYDER_DIR/config/config.ini

配置文件，检查

api_key

和

apify_token

是否包含真实凭证。

满足以下条件的密钥视为无效：

为空或仅包含空格
包含
```
your
```
字样（例如
```
your_serp_api_key
```
、
```
your_apify_token
```
）
字面量为
```
<the_key>
```
或
```
<the_token>
```
长度短于20个字符（真实API密钥长度更长）

安全规则：

绝对不要向用户打印、展示或回显API密钥
报告状态时，仅说明"SerpAPI密钥：已配置"或"Apify令牌：未配置"——永远不要展示实际值
向配置文件写入密钥时，直接使用写入工具——不要在bash中使用echo/cat等命令，避免密钥出现在命令记录中

2.2 Ask for missing keys

2.2 索要缺失的密钥

If either key is invalid, ask the user for it. Ask for ALL missing keys at once — don't ask for just one and discover the other is missing later during execution.

Explain what each API is for:

"SerpAPI key — This lets TikSpyder search Google for TikTok content. You can get one at https://serpapi.com/ (they have a free tier)."
"Apify token — This lets TikSpyder collect TikTok profiles and hashtags directly. You can get one at https://apify.com/ (they have a free tier). Required for user profile and hashtag searches."

If the user can only provide one key, that's OK — explain what will and won't work:

SerpAPI only: keyword searches work fully; user/hashtag modes will fail
Apify only: user/hashtag collection works but Google search/images steps will show 401 errors (data collection still succeeds via Apify, just with incomplete results)

如果任意密钥无效，向用户索要。一次性索要所有缺失的密钥——不要先索要一个，执行时才发现另一个也缺失。

向用户解释每个API的用途：

"SerpAPI密钥 — 支持TikSpyder通过Google搜索TikTok内容。你可以从https://serpapi.com/获取（提供免费额度）。"
"Apify令牌 — 支持TikSpyder直接采集TikTok账号和话题标签数据。你可以从https://apify.com/获取（提供免费额度）。是用户账号和话题标签搜索的必填项。"

如果用户只能提供其中一个密钥也可以——向用户说明支持和受限的功能：

仅SerpAPI：关键词搜索功能完整；用户/话题标签模式会失败
仅Apify：用户/话题标签采集可以正常运行，但Google搜索/图片步骤会显示401错误（仍然可以通过Apify成功采集数据，只是结果不完整）

2.3 Save keys

2.3 保存密钥

Write the config file at

$TIKSPYDER_DIR/config/config.ini

using this exact format:

ini

[SerpAPI Key]
api_key = <the_key>

[Apify Token]
apify_token = <the_token>

Preserve any existing valid key if the user only provides one of the two.

按照以下固定格式写入

$TIKSPYDER_DIR/config/config.ini

配置文件：

ini

[SerpAPI Key]
api_key = <the_key>

[Apify Token]
apify_token = <the_token>

如果用户仅提供了其中一个密钥，保留原有有效的另一个密钥。

Phase 3: Collect Search Parameters

阶段3：收集搜索参数

Ask the user what they want to collect. Use AskUserQuestion to make this conversational. Here's the decision tree:

询问用户需要采集的内容。使用AskUserQuestion工具实现对话式交互，决策流程如下：

3.1 Search mode (required)

3.1 搜索模式（必填）

Ask: "What would you like to search for?"

Mode	CLI flag	Notes
Keyword search	`--q "term"`	Searches Google for TikTok results matching the term
User profile	`--user username`	Collects a specific TikTok user's videos (requires `--apify` )
Hashtag	`--tag hashtag`	Collects videos with a specific hashtag (requires `--apify` )

If the user picks user or hashtag mode, the

--apify

flag is automatically required — add it without asking.

Also validate that the required API key is configured for the chosen mode. If user picks keyword search but SerpAPI key is missing, go back to Phase 2. Same for Apify with user/hashtag modes.

询问："你想要搜索什么内容？"

模式	CLI参数	说明
关键词搜索	`--q "关键词"`	通过Google搜索匹配关键词的TikTok结果
用户账号	`--user 用户名`	采集指定TikTok用户的视频（需要 `--apify` 参数）
话题标签	`--tag 话题标签`	采集包含指定话题标签的视频（需要 `--apify` 参数）

如果用户选择用户或话题标签模式，会自动需要

--apify

参数——无需询问直接添加即可。

同时验证所选模式对应的API密钥已配置。如果用户选择关键词搜索但SerpAPI密钥缺失，回到阶段2。用户/话题标签模式对应Apify密钥的校验规则相同。

3.2 Additional parameters

3.2 附加参数

After knowing the search mode, ask about these options. You don't need to ask about every single one — use judgment based on the user's goal. Present the most relevant options:

For keyword searches:

Country (
```
--gl
```
, e.g.,
```
us
```
,
```
gb
```
,
```
mx
```
) — "Which country should Google search from?"
Language (
```
--hl
```
, e.g.,
```
en
```
,
```
es
```
,
```
fr
```
) — "What language?"
Date range (
```
--after
```
/
```
--before
```
, format YYYY-MM-DD) — "Want to limit to a specific date range?"
Search depth (
```
--depth
```
, default 3) — "How deep should related content search go? Default is 3 levels."

For user/hashtag searches (Apify):

Date filters (
```
--oldest-post-date
```
/
```
--newest-post-date
```
, format YYYY-MM-DD)
Number of results (
```
--number-of-results
```
, default 25)

For all modes:

Download videos? (
```
--download
```
) — "Do you want to download the actual video files?"
Output directory (
```
--output
```
) — "Where should I save the results? Default creates a timestamped folder in
```
./tikspyder-data/
```
."
Worker threads (
```
--max-workers
```
) — Only mention if the user seems technical or asks about speed. Default is 5 for downloads, 3 for keyframes.

确定搜索模式后，询问以下选项。不需要逐一询问所有选项——根据用户的目标合理判断，仅展示最相关的选项：

关键词搜索专属：

国家（
```
--gl
```
，例如
```
us
```
、
```
gb
```
、
```
mx
```
）—— "需要指定Google搜索的国家吗？"
语言（
```
--hl
```
，例如
```
en
```
、
```
es
```
、
```
fr
```
）—— "需要指定搜索语言吗？"
日期范围（
```
--after
```
/
```
--before
```
，格式YYYY-MM-DD）—— "需要限定搜索的日期范围吗？"
搜索深度（
```
--depth
```
，默认3）—— "关联内容的搜索深度需要设置为多少？默认是3级。"

用户/话题标签搜索专属（Apify）：

日期过滤（
```
--oldest-post-date
```
/
```
--newest-post-date
```
，格式YYYY-MM-DD）
结果数量（
```
--number-of-results
```
，默认25）

所有模式通用：

下载视频？（
```
--download
```
）—— "你需要下载实际的视频文件吗？"
输出目录（
```
--output
```
）—— "需要指定结果保存路径吗？默认会在
```
./tikspyder-data/
```
下创建带时间戳的文件夹。"
工作线程数（
```
--max-workers
```
）—— 仅当用户看起来有技术背景或询问速度问题时提及。下载默认5线程，关键帧提取默认3线程。

3.3 Date synchronization (critical)

3.3 日期同步（关键要求）

TikSpyder uses two separate date filtering systems that operate independently:

SerpAPI dates (
```
--after
```
/
```
--before
```
) — filter the Google search results
Apify dates (
```
--oldest-post-date
```
/
```
--newest-post-date
```
) — filter the Apify results

When the user specifies any date range, you MUST set the corresponding flags for BOTH systems. Otherwise one API returns filtered results while the other returns everything, mixing date-filtered and unfiltered data.

Mapping:

User says	SerpAPI flag	Apify flag
"after [date]" / "since [date]" / "from [date]"	`--after [date]`	`--oldest-post-date [date]`
"before [date]" / "until [date]"	`--before [date]`	`--newest-post-date [date]`

Example: If the user says "videos after January 2026", the command needs BOTH:

--after 2026-01-01 --oldest-post-date 2026-01-01

This applies to all search modes — keyword, user, and hashtag.

TikSpyder使用两套独立运行的日期过滤系统：

SerpAPI日期参数（
```
--after
```
/
```
--before
```
）—— 过滤Google搜索结果
Apify日期参数（
```
--oldest-post-date
```
/
```
--newest-post-date
```
）—— 过滤Apify结果

当用户指定任意日期范围时，你必须为两套系统设置对应的参数。否则会出现一个API返回过滤结果，另一个返回全部结果的情况，导致混合了日期过滤和未过滤的数据。

参数映射：

用户表述	SerpAPI参数	Apify参数
"[日期]之后" / "从[日期]开始" / "自[日期]起"	`--after [日期]`	`--oldest-post-date [日期]`
"[日期]之前" / "到[日期]为止"	`--before [日期]`	`--newest-post-date [日期]`

示例： 如果用户说"2026年1月之后的视频"，命令需要同时包含两个参数：

--after 2026-01-01 --oldest-post-date 2026-01-01

该规则适用于所有搜索模式——关键词、用户和话题标签。

3.4 Confirm before running

3.4 运行前确认

Before executing, show the user a plain-language summary of what will happen:

Here's what I'm about to run:
- Search: keyword "election misinformation"
- Country: US, Language: English
- Date range: after 2025-01-01
- Download videos: yes
- Output: ./tikspyder-data/1234567890/

Ask for confirmation before proceeding.

执行前，用通俗易懂的语言向用户展示执行摘要：

我即将执行的操作如下：
- 搜索：关键词"选举虚假信息"
- 国家：美国，语言：英语
- 日期范围：2025-01-01之后
- 下载视频：是
- 输出路径：./tikspyder-data/1234567890/

获得用户确认后再继续执行。

Phase 4: Execute

阶段4：执行

4.1 Activate environment if needed

4.1 按需激活环境

Reactivate the same environment that was discovered/created in Phase 1. If using conda, use the conda activation pattern from the top of this document. If using venv, source the activate script (Windows:

.venv/Scripts/activate

, macOS/Linux:

.venv/bin/activate

重新激活阶段1中发现/创建的对应环境。如果使用conda，使用文档开头的conda激活模式。如果使用venv，source激活脚本（Windows：

.venv/Scripts/activate

，macOS/Linux：

.venv/bin/activate

）。

4.2 Build and run the command

4.2 构建并运行命令

Construct the tikspyder CLI command from the collected parameters. Always

cd

into TIKSPYDER_DIR first so the config file is found correctly.

Example commands:

bash

undefined

根据收集到的参数构建tikspyder CLI命令。始终先

cd

到TIKSPYDER_DIR目录，确保可以正确读取配置文件。

示例命令：

bash

undefined

Keyword search with date filter

带日期过滤的关键词搜索

cd "$TIKSPYDER_DIR" && tikspyder --q "search term" --gl us --hl en
--after 2025-01-01 --before 2025-06-01 --output ./data/ --download

User profile with date filter (note: BOTH --after AND --oldest-post-date)

带日期过滤的用户账号搜索（注意：同时包含--after和--oldest-post-date）

cd "$TIKSPYDER_DIR" && tikspyder --user username --apify
--after 2025-01-01 --oldest-post-date 2025-01-01 --output ./data/

Hashtag with date filter

带日期过滤的话题标签搜索

cd "$TIKSPYDER_DIR" && tikspyder --tag hashtag --apify
--after 2025-01-01 --oldest-post-date 2025-01-01
--number-of-results 50 --output ./data/ --download


Remember to prepend the conda or venv activation before `cd` if needed.

Run the command and let the user see the output. Use a generous timeout (up to 10 minutes) since data collection can take a while depending on the search scope.

cd "$TIKSPYDER_DIR" && tikspyder --tag hashtag --apify
--after 2025-01-01 --oldest-post-date 2025-01-01
--number-of-results 50 --output ./data/ --download


如果需要，记得在`cd`前添加conda或venv激活命令。

运行命令，向用户展示输出。设置充足的超时时间（最长10分钟），因为数据采集的耗时取决于搜索范围。

4.3 Error handling

4.3 错误处理

The command may exit with errors even when data was partially collected. Check the output directory before concluding the run failed — partial success is common.

Expected noise (not errors):

Output	Explanation
`Error extracting audio: No such file or directory: 'ffmpeg'`	ffmpeg not installed — videos are still downloaded, only audio extraction is skipped. Harmless if user chose to proceed without ffmpeg in Phase 1.

Errors that need action:

Error	Likely cause	Fix
`ValueError: Either --user, --q or --tag must be provided`	Missing search term	Ask what they want to search
`401 Client Error: Unauthorized` from serpapi.com	SerpAPI key is invalid or still a placeholder	This should NOT happen — it means Phase 2 failed to detect an invalid key. Ask the user for the correct key, save it, and rerun. Review Phase 2.1 validation rules to understand what went wrong.
`IndexError: list index out of range` in `sql_manager.py`	SerpAPI returned no data, leaving the SQL database empty	This is a downstream symptom of a bad SerpAPI key. Same fix as above.
`ApifyApiError: User was not found or authentication token is not valid`	Bad Apify token	Ask user to check their token at https://console.apify.com/account
`RuntimeError` with asyncio	Event loop conflict	Run `cd "$TIKSPYDER_DIR" && git pull` to get the latest fix
Connection/timeout errors	Network issues	Suggest checking internet connection, or trying with fewer results

即使部分数据已采集成功，命令也可能返回错误。判断运行失败前先检查输出目录——部分成功是常见情况。

预期提示（非错误）：

输出内容	说明
`Error extracting audio: No such file or directory: 'ffmpeg'`	未安装ffmpeg——视频仍然会下载，仅跳过音频提取。如果用户在阶段1选择不安装ffmpeg继续，属于无害提示。

需要处理的错误：

错误信息	可能原因	修复方案
`ValueError: Either --user, --q or --tag must be provided`	缺失搜索词	询问用户需要搜索的内容
serpapi.com返回 `401 Client Error: Unauthorized`	SerpAPI密钥无效或仍然是占位符	该错误本不应出现——说明阶段2未检测出无效密钥。向用户索要正确的密钥，保存后重新运行。回顾阶段2.1的校验规则排查问题。
`sql_manager.py` 中出现 `IndexError: list index out of range`	SerpAPI未返回数据，导致SQL数据库为空	是SerpAPI密钥错误的下游症状，修复方案同上。
`ApifyApiError: User was not found or authentication token is not valid`	Apify令牌错误	建议用户在https://console.apify.com/account检查令牌
asyncio相关 `RuntimeError`	事件循环冲突	运行 `cd "$TIKSPYDER_DIR" && git pull` 获取最新修复版本
连接/超时错误	网络问题	建议检查网络连接，或减少采集结果数量后重试

4.4 Post-run summary

4.4 运行后摘要

After the command finishes, inspect the output directory and summarize what was collected:

bash

find <output_dir> -type f | head -50
ls -la <output_dir>/*.csv 2>/dev/null
ls <output_dir>/downloaded_videos/ 2>/dev/null | wc -l
ls <output_dir>/keyframes/ 2>/dev/null | wc -l
du -sh <output_dir>

Report to the user:

Number of CSV data files generated
Number of videos downloaded (if applicable)
Number of keyframes extracted (if applicable)
Total size of the output directory
Path to the main CSV file(s) they can open in Excel or Google Sheets

命令执行完成后，检查输出目录，汇总采集结果：

bash

find <output_dir> -type f | head -50
ls -la <output_dir>/*.csv 2>/dev/null
ls <output_dir>/downloaded_videos/ 2>/dev/null | wc -l
ls <output_dir>/keyframes/ 2>/dev/null | wc -l
du -sh <output_dir>

向用户报告：

生成的CSV数据文件数量
下载的视频数量（如果启用）
提取的关键帧数量（如果启用）
输出目录的总大小
可在Excel或Google Sheets中打开的主CSV文件路径

4.5 Save environment cache (after successful run)

4.5 保存环境缓存（运行成功后执行）

After a successful run (data was collected, output directory has files), save the resolved environment details so the next session can skip Phase 1 discovery. Write a JSON file at

$SKILL_DIR/.tikspyder-env.json

json

{
  "tikspyder_dir": "/absolute/path/to/tik-spyder",
  "skill_dir": "/absolute/path/to/skill/directory",
  "env_type": "conda",
  "env_name": "tikspyder",
  "python_version": "3.12.1",
  "ffmpeg_available": false,
  "api_keys_configured": true,
  "last_successful_run": "2026-02-27"
}

```
env_type
```
should be
```
"conda"
```
or
```
"venv"
```
```
env_name
```
is the conda environment name (only relevant for conda)
Use absolute paths so the cache works regardless of the working directory
Only write this file after a confirmed successful run — never after errors

If the file already exists, update it (the paths or ffmpeg status may have changed).

运行成功后（已采集到数据，输出目录包含文件），保存解析好的环境详情，下次会话可以跳过阶段1的环境发现流程。在

$SKILL_DIR/.tikspyder-env.json

写入JSON文件：

json

{
  "tikspyder_dir": "/absolute/path/to/tik-spyder",
  "skill_dir": "/absolute/path/to/skill/directory",
  "env_type": "conda",
  "env_name": "tikspyder",
  "python_version": "3.12.1",
  "ffmpeg_available": false,
  "api_keys_configured": true,
  "last_successful_run": "2026-02-27"
}

```
env_type
```
取值为
```
"conda"
```
或
```
"venv"
```
```
env_name
```
为conda环境名称（仅conda环境需要）
使用绝对路径，确保缓存不受工作目录影响
仅在确认运行成功后写入该文件——出现错误时不要写入

如果文件已存在，更新内容（路径或ffmpeg状态可能发生变化）。

Phase 5: Streamlit App (Alternative)

阶段5：Streamlit应用（备选方案）

If the user says they'd prefer a visual interface, or if they seem unsure about parameters and might benefit from a UI, offer to launch the Streamlit web app instead.

Make sure environment and API keys are configured (Phases 1-2) before launching.

Activate the environment (conda or venv, same as Phase 4), then run:

bash

cd "$TIKSPYDER_DIR" && tikspyder --app

This starts a local web server at

http://localhost:8501

. Tell the user:

"I've launched the TikSpyder web interface. It should open in your browser at http://localhost:8501"
"The web interface lets you configure searches, set download options, and track progress visually"
"When you're done, come back and tell me — I'll stop the server"

The Streamlit app runs as a blocking process. To keep the conversation going while it runs, launch it in the background using

run_in_background

. When the user is done, find and stop the process listening on port 8501.

如果用户表示更偏好可视化界面，或者用户对参数不确定、适合通过UI操作，可以提议启动Streamlit网页应用。

启动前确保环境和API密钥已配置（阶段1-2已完成）。

激活环境（conda或venv，和阶段4相同），然后运行：

bash

cd "$TIKSPYDER_DIR" && tikspyder --app

该命令会在

http://localhost:8501

启动本地web服务器。告知用户：

"我已启动TikSpyder网页界面，会自动在浏览器中打开，地址为http://localhost:8501"
"网页界面支持配置搜索参数、设置下载选项、可视化跟踪进度"
"使用完成后回来告诉我，我会停止服务器"

Streamlit应用是阻塞进程。为了在运行过程中保持对话正常进行，使用

run_in_background

在后台启动。用户使用完成后，找到并停止监听8501端口的进程即可。

Quick Reference: Full CLI Flags

快速参考：完整CLI参数列表

Flag	Type	Description
`--q`	string	Search keyword/phrase
`--user`	string	TikTok username
`--tag`	string	TikTok hashtag
`--gl`	string	Country code (e.g., `us` )
`--hl`	string	Language code (e.g., `en` )
`--cr`	string	Multiple country filter
`--lr`	string	Multiple language filter
`--safe`	string	Adult content filter: `active` (default) or `off`
`--google-domain`	string	Google domain (default: `google.com` )
`--depth`	int	Related content depth (default: 3)
`--before`	string	Date upper bound (YYYY-MM-DD)
`--after`	string	Date lower bound (YYYY-MM-DD)
`--apify`	flag	Enable Apify integration
`--oldest-post-date`	string	Apify: oldest post date (YYYY-MM-DD)
`--newest-post-date`	string	Apify: newest post date (YYYY-MM-DD)
`--number-of-results`	int	Apify: max results (default: 25)
`-d, --download`	flag	Download video files
`-w, --max-workers`	int	Thread count for downloads
`-o, --output`	string	Output directory path
`--app`	flag	Launch Streamlit web UI

参数	类型	说明
`--q`	字符串	搜索关键词/短语
`--user`	字符串	TikTok用户名
`--tag`	字符串	TikTok话题标签
`--gl`	字符串	国家代码（例如 `us` ）
`--hl`	字符串	语言代码（例如 `en` ）
`--cr`	字符串	多国家过滤
`--lr`	字符串	多语言过滤
`--safe`	字符串	成人内容过滤： `active` （默认）或 `off`
`--google-domain`	字符串	Google域名（默认： `google.com` ）
`--depth`	整数	关联内容搜索深度（默认：3）
`--before`	字符串	日期上限（YYYY-MM-DD）
`--after`	字符串	日期下限（YYYY-MM-DD）
`--apify`	开关	启用Apify集成
`--oldest-post-date`	字符串	Apify：最早发布日期（YYYY-MM-DD）
`--newest-post-date`	字符串	Apify：最晚发布日期（YYYY-MM-DD）
`--number-of-results`	整数	Apify：最大结果数（默认：25）
`-d, --download`	开关	下载视频文件
`-w, --max-workers`	整数	下载线程数
`-o, --output`	字符串	输出目录路径
`--app`	开关	启动Streamlit网页UI