solo-index-youtube

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

/index-youtube

/index-youtube

Index YouTube video transcripts into a searchable knowledge base. Supports two modes depending on available tools.
将YouTube视频字幕索引到可搜索的知识库中。根据可用工具支持两种模式。

Prerequisites

前提条件

Check that yt-dlp is available:
bash
which yt-dlp || echo "MISSING: install yt-dlp (brew install yt-dlp / pip install yt-dlp / pipx install yt-dlp)"
检查yt-dlp是否可用:
bash
which yt-dlp || echo "MISSING: install yt-dlp (brew install yt-dlp / pip install yt-dlp / pipx install yt-dlp)"

Arguments

参数解析

Parse
$ARGUMENTS
for channel handles or "all":
  • If empty or "all": index all channels (from config or ask user)
  • If one or more handles: index only those channels (e.g.,
    GregIsenberg ycombinator
    )
  • Optional flags:
    -n <limit>
    (max videos per channel, default 10),
    --dry-run
    (parse only)
$ARGUMENTS
中解析频道标识或“all”:
  • 如果为空或为“all”:索引所有频道(来自配置或询问用户)
  • 如果是一个或多个标识:仅索引这些频道(例如:
    GregIsenberg ycombinator
  • 可选参数:
    -n <limit>
    (每个频道的最大视频数,默认10),
    --dry-run
    (仅解析,不执行索引)

Mode Detection

模式检测

Check which mode is available:
检查可用的模式:

Mode 1: With solograph MCP (recommended)

模式1:配合solograph MCP使用(推荐)

If MCP tools
source_search
,
source_list
,
source_tags
are available, use solograph for indexing and search.
Setup (if not yet installed):
bash
undefined
如果MCP工具
source_search
source_list
source_tags
可用,使用solograph进行索引和搜索。
设置(若尚未安装):
bash
undefined

Install solograph

安装solograph

pip install solograph
pip install solograph

or

uvx solograph

**Indexing via solograph CLI:**
```bash
uvx solograph

**通过solograph CLI进行索引:**
```bash

Single channel

单个频道

solograph-cli index-youtube -c GregIsenberg -n 10
solograph-cli index-youtube -c GregIsenberg -n 10

Multiple channels

多个频道

solograph-cli index-youtube -c GregIsenberg -c ycombinator -n 10
solograph-cli index-youtube -c GregIsenberg -c ycombinator -n 10

All channels (from channels.yaml in solograph config)

所有频道(来自solograph配置中的channels.yaml)

solograph-cli index-youtube -n 10
solograph-cli index-youtube -n 10

Dry run (parse only, no DB writes)

试运行(仅解析,不写入数据库)

solograph-cli index-youtube --dry-run

If `solograph-cli` is not on PATH, try:
```bash
uvx solograph-cli index-youtube -c <handle> -n 10
Verification via MCP:
  • source_list
    — check that youtube source appears
  • source_search("startup idea", source="youtube")
    — test semantic search
  • source_tags
    — see auto-detected topics from transcripts
  • source_related(video_id)
    — find related videos by tags
solograph-cli index-youtube --dry-run

如果`solograph-cli`不在PATH中,尝试:
```bash
uvx solograph-cli index-youtube -c <handle> -n 10
通过MCP验证:
  • source_list
    — 检查YouTube源是否已显示
  • source_search("startup idea", source="youtube")
    — 测试语义搜索
  • source_tags
    — 查看从字幕中自动识别的主题
  • source_related(video_id)
    — 通过标签查找相关视频

Mode 2: Without MCP (standalone fallback)

模式2:不使用MCP(独立备用模式)

If solograph MCP tools are NOT available, use yt-dlp directly to download transcripts and analyze them.
Step 1: Download video list
bash
undefined
如果solograph MCP工具不可用,直接使用yt-dlp下载字幕并进行分析。
步骤1:下载视频列表
bash
undefined

Get recent video URLs from a channel

获取频道的最新视频URL

yt-dlp --flat-playlist --print url "https://www.youtube.com/@GregIsenberg/videos" | head -n 10

**Step 2: Download transcripts**
```bash
yt-dlp --flat-playlist --print url "https://www.youtube.com/@GregIsenberg/videos" | head -n 10

**步骤2:下载字幕**
```bash

Download auto-generated subtitles (no video download)

下载自动生成的字幕(不下载视频)

yt-dlp --write-auto-sub --sub-lang en --skip-download --sub-format vtt
-o "docs/youtube/%(channel)s/%(title)s.%(ext)s"
"<video-url>"

**Step 3: Convert VTT to readable text**
```bash
yt-dlp --write-auto-sub --sub-lang en --skip-download --sub-format vtt
-o "docs/youtube/%(channel)s/%(title)s.%(ext)s"
"<video-url>"

**步骤3:将VTT转换为可读文本**
```bash

Strip VTT formatting (timestamps, positioning)

去除VTT格式(时间戳、位置信息)

sed '/^$/d; /^[0-9]/d; /^NOTE/d; /^WEBVTT/d; /-->/d' docs/youtube/channel/video.vtt |
awk '!seen[$0]++' > docs/youtube/channel/video.txt

**Step 4: Create index**

Read each transcript with the Read tool. For each video, extract:
- Title (from filename or yt-dlp metadata)
- Key topics and insights
- Actionable takeaways
- Timestamps for notable segments (if chapter markers exist)

Write a summary index to `docs/youtube/index.md`:

```markdown
sed '/^$/d; /^[0-9]/d; /^NOTE/d; /^WEBVTT/d; /-->/d' docs/youtube/channel/video.vtt |
awk '!seen[$0]++' > docs/youtube/channel/video.txt

**步骤4:创建索引**

使用读取工具读取每个字幕文件。针对每个视频,提取:
- 标题(来自文件名或yt-dlp元数据)
- 关键主题和见解
- 可执行要点
- 重要片段的时间戳(若存在章节标记)

将摘要索引写入`docs/youtube/index.md`:

```markdown

YouTube Knowledge Index

YouTube知识索引

Channel: {channel_name}

频道:{channel_name}

{video_title}

{video_title}

  • URL: {url}
  • Key topics: {topic1}, {topic2}
  • Insights: {summary}
  • Actionable: {takeaway}

**Step 5: Search indexed content**

With transcripts saved as text files, use Grep to search:
```bash
  • URL: {url}
  • 关键主题: {topic1}, {topic2}
  • 见解: {summary}
  • 可执行要点: {takeaway}

**步骤5:搜索已索引内容**

将字幕保存为文本文件后,使用Grep进行搜索:
```bash

Search across all transcripts

在所有字幕中搜索

grep -ri "startup idea" docs/youtube/
undefined
grep -ri "startup idea" docs/youtube/
undefined

Output

输出

Report to the user:
  1. Number of videos indexed
  2. Number of transcripts downloaded (vs skipped — no transcript available)
  3. How many had chapter markers
  4. Index file location
  5. How to search the indexed content (MCP tool or Grep command)
向用户报告以下内容:
  1. 已索引的视频数量
  2. 已下载的字幕数量(对比跳过的数量——无可用字幕的视频)
  3. 带有章节标记的视频数量
  4. 索引文件位置
  5. 如何搜索已索引内容(使用MCP工具或Grep命令)

Common Issues

常见问题

"MISSING: install yt-dlp"

“MISSING: install yt-dlp”

Cause: yt-dlp not installed. Fix: Run
brew install yt-dlp
(macOS),
pip install yt-dlp
, or
pipx install yt-dlp
.
原因: yt-dlp未安装。 解决方法: 运行
brew install yt-dlp
(macOS)、
pip install yt-dlp
pipx install yt-dlp

Videos skipped (no transcript)

视频被跳过(无字幕)

Cause: Video has no auto-generated or manual subtitles. Fix: This is expected — some videos lack transcripts. Only videos with available subtitles can be indexed.
原因: 视频没有自动生成或手动添加的字幕。 解决方法: 这是正常情况——部分视频没有可用字幕。只有带有可用字幕的视频才能被索引。

Rate limiting from YouTube

YouTube限制请求频率

Cause: Too many requests in short time. Fix: Reduce
-n
limit, add
--sleep-interval 2
to yt-dlp commands, or use
--cookies-from-browser chrome
for authenticated access.
原因: 短时间内请求次数过多。 解决方法: 减少
-n
参数的限制值,在yt-dlp命令中添加
--sleep-interval 2
,或使用
--cookies-from-browser chrome
进行已验证的访问。

solograph-cli not found

solograph-cli未找到

Cause: solograph not installed or not on PATH. Fix: Install with
pip install solograph
or
uvx solograph
. Check
which solograph-cli
.
原因: solograph未安装或不在PATH中。 解决方法: 使用
pip install solograph
uvx solograph
安装。检查
which solograph-cli
确认路径。