solo-index-youtube
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese/index-youtube
/index-youtube
Index YouTube video transcripts into a searchable knowledge base. Supports two modes depending on available tools.
将YouTube视频字幕索引到可搜索的知识库中。根据可用工具支持两种模式。
Prerequisites
前提条件
Check that yt-dlp is available:
bash
which yt-dlp || echo "MISSING: install yt-dlp (brew install yt-dlp / pip install yt-dlp / pipx install yt-dlp)"检查yt-dlp是否可用:
bash
which yt-dlp || echo "MISSING: install yt-dlp (brew install yt-dlp / pip install yt-dlp / pipx install yt-dlp)"Arguments
参数解析
Parse for channel handles or "all":
$ARGUMENTS- If empty or "all": index all channels (from config or ask user)
- If one or more handles: index only those channels (e.g., )
GregIsenberg ycombinator - Optional flags: (max videos per channel, default 10),
-n <limit>(parse only)--dry-run
从中解析频道标识或“all”:
$ARGUMENTS- 如果为空或为“all”:索引所有频道(来自配置或询问用户)
- 如果是一个或多个标识:仅索引这些频道(例如:)
GregIsenberg ycombinator - 可选参数:(每个频道的最大视频数,默认10),
-n <limit>(仅解析,不执行索引)--dry-run
Mode Detection
模式检测
Check which mode is available:
检查可用的模式:
Mode 1: With solograph MCP (recommended)
模式1:配合solograph MCP使用(推荐)
If MCP tools , , are available, use solograph for indexing and search.
source_searchsource_listsource_tagsSetup (if not yet installed):
bash
undefined如果MCP工具、、可用,使用solograph进行索引和搜索。
source_searchsource_listsource_tags设置(若尚未安装):
bash
undefinedInstall solograph
安装solograph
pip install solograph
pip install solograph
or
或
uvx solograph
**Indexing via solograph CLI:**
```bashuvx solograph
**通过solograph CLI进行索引:**
```bashSingle channel
单个频道
solograph-cli index-youtube -c GregIsenberg -n 10
solograph-cli index-youtube -c GregIsenberg -n 10
Multiple channels
多个频道
solograph-cli index-youtube -c GregIsenberg -c ycombinator -n 10
solograph-cli index-youtube -c GregIsenberg -c ycombinator -n 10
All channels (from channels.yaml in solograph config)
所有频道(来自solograph配置中的channels.yaml)
solograph-cli index-youtube -n 10
solograph-cli index-youtube -n 10
Dry run (parse only, no DB writes)
试运行(仅解析,不写入数据库)
solograph-cli index-youtube --dry-run
If `solograph-cli` is not on PATH, try:
```bash
uvx solograph-cli index-youtube -c <handle> -n 10Verification via MCP:
- — check that youtube source appears
source_list - — test semantic search
source_search("startup idea", source="youtube") - — see auto-detected topics from transcripts
source_tags - — find related videos by tags
source_related(video_id)
solograph-cli index-youtube --dry-run
如果`solograph-cli`不在PATH中,尝试:
```bash
uvx solograph-cli index-youtube -c <handle> -n 10通过MCP验证:
- — 检查YouTube源是否已显示
source_list - — 测试语义搜索
source_search("startup idea", source="youtube") - — 查看从字幕中自动识别的主题
source_tags - — 通过标签查找相关视频
source_related(video_id)
Mode 2: Without MCP (standalone fallback)
模式2:不使用MCP(独立备用模式)
If solograph MCP tools are NOT available, use yt-dlp directly to download transcripts and analyze them.
Step 1: Download video list
bash
undefined如果solograph MCP工具不可用,直接使用yt-dlp下载字幕并进行分析。
步骤1:下载视频列表
bash
undefinedGet recent video URLs from a channel
获取频道的最新视频URL
yt-dlp --flat-playlist --print url "https://www.youtube.com/@GregIsenberg/videos" | head -n 10
**Step 2: Download transcripts**
```bashyt-dlp --flat-playlist --print url "https://www.youtube.com/@GregIsenberg/videos" | head -n 10
**步骤2:下载字幕**
```bashDownload auto-generated subtitles (no video download)
下载自动生成的字幕(不下载视频)
yt-dlp --write-auto-sub --sub-lang en --skip-download --sub-format vtt
-o "docs/youtube/%(channel)s/%(title)s.%(ext)s"
"<video-url>"
-o "docs/youtube/%(channel)s/%(title)s.%(ext)s"
"<video-url>"
**Step 3: Convert VTT to readable text**
```bashyt-dlp --write-auto-sub --sub-lang en --skip-download --sub-format vtt
-o "docs/youtube/%(channel)s/%(title)s.%(ext)s"
"<video-url>"
-o "docs/youtube/%(channel)s/%(title)s.%(ext)s"
"<video-url>"
**步骤3:将VTT转换为可读文本**
```bashStrip VTT formatting (timestamps, positioning)
去除VTT格式(时间戳、位置信息)
sed '/^$/d; /^[0-9]/d; /^NOTE/d; /^WEBVTT/d; /-->/d' docs/youtube/channel/video.vtt |
awk '!seen[$0]++' > docs/youtube/channel/video.txt
awk '!seen[$0]++' > docs/youtube/channel/video.txt
**Step 4: Create index**
Read each transcript with the Read tool. For each video, extract:
- Title (from filename or yt-dlp metadata)
- Key topics and insights
- Actionable takeaways
- Timestamps for notable segments (if chapter markers exist)
Write a summary index to `docs/youtube/index.md`:
```markdownsed '/^$/d; /^[0-9]/d; /^NOTE/d; /^WEBVTT/d; /-->/d' docs/youtube/channel/video.vtt |
awk '!seen[$0]++' > docs/youtube/channel/video.txt
awk '!seen[$0]++' > docs/youtube/channel/video.txt
**步骤4:创建索引**
使用读取工具读取每个字幕文件。针对每个视频,提取:
- 标题(来自文件名或yt-dlp元数据)
- 关键主题和见解
- 可执行要点
- 重要片段的时间戳(若存在章节标记)
将摘要索引写入`docs/youtube/index.md`:
```markdownYouTube Knowledge Index
YouTube知识索引
Channel: {channel_name}
频道:{channel_name}
{video_title}
{video_title}
- URL: {url}
- Key topics: {topic1}, {topic2}
- Insights: {summary}
- Actionable: {takeaway}
**Step 5: Search indexed content**
With transcripts saved as text files, use Grep to search:
```bash- URL: {url}
- 关键主题: {topic1}, {topic2}
- 见解: {summary}
- 可执行要点: {takeaway}
**步骤5:搜索已索引内容**
将字幕保存为文本文件后,使用Grep进行搜索:
```bashSearch across all transcripts
在所有字幕中搜索
grep -ri "startup idea" docs/youtube/
undefinedgrep -ri "startup idea" docs/youtube/
undefinedOutput
输出
Report to the user:
- Number of videos indexed
- Number of transcripts downloaded (vs skipped — no transcript available)
- How many had chapter markers
- Index file location
- How to search the indexed content (MCP tool or Grep command)
向用户报告以下内容:
- 已索引的视频数量
- 已下载的字幕数量(对比跳过的数量——无可用字幕的视频)
- 带有章节标记的视频数量
- 索引文件位置
- 如何搜索已索引内容(使用MCP工具或Grep命令)
Common Issues
常见问题
"MISSING: install yt-dlp"
“MISSING: install yt-dlp”
Cause: yt-dlp not installed.
Fix: Run (macOS), , or .
brew install yt-dlppip install yt-dlppipx install yt-dlp原因: yt-dlp未安装。
解决方法: 运行(macOS)、或。
brew install yt-dlppip install yt-dlppipx install yt-dlpVideos skipped (no transcript)
视频被跳过(无字幕)
Cause: Video has no auto-generated or manual subtitles.
Fix: This is expected — some videos lack transcripts. Only videos with available subtitles can be indexed.
原因: 视频没有自动生成或手动添加的字幕。
解决方法: 这是正常情况——部分视频没有可用字幕。只有带有可用字幕的视频才能被索引。
Rate limiting from YouTube
YouTube限制请求频率
Cause: Too many requests in short time.
Fix: Reduce limit, add to yt-dlp commands, or use for authenticated access.
-n--sleep-interval 2--cookies-from-browser chrome原因: 短时间内请求次数过多。
解决方法: 减少参数的限制值,在yt-dlp命令中添加,或使用进行已验证的访问。
-n--sleep-interval 2--cookies-from-browser chromesolograph-cli not found
solograph-cli未找到
Cause: solograph not installed or not on PATH.
Fix: Install with or . Check .
pip install solographuvx solographwhich solograph-cli原因: solograph未安装或不在PATH中。
解决方法: 使用或安装。检查确认路径。
pip install solographuvx solographwhich solograph-cli