baoyu-youtube-transcript
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseYouTube Transcript
YouTube 转录文本(字幕)下载
Downloads transcripts (subtitles/captions) from YouTube videos. Works with both manually created and auto-generated transcripts. No API key or browser required — uses YouTube's InnerTube API directly.
Fetches video metadata and cover image on first run, caches raw data for fast re-formatting.
从YouTube视频下载转录文本(字幕/字幕文件)。支持手动创建和自动生成的转录文本。无需API密钥或浏览器——直接调用YouTube的InnerTube API。
首次运行时获取视频元数据和封面图片,缓存原始数据以实现快速重新格式化。
Script Directory
脚本目录
Scripts in subdirectory. = this SKILL.md's directory path. Resolve runtime: if installed → ; if available → ; else suggest installing bun. Replace and with actual values.
scripts/{baseDir}${BUN_X}bunbunnpxnpx -y bun{baseDir}${BUN_X}| Script | Purpose |
|---|---|
| Transcript download CLI |
脚本位于子目录中。 = 本SKILL.md文件所在的目录路径。解析运行时:若已安装则使用;若可用则使用;否则建议安装bun。将和替换为实际值。
scripts/{baseDir}${BUN_X}bunbunnpxnpx -y bun{baseDir}${BUN_X}| 脚本 | 用途 |
|---|---|
| 转录文本下载命令行工具 |
Usage
使用方法
bash
undefinedbash
undefinedDefault: markdown with timestamps (English)
默认:带时间戳的Markdown格式(英文)
${BUN_X} {baseDir}/scripts/main.ts <youtube-url-or-id>
${BUN_X} {baseDir}/scripts/main.ts <youtube-url-or-id>
Specify languages (priority order)
指定语言(优先级顺序)
${BUN_X} {baseDir}/scripts/main.ts <url> --languages zh,en,ja
${BUN_X} {baseDir}/scripts/main.ts <url> --languages zh,en,ja
Without timestamps
不带时间戳
${BUN_X} {baseDir}/scripts/main.ts <url> --no-timestamps
${BUN_X} {baseDir}/scripts/main.ts <url> --no-timestamps
With chapter segmentation
按章节划分
${BUN_X} {baseDir}/scripts/main.ts <url> --chapters
${BUN_X} {baseDir}/scripts/main.ts <url> --chapters
With speaker identification (requires AI post-processing)
带说话人识别(需要AI后处理)
${BUN_X} {baseDir}/scripts/main.ts <url> --speakers
${BUN_X} {baseDir}/scripts/main.ts <url> --speakers
SRT subtitle file
生成SRT字幕文件
${BUN_X} {baseDir}/scripts/main.ts <url> --format srt
${BUN_X} {baseDir}/scripts/main.ts <url> --format srt
Translate transcript
翻译转录文本
${BUN_X} {baseDir}/scripts/main.ts <url> --translate zh-Hans
${BUN_X} {baseDir}/scripts/main.ts <url> --translate zh-Hans
List available transcripts
列出可用的转录文本
${BUN_X} {baseDir}/scripts/main.ts <url> --list
${BUN_X} {baseDir}/scripts/main.ts <url> --list
Force re-fetch (ignore cache)
强制重新获取(忽略缓存)
${BUN_X} {baseDir}/scripts/main.ts <url> --refresh
undefined${BUN_X} {baseDir}/scripts/main.ts <url> --refresh
undefinedOptions
选项
| Option | Description | Default |
|---|---|---|
| YouTube URL or video ID (multiple allowed) | Required |
| Language codes, comma-separated, in priority order | |
| Output format: | |
| Translate to specified language code | |
| List available transcripts instead of fetching | |
| Include | on |
| Disable timestamps | |
| Chapter segmentation from video description | |
| Raw transcript with metadata for speaker identification | |
| Skip auto-generated transcripts | |
| Skip manually created transcripts | |
| Force re-fetch, ignore cached data | |
| Save to specific file path | auto-generated |
| Base output directory | |
| 选项 | 说明 | 默认值 |
|---|---|---|
| YouTube URL或视频ID(支持多个) | 必填 |
| 语言代码,逗号分隔,按优先级排序 | |
| 输出格式: | |
| 翻译为指定的语言代码 | |
| 列出可用的转录文本而非直接获取 | |
| 为每个段落添加 | 开启 |
| 禁用时间戳 | |
| 从视频描述中解析章节划分 | |
| 包含说话人识别所需元数据的原始转录文本 | |
| 跳过自动生成的转录文本 | |
| 跳过手动创建的转录文本 | |
| 强制重新获取,忽略缓存数据 | |
| 保存到指定文件路径 | 自动生成 |
| 基础输出目录 | |
Input Formats
输入格式
Accepts any of these as video input:
- Full URL:
https://www.youtube.com/watch?v=dQw4w9WgXcQ - Short URL:
https://youtu.be/dQw4w9WgXcQ - Embed URL:
https://www.youtube.com/embed/dQw4w9WgXcQ - Shorts URL:
https://www.youtube.com/shorts/dQw4w9WgXcQ - Video ID:
dQw4w9WgXcQ
支持以下任意一种视频输入格式:
- 完整URL:
https://www.youtube.com/watch?v=dQw4w9WgXcQ - 短URL:
https://youtu.be/dQw4w9WgXcQ - 嵌入URL:
https://www.youtube.com/embed/dQw4w9WgXcQ - Shorts URL:
https://www.youtube.com/shorts/dQw4w9WgXcQ - 视频ID:
dQw4w9WgXcQ
Output Formats
输出格式
| Format | Extension | Description |
|---|---|---|
| | Markdown with frontmatter (incl. |
| | SubRip subtitle format for video players |
| 格式 | 扩展名 | 说明 |
|---|---|---|
| | 带前置元数据(包含 |
| | 适用于视频播放器的SubRip字幕格式 |
Output Directory
输出目录
youtube-transcript/
├── .index.json # Video ID → directory path mapping (for cache lookup)
└── {channel-slug}/{title-full-slug}/
├── meta.json # Video metadata (title, channel, description, duration, chapters, etc.)
├── transcript-raw.json # Raw transcript snippets from YouTube API (cached)
├── transcript-sentences.json # Sentence-segmented transcript (split by punctuation, merged across snippets)
├── imgs/
│ └── cover.jpg # Video thumbnail
├── transcript.md # Markdown transcript (generated from sentences)
└── transcript.srt # SRT subtitle (generated from raw snippets, if --format srt)- : Channel name in kebab-case
{channel-slug} - : Full video title in kebab-case
{title-full-slug}
The mode outputs to stdout only (no file saved).
--listyoutube-transcript/
├── .index.json # 视频ID → 目录路径映射(用于缓存查找)
└── {channel-slug}/{title-full-slug}/
├── meta.json # 视频元数据(标题、频道、描述、时长、章节等)
├── transcript-raw.json # 从YouTube API获取的原始转录文本片段(已缓存)
├── transcript-sentences.json # 按句子分割的转录文本(按标点符号拆分,合并跨片段内容)
├── imgs/
│ └── cover.jpg # 视频封面图片
├── transcript.md # Markdown格式转录文本(从句子数据生成)
└── transcript.srt # SRT字幕文件(从原始片段生成,仅当指定--format srt时存在)- :频道名称的短横线分隔格式(kebab-case)
{channel-slug} - :完整视频标题的短横线分隔格式(kebab-case)
{title-full-slug}
--listCaching
缓存机制
On first fetch, the script saves:
- — video metadata, chapters, cover image path, language info
meta.json - — raw transcript snippets from YouTube API (
transcript-raw.json){ text, start, duration }[] - — sentence-segmented transcript (
transcript-sentences.json), split by sentence-ending punctuation ({ text, start: "HH:mm:ss", end: "HH:mm:ss" }[]etc.), timestamps proportionally allocated by character length, CJK-aware text merging.?!…。?! - — video thumbnail
imgs/cover.jpg
Subsequent runs for the same video use cached data (no network calls). Use to force re-fetch. If a different language is requested, the cache is automatically refreshed.
--refreshSRT output () is generated from . Text/markdown output uses for natural sentence boundaries.
--format srttranscript-raw.jsontranscript-sentences.json首次获取时,脚本会保存以下内容:
- — 视频元数据、章节、封面图片路径、语言信息
meta.json - — 从YouTube API获取的原始转录文本片段(格式为
transcript-raw.json){ text, start, duration }[] - — 按句子分割的转录文本(格式为
transcript-sentences.json),按句末标点({ text, start: "HH:mm:ss", end: "HH:mm:ss" }[]等)拆分,按字符长度比例分配时间戳,支持中日韩文本合并.?!…。?! - — 视频封面图片
imgs/cover.jpg
后续针对同一视频的运行会使用缓存数据(无需网络请求)。使用参数可强制重新获取数据。若请求不同语言的转录文本,缓存会自动刷新。
--refreshSRT格式输出()由生成。文本/Markdown格式输出使用以实现自然的句子边界。
--format srttranscript-raw.jsontranscript-sentences.jsonWorkflow
使用流程
When user provides a YouTube URL and wants the transcript:
- Run with first if the user hasn't specified a language, to show available options
--list - Always single-quote the URL when running the script — zsh treats as a glob wildcard, so an unquoted YouTube URL causes "no matches found": use
?'https://www.youtube.com/watch?v=ID' - Default: run with for the richest output (chapters + speaker identification)
--chapters --speakers - The script auto-saves cached data + output file and prints the file path
- For mode: after the script saves the raw file, follow the speaker identification workflow below to post-process with speaker labels
--speakers
When user only wants a cover image or metadata, running the script with any option will also cache and .
meta.jsonimgs/cover.jpgWhen re-formatting the same video (e.g., first text then SRT), the cached data is reused — no re-fetch needed.
当用户提供YouTube URL并需要转录文本时:
- 若用户未指定语言,先运行模式查看可用选项
--list - 运行脚本时务必用单引号包裹URL —— zsh会将视为通配符,未加引号的YouTube URL会导致“未找到匹配项”错误:请使用
?'https://www.youtube.com/watch?v=ID' - 默认建议使用参数以获取最丰富的输出(章节划分 + 说话人识别)
--chapters --speakers - 脚本会自动保存缓存数据和输出文件,并打印文件路径
- 若使用模式:脚本保存原始文件后,按照以下说话人识别流程进行后处理以添加说话人标签
--speakers
当用户仅需要封面图片或元数据时,运行任意参数的脚本都会缓存和。
meta.jsonimgs/cover.jpg当重新格式化同一视频(例如先生成文本格式再生成SRT格式)时,会复用缓存数据——无需重新获取。
Chapter & Speaker Workflow
章节与说话人识别流程
Chapters (--chapters
)
--chapters章节划分(--chapters
)
--chaptersThe script parses chapter timestamps from the video description (e.g., ), segments the transcript by chapter boundaries, groups snippets into readable paragraphs, and saves as with a Table of Contents. No further processing needed.
0:00 Introduction.mdIf no chapter timestamps exist in the description, the transcript is output as grouped paragraphs without chapter headings.
脚本从视频描述中解析章节时间戳(例如),按章节边界分割转录文本,将片段分组为易读的段落,并保存为带目录的文件。无需进一步处理。
0:00 Introduction.md若视频描述中无章节时间戳,转录文本将以分组段落形式输出,不带章节标题。
Speaker Identification (--speakers
)
--speakers说话人识别(--speakers
)
--speakersSpeaker identification requires AI processing. The script outputs a raw file containing:
.md- YAML frontmatter with video metadata (title, channel, date, cover, description, language)
- Video description (for speaker name extraction)
- Chapter list from description (if available)
- Raw transcript in SRT format (pre-computed start/end timestamps, token-efficient)
After the script saves the raw file, spawn a sub-agent (use a cheaper model like Sonnet for cost efficiency) to process speaker identification:
- Read the saved file
.md - Read the prompt template at
{baseDir}/prompts/speaker-transcript.md - Process the raw transcript following the prompt:
- Identify speakers using video metadata (title → guest, channel → host, description → names)
- Detect speaker turns from conversation flow, question-answer patterns, and contextual cues
- Segment into chapters (use description chapters if available, else create from topic shifts)
- Format with labels, paragraph grouping (2-4 sentences), and
**Speaker Name:**timestamps[HH:MM:SS → HH:MM:SS]
- Overwrite the file with the processed transcript (keep the YAML frontmatter)
.md
When is used, is implied — the processed output always includes chapter segmentation.
--speakers--chapters说话人识别需要AI处理。脚本会输出一个原始文件,包含:
.md- 带视频元数据(标题、频道、日期、封面、描述、语言)的YAML前置元数据
- 视频描述(用于提取说话人姓名)
- 从描述中获取的章节列表(若存在)
- SRT格式的原始转录文本(预计算的开始/结束时间戳,高效分词)
脚本保存原始文件后,启动子Agent(使用Sonnet等低成本模型以降低成本)进行说话人识别处理:
- 读取已保存的文件
.md - 读取中的提示模板
{baseDir}/prompts/speaker-transcript.md - 按照提示处理原始转录文本:
- 利用视频元数据识别说话人(标题→嘉宾,频道→主持人,描述→姓名)
- 根据对话流程、问答模式和上下文线索检测说话人切换
- 按章节分割(若描述中有章节则使用,否则根据主题变化创建章节)
- 以标签格式输出,将内容分组为2-4句的段落,并添加
**Speaker Name:**时间戳[HH:MM:SS → HH:MM:SS]
- 用处理后的转录文本覆盖原文件(保留YAML前置元数据)
.md
当使用参数时,会自动启用——处理后的输出始终包含章节划分。
--speakers--chaptersError Cases
错误情况
| Error | Meaning |
|---|---|
| Transcripts disabled | Video has no captions at all |
| No transcript found | Requested language not available |
| Video unavailable | Video deleted, private, or region-locked |
| IP blocked | Too many requests, try again later |
| Age restricted | Video requires login for age verification |
| 错误 | 含义 |
|---|---|
| Transcripts disabled | 该视频完全没有字幕 |
| No transcript found | 请求的语言不可用 |
| Video unavailable | 视频已删除、设为私有或受区域限制 |
| IP blocked | 请求次数过多,请稍后重试 |
| Age restricted | 视频需要登录进行年龄验证 |