youtube-transcript

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

YouTube Transcript Skill

YouTube Transcript Skill

Production-grade YouTube transcript extraction with comprehensive format support, intelligent caching, and resilient networking.
生产级YouTube字幕提取工具,支持全面的格式输出、智能缓存和高韧性网络处理。

When to Use

适用场景

USE this skill when:
  • Extracting transcripts from YouTube videos
  • Converting YouTube captions to SRT/VTT subtitle files
  • Analyzing video content via transcripts
  • Creating subtitles for downloaded videos
  • Batch processing multiple video transcripts
  • Needing transcripts in specific languages
  • Processing auto-generated captions
DON'T use this skill when:
  • Transcript not available (disabled by creator)
  • Video is private or age-restricted
  • Livestream that hasn't ended
  • Need speech-to-text from audio → Use transcribe
  • Need video frames → Use video-frames
推荐使用本工具的场景:
  • 提取YouTube视频的字幕
  • 将YouTube字幕转换为SRT/VTT字幕文件
  • 通过字幕分析视频内容
  • 为下载的视频生成字幕
  • 批量处理多个视频的字幕
  • 需要特定语言的字幕
  • 处理自动生成的字幕
不推荐使用本工具的场景:
  • 视频创作者禁用了字幕功能
  • 视频为私有或年龄限制内容
  • 尚未结束的直播视频
  • 需要将音频转换为文本 → 使用转写工具
  • 需要提取视频帧 → 使用视频帧工具

Prerequisites

前置要求

bash
undefined
bash
undefined

Requires Node.js (already available)

需要Node.js(已预装)

node --version
node --version

No additional dependencies required

无需额外依赖

undefined
undefined

Commands

命令说明

Basic Usage

基础用法

bash
undefined
bash
undefined

Extract transcript with video ID

通过视频ID提取字幕

{baseDir}/youtube-transcript.js VIDEO_ID
{baseDir}/youtube-transcript.js VIDEO_ID

Extract with full URL

通过完整URL提取

{baseDir}/youtube-transcript.js "https://www.youtube.com/watch?v=VIDEO_ID"
{baseDir}/youtube-transcript.js "https://www.youtube.com/watch?v=VIDEO_ID"

Extract with short URL

通过短链接提取

{baseDir}/youtube-transcript.js "https://youtu.be/VIDEO_ID"
undefined
{baseDir}/youtube-transcript.js "https://youtu.be/VIDEO_ID"
undefined

Output Formats

输出格式

bash
undefined
bash
undefined

Plain text with timestamps (default)

带时间戳的纯文本(默认格式)

{baseDir}/youtube-transcript.js VIDEO_ID --format text [0:00:00.00] Here is the transcript text [0:00:05.32] More transcript content
{baseDir}/youtube-transcript.js VIDEO_ID --format text [0:00:00.00] Here is the transcript text [0:00:05.32] More transcript content

Plain text without timestamps

无时间戳的纯文本

{baseDir}/youtube-transcript.js VIDEO_ID --format plain Here is the transcript text More transcript content
{baseDir}/youtube-transcript.js VIDEO_ID --format plain Here is the transcript text More transcript content

JSON with metadata

带元数据的JSON格式

{baseDir}/youtube-transcript.js VIDEO_ID --format json { "title": "Video Title", "author": "Channel Name", "language": "en", "isAutoGenerated": false, "transcript": [...] }
{baseDir}/youtube-transcript.js VIDEO_ID --format json { "title": "Video Title", "author": "Channel Name", "language": "en", "isAutoGenerated": false, "transcript": [...] }

SRT subtitle format

SRT字幕格式

{baseDir}/youtube-transcript.js VIDEO_ID --format srt > video.srt 1 00:00:00,000 --> 00:00:05,320 Here is the transcript text
2 00:00:05,320 --> 00:00:08,150 More transcript content
{baseDir}/youtube-transcript.js VIDEO_ID --format srt > video.srt 1 00:00:00,000 --> 00:00:05,320 Here is the transcript text
2 00:00:05,320 --> 00:00:08,150 More transcript content

VTT subtitle format

VTT字幕格式

{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > video.vtt WEBVTT
1 00:00.000 --> 00:05.320 Here is the transcript text
{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > video.vtt WEBVTT
1 00:00.000 --> 00:05.320 Here is the transcript text

TSV tab-separated values

TSV制表符分隔格式

{baseDir}/youtube-transcript.js VIDEO_ID --format tsv start\tduration\ttext 0.000\t5.320\tHere is the transcript text
{baseDir}/youtube-transcript.js VIDEO_ID --format tsv start\tduration\ttext 0.000\t5.320\tHere is the transcript text

CSV comma-separated values

CSV逗号分隔格式

{baseDir}/youtube-transcript.js VIDEO_ID --format csv start,duration,text 0.000,5.320,"Here is the transcript text"
undefined
{baseDir}/youtube-transcript.js VIDEO_ID --format csv start,duration,text 0.000,5.320,"Here is the transcript text"
undefined

Language Selection

语言选择

bash
undefined
bash
undefined

Auto-select best available (default)

自动选择最佳可用语言(默认)

{baseDir}/youtube-transcript.js VIDEO_ID
{baseDir}/youtube-transcript.js VIDEO_ID

Specific language by code

通过语言代码指定特定语言

{baseDir}/youtube-transcript.js VIDEO_ID --language en {baseDir}/youtube-transcript.js VIDEO_ID --language es {baseDir}/youtube-transcript.js VIDEO_ID --language fr
{baseDir}/youtube-transcript.js VIDEO_ID --language en {baseDir}/youtube-transcript.js VIDEO_ID --language es {baseDir}/youtube-transcript.js VIDEO_ID --language fr

Partial matches work too

部分匹配也生效

{baseDir}/youtube-transcript.js VIDEO_ID --language zh # Matches zh-CN, zh-TW, etc.
{baseDir}/youtube-transcript.js VIDEO_ID --language zh # 匹配zh-CN、zh-TW等

Language with auto-generated preference

指定语言并优先选择自动生成字幕

{baseDir}/youtube-transcript.js VIDEO_ID --language ja --format srt

**Common Language Codes:**
| Code | Language |
|------|----------|
| en | English |
| es | Spanish |
| fr | French |
| de | German |
| ja | Japanese |
| ko | Korean |
| zh | Chinese |
| pt | Portuguese |
| ru | Russian |
| hi | Hindi |
| ar | Arabic |
| it | Italian |
{baseDir}/youtube-transcript.js VIDEO_ID --language ja --format srt

**常用语言代码:**
| 代码 | 语言 |
|------|----------|
| en | 英语 |
| es | 西班牙语 |
| fr | 法语 |
| de | 德语 |
| ja | 日语 |
| ko | 韩语 |
| zh | 中文 |
| pt | 葡萄牙语 |
| ru | 俄语 |
| hi | 印地语 |
| ar | 阿拉伯语 |
| it | 意大利语 |

Save to File

保存到文件

bash
undefined
bash
undefined

Save transcript directly to file

直接将字幕保存到文件

{baseDir}/youtube-transcript.js VIDEO_ID --output transcript.txt {baseDir}/youtube-transcript.js VIDEO_ID --format srt --output subtitles.srt {baseDir}/youtube-transcript.js VIDEO_ID --format json --output data.json
{baseDir}/youtube-transcript.js VIDEO_ID --output transcript.txt {baseDir}/youtube-transcript.js VIDEO_ID --format srt --output subtitles.srt {baseDir}/youtube-transcript.js VIDEO_ID --format json --output data.json

Shell redirection (equivalent)

Shell重定向(效果相同)

{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > captions.vtt
undefined
{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > captions.vtt
undefined

Advanced Options

高级选项

bash
undefined
bash
undefined

Skip cache (force fresh fetch)

跳过缓存(强制重新获取)

{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache

Verbose debugging output

详细调试输出

DEBUG=1 {baseDir}/youtube-transcript.js VIDEO_ID
DEBUG=1 {baseDir}/youtube-transcript.js VIDEO_ID

Combine options

组合选项

{baseDir}/youtube-transcript.js VIDEO_ID --language es --format srt --output spanish.srt --no-cache
undefined
{baseDir}/youtube-transcript.js VIDEO_ID --language es --format srt --output spanish.srt --no-cache
undefined

Features

功能特性

Format Comparison

格式对比

FormatUse CaseHuman ReadableMachine Readable
text
Default viewing⚠️
plain
Content only⚠️
json
API integration⚠️
srt
Subtitle files
vtt
Web captions
tsv
Spreadsheet import⚠️
csv
Database import⚠️
格式适用场景可读性机器兼容性
text
默认查看⚠️
plain
仅提取内容⚠️
json
API集成⚠️
srt
字幕文件
vtt
网页字幕
tsv
导入电子表格⚠️
csv
导入数据库⚠️

Supported Video URL Formats

支持的视频URL格式

undefined
undefined

Plain video ID (11 characters)

纯视频ID(11个字符)

EBw7gsDPAYQ
EBw7gsDPAYQ

Standard YouTube URL

标准YouTube URL

Short youtu.be URL

短链接youtu.be

Embed URL

嵌入URL

YouTube Live URL

YouTube直播URL

URLs with additional parameters (automatically handled)

带额外参数的URL(自动处理)

Playlist URLs (extracts first video)

播放列表URL(提取第一个视频)

Intelligent Caching

智能缓存

The skill implements intelligent caching to improve performance:
  • Cache Location:
    /tmp/youtube-transcript-cache/
  • TTL: 24 hours per entry
  • Max Entries: 100 videos
  • Benefits:
    • Instant retrieval of previously fetched transcripts
    • Reduced load on YouTube servers
    • Better performance for repeated operations
Cache Bypass:
bash
undefined
本工具实现了智能缓存以提升性能:
  • 缓存位置:
    /tmp/youtube-transcript-cache/
  • 过期时间: 每条缓存24小时
  • 最大缓存条目: 100个视频
  • 优势:
    • 即时获取已提取过的字幕
    • 减少对YouTube服务器的请求压力
    • 重复操作时性能更优
跳过缓存:
bash
undefined

Force fresh fetch

强制重新获取

{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
undefined
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
undefined

Rate Limiting

速率限制

To avoid being blocked by YouTube:
  • Max 60 requests per minute
  • Minimum 1 second delay between requests
  • Exponential backoff on retries
为避免被YouTube限制:
  • 每分钟最多60次请求
  • 请求间隔至少1秒
  • 重试时采用指数退避策略

Retry Logic

重试逻辑

When requests fail:
  1. First attempt
  2. Wait 2 seconds, retry
  3. Wait 4 seconds, retry
  4. Wait 6 seconds, retry
  5. Final error reported
当请求失败时:
  1. 首次尝试
  2. 等待2秒后重试
  3. 等待4秒后重试
  4. 等待6秒后重试
  5. 最终报告错误

Error Handling

错误处理

Error Codes

错误代码

CodeNameDescriptionResolution
0SUCCESSTranscript fetchedNone needed
1INVALID_VIDEO_IDBad URL/ID format double-check the video ID
2VIDEO_NOT_FOUNDVideo doesn't existVerify video exists
3TRANSCRIPT_DISABLEDCreator disabled captionsContact creator
4NO_TRANSCRIPTNo captions availableWait for transcript
5VIDEO_UNAVAILABLECan't accessCheck restrictions
6PRIVATE_VIDEOVideo is privateGet access/permission
7RATE_LIMITEDToo many requestsWait before retry
8NETWORK_ERRORConnection issueCheck internet
9PARSE_ERRORData extraction failedTry again
99UNKNOWNUnexpected errorReport issue
代码名称描述解决方法
0SUCCESS字幕提取成功无需操作
1INVALID_VIDEO_IDURL/ID格式错误检查视频ID
2VIDEO_NOT_FOUND视频不存在验证视频是否存在
3TRANSCRIPT_DISABLED创作者禁用了字幕联系创作者
4NO_TRANSCRIPT无可用字幕等待字幕生成
5VIDEO_UNAVAILABLE无法访问视频检查视频限制
6PRIVATE_VIDEO视频为私有获取访问权限
7RATE_LIMITED请求过于频繁等待后重试
8NETWORK_ERROR网络连接问题检查网络
9PARSE_ERROR数据提取失败重试操作
99UNKNOWN未知错误反馈问题

Common Errors and Solutions

常见错误及解决方案

"Could not extract player data"
  • YouTube may have changed their page structure
  • The video may be age-restricted
  • The video may require login
  • Solution: Try again later or check if video is publicly accessible
"No captions available for this video"
  • Creator hasn't added captions
  • Auto-generated captions aren't ready (may take a few hours after upload)
  • Video is too new
  • Solution: Wait for YouTube to generate captions, or check if manual captions exist
"Rate limited by YouTube"
  • Too many requests in short period
  • Solution: Wait 1-2 minutes before retrying
"Transcript too long"
  • Video exceeds 500K characters
  • Solution: Use
    --format json
    which handles large transcripts better
"Video unavailable or not found"
  • Video removed or never existed
  • Region-restricted
  • Solution: Verify video ID/URL is correct
"Could not extract player data"
  • YouTube可能更新了页面结构
  • 视频可能存在年龄限制
  • 视频需要登录才能访问
  • 解决方法:稍后重试或检查视频是否公开可访问
"No captions available for this video"
  • 创作者未添加字幕
  • 自动生成字幕尚未准备好(上传后可能需要数小时)
  • 视频过于新颖
  • 解决方法:等待YouTube生成字幕,或检查是否有手动添加的字幕
"Rate limited by YouTube"
  • 短时间内请求次数过多
  • 解决方法:等待1-2分钟后重试
"Transcript too long"
  • 视频字幕超过500K字符
  • 解决方法:使用
    --format json
    格式,该格式更适合处理长字幕
"Video unavailable or not found"
  • 视频已被删除或从未存在
  • 视频存在区域限制
  • 解决方法:验证视频ID/URL是否正确

Technical Architecture

技术架构

Data Flow

数据流

Video ID/URL
Extract Video ID ← URL parser (7+ formats)
Check Cache ← 24hr TTL store
    ↓[cache miss]
Fetch YouTube Page ← HTTP with retry logic
Extract Player Data ← ytInitialPlayerResponse
Parse Caption Tracks ← Language selection
Fetch Transcript ← Select appropriate URL
Parse Entries ← XML/JSON parsing
Format Output ← 7 output formats
Cache & Return ← Store for 24hr
视频ID/URL
提取视频ID ← URL解析器(支持7+种格式)
检查缓存 ← 24小时过期存储
    ↓[缓存未命中]
获取YouTube页面 ← 带重试逻辑的HTTP请求
提取播放器数据 ← ytInitialPlayerResponse
解析字幕轨道 ← 语言选择
获取字幕 ← 选择合适的URL
解析条目 ← XML/JSON解析
格式化输出 ← 7种输出格式
缓存并返回 ← 存储24小时

Player Data Extraction

播放器数据提取

Extracts multiple potential sources:
  1. ytInitialPlayerResponse
    JavaScript variable
  2. playerResponse
    JSON in script tags
  3. Caption tracks from various locations
提取多个潜在数据源:
  1. ytInitialPlayerResponse
    JavaScript变量
  2. 脚本标签中的
    playerResponse
    JSON数据
  3. 来自不同位置的字幕轨道

Transcript Parsing

字幕解析

Supports multiple formats:
  1. JSON API Response: Modern format
  2. Timed Text XML: Legacy format
  3. Alternative XML: Older structure
  4. Special handling for: Auto-generated vs manual captions
支持多种格式:
  1. JSON API响应: 现代格式
  2. Timed Text XML: 旧版格式
  3. 替代XML: 更早的结构
  4. 特殊处理: 自动生成字幕与手动字幕的区分

Data Unescaping

数据转义处理

Properly handles:
  • &
    &
  • <
    <
  • &gt;
    >
  • &quot;
    "
  • &#39;
    /
    &#039;
    /
    &apos;
    '
  • Whitespace normalization
正确处理以下转义:
  • &amp;
    &
  • &lt;
    <
  • &gt;
    >
  • &quot;
    "
  • &#39;
    /
    &#039;
    /
    &apos;
    '
  • 空白字符标准化

Sample Output

示例输出

JSON Format (Full)

JSON格式(完整)

json
{
  "title": "How Artificial Intelligence Works",
  "author": "Example Channel",
  "duration": "PT10M32S",
  "language": "en",
  "isAutoGenerated": true,
  "transcript": [
    {
      "start": 0.000,
      "duration": 5.320,
      "text": "In this video, we'll explore how AI systems learn and adapt"
    },
    {
      "start": 5.320,
      "duration": 4.180,
      "text": "to perform tasks that traditionally required human intelligence"
    }
  ],
  "word_count": 2847,
  "total_entries": 156
}
json
{
  "title": "How Artificial Intelligence Works",
  "author": "Example Channel",
  "duration": "PT10M32S",
  "language": "en",
  "isAutoGenerated": true,
  "transcript": [
    {
      "start": 0.000,
      "duration": 5.320,
      "text": "In this video, we'll explore how AI systems learn and adapt"
    },
    {
      "start": 5.320,
      "duration": 4.180,
      "text": "to perform tasks that traditionally required human intelligence"
    }
  ],
  "word_count": 2847,
  "total_entries": 156
}

SRT Format (SubRip)

SRT格式(SubRip)

srt
1
00:00:00,000 --> 00:00:05,320
In this video, we'll explore how AI systems
learn and adapt

2
00:00:05,320 --> 00:00:09,500
to perform tasks that traditionally
required human intelligence

3
00:00:09,500 --> 00:00:13,240
This process is called
machine learning

...
srt
1
00:00:00,000 --> 00:00:05,320
In this video, we'll explore how AI systems
learn and adapt

2
00:00:05,320 --> 00:00:09,500
to perform tasks that traditionally
required human intelligence

3
00:00:09,500 --> 00:00:13,240
This process is called
machine learning

...

VTT Format (WebVTT)

VTT格式(WebVTT)

vtt
WEBVTT

1
00:00.000 --> 00:05.320
In this video, we'll explore how AI systems
learn and adapt

2
00:05.320 --> 00:09.500
to perform tasks that traditionally
required human intelligence

...
vtt
WEBVTT

1
00:00.000 --> 00:05.320
In this video, we'll explore how AI systems
learn and adapt

2
00:05.320 --> 00:09.500
to perform tasks that traditionally
required human intelligence

...

Examples

示例脚本

Download Transcripts for Playlist

下载播放列表的所有字幕

bash
#!/bin/bash
bash
#!/bin/bash

Process multiple videos from IDs file

处理ID文件中的多个视频

for video_id in $(cat video_ids.txt); do echo "Processing: $video_id"
{baseDir}/youtube-transcript.js "$video_id" --format srt --output "transcripts/${video_id}.srt" 2>/dev/null
if [ $? -eq 0 ]; then echo " ✓ Success" else echo " ✗ Failed" fi

Sleep to respect rate limits

sleep 2 done
undefined
for video_id in $(cat video_ids.txt); do echo "Processing: $video_id"
{baseDir}/youtube-transcript.js "$video_id" --format srt --output "transcripts/${video_id}.srt" 2>/dev/null
if [ $? -eq 0 ]; then echo " ✓ 成功" else echo " ✗ 失败" fi

休眠以遵守速率限制

sleep 2 done
undefined

Convert to PDF for Reading

转换为PDF以便阅读

bash
#!/bin/bash
VIDEO_ID="EBw7gsDPAYQ"
bash
#!/bin/bash
VIDEO_ID="EBw7gsDPAYQ"

Get transcript

获取字幕

{baseDir}/youtube-transcript.js "$VIDEO_ID" --format plain > transcript.txt
{baseDir}/youtube-transcript.js "$VIDEO_ID" --format plain > transcript.txt

Convert to PDF (requires pandoc)

转换为PDF(需要pandoc)

pandoc transcript.txt -o transcript.pdf echo "PDF created: transcript.pdf"
undefined
pandoc transcript.txt -o transcript.pdf echo "PDF已生成: transcript.pdf"
undefined

Analyze Word Counts

分析词频统计

bash
#!/bin/bash
VIDEO_ID="EBw7gsDPAYQ"
bash
#!/bin/bash
VIDEO_ID="EBw7gsDPAYQ"

Get JSON format

获取JSON格式字幕

{baseDir}/youtube-transcript.js "$VIDEO_ID" --format json | jq -r ' "Title: (.title)", "Author: (.author)", "Words: (.word_count)", "Entries: (.total_entries)", "Language: (.language)(.isAutoGenerated ? " (auto)" : "")" '
undefined
{baseDir}/youtube-transcript.js "$VIDEO_ID" --format json | jq -r ' "标题: (.title)", "作者: (.author)", "词数: (.word_count)", "条目数: (.total_entries)", "语言: (.language)(.isAutoGenerated ? " (自动生成)" : "")" '
undefined

Batch Download with Progress

批量下载并显示进度

bash
#!/bin/bash
VIDEOS=("VIDEO1" "VIDEO2" "VIDEO3")
TOTAL=${#VIDEOS[@]}

for i in "${!VIDEOS[@]}"; do
  id="${VIDEOS[$i]}"
  echo "[$((i+1))/$TOTAL] Processing $id..."
  
  {baseDir}/youtube-transcript.js "$id" --format json --output "data/${id}.json" 2>/dev/null
  
  sleep 1  # Rate limit protection
done
bash
#!/bin/bash
VIDEOS=("VIDEO1" "VIDEO2" "VIDEO3")
TOTAL=${#VIDEOS[@]}

for i in "${!VIDEOS[@]}"; do
  id="${VIDEOS[$i]}"
  echo "[$((i+1))/$TOTAL] 处理 $id..."
  
  {baseDir}/youtube-transcript.js "$id" --format json --output "data/${id}.json" 2>/dev/null
  
  sleep 1  # 速率限制保护
done

Create Bilingual Subtitles

创建双语字幕

bash
#!/bin/bash
VIDEO_ID="your-video-id"
bash
#!/bin/bash
VIDEO_ID="your-video-id"

Get English and Spanish

获取英语和西班牙语字幕

{baseDir}/youtube-transcript.js "$VIDEO_ID" --language en --format srt > english.srt echo "English ✓"
{baseDir}/youtube-transcript.js "$VIDEO_ID" --language es --format srt > spanish.srt echo "Spanish ✓"
{baseDir}/youtube-transcript.js "$VIDEO_ID" --language en --format srt > english.srt echo "英语字幕 ✓"
{baseDir}/youtube-transcript.js "$VIDEO_ID" --language es --format srt > spanish.srt echo "西班牙语字幕 ✓"

Combine (requires ffmpeg)

合并字幕(需要ffmpeg)

ffmpeg -i video.mp4 -i english.srt -i spanish.srt
-map 0:v -map 0:a -map 1:s:0 -map 2:s:0
-c:v copy -c:a copy -c:s mov_text
"${VIDEO_ID}_bilingual.mp4"
echo "Bilingual video created ✓"
undefined
ffmpeg -i video.mp4 -i english.srt -i spanish.srt
-map 0:v -map 0:a -map 1:s:0 -map 2:s:0
-c:v copy -c:a copy -c:s mov_text
"${VIDEO_ID}_bilingual.mp4"
echo "双语视频已生成 ✓"
undefined

Performance Tips

性能优化建议

1. Use Caching

1. 启用缓存

First fetch: ~2-5 seconds
Cached fetch: ~100ms
bash
undefined
首次提取:约2-5秒
缓存提取:约100ms
bash
undefined

First time (slow)

首次提取(较慢)

{baseDir}/youtube-transcript.js VIDEO_ID
{baseDir}/youtube-transcript.js VIDEO_ID

Second time (fast - from cache)

第二次提取(快速,从缓存获取)

{baseDir}/youtube-transcript.js VIDEO_ID
{baseDir}/youtube-transcript.js VIDEO_ID

Force refresh (slow)

强制刷新(较慢)

{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
undefined
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
undefined

2. Batch Processing with Delays

2. 批量处理时添加延迟

bash
undefined
bash
undefined

Bad - might hit rate limits

不推荐 - 可能触发速率限制

for id in $IDS; do {baseDir}/youtube-transcript.js "$id" done
for id in $IDS; do {baseDir}/youtube-transcript.js "$id" done

Good - respects rate limits

推荐 - 遵守速率限制

for id in $IDS; do {baseDir}/youtube-transcript.js "$id" sleep 2 done
undefined
for id in $IDS; do {baseDir}/youtube-transcript.js "$id" sleep 2 done
undefined

3. Parallel Processing (Limited)

3. 有限并行处理

bash
undefined
bash
undefined

Process 2-3 at a time (don't exceed rate limit)

同时处理2-3个视频(不要超过速率限制)

{baseDir}/youtube-transcript.js VIDEO1 & {baseDir}/youtube-transcript.js VIDEO2 & {baseDir}/youtube-transcript.js VIDEO3 & wait
undefined
{baseDir}/youtube-transcript.js VIDEO1 & {baseDir}/youtube-transcript.js VIDEO2 & {baseDir}/youtube-transcript.js VIDEO3 & wait
undefined

4. Output Format Selection

4. 选择合适的输出格式

  • Fastest:
    plain
    (smallest output, fastest write)
  • Recommended:
    text
    or
    json
    (balanced)
  • For subtitles:
    srt
    or
    vtt
    (industry standard)
  • 最快:
    plain
    (输出最小,写入最快)
  • 推荐:
    text
    json
    (平衡可读性和机器兼容性)
  • 字幕场景:
    srt
    vtt
    (行业标准)

Limitations

限制

  1. No Private Videos: Requires public access
  2. No Age-Restricted: Some videos unavailable
  3. No Members-Only: Requires YouTube membership
  4. Livestream Lag: Captions may be delayed
  5. New Videos: Auto-generated captions take time
  6. Rate Limits: Max 60 requests/minute
  7. Large Transcripts: Limited to 500K characters
  1. 不支持私有视频: 需要公开访问权限
  2. 不支持年龄限制视频: 部分视频无法访问
  3. 不支持会员专属视频: 需要YouTube会员权限
  4. 直播延迟: 字幕可能存在延迟
  5. 新视频限制: 自动生成字幕需要时间生成
  6. 速率限制: 每分钟最多60次请求
  7. 长字幕限制: 最多支持500K字符

Notes

注意事项

  • Cached transcripts expire after 24 hours
  • Auto-generated captions may have errors
  • Manual captions are preferred when available
  • Language codes follow YouTube's internal format
  • SRT format uses comma for milliseconds (WebVTT uses period)
  • TSV and CSV formats are UTF-8 encoded
  • JSON output includes metadata for programmatic use
  • Script is network-resilient with automatic retries
  • Use
    --output
    to save directly to file (handles special characters)
  • STDERR contains progress messages and metadata
  • STDOUT contains the actual transcript data
  • 缓存的字幕24小时后过期
  • 自动生成的字幕可能存在错误
  • 优先选择手动添加的字幕
  • 语言代码遵循YouTube内部格式
  • SRT格式使用逗号分隔毫秒(WebVTT使用句号)
  • TSV和CSV格式为UTF-8编码
  • JSON输出包含元数据,适合程序化调用
  • 脚本具备网络韧性,支持自动重试
  • 使用
    --output
    参数直接保存到文件(支持特殊字符)
  • STDERR包含进度信息和元数据
  • STDOUT包含实际的字幕数据