youtube-transcript
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseYouTube Transcript Skill
YouTube Transcript Skill
Production-grade YouTube transcript extraction with comprehensive format support, intelligent caching, and resilient networking.
生产级YouTube字幕提取工具,支持全面的格式输出、智能缓存和高韧性网络处理。
When to Use
适用场景
✅ USE this skill when:
- Extracting transcripts from YouTube videos
- Converting YouTube captions to SRT/VTT subtitle files
- Analyzing video content via transcripts
- Creating subtitles for downloaded videos
- Batch processing multiple video transcripts
- Needing transcripts in specific languages
- Processing auto-generated captions
❌ DON'T use this skill when:
- Transcript not available (disabled by creator)
- Video is private or age-restricted
- Livestream that hasn't ended
- Need speech-to-text from audio → Use transcribe
- Need video frames → Use video-frames
✅ 推荐使用本工具的场景:
- 提取YouTube视频的字幕
- 将YouTube字幕转换为SRT/VTT字幕文件
- 通过字幕分析视频内容
- 为下载的视频生成字幕
- 批量处理多个视频的字幕
- 需要特定语言的字幕
- 处理自动生成的字幕
❌ 不推荐使用本工具的场景:
- 视频创作者禁用了字幕功能
- 视频为私有或年龄限制内容
- 尚未结束的直播视频
- 需要将音频转换为文本 → 使用转写工具
- 需要提取视频帧 → 使用视频帧工具
Prerequisites
前置要求
bash
undefinedbash
undefinedRequires Node.js (already available)
需要Node.js(已预装)
node --version
node --version
No additional dependencies required
无需额外依赖
undefinedundefinedCommands
命令说明
Basic Usage
基础用法
bash
undefinedbash
undefinedExtract transcript with video ID
通过视频ID提取字幕
{baseDir}/youtube-transcript.js VIDEO_ID
{baseDir}/youtube-transcript.js VIDEO_ID
Extract with full URL
通过完整URL提取
{baseDir}/youtube-transcript.js "https://www.youtube.com/watch?v=VIDEO_ID"
{baseDir}/youtube-transcript.js "https://www.youtube.com/watch?v=VIDEO_ID"
Extract with short URL
通过短链接提取
{baseDir}/youtube-transcript.js "https://youtu.be/VIDEO_ID"
undefined{baseDir}/youtube-transcript.js "https://youtu.be/VIDEO_ID"
undefinedOutput Formats
输出格式
bash
undefinedbash
undefinedPlain text with timestamps (default)
带时间戳的纯文本(默认格式)
{baseDir}/youtube-transcript.js VIDEO_ID --format text
[0:00:00.00] Here is the transcript text
[0:00:05.32] More transcript content
{baseDir}/youtube-transcript.js VIDEO_ID --format text
[0:00:00.00] Here is the transcript text
[0:00:05.32] More transcript content
Plain text without timestamps
无时间戳的纯文本
{baseDir}/youtube-transcript.js VIDEO_ID --format plain
Here is the transcript text More transcript content
{baseDir}/youtube-transcript.js VIDEO_ID --format plain
Here is the transcript text More transcript content
JSON with metadata
带元数据的JSON格式
{baseDir}/youtube-transcript.js VIDEO_ID --format json
{
"title": "Video Title",
"author": "Channel Name",
"language": "en",
"isAutoGenerated": false,
"transcript": [...]
}
{baseDir}/youtube-transcript.js VIDEO_ID --format json
{
"title": "Video Title",
"author": "Channel Name",
"language": "en",
"isAutoGenerated": false,
"transcript": [...]
}
SRT subtitle format
SRT字幕格式
{baseDir}/youtube-transcript.js VIDEO_ID --format srt > video.srt
1
00:00:00,000 --> 00:00:05,320
Here is the transcript text
2
00:00:05,320 --> 00:00:08,150
More transcript content
{baseDir}/youtube-transcript.js VIDEO_ID --format srt > video.srt
1
00:00:00,000 --> 00:00:05,320
Here is the transcript text
2
00:00:05,320 --> 00:00:08,150
More transcript content
VTT subtitle format
VTT字幕格式
{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > video.vtt
WEBVTT
1
00:00.000 --> 00:05.320
Here is the transcript text
{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > video.vtt
WEBVTT
1
00:00.000 --> 00:05.320
Here is the transcript text
TSV tab-separated values
TSV制表符分隔格式
{baseDir}/youtube-transcript.js VIDEO_ID --format tsv
start\tduration\ttext
0.000\t5.320\tHere is the transcript text
{baseDir}/youtube-transcript.js VIDEO_ID --format tsv
start\tduration\ttext
0.000\t5.320\tHere is the transcript text
CSV comma-separated values
CSV逗号分隔格式
{baseDir}/youtube-transcript.js VIDEO_ID --format csv
start,duration,text
0.000,5.320,"Here is the transcript text"
undefined{baseDir}/youtube-transcript.js VIDEO_ID --format csv
start,duration,text
0.000,5.320,"Here is the transcript text"
undefinedLanguage Selection
语言选择
bash
undefinedbash
undefinedAuto-select best available (default)
自动选择最佳可用语言(默认)
{baseDir}/youtube-transcript.js VIDEO_ID
{baseDir}/youtube-transcript.js VIDEO_ID
Specific language by code
通过语言代码指定特定语言
{baseDir}/youtube-transcript.js VIDEO_ID --language en
{baseDir}/youtube-transcript.js VIDEO_ID --language es
{baseDir}/youtube-transcript.js VIDEO_ID --language fr
{baseDir}/youtube-transcript.js VIDEO_ID --language en
{baseDir}/youtube-transcript.js VIDEO_ID --language es
{baseDir}/youtube-transcript.js VIDEO_ID --language fr
Partial matches work too
部分匹配也生效
{baseDir}/youtube-transcript.js VIDEO_ID --language zh # Matches zh-CN, zh-TW, etc.
{baseDir}/youtube-transcript.js VIDEO_ID --language zh # 匹配zh-CN、zh-TW等
Language with auto-generated preference
指定语言并优先选择自动生成字幕
{baseDir}/youtube-transcript.js VIDEO_ID --language ja --format srt
**Common Language Codes:**
| Code | Language |
|------|----------|
| en | English |
| es | Spanish |
| fr | French |
| de | German |
| ja | Japanese |
| ko | Korean |
| zh | Chinese |
| pt | Portuguese |
| ru | Russian |
| hi | Hindi |
| ar | Arabic |
| it | Italian |{baseDir}/youtube-transcript.js VIDEO_ID --language ja --format srt
**常用语言代码:**
| 代码 | 语言 |
|------|----------|
| en | 英语 |
| es | 西班牙语 |
| fr | 法语 |
| de | 德语 |
| ja | 日语 |
| ko | 韩语 |
| zh | 中文 |
| pt | 葡萄牙语 |
| ru | 俄语 |
| hi | 印地语 |
| ar | 阿拉伯语 |
| it | 意大利语 |Save to File
保存到文件
bash
undefinedbash
undefinedSave transcript directly to file
直接将字幕保存到文件
{baseDir}/youtube-transcript.js VIDEO_ID --output transcript.txt
{baseDir}/youtube-transcript.js VIDEO_ID --format srt --output subtitles.srt
{baseDir}/youtube-transcript.js VIDEO_ID --format json --output data.json
{baseDir}/youtube-transcript.js VIDEO_ID --output transcript.txt
{baseDir}/youtube-transcript.js VIDEO_ID --format srt --output subtitles.srt
{baseDir}/youtube-transcript.js VIDEO_ID --format json --output data.json
Shell redirection (equivalent)
Shell重定向(效果相同)
{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > captions.vtt
undefined{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > captions.vtt
undefinedAdvanced Options
高级选项
bash
undefinedbash
undefinedSkip cache (force fresh fetch)
跳过缓存(强制重新获取)
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
Verbose debugging output
详细调试输出
DEBUG=1 {baseDir}/youtube-transcript.js VIDEO_ID
DEBUG=1 {baseDir}/youtube-transcript.js VIDEO_ID
Combine options
组合选项
{baseDir}/youtube-transcript.js VIDEO_ID --language es --format srt --output spanish.srt --no-cache
undefined{baseDir}/youtube-transcript.js VIDEO_ID --language es --format srt --output spanish.srt --no-cache
undefinedFeatures
功能特性
Format Comparison
格式对比
| Format | Use Case | Human Readable | Machine Readable |
|---|---|---|---|
| Default viewing | ✅ | ⚠️ |
| Content only | ✅ | ⚠️ |
| API integration | ⚠️ | ✅ |
| Subtitle files | ✅ | ✅ |
| Web captions | ✅ | ✅ |
| Spreadsheet import | ⚠️ | ✅ |
| Database import | ⚠️ | ✅ |
| 格式 | 适用场景 | 可读性 | 机器兼容性 |
|---|---|---|---|
| 默认查看 | ✅ | ⚠️ |
| 仅提取内容 | ✅ | ⚠️ |
| API集成 | ⚠️ | ✅ |
| 字幕文件 | ✅ | ✅ |
| 网页字幕 | ✅ | ✅ |
| 导入电子表格 | ⚠️ | ✅ |
| 导入数据库 | ⚠️ | ✅ |
Supported Video URL Formats
支持的视频URL格式
undefinedundefinedPlain video ID (11 characters)
纯视频ID(11个字符)
EBw7gsDPAYQ
EBw7gsDPAYQ
Standard YouTube URL
标准YouTube URL
Short youtu.be URL
短链接youtu.be
Embed URL
嵌入URL
YouTube Live URL
YouTube直播URL
URLs with additional parameters (automatically handled)
带额外参数的URL(自动处理)
Playlist URLs (extracts first video)
播放列表URL(提取第一个视频)
undefinedundefinedIntelligent Caching
智能缓存
The skill implements intelligent caching to improve performance:
- Cache Location:
/tmp/youtube-transcript-cache/ - TTL: 24 hours per entry
- Max Entries: 100 videos
- Benefits:
- Instant retrieval of previously fetched transcripts
- Reduced load on YouTube servers
- Better performance for repeated operations
Cache Bypass:
bash
undefined本工具实现了智能缓存以提升性能:
- 缓存位置:
/tmp/youtube-transcript-cache/ - 过期时间: 每条缓存24小时
- 最大缓存条目: 100个视频
- 优势:
- 即时获取已提取过的字幕
- 减少对YouTube服务器的请求压力
- 重复操作时性能更优
跳过缓存:
bash
undefinedForce fresh fetch
强制重新获取
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
undefined{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
undefinedRate Limiting
速率限制
To avoid being blocked by YouTube:
- Max 60 requests per minute
- Minimum 1 second delay between requests
- Exponential backoff on retries
为避免被YouTube限制:
- 每分钟最多60次请求
- 请求间隔至少1秒
- 重试时采用指数退避策略
Retry Logic
重试逻辑
When requests fail:
- First attempt
- Wait 2 seconds, retry
- Wait 4 seconds, retry
- Wait 6 seconds, retry
- Final error reported
当请求失败时:
- 首次尝试
- 等待2秒后重试
- 等待4秒后重试
- 等待6秒后重试
- 最终报告错误
Error Handling
错误处理
Error Codes
错误代码
| Code | Name | Description | Resolution |
|---|---|---|---|
| 0 | SUCCESS | Transcript fetched | None needed |
| 1 | INVALID_VIDEO_ID | Bad URL/ID format double-check the video ID | |
| 2 | VIDEO_NOT_FOUND | Video doesn't exist | Verify video exists |
| 3 | TRANSCRIPT_DISABLED | Creator disabled captions | Contact creator |
| 4 | NO_TRANSCRIPT | No captions available | Wait for transcript |
| 5 | VIDEO_UNAVAILABLE | Can't access | Check restrictions |
| 6 | PRIVATE_VIDEO | Video is private | Get access/permission |
| 7 | RATE_LIMITED | Too many requests | Wait before retry |
| 8 | NETWORK_ERROR | Connection issue | Check internet |
| 9 | PARSE_ERROR | Data extraction failed | Try again |
| 99 | UNKNOWN | Unexpected error | Report issue |
| 代码 | 名称 | 描述 | 解决方法 |
|---|---|---|---|
| 0 | SUCCESS | 字幕提取成功 | 无需操作 |
| 1 | INVALID_VIDEO_ID | URL/ID格式错误 | 检查视频ID |
| 2 | VIDEO_NOT_FOUND | 视频不存在 | 验证视频是否存在 |
| 3 | TRANSCRIPT_DISABLED | 创作者禁用了字幕 | 联系创作者 |
| 4 | NO_TRANSCRIPT | 无可用字幕 | 等待字幕生成 |
| 5 | VIDEO_UNAVAILABLE | 无法访问视频 | 检查视频限制 |
| 6 | PRIVATE_VIDEO | 视频为私有 | 获取访问权限 |
| 7 | RATE_LIMITED | 请求过于频繁 | 等待后重试 |
| 8 | NETWORK_ERROR | 网络连接问题 | 检查网络 |
| 9 | PARSE_ERROR | 数据提取失败 | 重试操作 |
| 99 | UNKNOWN | 未知错误 | 反馈问题 |
Common Errors and Solutions
常见错误及解决方案
"Could not extract player data"
- YouTube may have changed their page structure
- The video may be age-restricted
- The video may require login
- Solution: Try again later or check if video is publicly accessible
"No captions available for this video"
- Creator hasn't added captions
- Auto-generated captions aren't ready (may take a few hours after upload)
- Video is too new
- Solution: Wait for YouTube to generate captions, or check if manual captions exist
"Rate limited by YouTube"
- Too many requests in short period
- Solution: Wait 1-2 minutes before retrying
"Transcript too long"
- Video exceeds 500K characters
- Solution: Use which handles large transcripts better
--format json
"Video unavailable or not found"
- Video removed or never existed
- Region-restricted
- Solution: Verify video ID/URL is correct
"Could not extract player data"
- YouTube可能更新了页面结构
- 视频可能存在年龄限制
- 视频需要登录才能访问
- 解决方法:稍后重试或检查视频是否公开可访问
"No captions available for this video"
- 创作者未添加字幕
- 自动生成字幕尚未准备好(上传后可能需要数小时)
- 视频过于新颖
- 解决方法:等待YouTube生成字幕,或检查是否有手动添加的字幕
"Rate limited by YouTube"
- 短时间内请求次数过多
- 解决方法:等待1-2分钟后重试
"Transcript too long"
- 视频字幕超过500K字符
- 解决方法:使用格式,该格式更适合处理长字幕
--format json
"Video unavailable or not found"
- 视频已被删除或从未存在
- 视频存在区域限制
- 解决方法:验证视频ID/URL是否正确
Technical Architecture
技术架构
Data Flow
数据流
Video ID/URL
↓
Extract Video ID ← URL parser (7+ formats)
↓
Check Cache ← 24hr TTL store
↓[cache miss]
Fetch YouTube Page ← HTTP with retry logic
↓
Extract Player Data ← ytInitialPlayerResponse
↓
Parse Caption Tracks ← Language selection
↓
Fetch Transcript ← Select appropriate URL
↓
Parse Entries ← XML/JSON parsing
↓
Format Output ← 7 output formats
↓
Cache & Return ← Store for 24hr视频ID/URL
↓
提取视频ID ← URL解析器(支持7+种格式)
↓
检查缓存 ← 24小时过期存储
↓[缓存未命中]
获取YouTube页面 ← 带重试逻辑的HTTP请求
↓
提取播放器数据 ← ytInitialPlayerResponse
↓
解析字幕轨道 ← 语言选择
↓
获取字幕 ← 选择合适的URL
↓
解析条目 ← XML/JSON解析
↓
格式化输出 ← 7种输出格式
↓
缓存并返回 ← 存储24小时Player Data Extraction
播放器数据提取
Extracts multiple potential sources:
- JavaScript variable
ytInitialPlayerResponse - JSON in script tags
playerResponse - Caption tracks from various locations
提取多个潜在数据源:
- JavaScript变量
ytInitialPlayerResponse - 脚本标签中的JSON数据
playerResponse - 来自不同位置的字幕轨道
Transcript Parsing
字幕解析
Supports multiple formats:
- JSON API Response: Modern format
- Timed Text XML: Legacy format
- Alternative XML: Older structure
- Special handling for: Auto-generated vs manual captions
支持多种格式:
- JSON API响应: 现代格式
- Timed Text XML: 旧版格式
- 替代XML: 更早的结构
- 特殊处理: 自动生成字幕与手动字幕的区分
Data Unescaping
数据转义处理
Properly handles:
- →
&& - →
<< - →
>> - →
"" - /
'/'→'' - Whitespace normalization
正确处理以下转义:
- →
&& - →
<< - →
>> - →
"" - /
'/'→'' - 空白字符标准化
Sample Output
示例输出
JSON Format (Full)
JSON格式(完整)
json
{
"title": "How Artificial Intelligence Works",
"author": "Example Channel",
"duration": "PT10M32S",
"language": "en",
"isAutoGenerated": true,
"transcript": [
{
"start": 0.000,
"duration": 5.320,
"text": "In this video, we'll explore how AI systems learn and adapt"
},
{
"start": 5.320,
"duration": 4.180,
"text": "to perform tasks that traditionally required human intelligence"
}
],
"word_count": 2847,
"total_entries": 156
}json
{
"title": "How Artificial Intelligence Works",
"author": "Example Channel",
"duration": "PT10M32S",
"language": "en",
"isAutoGenerated": true,
"transcript": [
{
"start": 0.000,
"duration": 5.320,
"text": "In this video, we'll explore how AI systems learn and adapt"
},
{
"start": 5.320,
"duration": 4.180,
"text": "to perform tasks that traditionally required human intelligence"
}
],
"word_count": 2847,
"total_entries": 156
}SRT Format (SubRip)
SRT格式(SubRip)
srt
1
00:00:00,000 --> 00:00:05,320
In this video, we'll explore how AI systems
learn and adapt
2
00:00:05,320 --> 00:00:09,500
to perform tasks that traditionally
required human intelligence
3
00:00:09,500 --> 00:00:13,240
This process is called
machine learning
...srt
1
00:00:00,000 --> 00:00:05,320
In this video, we'll explore how AI systems
learn and adapt
2
00:00:05,320 --> 00:00:09,500
to perform tasks that traditionally
required human intelligence
3
00:00:09,500 --> 00:00:13,240
This process is called
machine learning
...VTT Format (WebVTT)
VTT格式(WebVTT)
vtt
WEBVTT
1
00:00.000 --> 00:05.320
In this video, we'll explore how AI systems
learn and adapt
2
00:05.320 --> 00:09.500
to perform tasks that traditionally
required human intelligence
...vtt
WEBVTT
1
00:00.000 --> 00:05.320
In this video, we'll explore how AI systems
learn and adapt
2
00:05.320 --> 00:09.500
to perform tasks that traditionally
required human intelligence
...Examples
示例脚本
Download Transcripts for Playlist
下载播放列表的所有字幕
bash
#!/bin/bashbash
#!/bin/bashProcess multiple videos from IDs file
处理ID文件中的多个视频
for video_id in $(cat video_ids.txt); do
echo "Processing: $video_id"
{baseDir}/youtube-transcript.js "$video_id" --format srt --output "transcripts/${video_id}.srt" 2>/dev/null
if [ $? -eq 0 ]; then
echo " ✓ Success"
else
echo " ✗ Failed"
fi
Sleep to respect rate limits
sleep 2
done
undefinedfor video_id in $(cat video_ids.txt); do
echo "Processing: $video_id"
{baseDir}/youtube-transcript.js "$video_id" --format srt --output "transcripts/${video_id}.srt" 2>/dev/null
if [ $? -eq 0 ]; then
echo " ✓ 成功"
else
echo " ✗ 失败"
fi
休眠以遵守速率限制
sleep 2
done
undefinedConvert to PDF for Reading
转换为PDF以便阅读
bash
#!/bin/bash
VIDEO_ID="EBw7gsDPAYQ"bash
#!/bin/bash
VIDEO_ID="EBw7gsDPAYQ"Get transcript
获取字幕
{baseDir}/youtube-transcript.js "$VIDEO_ID" --format plain > transcript.txt
{baseDir}/youtube-transcript.js "$VIDEO_ID" --format plain > transcript.txt
Convert to PDF (requires pandoc)
转换为PDF(需要pandoc)
pandoc transcript.txt -o transcript.pdf
echo "PDF created: transcript.pdf"
undefinedpandoc transcript.txt -o transcript.pdf
echo "PDF已生成: transcript.pdf"
undefinedAnalyze Word Counts
分析词频统计
bash
#!/bin/bash
VIDEO_ID="EBw7gsDPAYQ"bash
#!/bin/bash
VIDEO_ID="EBw7gsDPAYQ"Get JSON format
获取JSON格式字幕
{baseDir}/youtube-transcript.js "$VIDEO_ID" --format json | jq -r '
"Title: (.title)",
"Author: (.author)",
"Words: (.word_count)",
"Entries: (.total_entries)",
"Language: (.language)(.isAutoGenerated ? " (auto)" : "")"
'
undefined{baseDir}/youtube-transcript.js "$VIDEO_ID" --format json | jq -r '
"标题: (.title)",
"作者: (.author)",
"词数: (.word_count)",
"条目数: (.total_entries)",
"语言: (.language)(.isAutoGenerated ? " (自动生成)" : "")"
'
undefinedBatch Download with Progress
批量下载并显示进度
bash
#!/bin/bash
VIDEOS=("VIDEO1" "VIDEO2" "VIDEO3")
TOTAL=${#VIDEOS[@]}
for i in "${!VIDEOS[@]}"; do
id="${VIDEOS[$i]}"
echo "[$((i+1))/$TOTAL] Processing $id..."
{baseDir}/youtube-transcript.js "$id" --format json --output "data/${id}.json" 2>/dev/null
sleep 1 # Rate limit protection
donebash
#!/bin/bash
VIDEOS=("VIDEO1" "VIDEO2" "VIDEO3")
TOTAL=${#VIDEOS[@]}
for i in "${!VIDEOS[@]}"; do
id="${VIDEOS[$i]}"
echo "[$((i+1))/$TOTAL] 处理 $id..."
{baseDir}/youtube-transcript.js "$id" --format json --output "data/${id}.json" 2>/dev/null
sleep 1 # 速率限制保护
doneCreate Bilingual Subtitles
创建双语字幕
bash
#!/bin/bash
VIDEO_ID="your-video-id"bash
#!/bin/bash
VIDEO_ID="your-video-id"Get English and Spanish
获取英语和西班牙语字幕
{baseDir}/youtube-transcript.js "$VIDEO_ID" --language en --format srt > english.srt
echo "English ✓"
{baseDir}/youtube-transcript.js "$VIDEO_ID" --language es --format srt > spanish.srt
echo "Spanish ✓"
{baseDir}/youtube-transcript.js "$VIDEO_ID" --language en --format srt > english.srt
echo "英语字幕 ✓"
{baseDir}/youtube-transcript.js "$VIDEO_ID" --language es --format srt > spanish.srt
echo "西班牙语字幕 ✓"
Combine (requires ffmpeg)
合并字幕(需要ffmpeg)
ffmpeg -i video.mp4 -i english.srt -i spanish.srt
-map 0:v -map 0:a -map 1:s:0 -map 2:s:0
-c:v copy -c:a copy -c:s mov_text
"${VIDEO_ID}_bilingual.mp4"
-map 0:v -map 0:a -map 1:s:0 -map 2:s:0
-c:v copy -c:a copy -c:s mov_text
"${VIDEO_ID}_bilingual.mp4"
echo "Bilingual video created ✓"
undefinedffmpeg -i video.mp4 -i english.srt -i spanish.srt
-map 0:v -map 0:a -map 1:s:0 -map 2:s:0
-c:v copy -c:a copy -c:s mov_text
"${VIDEO_ID}_bilingual.mp4"
-map 0:v -map 0:a -map 1:s:0 -map 2:s:0
-c:v copy -c:a copy -c:s mov_text
"${VIDEO_ID}_bilingual.mp4"
echo "双语视频已生成 ✓"
undefinedPerformance Tips
性能优化建议
1. Use Caching
1. 启用缓存
First fetch: ~2-5 seconds
Cached fetch: ~100ms
Cached fetch: ~100ms
bash
undefined首次提取:约2-5秒
缓存提取:约100ms
缓存提取:约100ms
bash
undefinedFirst time (slow)
首次提取(较慢)
{baseDir}/youtube-transcript.js VIDEO_ID
{baseDir}/youtube-transcript.js VIDEO_ID
Second time (fast - from cache)
第二次提取(快速,从缓存获取)
{baseDir}/youtube-transcript.js VIDEO_ID
{baseDir}/youtube-transcript.js VIDEO_ID
Force refresh (slow)
强制刷新(较慢)
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
undefined{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
undefined2. Batch Processing with Delays
2. 批量处理时添加延迟
bash
undefinedbash
undefinedBad - might hit rate limits
不推荐 - 可能触发速率限制
for id in $IDS; do
{baseDir}/youtube-transcript.js "$id"
done
for id in $IDS; do
{baseDir}/youtube-transcript.js "$id"
done
Good - respects rate limits
推荐 - 遵守速率限制
for id in $IDS; do
{baseDir}/youtube-transcript.js "$id"
sleep 2
done
undefinedfor id in $IDS; do
{baseDir}/youtube-transcript.js "$id"
sleep 2
done
undefined3. Parallel Processing (Limited)
3. 有限并行处理
bash
undefinedbash
undefinedProcess 2-3 at a time (don't exceed rate limit)
同时处理2-3个视频(不要超过速率限制)
{baseDir}/youtube-transcript.js VIDEO1 &
{baseDir}/youtube-transcript.js VIDEO2 &
{baseDir}/youtube-transcript.js VIDEO3 &
wait
undefined{baseDir}/youtube-transcript.js VIDEO1 &
{baseDir}/youtube-transcript.js VIDEO2 &
{baseDir}/youtube-transcript.js VIDEO3 &
wait
undefined4. Output Format Selection
4. 选择合适的输出格式
- Fastest: (smallest output, fastest write)
plain - Recommended: or
text(balanced)json - For subtitles: or
srt(industry standard)vtt
- 最快:(输出最小,写入最快)
plain - 推荐:或
text(平衡可读性和机器兼容性)json - 字幕场景:或
srt(行业标准)vtt
Limitations
限制
- No Private Videos: Requires public access
- No Age-Restricted: Some videos unavailable
- No Members-Only: Requires YouTube membership
- Livestream Lag: Captions may be delayed
- New Videos: Auto-generated captions take time
- Rate Limits: Max 60 requests/minute
- Large Transcripts: Limited to 500K characters
- 不支持私有视频: 需要公开访问权限
- 不支持年龄限制视频: 部分视频无法访问
- 不支持会员专属视频: 需要YouTube会员权限
- 直播延迟: 字幕可能存在延迟
- 新视频限制: 自动生成字幕需要时间生成
- 速率限制: 每分钟最多60次请求
- 长字幕限制: 最多支持500K字符
Notes
注意事项
- Cached transcripts expire after 24 hours
- Auto-generated captions may have errors
- Manual captions are preferred when available
- Language codes follow YouTube's internal format
- SRT format uses comma for milliseconds (WebVTT uses period)
- TSV and CSV formats are UTF-8 encoded
- JSON output includes metadata for programmatic use
- Script is network-resilient with automatic retries
- Use to save directly to file (handles special characters)
--output - STDERR contains progress messages and metadata
- STDOUT contains the actual transcript data
- 缓存的字幕24小时后过期
- 自动生成的字幕可能存在错误
- 优先选择手动添加的字幕
- 语言代码遵循YouTube内部格式
- SRT格式使用逗号分隔毫秒(WebVTT使用句号)
- TSV和CSV格式为UTF-8编码
- JSON输出包含元数据,适合程序化调用
- 脚本具备网络韧性,支持自动重试
- 使用参数直接保存到文件(支持特殊字符)
--output - STDERR包含进度信息和元数据
- STDOUT包含实际的字幕数据