youtube-transcript

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

YouTube Transcript Downloader

YouTube字幕下载工具

This skill helps download transcripts (subtitles/captions) from YouTube videos using yt-dlp.

本工具可借助yt-dlp下载YouTube视频的字幕（转录文本/对白）。

When to Use This Skill

适用场景

Activate this skill when the user:

Provides a YouTube URL and wants the transcript
Asks to "download transcript from YouTube"
Wants to "get captions" or "get subtitles" from a video
Asks to "transcribe a YouTube video"
Needs text content from a YouTube video

当用户出现以下需求时，启用本工具：

提供YouTube URL并想要获取字幕
要求“下载YouTube字幕”
想要“获取字幕”或“获取对白文本”
要求“转录YouTube视频”
需要提取YouTube视频中的文本内容

How It Works

工作原理

Priority Order:

优先级顺序：

Check if yt-dlp is installed - install if needed
List available subtitles - see what's actually available
Try manual subtitles first (
```
--write-sub
```
) - highest quality
Fallback to auto-generated (
```
--write-auto-sub
```
) - usually available
Last resort: Whisper transcription - if no subtitles exist (requires user confirmation)
Confirm the download and show the user where the file is saved
Optionally clean up the VTT format if the user wants plain text

检查yt-dlp是否已安装 - 未安装则自动安装
列出可用字幕 - 查看实际可获取的字幕类型
优先尝试手动字幕（
```
--write-sub
```
）- 质量最高
降级到自动生成字幕（
```
--write-auto-sub
```
）- 通常都可获取
最后方案：Whisper转录 - 若无可用字幕（需用户确认）
确认下载完成并告知用户文件保存位置
可选清理：若用户需要纯文本，可清理VTT格式内容

Installation Check

安装检查

IMPORTANT: Always check if yt-dlp is installed first:

bash

which yt-dlp || command -v yt-dlp

重要提示：请始终先检查yt-dlp是否已安装：

bash

which yt-dlp || command -v yt-dlp

If Not Installed

未安装时的处理

Attempt automatic installation based on the system:

macOS (Homebrew):

bash

brew install yt-dlp

Linux (apt/Debian/Ubuntu):

bash

sudo apt update && sudo apt install -y yt-dlp

Alternative (pip - works on all systems):

bash

pip3 install yt-dlp

根据系统尝试自动安装：

macOS（Homebrew）:

bash

brew install yt-dlp

Linux（apt/Debian/Ubuntu）:

bash

sudo apt update && sudo apt install -y yt-dlp

替代方案（pip - 适用于所有系统）:

bash

pip3 install yt-dlp

or

或

python3 -m pip install yt-dlp


**If installation fails**: Inform the user they need to install yt-dlp manually and provide them with installation instructions from https://github.com/yt-dlp/yt-dlp#installation

python3 -m pip install yt-dlp


**若安装失败**：告知用户需手动安装yt-dlp，并提供官方安装指引链接：https://github.com/yt-dlp/yt-dlp#installation

Check Available Subtitles

检查可用字幕

ALWAYS do this first before attempting to download:

bash

yt-dlp --list-subs "YOUTUBE_URL"

This shows what subtitle types are available without downloading anything. Look for:

Manual subtitles (better quality)
Auto-generated subtitles (usually available)
Available languages

请务必先执行此步骤，再尝试下载：

bash

yt-dlp --list-subs "YOUTUBE_URL"

此命令会显示可获取的字幕类型，无需下载视频。需关注：

手动字幕（质量更高）
自动生成字幕（通常都有）
可用的语言选项

Download Strategy

下载策略

Option 1: Manual Subtitles (Preferred)

选项1：手动字幕（优先选择）

Try this first - highest quality, human-created:

bash

yt-dlp --write-sub --skip-download --output "OUTPUT_NAME" "YOUTUBE_URL"

优先尝试此选项，为人工制作的高质量字幕：

bash

yt-dlp --write-sub --skip-download --output "OUTPUT_NAME" "YOUTUBE_URL"

Option 2: Auto-Generated Subtitles (Fallback)

选项2：自动生成字幕（降级方案）

If manual subtitles aren't available:

bash

yt-dlp --write-auto-sub --skip-download --output "OUTPUT_NAME" "YOUTUBE_URL"

Both commands create a

.vtt

file (WebVTT subtitle format).

若手动字幕不可用，则尝试此选项：

bash

yt-dlp --write-auto-sub --skip-download --output "OUTPUT_NAME" "YOUTUBE_URL"

以上两个命令都会生成

.vtt

格式文件（WebVTT字幕格式）。

Option 3: Whisper Transcription (Last Resort)

选项3：Whisper转录（最后方案）

ONLY use this if both manual and auto-generated subtitles are unavailable.

仅当手动和自动字幕均不可用时，才使用此方案。

Step 1: Show File Size and Ask for Confirmation

步骤1：显示文件大小并请求用户确认

bash

undefined

bash

undefined

Get audio file size estimate

估算音频文件大小

yt-dlp --print "%(filesize,filesize_approx)s" -f "bestaudio" "YOUTUBE_URL"

Or get duration to estimate

或通过时长估算

yt-dlp --print "%(duration)s %(title)s" "YOUTUBE_URL"


**IMPORTANT**: Display the file size to the user and ask: "No subtitles are available. I can download the audio (approximately X MB) and transcribe it using Whisper. Would you like to proceed?"

**Wait for user confirmation before continuing.**

yt-dlp --print "%(duration)s %(title)s" "YOUTUBE_URL"


**重要提示**：向用户显示文件大小并询问：“当前视频无可用字幕。我可以下载音频文件（约X MB）并使用Whisper进行转录。是否继续？”

**需等待用户确认后再继续**。

Step 2: Check for Whisper Installation

步骤2：检查Whisper是否已安装

bash

command -v whisper

If not installed, ask user: "Whisper is not installed. Install it with

pip install openai-whisper

(requires ~1-3GB for models)? This is a one-time installation."

Wait for user confirmation before installing.

Install if approved:

bash

pip3 install openai-whisper

bash

command -v whisper

若未安装，询问用户：“Whisper未安装。是否使用

pip install openai-whisper

安装？（模型文件需占用1-3GB存储空间）此为一次性安装。”

需等待用户确认后再安装。

若用户同意，执行安装：

bash

pip3 install openai-whisper

Step 3: Download Audio Only

步骤3：仅下载音频

bash

yt-dlp -x --audio-format mp3 --output "audio_%(id)s.%(ext)s" "YOUTUBE_URL"

bash

yt-dlp -x --audio-format mp3 --output "audio_%(id)s.%(ext)s" "YOUTUBE_URL"

Step 4: Transcribe with Whisper

步骤4：使用Whisper转录

bash

undefined

bash

undefined

Auto-detect language (recommended)

自动检测语言（推荐）

whisper audio_VIDEO_ID.mp3 --model base --output_format vtt

Or specify language if known

若已知语言，可指定语言

whisper audio_VIDEO_ID.mp3 --model base --language en --output_format vtt


**Model Options** (stick to `base` for now):
- `tiny` - fastest, least accurate (~1GB)
- `base` - good balance (~1GB) ← **USE THIS**
- `small` - better accuracy (~2GB)
- `medium` - very good (~5GB)
- `large` - best accuracy (~10GB)

whisper audio_VIDEO_ID.mp3 --model base --language en --output_format vtt


**模型选项**（目前请优先使用`base`）：
- `tiny` - 速度最快，准确率最低（约1GB）
- `base` - 平衡速度与准确率（约1GB）← **推荐使用**
- `small` - 准确率更高（约2GB）
- `medium` - 准确率很高（约5GB）
- `large` - 准确率最高（约10GB）

Step 5: Cleanup

步骤5：清理文件

After transcription completes, ask user: "Transcription complete! Would you like me to delete the audio file to save space?"

If yes:

bash

rm audio_VIDEO_ID.mp3

转录完成后，询问用户：“转录已完成！是否删除音频文件以节省存储空间？”

若用户同意：

bash

rm audio_VIDEO_ID.mp3

Getting Video Information

获取视频信息

Extract Video Title (for filename)

提取视频标题（用于生成文件名）

bash

yt-dlp --print "%(title)s" "YOUTUBE_URL"

Use this to create meaningful filenames based on the video title. Clean the title for filesystem compatibility:

Replace
```
/
```
with
```
-
```
Replace special characters that might cause issues

Consider using sanitized version:

$(yt-dlp --print "%(title)s" "URL" | tr '/' '-' | tr ':' '-')

bash

yt-dlp --print "%(title)s" "YOUTUBE_URL"

可使用视频标题生成有意义的文件名，需先清理标题以适配文件系统要求：

将
```
/
```
替换为
```
-
```
替换可能引发问题的特殊字符

可使用清理后的标题：

$(yt-dlp --print "%(title)s" "URL" | tr '/' '-' | tr ':' '-')

Post-Processing

后处理

Convert to Plain Text (Recommended)

转换为纯文本（推荐）

YouTube's auto-generated VTT files contain duplicate lines because captions are shown progressively with overlapping timestamps. Always deduplicate when converting to plain text while preserving the original speaking order.

bash

python3 -c "
import sys, re
seen = set()
with open('transcript.en.vtt', 'r') as f:
    for line in f:
        line = line.strip()
        if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line:
            clean = re.sub('<[^>]*>', '', line)
            clean = clean.replace('&amp;', '&').replace('&gt;', '>').replace('&lt;', '<')
            if clean and clean not in seen:
                print(clean)
                seen.add(clean)
" > transcript.txt

YouTube自动生成的VTT文件包含重复行，因为字幕会随时间逐步显示，存在时间戳重叠。转换为纯文本时，需去重并保留原始发言顺序。

bash

python3 -c "
import sys, re
seen = set()
with open('transcript.en.vtt', 'r') as f:
    for line in f:
        line = line.strip()
        if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line:
            clean = re.sub('<[^>]*>', '', line)
            clean = clean.replace('&amp;', '&').replace('&gt;', '>').replace('&lt;', '<')
            if clean and clean not in seen:
                print(clean)
                seen.add(clean)
" > transcript.txt

Complete Post-Processing with Video Title

结合视频标题的完整后处理流程

bash

undefined

bash

undefined

Get video title

获取视频标题

VIDEO_TITLE=$(yt-dlp --print "%(title)s" "YOUTUBE_URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '')

Find the VTT file

查找VTT文件

VTT_FILE=$(ls *.vtt | head -n 1)

Convert with deduplication

转换并去重

python3 -c " import sys, re seen = set() with open('$VTT_FILE', 'r') as f: for line in f: line = line.strip() if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line: clean = re.sub('<[^>]*>', '', line) clean = clean.replace('&', '&').replace('>', '>').replace('<', '<') if clean and clean not in seen: print(clean) seen.add(clean) " > "${VIDEO_TITLE}.txt"

echo "✓ Saved to: ${VIDEO_TITLE}.txt"

echo "✓ 已保存至: ${VIDEO_TITLE}.txt"

Clean up VTT file

清理临时VTT文件

rm "$VTT_FILE" echo "✓ Cleaned up temporary VTT file"

undefined

rm "$VTT_FILE" echo "✓ 已清理临时VTT文件"

undefined

Output Formats

输出格式

VTT format (
```
.vtt
```
): Includes timestamps and formatting, good for video players
Plain text (
```
.txt
```
): Just the text content, good for reading or analysis

VTT格式（
```
.vtt
```
）：包含时间戳和格式信息，适合视频播放器使用
纯文本格式（
```
.txt
```
）：仅保留文本内容，适合阅读或分析

Tips

小贴士

The filename will be

{output_name}.{language_code}.vtt

(e.g.,

transcript.en.vtt

)

Most YouTube videos have auto-generated English subtitles
Some videos may have multiple language options
If auto-subtitles aren't available, try
```
--write-sub
```
instead for manual subtitles

文件名格式为

{output_name}.{language_code}.vtt

（例如：

transcript.en.vtt

）

大多数YouTube视频都有自动生成的英文字幕
部分视频支持多种语言字幕
若无自动字幕，可尝试
```
--write-sub
```
选项获取手动字幕

Complete Workflow Example

完整工作流示例

bash

VIDEO_URL="https://www.youtube.com/watch?v=dQw4w9WgXcQ"

bash

VIDEO_URL="https://www.youtube.com/watch?v=dQw4w9WgXcQ"

Get video title for filename

获取视频标题用于生成文件名

VIDEO_TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '') OUTPUT_NAME="transcript_temp"

============================================

STEP 1: Check if yt-dlp is installed

步骤1：检查yt-dlp是否已安装

============================================

if ! command -v yt-dlp &> /dev/null; then echo "yt-dlp not found, attempting to install..." if command -v brew &> /dev/null; then brew install yt-dlp elif command -v apt &> /dev/null; then sudo apt update && sudo apt install -y yt-dlp else pip3 install yt-dlp fi fi

if ! command -v yt-dlp &> /dev/null; then echo "未找到yt-dlp，尝试自动安装..." if command -v brew &> /dev/null; then brew install yt-dlp elif command -v apt &> /dev/null; then sudo apt update && sudo apt install -y yt-dlp else pip3 install yt-dlp fi fi

============================================

STEP 2: List available subtitles

步骤2：列出可用字幕

============================================

echo "Checking available subtitles..." yt-dlp --list-subs "$VIDEO_URL"

echo "正在检查可用字幕..." yt-dlp --list-subs "$VIDEO_URL"

============================================

STEP 3: Try manual subtitles first

步骤3：优先尝试手动字幕

============================================

echo "Attempting to download manual subtitles..." if yt-dlp --write-sub --skip-download --output "$OUTPUT_NAME" "$VIDEO_URL" 2>/dev/null; then echo "✓ Manual subtitles downloaded successfully!" ls -lh ${OUTPUT_NAME}.* else # ============================================ # STEP 4: Fallback to auto-generated # ============================================ echo "Manual subtitles not available. Trying auto-generated..." if yt-dlp --write-auto-sub --skip-download --output "$OUTPUT_NAME" "$VIDEO_URL" 2>/dev/null; then echo "✓ Auto-generated subtitles downloaded successfully!" ls -lh ${OUTPUT_NAME}.* else # ============================================ # STEP 5: Last resort - Whisper transcription # ============================================ echo "⚠ No subtitles available for this video."

    # Get file size
    FILE_SIZE=$(yt-dlp --print "%(filesize_approx)s" -f "bestaudio" "$VIDEO_URL")
    DURATION=$(yt-dlp --print "%(duration)s" "$VIDEO_URL")
    TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL")

    echo "Video: $TITLE"
    echo "Duration: $((DURATION / 60)) minutes"
    echo "Audio size: ~$((FILE_SIZE / 1024 / 1024)) MB"
    echo ""
    echo "Would you like to download and transcribe with Whisper? (y/n)"
    read -r RESPONSE

    if [[ "$RESPONSE" =~ ^[Yy]$ ]]; then
        # Check for Whisper
        if ! command -v whisper &> /dev/null; then
            echo "Whisper not installed. Install now? (requires ~1-3GB) (y/n)"
            read -r INSTALL_RESPONSE
            if [[ "$INSTALL_RESPONSE" =~ ^[Yy]$ ]]; then
                pip3 install openai-whisper
            else
                echo "Cannot proceed without Whisper. Exiting."
                exit 1
            fi
        fi

        # Download audio
        echo "Downloading audio..."
        yt-dlp -x --audio-format mp3 --output "audio_%(id)s.%(ext)s" "$VIDEO_URL"

        # Get the actual audio filename
        AUDIO_FILE=$(ls audio_*.mp3 | head -n 1)

        # Transcribe
        echo "Transcribing with Whisper (this may take a few minutes)..."
        whisper "$AUDIO_FILE" --model base --output_format vtt

        # Cleanup
        echo "Transcription complete! Delete audio file? (y/n)"
        read -r CLEANUP_RESPONSE
        if [[ "$CLEANUP_RESPONSE" =~ ^[Yy]$ ]]; then
            rm "$AUDIO_FILE"
            echo "Audio file deleted."
        fi

        ls -lh *.vtt
    else
        echo "Transcription cancelled."
        exit 0
    fi
fi

echo "尝试下载手动字幕..." if yt-dlp --write-sub --skip-download --output "$OUTPUT_NAME" "$VIDEO_URL" 2>/dev/null; then echo "✓ 手动字幕下载成功！" ls -lh ${OUTPUT_NAME}.* else # ============================================ # 步骤4：降级到自动生成字幕 # ============================================ echo "无可用手动字幕，尝试自动生成字幕..." if yt-dlp --write-auto-sub --skip-download --output "$OUTPUT_NAME" "$VIDEO_URL" 2>/dev/null; then echo "✓ 自动生成字幕下载成功！" ls -lh ${OUTPUT_NAME}.* else # ============================================ # 步骤5：最后方案 - Whisper转录 # ============================================ echo "⚠ 当前视频无可用字幕。"

    # 获取文件大小
    FILE_SIZE=$(yt-dlp --print "%(filesize_approx)s" -f "bestaudio" "$VIDEO_URL")
    DURATION=$(yt-dlp --print "%(duration)s" "$VIDEO_URL")
    TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL")

    echo "视频: $TITLE"
    echo "时长: $((DURATION / 60)) 分钟"
    echo "音频文件大小: ~$((FILE_SIZE / 1024 / 1024)) MB"
    echo ""
    echo "是否下载音频并使用Whisper进行转录？(y/n)"
    read -r RESPONSE

    if [[ "$RESPONSE" =~ ^[Yy]$ ]]; then
        # 检查Whisper是否已安装
        if ! command -v whisper &> /dev/null; then
            echo "Whisper未安装。是否安装？（需占用1-3GB存储空间）(y/n)"
            read -r INSTALL_RESPONSE
            if [[ "$INSTALL_RESPONSE" =~ ^[Yy]$ ]]; then
                pip3 install openai-whisper
            else
                echo "无Whisper无法继续，退出流程。"
                exit 1
            fi
        fi

        # 下载音频
        echo "正在下载音频..."
        yt-dlp -x --audio-format mp3 --output "audio_%(id)s.%(ext)s" "$VIDEO_URL"

        # 获取实际音频文件名
        AUDIO_FILE=$(ls audio_*.mp3 | head -n 1)

        # 开始转录
        echo "正在使用Whisper转录（可能需要几分钟）..."
        whisper "$AUDIO_FILE" --model base --output_format vtt

        # 清理文件
        echo "转录已完成！是否删除音频文件？(y/n)"
        read -r CLEANUP_RESPONSE
        if [[ "$CLEANUP_RESPONSE" =~ ^[Yy]$ ]]; then
            rm "$AUDIO_FILE"
            echo "音频文件已删除。"
        fi

        ls -lh *.vtt
    else
        echo "已取消转录。"
        exit 0
    fi
fi

============================================

STEP 6: Convert to readable plain text with deduplication

步骤6：转换为易读格式并去重

============================================

VTT_FILE=$(ls ${OUTPUT_NAME}*.vtt 2>/dev/null || ls .vtt | head -n 1) if [ -f "$VTT_FILE" ]; then echo "Converting to readable format and removing duplicates..." python3 -c " import sys, re seen = set() with open('$VTT_FILE', 'r') as f: for line in f: line = line.strip() if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line: clean = re.sub('<[^>]>', '', line) clean = clean.replace('&', '&').replace('>', '>').replace('<', '<') if clean and clean not in seen: print(clean) seen.add(clean) " > "${VIDEO_TITLE}.txt" echo "✓ Saved to: ${VIDEO_TITLE}.txt"

# Clean up temporary VTT file
rm "$VTT_FILE"
echo "✓ Cleaned up temporary VTT file"

else echo "⚠ No VTT file found to convert" fi

echo "✓ Complete!"


**Note**: This complete workflow handles all scenarios with proper error checking and user prompts at each decision point.

VTT_FILE=$(ls ${OUTPUT_NAME}*.vtt 2>/dev/null || ls .vtt | head -n 1) if [ -f "$VTT_FILE" ]; then echo "正在转换为易读格式并去重..." python3 -c " import sys, re seen = set() with open('$VTT_FILE', 'r') as f: for line in f: line = line.strip() if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line: clean = re.sub('<[^>]>', '', line) clean = clean.replace('&', '&').replace('>', '>').replace('<', '<') if clean and clean not in seen: print(clean) seen.add(clean) " > "${VIDEO_TITLE}.txt" echo "✓ 已保存至: ${VIDEO_TITLE}.txt"

# 清理临时VTT文件
rm "$VTT_FILE"
echo "✓ 已清理临时VTT文件"

else echo "⚠ 未找到可转换的VTT文件" fi

echo "✓ 流程完成！"


**注意**：此完整工作流可处理所有场景，包含完善的错误检查和决策点的用户提示。

Error Handling

错误处理

Common Issues and Solutions:

常见问题与解决方案：

1. yt-dlp not installed

Attempt automatic installation based on system (Homebrew/apt/pip)
If installation fails, provide manual installation link
Verify installation before proceeding

2. No subtitles available

List available subtitles first to confirm
Try both
```
--write-sub
```
and
```
--write-auto-sub
```
If both fail, offer Whisper transcription option
Show file size and ask for user confirmation before downloading audio

3. Invalid or private video

Check if URL is correct format:

https://www.youtube.com/watch?v=VIDEO_ID

Some videos may be private, age-restricted, or geo-blocked
Inform user of the specific error from yt-dlp

4. Whisper installation fails

May require system dependencies (ffmpeg, rust)
Provide fallback: "Install manually with:
```
pip3 install openai-whisper
```
"
Check available disk space (models require 1-10GB depending on size)

5. Download interrupted or failed

Check internet connection
Verify sufficient disk space
Try again with
```
--no-check-certificate
```
if SSL issues occur

6. Multiple subtitle languages

By default, yt-dlp downloads all available languages
Can specify with
```
--sub-langs en
```
for English only
List available with
```
--list-subs
```
first

1. yt-dlp未安装

根据系统尝试自动安装（Homebrew/apt/pip）
若安装失败，提供手动安装链接
继续前需验证安装是否成功

2. 无可用字幕

先执行
```
--list-subs
```
确认是否真的无字幕
尝试
```
--write-sub
```
和
```
--write-auto-sub
```
两个选项
若均失败，提供Whisper转录选项
下载音频前需向用户显示文件大小并请求确认

3. 视频链接无效或视频为私有

检查URL格式是否正确：

https://www.youtube.com/watch?v=VIDEO_ID

部分视频可能是私有、年龄限制或地区限制
向用户显示yt-dlp返回的具体错误信息

4. Whisper安装失败

可能需要系统依赖（ffmpeg、rust）
提供降级方案：“请手动安装：
```
pip3 install openai-whisper
```
”
检查磁盘空间（模型文件需占用1-10GB，取决于模型大小）

5. 下载中断或失败

检查网络连接
验证磁盘空间是否充足
若出现SSL问题，可尝试添加
```
--no-check-certificate
```
选项后重试

6. 存在多种语言字幕

默认情况下，yt-dlp会下载所有可用语言的字幕
可使用
```
--sub-langs en
```
指定仅下载英文字幕
先执行
```
--list-subs
```
查看可用语言

Best Practices:

最佳实践：

✅ Always check what's available before attempting download (
```
--list-subs
```
)
✅ Verify success at each step before proceeding to next
✅ Ask user before large downloads (audio files, Whisper models)
✅ Clean up temporary files after processing
✅ Provide clear feedback about what's happening at each stage
✅ Handle errors gracefully with helpful messages

✅ 下载前始终先检查可用字幕（
```
--list-subs
```
）
✅ 每一步执行完成后，验证是否成功再进入下一步
✅ 大文件下载（音频文件、Whisper模型）前需询问用户
✅ 处理完成后清理临时文件
✅ 每一步操作都向用户提供清晰的反馈
✅ 优雅处理错误并提供有用的提示信息