youtube-transcript

Original🇺🇸 English
Translated
2 scripts

Production-grade YouTube transcript extraction with multi-format output, language selection, auto-generated support, intelligent caching, rate limiting, and retry logic. Supports SRT, VTT, JSON, CSV, TSV, and plain text.

4installs
Added on

NPX Install

npx skill4agent add winsorllc/upgraded-carnival youtube-transcript

YouTube Transcript Skill

Production-grade YouTube transcript extraction with comprehensive format support, intelligent caching, and resilient networking.

When to Use

USE this skill when:
  • Extracting transcripts from YouTube videos
  • Converting YouTube captions to SRT/VTT subtitle files
  • Analyzing video content via transcripts
  • Creating subtitles for downloaded videos
  • Batch processing multiple video transcripts
  • Needing transcripts in specific languages
  • Processing auto-generated captions
DON'T use this skill when:
  • Transcript not available (disabled by creator)
  • Video is private or age-restricted
  • Livestream that hasn't ended
  • Need speech-to-text from audio → Use transcribe
  • Need video frames → Use video-frames

Prerequisites

bash
# Requires Node.js (already available)
node --version

# No additional dependencies required

Commands

Basic Usage

bash
# Extract transcript with video ID
{baseDir}/youtube-transcript.js VIDEO_ID

# Extract with full URL
{baseDir}/youtube-transcript.js "https://www.youtube.com/watch?v=VIDEO_ID"

# Extract with short URL
{baseDir}/youtube-transcript.js "https://youtu.be/VIDEO_ID"

Output Formats

bash
# Plain text with timestamps (default)
{baseDir}/youtube-transcript.js VIDEO_ID --format text
[0:00:00.00] Here is the transcript text
[0:00:05.32] More transcript content

# Plain text without timestamps
{baseDir}/youtube-transcript.js VIDEO_ID --format plain
Here is the transcript text More transcript content

# JSON with metadata
{baseDir}/youtube-transcript.js VIDEO_ID --format json
{
  "title": "Video Title",
  "author": "Channel Name",
  "language": "en",
  "isAutoGenerated": false,
  "transcript": [...]
}

# SRT subtitle format
{baseDir}/youtube-transcript.js VIDEO_ID --format srt > video.srt
1
00:00:00,000 --> 00:00:05,320
Here is the transcript text

2
00:00:05,320 --> 00:00:08,150
More transcript content

# VTT subtitle format
{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > video.vtt
WEBVTT

1
00:00.000 --> 00:05.320
Here is the transcript text

# TSV tab-separated values
{baseDir}/youtube-transcript.js VIDEO_ID --format tsv
start\tduration\ttext
0.000\t5.320\tHere is the transcript text

# CSV comma-separated values
{baseDir}/youtube-transcript.js VIDEO_ID --format csv
start,duration,text
0.000,5.320,"Here is the transcript text"

Language Selection

bash
# Auto-select best available (default)
{baseDir}/youtube-transcript.js VIDEO_ID

# Specific language by code
{baseDir}/youtube-transcript.js VIDEO_ID --language en
{baseDir}/youtube-transcript.js VIDEO_ID --language es
{baseDir}/youtube-transcript.js VIDEO_ID --language fr

# Partial matches work too
{baseDir}/youtube-transcript.js VIDEO_ID --language zh   # Matches zh-CN, zh-TW, etc.

# Language with auto-generated preference
{baseDir}/youtube-transcript.js VIDEO_ID --language ja --format srt
Common Language Codes:
CodeLanguage
enEnglish
esSpanish
frFrench
deGerman
jaJapanese
koKorean
zhChinese
ptPortuguese
ruRussian
hiHindi
arArabic
itItalian

Save to File

bash
# Save transcript directly to file
{baseDir}/youtube-transcript.js VIDEO_ID --output transcript.txt
{baseDir}/youtube-transcript.js VIDEO_ID --format srt --output subtitles.srt
{baseDir}/youtube-transcript.js VIDEO_ID --format json --output data.json

# Shell redirection (equivalent)
{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > captions.vtt

Advanced Options

bash
# Skip cache (force fresh fetch)
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache

# Verbose debugging output
DEBUG=1 {baseDir}/youtube-transcript.js VIDEO_ID

# Combine options
{baseDir}/youtube-transcript.js VIDEO_ID --language es --format srt --output spanish.srt --no-cache

Features

Format Comparison

FormatUse CaseHuman ReadableMachine Readable
text
Default viewing⚠️
plain
Content only⚠️
json
API integration⚠️
srt
Subtitle files
vtt
Web captions
tsv
Spreadsheet import⚠️
csv
Database import⚠️

Supported Video URL Formats

# Plain video ID (11 characters)
EBw7gsDPAYQ

# Standard YouTube URL
https://www.youtube.com/watch?v=EBw7gsDPAYQ

# Short youtu.be URL
https://youtu.be/EBw7gsDPAYQ

# Embed URL
https://www.youtube.com/embed/EBw7gsDPAYQ

# YouTube Live URL
https://www.youtube.com/live/EBw7gsDPAYQ

# URLs with additional parameters (automatically handled)
https://www.youtube.com/watch?v=EBw7gsDPAYQ&t=120s
https://www.youtube.com/watch?v=EBw7gsDPAYQ&index=2

# Playlist URLs (extracts first video)
https://www.youtube.com/watch?v=EBw7gsDPAYQ&list=...

Intelligent Caching

The skill implements intelligent caching to improve performance:
  • Cache Location:
    /tmp/youtube-transcript-cache/
  • TTL: 24 hours per entry
  • Max Entries: 100 videos
  • Benefits:
    • Instant retrieval of previously fetched transcripts
    • Reduced load on YouTube servers
    • Better performance for repeated operations
Cache Bypass:
bash
# Force fresh fetch
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache

Rate Limiting

To avoid being blocked by YouTube:
  • Max 60 requests per minute
  • Minimum 1 second delay between requests
  • Exponential backoff on retries

Retry Logic

When requests fail:
  1. First attempt
  2. Wait 2 seconds, retry
  3. Wait 4 seconds, retry
  4. Wait 6 seconds, retry
  5. Final error reported

Error Handling

Error Codes

CodeNameDescriptionResolution
0SUCCESSTranscript fetchedNone needed
1INVALID_VIDEO_IDBad URL/ID format double-check the video ID
2VIDEO_NOT_FOUNDVideo doesn't existVerify video exists
3TRANSCRIPT_DISABLEDCreator disabled captionsContact creator
4NO_TRANSCRIPTNo captions availableWait for transcript
5VIDEO_UNAVAILABLECan't accessCheck restrictions
6PRIVATE_VIDEOVideo is privateGet access/permission
7RATE_LIMITEDToo many requestsWait before retry
8NETWORK_ERRORConnection issueCheck internet
9PARSE_ERRORData extraction failedTry again
99UNKNOWNUnexpected errorReport issue

Common Errors and Solutions

"Could not extract player data"
  • YouTube may have changed their page structure
  • The video may be age-restricted
  • The video may require login
  • Solution: Try again later or check if video is publicly accessible
"No captions available for this video"
  • Creator hasn't added captions
  • Auto-generated captions aren't ready (may take a few hours after upload)
  • Video is too new
  • Solution: Wait for YouTube to generate captions, or check if manual captions exist
"Rate limited by YouTube"
  • Too many requests in short period
  • Solution: Wait 1-2 minutes before retrying
"Transcript too long"
  • Video exceeds 500K characters
  • Solution: Use
    --format json
    which handles large transcripts better
"Video unavailable or not found"
  • Video removed or never existed
  • Region-restricted
  • Solution: Verify video ID/URL is correct

Technical Architecture

Data Flow

Video ID/URL
Extract Video ID ← URL parser (7+ formats)
Check Cache ← 24hr TTL store
    ↓[cache miss]
Fetch YouTube Page ← HTTP with retry logic
Extract Player Data ← ytInitialPlayerResponse
Parse Caption Tracks ← Language selection
Fetch Transcript ← Select appropriate URL
Parse Entries ← XML/JSON parsing
Format Output ← 7 output formats
Cache & Return ← Store for 24hr

Player Data Extraction

Extracts multiple potential sources:
  1. ytInitialPlayerResponse
    JavaScript variable
  2. playerResponse
    JSON in script tags
  3. Caption tracks from various locations

Transcript Parsing

Supports multiple formats:
  1. JSON API Response: Modern format
  2. Timed Text XML: Legacy format
  3. Alternative XML: Older structure
  4. Special handling for: Auto-generated vs manual captions

Data Unescaping

Properly handles:
  • &
    &
  • <
    <
  • &gt;
    >
  • &quot;
    "
  • &#39;
    /
    &#039;
    /
    &apos;
    '
  • Whitespace normalization

Sample Output

JSON Format (Full)

json
{
  "title": "How Artificial Intelligence Works",
  "author": "Example Channel",
  "duration": "PT10M32S",
  "language": "en",
  "isAutoGenerated": true,
  "transcript": [
    {
      "start": 0.000,
      "duration": 5.320,
      "text": "In this video, we'll explore how AI systems learn and adapt"
    },
    {
      "start": 5.320,
      "duration": 4.180,
      "text": "to perform tasks that traditionally required human intelligence"
    }
  ],
  "word_count": 2847,
  "total_entries": 156
}

SRT Format (SubRip)

srt
1
00:00:00,000 --> 00:00:05,320
In this video, we'll explore how AI systems
learn and adapt

2
00:00:05,320 --> 00:00:09,500
to perform tasks that traditionally
required human intelligence

3
00:00:09,500 --> 00:00:13,240
This process is called
machine learning

...

VTT Format (WebVTT)

vtt
WEBVTT

1
00:00.000 --> 00:05.320
In this video, we'll explore how AI systems
learn and adapt

2
00:05.320 --> 00:09.500
to perform tasks that traditionally
required human intelligence

...

Examples

Download Transcripts for Playlist

bash
#!/bin/bash
# Process multiple videos from IDs file

for video_id in $(cat video_ids.txt); do
  echo "Processing: $video_id"
  
  {baseDir}/youtube-transcript.js "$video_id" --format srt --output "transcripts/${video_id}.srt" 2>/dev/null
  
  if [ $? -eq 0 ]; then
    echo "  ✓ Success"
  else
    echo "  ✗ Failed"
  fi
  
  # Sleep to respect rate limits
  sleep 2
done

Convert to PDF for Reading

bash
#!/bin/bash
VIDEO_ID="EBw7gsDPAYQ"

# Get transcript
{baseDir}/youtube-transcript.js "$VIDEO_ID" --format plain > transcript.txt

# Convert to PDF (requires pandoc)
pandoc transcript.txt -o transcript.pdf
echo "PDF created: transcript.pdf"

Analyze Word Counts

bash
#!/bin/bash
VIDEO_ID="EBw7gsDPAYQ"

# Get JSON format
{baseDir}/youtube-transcript.js "$VIDEO_ID" --format json | jq -r '
  "Title: \(.title)",
  "Author: \(.author)",
  "Words: \(.word_count)",
  "Entries: \(.total_entries)",
  "Language: \(.language)\(.isAutoGenerated ? " (auto)" : "")"
'

Batch Download with Progress

bash
#!/bin/bash
VIDEOS=("VIDEO1" "VIDEO2" "VIDEO3")
TOTAL=${#VIDEOS[@]}

for i in "${!VIDEOS[@]}"; do
  id="${VIDEOS[$i]}"
  echo "[$((i+1))/$TOTAL] Processing $id..."
  
  {baseDir}/youtube-transcript.js "$id" --format json --output "data/${id}.json" 2>/dev/null
  
  sleep 1  # Rate limit protection
done

Create Bilingual Subtitles

bash
#!/bin/bash
VIDEO_ID="your-video-id"

# Get English and Spanish
{baseDir}/youtube-transcript.js "$VIDEO_ID" --language en --format srt > english.srt
echo "English ✓"

{baseDir}/youtube-transcript.js "$VIDEO_ID" --language es --format srt > spanish.srt
echo "Spanish ✓"

# Combine (requires ffmpeg)
ffmpeg -i video.mp4 -i english.srt -i spanish.srt \
  -map 0:v -map 0:a -map 1:s:0 -map 2:s:0 \
  -c:v copy -c:a copy -c:s mov_text \
  "${VIDEO_ID}_bilingual.mp4"

echo "Bilingual video created ✓"

Performance Tips

1. Use Caching

First fetch: ~2-5 seconds
Cached fetch: ~100ms
bash
# First time (slow)
{baseDir}/youtube-transcript.js VIDEO_ID

# Second time (fast - from cache)
{baseDir}/youtube-transcript.js VIDEO_ID

# Force refresh (slow)
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache

2. Batch Processing with Delays

bash
# Bad - might hit rate limits
for id in $IDS; do
  {baseDir}/youtube-transcript.js "$id"
done

# Good - respects rate limits
for id in $IDS; do
  {baseDir}/youtube-transcript.js "$id"
  sleep 2
done

3. Parallel Processing (Limited)

bash
# Process 2-3 at a time (don't exceed rate limit)
{baseDir}/youtube-transcript.js VIDEO1 &
{baseDir}/youtube-transcript.js VIDEO2 &
{baseDir}/youtube-transcript.js VIDEO3 &
wait

4. Output Format Selection

  • Fastest:
    plain
    (smallest output, fastest write)
  • Recommended:
    text
    or
    json
    (balanced)
  • For subtitles:
    srt
    or
    vtt
    (industry standard)

Limitations

  1. No Private Videos: Requires public access
  2. No Age-Restricted: Some videos unavailable
  3. No Members-Only: Requires YouTube membership
  4. Livestream Lag: Captions may be delayed
  5. New Videos: Auto-generated captions take time
  6. Rate Limits: Max 60 requests/minute
  7. Large Transcripts: Limited to 500K characters

Notes

  • Cached transcripts expire after 24 hours
  • Auto-generated captions may have errors
  • Manual captions are preferred when available
  • Language codes follow YouTube's internal format
  • SRT format uses comma for milliseconds (WebVTT uses period)
  • TSV and CSV formats are UTF-8 encoded
  • JSON output includes metadata for programmatic use
  • Script is network-resilient with automatic retries
  • Use
    --output
    to save directly to file (handles special characters)
  • STDERR contains progress messages and metadata
  • STDOUT contains the actual transcript data