Transcribe audio files to text using local whisper.cpp (no cloud API required).
┌─────────────────────────────┐
│ transcribe.sh │
│ audio_file, [model], [lang]│
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ ffmpeg: convert to WAV │
│ 16kHz, mono, pcm_s16le │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ whisper-cli: transcribe │
│ with Metal acceleration │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ Save to files │
│ .json (full) + .txt │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ Return file paths │
│ {file_path, text_file_path}│
└─────────────────────────────┘
json
{
"status": "success",
"file_path": "{baseDir}/data/20091025__VIDEO_ID.json",
"text_file_path": "{baseDir}/data/20091025__VIDEO_ID.txt",
"language": "en",
"duration": "3:32",
"model": "medium",
"char_count": 12345,
"line_count": 100,
"text_char_count": 10000,
"text_line_count": 50,
"cached": false,
"video_id": "dQw4w9WgXcQ",
"title": "Video Title",
"channel": "Channel Name",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}
json
{
"status": "error",
"error_code": "UNKNOWN_MODEL",
"message": "Unknown model: invalid-name",
"available_models": ["tiny", "base", "small", "medium", "large-v3", "large-v3-turbo", "belle-zh", "kotoba-ja", "kotoba-ja-q5"]
}
When you receive
error: suggest a valid model from the
list.
json
{
"status": "error",
"error_code": "MODEL_NOT_FOUND",
"message": "Model 'medium' not found. Please download it first.",
"model": "medium",
"model_size": "1.4GB",
"download_url": "https://huggingface.co/...",
"download_command": "curl -L --progress-bar -o '/path/to/models/ggml-medium.bin' 'https://...' 2>&1"
}
json
{
"status": "error",
"error_code": "MODEL_CORRUPTED",
"message": "Model 'medium' is corrupted or incomplete. Please re-download.",
"model": "medium",
"model_size": "1.4GB",
"expected_sha256": "6c14d5adee5f86394037b4e4e8b59f1673b6cee10e3cf0b11bbdbee79c156208",
"actual_sha256": "def456...",
"model_path": "/path/to/models/ggml-medium.bin",
"download_command": "rm '/path/to/models/ggml-medium.bin' && curl -L --progress-bar -o '/path/to/models/ggml-medium.bin' 'https://...' 2>&1"
}
Output files preserve the input audio filename's unified naming format with date prefix:
{YYYYMMDD}__{video_id}.{ext}
When model is
(default), the system automatically selects the best model based on language:
Models must be downloaded before transcription. When you receive a
error, execute the
with
.
After transcription completes, invoke
/mk-youtube-transcript-summarize
with the
from the output to generate a structured summary:
IMPORTANT: Always use the Skill tool to invoke
/mk-youtube-transcript-summarize
. Do NOT generate summaries directly without loading the skill — it contains critical rules for compression ratio, section structure, data preservation, and language handling.