transcribe-audio

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Skill: Transcribe Audio

Skill: 转录音频

Transcribes video audio using WhisperX and creates clean JSON transcripts with word-level timing data.

使用WhisperX转录视频音频，并生成包含单词级时间数据的简洁JSON转录本。

When to Use

使用场景

Videos need audio transcripts before visual analysis

在进行视觉分析前需要为视频生成音频转录本时

Critical Requirements

关键要求

Use WhisperX, NOT standard Whisper. WhisperX preserves the original video timeline including leading silence, ensuring transcripts match actual video timestamps. Run WhisperX directly on video files. Don't extract audio separately - this ensures timestamp alignment.

必须使用WhisperX，而非标准Whisper。WhisperX会保留原始视频时间线，包括开头的静音部分，确保转录本与实际视频时间戳完全匹配。直接对视频文件运行WhisperX，不要单独提取音频——这样才能保证时间戳对齐。

Workflow

工作流程

1. Read Language from Library File

1. 从库文件读取语言信息

Read the library's

library.yaml

to get the language code:

yaml

undefined

读取库中的

library.yaml

文件获取语言代码：

yaml

undefined

Library metadata

library_name: [library-name] language: en # Language code stored here ...

undefined

library_name: [library-name] language: en # Language code stored here ...

undefined

2. Run WhisperX

2. 运行WhisperX

bash

whisperx "/full/path/to/video.mov" \
  --language en \
  --model medium \
  --compute_type float32 \
  --device cpu \
  --output_format json \
  --output_dir libraries/[library-name]/transcripts

bash

whisperx "/full/path/to/video.mov" \
  --language en \
  --model medium \
  --compute_type float32 \
  --device cpu \
  --output_format json \
  --output_dir libraries/[library-name]/transcripts

3. Prepare Audio Transcript

3. 准备音频转录本

After WhisperX completes, format the JSON using our prepare_audio_script:

bash

ruby .claude/skills/transcribe-audio/prepare_audio_script.rb \
  libraries/[library-name]/transcripts/video_name.json \
  /full/path/to/original/video_name.mov

This script:

Adds video source path as metadata
Removes unnecessary fields to reduce file size
Prettifies JSON

WhisperX运行完成后，使用我们的prepare_audio_script格式化JSON：

bash

ruby .claude/skills/transcribe-audio/prepare_audio_script.rb \
  libraries/[library-name]/transcripts/video_name.json \
  /full/path/to/original/video_name.mov

该脚本会：

添加视频源路径作为元数据
删除不必要的字段以减小文件体积
美化JSON格式

4. Return Success Response

4. 返回成功响应

After audio preparation completes, return this structured response to the parent agent:

✓ [video_filename.mov] transcribed successfully
  Audio transcript: libraries/[library-name]/transcripts/video_name.json
  Video path: /full/path/to/video_filename.mov

DO NOT update library.yaml - the parent agent will handle this to avoid race conditions when running multiple transcriptions in parallel.

音频转录本准备完成后，向父Agent返回以下结构化响应：

✓ [video_filename.mov] 转录成功
  音频转录本：libraries/[library-name]/transcripts/video_name.json
  视频路径：/full/path/to/video_filename.mov

请勿更新library.yaml——父Agent会处理此操作，以避免在并行运行多个转录任务时出现竞争条件。

Running in Parallel

并行运行

This skill is designed to run inside a Task agent for parallel execution:

Each agent handles ONE video file
Multiple agents can run simultaneously
Parent thread updates library.yaml sequentially after each agent completes
No race conditions on shared YAML file

此Skill专为在Task Agent中并行执行而设计：

每个Agent处理一个视频文件
多个Agent可同时运行
父线程会在每个Agent完成后按顺序更新library.yaml
共享YAML文件不会出现竞争条件

Next Step

下一步操作

After audio transcription, use the analyze-video skill to add visual descriptions and create the visual transcript.

完成音频转录后，使用analyze-video Skill添加视觉描述并生成视觉转录本。

Installation

安装

Ensure WhisperX is installed. Use the setup skill to verify dependencies.

确保已安装WhisperX。使用setup Skill验证依赖项。