transcribe
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAudio Transcribe
音频转录
Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.
使用OpenAI进行音频转录,可根据需求启用可选的说话人分离功能。推荐使用内置CLI以获得确定性、可重复的运行结果。
Workflow
工作流程
- Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
- Verify is set. If missing, ask the user to set it locally (do not ask them to paste the key).
OPENAI_API_KEY - Run the bundled CLI with sensible defaults (fast text transcription).
transcribe_diarize.py - Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
- Save outputs under when working in this repo.
output/transcribe/
- 收集输入:音频文件路径、期望的响应格式(text/json/diarized_json)、可选的语言提示,以及任何已知说话人的参考音频。
- 验证是否已设置。如果未设置,请用户在本地配置(不要要求用户粘贴密钥)。
OPENAI_API_KEY - 使用合理的默认值(快速文本转录)运行内置的CLI。
transcribe_diarize.py - 验证输出:转录质量、说话人标签和片段边界;如果需要,仅进行一次针对性调整后重新运行。
- 当在此仓库中工作时,将输出保存到目录下。
output/transcribe/
Decision rules
决策规则
- Default to with
gpt-4o-mini-transcribefor fast transcription.--response-format text - If the user wants speaker labels or diarization, use .
--model gpt-4o-transcribe-diarize --response-format diarized_json - If audio is longer than ~30 seconds, keep .
--chunking-strategy auto - Prompting is not supported for .
gpt-4o-transcribe-diarize
- 默认使用模型并指定
gpt-4o-mini-transcribe以实现快速转录。--response-format text - 如果用户需要说话人标签或分离功能,使用。
--model gpt-4o-transcribe-diarize --response-format diarized_json - 如果音频时长超过约30秒,保留参数。
--chunking-strategy auto - 模型不支持提示词设置。
gpt-4o-transcribe-diarize
Output conventions
输出约定
- Use for evaluation runs.
output/transcribe/<job-id>/ - Use for multiple files to avoid overwriting.
--out-dir
- 评估运行的输出保存到目录。
output/transcribe/<job-id>/ - 处理多个文件时使用参数以避免覆盖文件。
--out-dir
Dependencies (install if missing)
依赖项(缺失时安装)
Prefer for dependency management.
uvuv pip install openaiIf is unavailable:
uvpython3 -m pip install openai推荐使用进行依赖管理。
uvuv pip install openai如果不可用:
uvpython3 -m pip install openaiEnvironment
环境要求
- must be set for live API calls.
OPENAI_API_KEY - If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
- Never ask the user to paste the full key in chat.
- 调用实时API必须设置。
OPENAI_API_KEY - 如果密钥未设置,指导用户在OpenAI平台UI中创建密钥并在shell中导出。
- 绝对不要要求用户在聊天中粘贴完整密钥。
Skill path (set once)
技能路径(仅需设置一次)
bash
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"User-scoped skills install under (default: ).
$CODEX_HOME/skills~/.codex/skillsbash
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"用户专属技能安装在目录下(默认路径:)。
$CODEX_HOME/skills~/.codex/skillsCLI quick start
CLI快速开始
Single file (fast text default):
python3 "$TRANSCRIBE_CLI" \
path/to/audio.wav \
--out transcript.txtDiarization with known speakers (up to 4):
python3 "$TRANSCRIBE_CLI" \
meeting.m4a \
--model gpt-4o-transcribe-diarize \
--known-speaker "Alice=refs/alice.wav" \
--known-speaker "Bob=refs/bob.wav" \
--response-format diarized_json \
--out-dir output/transcribe/meetingPlain text output (explicit):
python3 "$TRANSCRIBE_CLI" \
interview.mp3 \
--response-format text \
--out interview.txt单个文件(默认快速文本转录):
python3 "$TRANSCRIBE_CLI" \
path/to/audio.wav \
--out transcript.txt带已知说话人的分离功能(最多支持4人):
python3 "$TRANSCRIBE_CLI" \
meeting.m4a \
--model gpt-4o-transcribe-diarize \
--known-speaker "Alice=refs/alice.wav" \
--known-speaker "Bob=refs/bob.wav" \
--response-format diarized_json \
--out-dir output/transcribe/meeting纯文本输出(显式指定):
python3 "$TRANSCRIBE_CLI" \
interview.mp3 \
--response-format text \
--out interview.txtReference map
参考映射
- : supported formats, limits, response formats, and known-speaker notes.
references/api.md
- : 支持的格式、限制、响应格式以及已知说话人相关说明。
references/api.md