transcribe

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Audio Transcribe

音频转录

Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.
使用OpenAI进行音频转录,可根据需求启用可选的说话人分离功能。推荐使用内置CLI以获得确定性、可重复的运行结果。

Workflow

工作流程

  1. Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
  2. Verify
    OPENAI_API_KEY
    is set. If missing, ask the user to set it locally (do not ask them to paste the key).
  3. Run the bundled
    transcribe_diarize.py
    CLI with sensible defaults (fast text transcription).
  4. Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
  5. Save outputs under
    output/transcribe/
    when working in this repo.
  1. 收集输入:音频文件路径、期望的响应格式(text/json/diarized_json)、可选的语言提示,以及任何已知说话人的参考音频。
  2. 验证
    OPENAI_API_KEY
    是否已设置。如果未设置,请用户在本地配置(不要要求用户粘贴密钥)。
  3. 使用合理的默认值(快速文本转录)运行内置的
    transcribe_diarize.py
    CLI。
  4. 验证输出:转录质量、说话人标签和片段边界;如果需要,仅进行一次针对性调整后重新运行。
  5. 当在此仓库中工作时,将输出保存到
    output/transcribe/
    目录下。

Decision rules

决策规则

  • Default to
    gpt-4o-mini-transcribe
    with
    --response-format text
    for fast transcription.
  • If the user wants speaker labels or diarization, use
    --model gpt-4o-transcribe-diarize --response-format diarized_json
    .
  • If audio is longer than ~30 seconds, keep
    --chunking-strategy auto
    .
  • Prompting is not supported for
    gpt-4o-transcribe-diarize
    .
  • 默认使用
    gpt-4o-mini-transcribe
    模型并指定
    --response-format text
    以实现快速转录。
  • 如果用户需要说话人标签或分离功能,使用
    --model gpt-4o-transcribe-diarize --response-format diarized_json
  • 如果音频时长超过约30秒,保留
    --chunking-strategy auto
    参数。
  • gpt-4o-transcribe-diarize
    模型不支持提示词设置。

Output conventions

输出约定

  • Use
    output/transcribe/<job-id>/
    for evaluation runs.
  • Use
    --out-dir
    for multiple files to avoid overwriting.
  • 评估运行的输出保存到
    output/transcribe/<job-id>/
    目录。
  • 处理多个文件时使用
    --out-dir
    参数以避免覆盖文件。

Dependencies (install if missing)

依赖项(缺失时安装)

Prefer
uv
for dependency management.
uv pip install openai
If
uv
is unavailable:
python3 -m pip install openai
推荐使用
uv
进行依赖管理。
uv pip install openai
如果
uv
不可用:
python3 -m pip install openai

Environment

环境要求

  • OPENAI_API_KEY
    must be set for live API calls.
  • If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
  • Never ask the user to paste the full key in chat.
  • 调用实时API必须设置
    OPENAI_API_KEY
  • 如果密钥未设置,指导用户在OpenAI平台UI中创建密钥并在shell中导出。
  • 绝对不要要求用户在聊天中粘贴完整密钥。

Skill path (set once)

技能路径(仅需设置一次)

bash
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"
User-scoped skills install under
$CODEX_HOME/skills
(default:
~/.codex/skills
).
bash
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"
用户专属技能安装在
$CODEX_HOME/skills
目录下(默认路径:
~/.codex/skills
)。

CLI quick start

CLI快速开始

Single file (fast text default):
python3 "$TRANSCRIBE_CLI" \
  path/to/audio.wav \
  --out transcript.txt
Diarization with known speakers (up to 4):
python3 "$TRANSCRIBE_CLI" \
  meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --response-format diarized_json \
  --out-dir output/transcribe/meeting
Plain text output (explicit):
python3 "$TRANSCRIBE_CLI" \
  interview.mp3 \
  --response-format text \
  --out interview.txt
单个文件(默认快速文本转录):
python3 "$TRANSCRIBE_CLI" \
  path/to/audio.wav \
  --out transcript.txt
带已知说话人的分离功能(最多支持4人):
python3 "$TRANSCRIBE_CLI" \
  meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --response-format diarized_json \
  --out-dir output/transcribe/meeting
纯文本输出(显式指定):
python3 "$TRANSCRIBE_CLI" \
  interview.mp3 \
  --response-format text \
  --out interview.txt

Reference map

参考映射

  • references/api.md
    : supported formats, limits, response formats, and known-speaker notes.
  • references/api.md
    : 支持的格式、限制、响应格式以及已知说话人相关说明。