transcribe

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Audio Transcribe

音频转录

Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.

使用OpenAI进行音频转录，可根据需求启用可选的说话人分离功能。推荐使用内置CLI以获得确定性、可重复的运行结果。

Workflow

工作流程

Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
Verify
```
OPENAI_API_KEY
```
is set. If missing, ask the user to set it locally (do not ask them to paste the key).
Run the bundled
```
transcribe_diarize.py
```
CLI with sensible defaults (fast text transcription).
Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
Save outputs under
```
output/transcribe/
```
when working in this repo.

收集输入：音频文件路径、期望的响应格式（text/json/diarized_json）、可选的语言提示，以及任何已知说话人的参考音频。
验证
```
OPENAI_API_KEY
```
是否已设置。如果未设置，请用户在本地配置（不要要求用户粘贴密钥）。
使用合理的默认值（快速文本转录）运行内置的
```
transcribe_diarize.py
```
CLI。
验证输出：转录质量、说话人标签和片段边界；如果需要，仅进行一次针对性调整后重新运行。
当在此仓库中工作时，将输出保存到
```
output/transcribe/
```
目录下。

Decision rules

决策规则

Default to

gpt-4o-mini-transcribe

with

--response-format text

for fast transcription.

If the user wants speaker labels or diarization, use

--model gpt-4o-transcribe-diarize --response-format diarized_json

If audio is longer than ~30 seconds, keep
```
--chunking-strategy auto
```
.
Prompting is not supported for
```
gpt-4o-transcribe-diarize
```
.

默认使用
```
gpt-4o-mini-transcribe
```
模型并指定
```
--response-format text
```
以实现快速转录。

如果用户需要说话人标签或分离功能，使用

--model gpt-4o-transcribe-diarize --response-format diarized_json

。

如果音频时长超过约30秒，保留
```
--chunking-strategy auto
```
参数。
```
gpt-4o-transcribe-diarize
```
模型不支持提示词设置。

Output conventions

输出约定

Use
```
output/transcribe/<job-id>/
```
for evaluation runs.
Use
```
--out-dir
```
for multiple files to avoid overwriting.

评估运行的输出保存到
```
output/transcribe/<job-id>/
```
目录。
处理多个文件时使用
```
--out-dir
```
参数以避免覆盖文件。

Dependencies (install if missing)

依赖项（缺失时安装）

Prefer

uv

for dependency management.

uv pip install openai

uv

is unavailable:

python3 -m pip install openai

推荐使用

uv

进行依赖管理。

uv pip install openai

如果

uv

不可用：

python3 -m pip install openai

Environment

环境要求

```
OPENAI_API_KEY
```
must be set for live API calls.
If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
Never ask the user to paste the full key in chat.

调用实时API必须设置
```
OPENAI_API_KEY
```
。
如果密钥未设置，指导用户在OpenAI平台UI中创建密钥并在shell中导出。
绝对不要要求用户在聊天中粘贴完整密钥。

Skill path (set once)

技能路径（仅需设置一次）

bash

export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"

User-scoped skills install under

$CODEX_HOME/skills

(default:

~/.codex/skills

bash

export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"

用户专属技能安装在

$CODEX_HOME/skills

目录下（默认路径：

~/.codex/skills

）。

CLI quick start

CLI快速开始

Single file (fast text default):

python3 "$TRANSCRIBE_CLI" \
  path/to/audio.wav \
  --out transcript.txt

Diarization with known speakers (up to 4):

python3 "$TRANSCRIBE_CLI" \
  meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --response-format diarized_json \
  --out-dir output/transcribe/meeting

Plain text output (explicit):

python3 "$TRANSCRIBE_CLI" \
  interview.mp3 \
  --response-format text \
  --out interview.txt

单个文件（默认快速文本转录）：

python3 "$TRANSCRIBE_CLI" \
  path/to/audio.wav \
  --out transcript.txt

带已知说话人的分离功能（最多支持4人）：

python3 "$TRANSCRIBE_CLI" \
  meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --response-format diarized_json \
  --out-dir output/transcribe/meeting

纯文本输出（显式指定）：

python3 "$TRANSCRIBE_CLI" \
  interview.mp3 \
  --response-format text \
  --out interview.txt

Reference map

参考映射

```
references/api.md
```
: supported formats, limits, response formats, and known-speaker notes.

```
references/api.md
```
: 支持的格式、限制、响应格式以及已知说话人相关说明。