openrouter-stt

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenRouter Speech-to-Text

OpenRouter 语音转文本

Transcribe audio via
POST /api/v1/audio/transcriptions
using
curl
. Requires
OPENROUTER_API_KEY
(get one at https://openrouter.ai/keys). If unset, stop and ask.
This endpoint is not OpenAI-compatible. The body is JSON with base64 audio under
input_audio: { data, format }
— not
multipart/form-data
with a
file
field the way OpenAI's
/v1/audio/transcriptions
works. Do not point the OpenAI SDK at this endpoint; it will send the wrong shape. Use
curl
,
fetch
, or
requests
directly.
使用
curl
通过
POST /api/v1/audio/transcriptions
接口转录音频。需要
OPENROUTER_API_KEY
(可在https://openrouter.ai/keys获取)。如果未设置该密钥,请停止操作并询问用户。
此接口与OpenAI不兼容。请求体为JSON格式,其中
input_audio: { data, format }
字段存放base64编码的音频——而非OpenAI的
/v1/audio/transcriptions
接口所使用的、包含
file
字段的
multipart/form-data
格式。请勿将OpenAI SDK指向此接口,否则会发送错误格式的请求。请直接使用
curl
fetch
requests
调用。

One call, JSON back

一次调用,返回JSON

Both request and response are JSON. The response body carries:
  • text
    — the transcript.
  • usage
    — always includes
    cost
    . Providers additionally report either
    seconds
    of audio billed or a token breakdown (
    total_tokens
    ,
    input_tokens
    ,
    output_tokens
    ), depending on how they price the request. Don't assume both are present.
Sample response (duration-priced provider, e.g.
google/chirp-3
):
json
{
  "text": "I used to rule the world.",
  "usage": {
    "seconds": 20,
    "cost": 0.005333
  }
}
Sample response (token-priced provider):
json
{
  "text": "Hello, this is a test of speech-to-text transcription.",
  "usage": {
    "total_tokens": 113,
    "input_tokens": 83,
    "output_tokens": 30,
    "cost": 0.000508
  }
}
请求和响应均为JSON格式。响应体包含:
  • text
    —— 转录文本。
  • usage
    —— 始终包含
    cost
    字段。根据服务商的定价方式,还会额外报告计费音频时长
    seconds
    ,或令牌明细(
    total_tokens
    input_tokens
    output_tokens
    )。请勿假设两者同时存在。
示例响应(按时长定价的服务商,如
google/chirp-3
):
json
{
  "text": "I used to rule the world.",
  "usage": {
    "seconds": 20,
    "cost": 0.005333
  }
}
示例响应(按令牌定价的服务商):
json
{
  "text": "Hello, this is a test of speech-to-text transcription.",
  "usage": {
    "total_tokens": 113,
    "input_tokens": 83,
    "output_tokens": 30,
    "cost": 0.000508
  }
}

Drop-in workflow

即插即用工作流

bash
#!/usr/bin/env bash
set -euo pipefail

MODEL="google/chirp-3"
FORMAT="wav"                          # wav, mp3, flac, m4a, ogg, webm, aac
AUDIO="audio.wav"
BODY=$(mktemp)
PAYLOAD=$(mktemp)

audio_b64=$(base64 < "$AUDIO" | tr -d '\n')

jq -n --arg model "$MODEL" --arg data "$audio_b64" --arg fmt "$FORMAT" \
  '{model: $model, input_audio: {data: $data, format: $fmt}}' > "$PAYLOAD"
bash
#!/usr/bin/env bash
set -euo pipefail

MODEL="google/chirp-3"
FORMAT="wav"                          # wav, mp3, flac, m4a, ogg, webm, aac
AUDIO="audio.wav"
BODY=$(mktemp)
PAYLOAD=$(mktemp)

audio_b64=$(base64 < "$AUDIO" | tr -d '\n')

jq -n --arg model "$MODEL" --arg data "$audio_b64" --arg fmt "$FORMAT" \
  '{model: $model, input_audio: {data: $data, format: $fmt}}' > "$PAYLOAD"

--data-binary @file keeps the base64 payload off argv (avoids E2BIG / ARG_MAX).

--data-binary @file 可避免base64负载出现在命令行参数中(防止E2BIG / ARG_MAX错误)。

http_code=$(curl -sS -X POST https://openrouter.ai/api/v1/audio/transcriptions
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
--output "$BODY"
-w '%{http_code}'
--data-binary @"$PAYLOAD")
if [[ "$http_code" != "200" ]]; then echo "STT failed (HTTP $http_code):" >&2 cat "$BODY" >&2 rm -f "$BODY" "$PAYLOAD" exit 1 fi
jq -r '.text' "$BODY" rm -f "$BODY" "$PAYLOAD"
undefined
http_code=$(curl -sS -X POST https://openrouter.ai/api/v1/audio/transcriptions
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
--output "$BODY"
-w '%{http_code}'
--data-binary @"$PAYLOAD")
if [[ "$http_code" != "200" ]]; then echo "STT failed (HTTP $http_code):" >&2 cat "$BODY" >&2 rm -f "$BODY" "$PAYLOAD" exit 1 fi
jq -r '.text' "$BODY" rm -f "$BODY" "$PAYLOAD"
undefined

Discovering STT models

发现STT模型

Filter the models endpoint by output modality to list transcription models.
bash
curl -sS "https://openrouter.ai/api/v1/models?output_modalities=transcription" \
  | jq '.data[] | {id, name, pricing}'
Models are provider-namespaced — use the full slug (
google/chirp-3
,
openai/whisper-1
,
openai/whisper-large-v3
), not the short name.
通过输出模态过滤模型接口,列出转录模型。
bash
curl -sS "https://openrouter.ai/api/v1/models?output_modalities=transcription" \
  | jq '.data[] | {id, name, pricing}'
模型由服务商命名空间管理——请使用完整的标识(
google/chirp-3
openai/whisper-1
openai/whisper-large-v3
),而非简称。

Parameters

参数

FieldRequiredNotes
model
yesFull model slug from
/api/v1/models?output_modalities=transcription
.
input_audio.data
yesBase64-encoded raw audio bytes. Not a data URI — just the base64 payload, no
data:audio/...;base64,
prefix.
input_audio.format
yes
wav
,
mp3
,
flac
,
m4a
,
ogg
,
webm
, or
aac
. Must match the actual bytes. Support varies by provider.
language
noISO-639-1 code (
en
,
ja
,
fr
). Auto-detected if omitted.
temperature
no0–1. Lower is more deterministic.
provider
noProvider passthrough — see below.
字段是否必填说明
model
/api/v1/models?output_modalities=transcription
获取的完整模型标识。
input_audio.data
Base64编码的原始音频字节。不是数据URI——仅为base64负载,不要添加
data:audio/...;base64,
前缀。
input_audio.format
wav
mp3
flac
m4a
ogg
webm
aac
。必须与实际字节的格式匹配。不同服务商支持的格式可能不同。
language
ISO-639-1代码(
en
ja
fr
)。若省略则自动检测。
temperature
0–1之间的值。值越低,结果越具确定性。
provider
服务商透传参数——详见下文。

Picking an audio format

选择音频格式

  • wav
    /
    flac
    — uncompressed or lossless. Highest quality; largest uploads.
  • mp3
    /
    m4a
    /
    aac
    — compressed. Smaller payloads, which matters because base64 inflates bytes by ~33% on top of whatever the file already weighs.
  • webm
    /
    ogg
    — typical for browser recordings (
    MediaRecorder
    ).
The
format
field must match the actual container/codec of the bytes. A file saved as
.wav
that is actually mp3 will be rejected or mis-decoded. When in doubt, confirm with
ffprobe <file>
.
  • wav
    /
    flac
    —— 无压缩或无损格式。质量最高,但上传文件体积最大。
  • mp3
    /
    m4a
    /
    aac
    —— 压缩格式。负载更小,因为base64会使字节体积膨胀约33%,而压缩格式本身已减小了文件大小。
  • webm
    /
    ogg
    —— 浏览器录制的典型格式(
    MediaRecorder
    )。
format
字段必须与实际字节的容器/编解码器匹配。如果文件保存为
.wav
但实际是mp3格式,会被拒绝或解码错误。不确定时,可使用
ffprobe <file>
确认。

Provider-specific options

服务商特定选项

Provider passthrough goes under
provider.options.<slug>
and is only forwarded when that provider handles the request. Example — Groq's
prompt
for vocabulary hinting:
json
{
  "model": "openai/whisper-large-v3",
  "input_audio": { "data": "UklGRiQA...", "format": "wav" },
  "provider": {
    "options": {
      "groq": {
        "prompt": "Expected vocabulary: OpenRouter, API, transcription"
      }
    }
  }
}
Options keyed by provider slug are forwarded only when that provider matches; other keys are ignored. Check each provider's upstream docs for available passthrough keys.
服务商透传参数放在
provider.options.<slug>
下,仅当该服务商处理请求时才会转发。示例——Groq用于词汇提示的
prompt
参数:
json
{
  "model": "openai/whisper-large-v3",
  "input_audio": { "data": "UklGRiQA...", "format": "wav" },
  "provider": {
    "options": {
      "groq": {
        "prompt": "Expected vocabulary: OpenRouter, API, transcription"
      }
    }
  }
}
按服务商标识命名的选项仅在匹配对应服务商时才会转发,其他标识的选项会被忽略。请查看各服务商的上游文档,了解可用的透传参数。

TypeScript (fetch)

TypeScript (fetch)

typescript
import fs from "fs";

const audio = await fs.promises.readFile("audio.wav");
const data = audio.toString("base64");

const res = await fetch("https://openrouter.ai/api/v1/audio/transcriptions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "google/chirp-3",
    input_audio: { data, format: "wav" },
  }),
});

if (!res.ok) {
  throw new Error(`STT failed (HTTP ${res.status}): ${await res.text()}`);
}

const result = await res.json();
console.log(result.text);
typescript
import fs from "fs";

const audio = await fs.promises.readFile("audio.wav");
const data = audio.toString("base64");

const res = await fetch("https://openrouter.ai/api/v1/audio/transcriptions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "google/chirp-3",
    input_audio: { data, format: "wav" },
  }),
});

if (!res.ok) {
  throw new Error(`STT failed (HTTP ${res.status}): ${await res.text()}`);
}

const result = await res.json();
console.log(result.text);

Python (requests)

Python (requests)

python
import base64
import os
import requests

with open("audio.wav", "rb") as f:
    data = base64.b64encode(f.read()).decode("utf-8")

res = requests.post(
    "https://openrouter.ai/api/v1/audio/transcriptions",
    headers={
        "Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "google/chirp-3",
        "input_audio": {"data": data, "format": "wav"},
    },
)

if not res.ok:
    raise RuntimeError(f"STT failed (HTTP {res.status_code}): {res.text}")

print(res.json()["text"])
python
import base64
import os
import requests

with open("audio.wav", "rb") as f:
    data = base64.b64encode(f.read()).decode("utf-8")

res = requests.post(
    "https://openrouter.ai/api/v1/audio/transcriptions",
    headers={
        "Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "google/chirp-3",
        "input_audio": {"data": data, "format": "wav"},
    },
)

if not res.ok:
    raise RuntimeError(f"STT failed (HTTP {res.status_code}): {res.text}")

print(res.json()["text"])

Troubleshooting

故障排除

Garbled or empty
text
— the
format
field probably doesn't match the actual bytes, or the audio is silent/corrupted. Confirm with
ffprobe audio.wav
.
400 with
"Invalid base64"
or silent failure
data
must be just base64, not a data URI (
data:audio/wav;base64,...
). Strip the prefix if you copied it from a browser
FileReader
.
400 with a
ZodError
— a required field is missing or the wrong type. The body looks like
{"success":false,"error":{"name":"ZodError","message":"[...]"}}
— the nested
message
JSON string names the bad path (commonly
input_audio.data
or
input_audio.format
).
413 / request too large — base64 inflates bytes by ~33%, so a large raw file becomes an even larger JSON payload. Use a smaller source file (compressed format, lower sample rate, or trimmed clip).
Model not found — use the full slug from
/api/v1/models?output_modalities=transcription
(
google/chirp-3
, not
chirp-3
).
转录文本混乱或为空——
format
字段可能与实际字节格式不匹配,或者音频无声/损坏。请使用
ffprobe audio.wav
确认。
400错误,提示
"Invalid base64"
或无响应
——
data
必须仅为base64编码内容,而非数据URI(
data:audio/wav;base64,...
)。如果是从浏览器
FileReader
复制的内容,请去掉前缀。
400错误,提示
ZodError
——缺少必填字段或字段类型错误。响应体格式为
{"success":false,"error":{"name":"ZodError","message":"[...]"}}
——嵌套的
message
字符串会指出错误的路径(通常是
input_audio.data
input_audio.format
)。
413 / 请求过大——base64会使字节体积膨胀约33%,因此大的原始文件会变成更大的JSON负载。请使用更小的源文件(压缩格式、更低采样率或剪辑后的片段)。
模型未找到——请使用
/api/v1/models?output_modalities=transcription
返回的完整标识(
google/chirp-3
,而非
chirp-3
)。

References

参考资料