openrouter-stt
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOpenRouter Speech-to-Text
OpenRouter 语音转文本
Transcribe audio via using . Requires (get one at https://openrouter.ai/keys). If unset, stop and ask.
POST /api/v1/audio/transcriptionscurlOPENROUTER_API_KEYThis endpoint is not OpenAI-compatible. The body is JSON with base64 audio under — not with a field the way OpenAI's works. Do not point the OpenAI SDK at this endpoint; it will send the wrong shape. Use , , or directly.
input_audio: { data, format }multipart/form-datafile/v1/audio/transcriptionscurlfetchrequests使用通过接口转录音频。需要(可在https://openrouter.ai/keys获取)。如果未设置该密钥,请停止操作并询问用户。
curlPOST /api/v1/audio/transcriptionsOPENROUTER_API_KEY此接口与OpenAI不兼容。请求体为JSON格式,其中字段存放base64编码的音频——而非OpenAI的接口所使用的、包含字段的格式。请勿将OpenAI SDK指向此接口,否则会发送错误格式的请求。请直接使用、或调用。
input_audio: { data, format }/v1/audio/transcriptionsfilemultipart/form-datacurlfetchrequestsOne call, JSON back
一次调用,返回JSON
Both request and response are JSON. The response body carries:
- — the transcript.
text - — always includes
usage. Providers additionally report eithercostof audio billed or a token breakdown (seconds,total_tokens,input_tokens), depending on how they price the request. Don't assume both are present.output_tokens
Sample response (duration-priced provider, e.g. ):
google/chirp-3json
{
"text": "I used to rule the world.",
"usage": {
"seconds": 20,
"cost": 0.005333
}
}Sample response (token-priced provider):
json
{
"text": "Hello, this is a test of speech-to-text transcription.",
"usage": {
"total_tokens": 113,
"input_tokens": 83,
"output_tokens": 30,
"cost": 0.000508
}
}请求和响应均为JSON格式。响应体包含:
- —— 转录文本。
text - —— 始终包含
usage字段。根据服务商的定价方式,还会额外报告计费音频时长cost,或令牌明细(seconds、total_tokens、input_tokens)。请勿假设两者同时存在。output_tokens
示例响应(按时长定价的服务商,如):
google/chirp-3json
{
"text": "I used to rule the world.",
"usage": {
"seconds": 20,
"cost": 0.005333
}
}示例响应(按令牌定价的服务商):
json
{
"text": "Hello, this is a test of speech-to-text transcription.",
"usage": {
"total_tokens": 113,
"input_tokens": 83,
"output_tokens": 30,
"cost": 0.000508
}
}Drop-in workflow
即插即用工作流
bash
#!/usr/bin/env bash
set -euo pipefail
MODEL="google/chirp-3"
FORMAT="wav" # wav, mp3, flac, m4a, ogg, webm, aac
AUDIO="audio.wav"
BODY=$(mktemp)
PAYLOAD=$(mktemp)
audio_b64=$(base64 < "$AUDIO" | tr -d '\n')
jq -n --arg model "$MODEL" --arg data "$audio_b64" --arg fmt "$FORMAT" \
'{model: $model, input_audio: {data: $data, format: $fmt}}' > "$PAYLOAD"bash
#!/usr/bin/env bash
set -euo pipefail
MODEL="google/chirp-3"
FORMAT="wav" # wav, mp3, flac, m4a, ogg, webm, aac
AUDIO="audio.wav"
BODY=$(mktemp)
PAYLOAD=$(mktemp)
audio_b64=$(base64 < "$AUDIO" | tr -d '\n')
jq -n --arg model "$MODEL" --arg data "$audio_b64" --arg fmt "$FORMAT" \
'{model: $model, input_audio: {data: $data, format: $fmt}}' > "$PAYLOAD"--data-binary @file keeps the base64 payload off argv (avoids E2BIG / ARG_MAX).
--data-binary @file 可避免base64负载出现在命令行参数中(防止E2BIG / ARG_MAX错误)。
http_code=$(curl -sS -X POST https://openrouter.ai/api/v1/audio/transcriptions
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
--output "$BODY"
-w '%{http_code}'
--data-binary @"$PAYLOAD")
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
--output "$BODY"
-w '%{http_code}'
--data-binary @"$PAYLOAD")
if [[ "$http_code" != "200" ]]; then
echo "STT failed (HTTP $http_code):" >&2
cat "$BODY" >&2
rm -f "$BODY" "$PAYLOAD"
exit 1
fi
jq -r '.text' "$BODY"
rm -f "$BODY" "$PAYLOAD"
undefinedhttp_code=$(curl -sS -X POST https://openrouter.ai/api/v1/audio/transcriptions
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
--output "$BODY"
-w '%{http_code}'
--data-binary @"$PAYLOAD")
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
--output "$BODY"
-w '%{http_code}'
--data-binary @"$PAYLOAD")
if [[ "$http_code" != "200" ]]; then
echo "STT failed (HTTP $http_code):" >&2
cat "$BODY" >&2
rm -f "$BODY" "$PAYLOAD"
exit 1
fi
jq -r '.text' "$BODY"
rm -f "$BODY" "$PAYLOAD"
undefinedDiscovering STT models
发现STT模型
Filter the models endpoint by output modality to list transcription models.
bash
curl -sS "https://openrouter.ai/api/v1/models?output_modalities=transcription" \
| jq '.data[] | {id, name, pricing}'Models are provider-namespaced — use the full slug (, , ), not the short name.
google/chirp-3openai/whisper-1openai/whisper-large-v3通过输出模态过滤模型接口,列出转录模型。
bash
curl -sS "https://openrouter.ai/api/v1/models?output_modalities=transcription" \
| jq '.data[] | {id, name, pricing}'模型由服务商命名空间管理——请使用完整的标识(、、),而非简称。
google/chirp-3openai/whisper-1openai/whisper-large-v3Parameters
参数
| Field | Required | Notes |
|---|---|---|
| yes | Full model slug from |
| yes | Base64-encoded raw audio bytes. Not a data URI — just the base64 payload, no |
| yes | |
| no | ISO-639-1 code ( |
| no | 0–1. Lower is more deterministic. |
| no | Provider passthrough — see below. |
| 字段 | 是否必填 | 说明 |
|---|---|---|
| 是 | 从 |
| 是 | Base64编码的原始音频字节。不是数据URI——仅为base64负载,不要添加 |
| 是 | |
| 否 | ISO-639-1代码( |
| 否 | 0–1之间的值。值越低,结果越具确定性。 |
| 否 | 服务商透传参数——详见下文。 |
Picking an audio format
选择音频格式
- /
wav— uncompressed or lossless. Highest quality; largest uploads.flac - /
mp3/m4a— compressed. Smaller payloads, which matters because base64 inflates bytes by ~33% on top of whatever the file already weighs.aac - /
webm— typical for browser recordings (ogg).MediaRecorder
The field must match the actual container/codec of the bytes. A file saved as that is actually mp3 will be rejected or mis-decoded. When in doubt, confirm with .
format.wavffprobe <file>- /
wav—— 无压缩或无损格式。质量最高,但上传文件体积最大。flac - /
mp3/m4a—— 压缩格式。负载更小,因为base64会使字节体积膨胀约33%,而压缩格式本身已减小了文件大小。aac - /
webm—— 浏览器录制的典型格式(ogg)。MediaRecorder
format.wavffprobe <file>Provider-specific options
服务商特定选项
Provider passthrough goes under and is only forwarded when that provider handles the request. Example — Groq's for vocabulary hinting:
provider.options.<slug>promptjson
{
"model": "openai/whisper-large-v3",
"input_audio": { "data": "UklGRiQA...", "format": "wav" },
"provider": {
"options": {
"groq": {
"prompt": "Expected vocabulary: OpenRouter, API, transcription"
}
}
}
}Options keyed by provider slug are forwarded only when that provider matches; other keys are ignored. Check each provider's upstream docs for available passthrough keys.
服务商透传参数放在下,仅当该服务商处理请求时才会转发。示例——Groq用于词汇提示的参数:
provider.options.<slug>promptjson
{
"model": "openai/whisper-large-v3",
"input_audio": { "data": "UklGRiQA...", "format": "wav" },
"provider": {
"options": {
"groq": {
"prompt": "Expected vocabulary: OpenRouter, API, transcription"
}
}
}
}按服务商标识命名的选项仅在匹配对应服务商时才会转发,其他标识的选项会被忽略。请查看各服务商的上游文档,了解可用的透传参数。
TypeScript (fetch)
TypeScript (fetch)
typescript
import fs from "fs";
const audio = await fs.promises.readFile("audio.wav");
const data = audio.toString("base64");
const res = await fetch("https://openrouter.ai/api/v1/audio/transcriptions", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "google/chirp-3",
input_audio: { data, format: "wav" },
}),
});
if (!res.ok) {
throw new Error(`STT failed (HTTP ${res.status}): ${await res.text()}`);
}
const result = await res.json();
console.log(result.text);typescript
import fs from "fs";
const audio = await fs.promises.readFile("audio.wav");
const data = audio.toString("base64");
const res = await fetch("https://openrouter.ai/api/v1/audio/transcriptions", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "google/chirp-3",
input_audio: { data, format: "wav" },
}),
});
if (!res.ok) {
throw new Error(`STT failed (HTTP ${res.status}): ${await res.text()}`);
}
const result = await res.json();
console.log(result.text);Python (requests)
Python (requests)
python
import base64
import os
import requests
with open("audio.wav", "rb") as f:
data = base64.b64encode(f.read()).decode("utf-8")
res = requests.post(
"https://openrouter.ai/api/v1/audio/transcriptions",
headers={
"Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}",
"Content-Type": "application/json",
},
json={
"model": "google/chirp-3",
"input_audio": {"data": data, "format": "wav"},
},
)
if not res.ok:
raise RuntimeError(f"STT failed (HTTP {res.status_code}): {res.text}")
print(res.json()["text"])python
import base64
import os
import requests
with open("audio.wav", "rb") as f:
data = base64.b64encode(f.read()).decode("utf-8")
res = requests.post(
"https://openrouter.ai/api/v1/audio/transcriptions",
headers={
"Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}",
"Content-Type": "application/json",
},
json={
"model": "google/chirp-3",
"input_audio": {"data": data, "format": "wav"},
},
)
if not res.ok:
raise RuntimeError(f"STT failed (HTTP {res.status_code}): {res.text}")
print(res.json()["text"])Troubleshooting
故障排除
Garbled or empty — the field probably doesn't match the actual bytes, or the audio is silent/corrupted. Confirm with .
textformatffprobe audio.wav400 with or silent failure — must be just base64, not a data URI (). Strip the prefix if you copied it from a browser .
"Invalid base64"datadata:audio/wav;base64,...FileReader400 with a — a required field is missing or the wrong type. The body looks like — the nested JSON string names the bad path (commonly or ).
ZodError{"success":false,"error":{"name":"ZodError","message":"[...]"}}messageinput_audio.datainput_audio.format413 / request too large — base64 inflates bytes by ~33%, so a large raw file becomes an even larger JSON payload. Use a smaller source file (compressed format, lower sample rate, or trimmed clip).
Model not found — use the full slug from (, not ).
/api/v1/models?output_modalities=transcriptiongoogle/chirp-3chirp-3转录文本混乱或为空——字段可能与实际字节格式不匹配,或者音频无声/损坏。请使用确认。
formatffprobe audio.wav400错误,提示或无响应——必须仅为base64编码内容,而非数据URI()。如果是从浏览器复制的内容,请去掉前缀。
"Invalid base64"datadata:audio/wav;base64,...FileReader400错误,提示——缺少必填字段或字段类型错误。响应体格式为——嵌套的字符串会指出错误的路径(通常是或)。
ZodError{"success":false,"error":{"name":"ZodError","message":"[...]"}}messageinput_audio.datainput_audio.format413 / 请求过大——base64会使字节体积膨胀约33%,因此大的原始文件会变成更大的JSON负载。请使用更小的源文件(压缩格式、更低采样率或剪辑后的片段)。
模型未找到——请使用返回的完整标识(,而非)。
/api/v1/models?output_modalities=transcriptiongoogle/chirp-3chirp-3