openrouter-stt

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

OpenRouter Speech-to-Text

OpenRouter 语音转文本

Transcribe audio via

POST /api/v1/audio/transcriptions

using

curl

. Requires

OPENROUTER_API_KEY

(get one at https://openrouter.ai/keys). If unset, stop and ask.

This endpoint is not OpenAI-compatible. The body is JSON with base64 audio under

input_audio: { data, format }

— not

multipart/form-data

with a

file

field the way OpenAI's

/v1/audio/transcriptions

works. Do not point the OpenAI SDK at this endpoint; it will send the wrong shape. Use

curl

fetch

, or

requests

directly.

使用

curl

通过

POST /api/v1/audio/transcriptions

接口转录音频。需要

OPENROUTER_API_KEY

（可在https://openrouter.ai/keys获取）。如果未设置该密钥，请停止操作并询问用户。

此接口与OpenAI不兼容。请求体为JSON格式，其中

input_audio: { data, format }

字段存放base64编码的音频——而非OpenAI的

/v1/audio/transcriptions

接口所使用的、包含

file

字段的

multipart/form-data

格式。请勿将OpenAI SDK指向此接口，否则会发送错误格式的请求。请直接使用

curl

、

fetch

或

requests

调用。

One call, JSON back

一次调用，返回JSON

Both request and response are JSON. The response body carries:

```
text
```
— the transcript.
```
usage
```
— always includes
```
cost
```
. Providers additionally report either
```
seconds
```
of audio billed or a token breakdown (
```
total_tokens
```
,
```
input_tokens
```
,
```
output_tokens
```
), depending on how they price the request. Don't assume both are present.

Sample response (duration-priced provider, e.g.

google/chirp-3

json

{
  "text": "I used to rule the world.",
  "usage": {
    "seconds": 20,
    "cost": 0.005333
  }
}

Sample response (token-priced provider):

json

{
  "text": "Hello, this is a test of speech-to-text transcription.",
  "usage": {
    "total_tokens": 113,
    "input_tokens": 83,
    "output_tokens": 30,
    "cost": 0.000508
  }
}

请求和响应均为JSON格式。响应体包含：

```
text
```
—— 转录文本。
```
usage
```
—— 始终包含
```
cost
```
字段。根据服务商的定价方式，还会额外报告计费音频时长
```
seconds
```
，或令牌明细（
```
total_tokens
```
、
```
input_tokens
```
、
```
output_tokens
```
）。请勿假设两者同时存在。

示例响应（按时长定价的服务商，如

google/chirp-3

）：

json

{
  "text": "I used to rule the world.",
  "usage": {
    "seconds": 20,
    "cost": 0.005333
  }
}

示例响应（按令牌定价的服务商）：

json

{
  "text": "Hello, this is a test of speech-to-text transcription.",
  "usage": {
    "total_tokens": 113,
    "input_tokens": 83,
    "output_tokens": 30,
    "cost": 0.000508
  }
}

Drop-in workflow

即插即用工作流

bash

#!/usr/bin/env bash
set -euo pipefail

MODEL="google/chirp-3"
FORMAT="wav"                          # wav, mp3, flac, m4a, ogg, webm, aac
AUDIO="audio.wav"
BODY=$(mktemp)
PAYLOAD=$(mktemp)

audio_b64=$(base64 < "$AUDIO" | tr -d '\n')

jq -n --arg model "$MODEL" --arg data "$audio_b64" --arg fmt "$FORMAT" \
  '{model: $model, input_audio: {data: $data, format: $fmt}}' > "$PAYLOAD"

bash

#!/usr/bin/env bash
set -euo pipefail

MODEL="google/chirp-3"
FORMAT="wav"                          # wav, mp3, flac, m4a, ogg, webm, aac
AUDIO="audio.wav"
BODY=$(mktemp)
PAYLOAD=$(mktemp)

audio_b64=$(base64 < "$AUDIO" | tr -d '\n')

jq -n --arg model "$MODEL" --arg data "$audio_b64" --arg fmt "$FORMAT" \
  '{model: $model, input_audio: {data: $data, format: $fmt}}' > "$PAYLOAD"

--data-binary @file keeps the base64 payload off argv (avoids E2BIG / ARG_MAX).

--data-binary @file 可避免base64负载出现在命令行参数中（防止E2BIG / ARG_MAX错误）。

http_code=$(curl -sS -X POST https://openrouter.ai/api/v1/audio/transcriptions
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
--output "$BODY"
-w '%{http_code}'
--data-binary @"$PAYLOAD")

if [[ "$http_code" != "200" ]]; then echo "STT failed (HTTP $http_code):" >&2 cat "$BODY" >&2 rm -f "$BODY" "$PAYLOAD" exit 1 fi

jq -r '.text' "$BODY" rm -f "$BODY" "$PAYLOAD"

undefined

if [[ "$http_code" != "200" ]]; then echo "STT failed (HTTP $http_code):" >&2 cat "$BODY" >&2 rm -f "$BODY" "$PAYLOAD" exit 1 fi

jq -r '.text' "$BODY" rm -f "$BODY" "$PAYLOAD"

undefined

Discovering STT models

发现STT模型

Filter the models endpoint by output modality to list transcription models.

bash

curl -sS "https://openrouter.ai/api/v1/models?output_modalities=transcription" \
  | jq '.data[] | {id, name, pricing}'

Models are provider-namespaced — use the full slug (

google/chirp-3

openai/whisper-1

openai/whisper-large-v3

), not the short name.

通过输出模态过滤模型接口，列出转录模型。

bash

curl -sS "https://openrouter.ai/api/v1/models?output_modalities=transcription" \
  | jq '.data[] | {id, name, pricing}'

模型由服务商命名空间管理——请使用完整的标识（

google/chirp-3

、

openai/whisper-1

、

openai/whisper-large-v3

），而非简称。

Parameters

参数

Field	Required	Notes
`model`	yes	Full model slug from `/api/v1/models?output_modalities=transcription` .
`input_audio.data`	yes	Base64-encoded raw audio bytes. Not a data URI — just the base64 payload, no `data:audio/...;base64,` prefix.
`input_audio.format`	yes	`wav` , `mp3` , `flac` , `m4a` , `ogg` , `webm` , or `aac` . Must match the actual bytes. Support varies by provider.
`language`	no	ISO-639-1 code ( `en` , `ja` , `fr` ). Auto-detected if omitted.
`temperature`	no	0–1. Lower is more deterministic.
`provider`	no	Provider passthrough — see below.

字段	是否必填	说明
`model`	是	从 `/api/v1/models?output_modalities=transcription` 获取的完整模型标识。
`input_audio.data`	是	Base64编码的原始音频字节。不是数据URI——仅为base64负载，不要添加 `data:audio/...;base64,` 前缀。
`input_audio.format`	是	`wav` 、 `mp3` 、 `flac` 、 `m4a` 、 `ogg` 、 `webm` 或 `aac` 。必须与实际字节的格式匹配。不同服务商支持的格式可能不同。
`language`	否	ISO-639-1代码（ `en` 、 `ja` 、 `fr` ）。若省略则自动检测。
`temperature`	否	0–1之间的值。值越低，结果越具确定性。
`provider`	否	服务商透传参数——详见下文。

Picking an audio format

选择音频格式

wav
/ flac
— uncompressed or lossless. Highest quality; largest uploads.
mp3
/ m4a
/ aac
— compressed. Smaller payloads, which matters because base64 inflates bytes by ~33% on top of whatever the file already weighs.
webm
/ ogg
— typical for browser recordings (
```
MediaRecorder
```
).

The

format

field must match the actual container/codec of the bytes. A file saved as

.wav

that is actually mp3 will be rejected or mis-decoded. When in doubt, confirm with

ffprobe <file>

wav
/ flac
—— 无压缩或无损格式。质量最高，但上传文件体积最大。
mp3
/ m4a
/ aac
—— 压缩格式。负载更小，因为base64会使字节体积膨胀约33%，而压缩格式本身已减小了文件大小。
webm
/ ogg
—— 浏览器录制的典型格式（
```
MediaRecorder
```
）。

format

字段必须与实际字节的容器/编解码器匹配。如果文件保存为

.wav

但实际是mp3格式，会被拒绝或解码错误。不确定时，可使用

ffprobe <file>

确认。

Provider-specific options

服务商特定选项

Provider passthrough goes under

provider.options.<slug>

and is only forwarded when that provider handles the request. Example — Groq's

prompt

for vocabulary hinting:

json

{
  "model": "openai/whisper-large-v3",
  "input_audio": { "data": "UklGRiQA...", "format": "wav" },
  "provider": {
    "options": {
      "groq": {
        "prompt": "Expected vocabulary: OpenRouter, API, transcription"
      }
    }
  }
}

Options keyed by provider slug are forwarded only when that provider matches; other keys are ignored. Check each provider's upstream docs for available passthrough keys.

服务商透传参数放在

provider.options.<slug>

下，仅当该服务商处理请求时才会转发。示例——Groq用于词汇提示的

prompt

参数：

json

{
  "model": "openai/whisper-large-v3",
  "input_audio": { "data": "UklGRiQA...", "format": "wav" },
  "provider": {
    "options": {
      "groq": {
        "prompt": "Expected vocabulary: OpenRouter, API, transcription"
      }
    }
  }
}

按服务商标识命名的选项仅在匹配对应服务商时才会转发，其他标识的选项会被忽略。请查看各服务商的上游文档，了解可用的透传参数。

TypeScript (fetch)

typescript

import fs from "fs";

const audio = await fs.promises.readFile("audio.wav");
const data = audio.toString("base64");

const res = await fetch("https://openrouter.ai/api/v1/audio/transcriptions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "google/chirp-3",
    input_audio: { data, format: "wav" },
  }),
});

if (!res.ok) {
  throw new Error(`STT failed (HTTP ${res.status}): ${await res.text()}`);
}

const result = await res.json();
console.log(result.text);

typescript

import fs from "fs";

const audio = await fs.promises.readFile("audio.wav");
const data = audio.toString("base64");

const res = await fetch("https://openrouter.ai/api/v1/audio/transcriptions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "google/chirp-3",
    input_audio: { data, format: "wav" },
  }),
});

if (!res.ok) {
  throw new Error(`STT failed (HTTP ${res.status}): ${await res.text()}`);
}

const result = await res.json();
console.log(result.text);

Python (requests)

python

import base64
import os
import requests

with open("audio.wav", "rb") as f:
    data = base64.b64encode(f.read()).decode("utf-8")

res = requests.post(
    "https://openrouter.ai/api/v1/audio/transcriptions",
    headers={
        "Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "google/chirp-3",
        "input_audio": {"data": data, "format": "wav"},
    },
)

if not res.ok:
    raise RuntimeError(f"STT failed (HTTP {res.status_code}): {res.text}")

print(res.json()["text"])

python

import base64
import os
import requests

with open("audio.wav", "rb") as f:
    data = base64.b64encode(f.read()).decode("utf-8")

res = requests.post(
    "https://openrouter.ai/api/v1/audio/transcriptions",
    headers={
        "Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "google/chirp-3",
        "input_audio": {"data": data, "format": "wav"},
    },
)

if not res.ok:
    raise RuntimeError(f"STT failed (HTTP {res.status_code}): {res.text}")

print(res.json()["text"])

Troubleshooting

故障排除

Garbled or empty
text
— the

format

field probably doesn't match the actual bytes, or the audio is silent/corrupted. Confirm with

ffprobe audio.wav

400 with
"Invalid base64"
or silent failure —

data

must be just base64, not a data URI (

data:audio/wav;base64,...

). Strip the prefix if you copied it from a browser

FileReader

400 with a
ZodError
— a required field is missing or the wrong type. The body looks like

{"success":false,"error":{"name":"ZodError","message":"[...]"}}

— the nested

message

JSON string names the bad path (commonly

input_audio.data

input_audio.format

413 / request too large — base64 inflates bytes by ~33%, so a large raw file becomes an even larger JSON payload. Use a smaller source file (compressed format, lower sample rate, or trimmed clip).

Model not found — use the full slug from

/api/v1/models?output_modalities=transcription

(

google/chirp-3

, not

chirp-3

转录文本混乱或为空——

format

字段可能与实际字节格式不匹配，或者音频无声/损坏。请使用

ffprobe audio.wav

确认。

400错误，提示
"Invalid base64"
或无响应——

data

必须仅为base64编码内容，而非数据URI（

data:audio/wav;base64,...

）。如果是从浏览器

FileReader

复制的内容，请去掉前缀。

400错误，提示
ZodError
——缺少必填字段或字段类型错误。响应体格式为

{"success":false,"error":{"name":"ZodError","message":"[...]"}}

——嵌套的

message

字符串会指出错误的路径（通常是

input_audio.data

或

input_audio.format

）。

413 / 请求过大——base64会使字节体积膨胀约33%，因此大的原始文件会变成更大的JSON负载。请使用更小的源文件（压缩格式、更低采样率或剪辑后的片段）。

模型未找到——请使用

/api/v1/models?output_modalities=transcription

返回的完整标识（

google/chirp-3

，而非

chirp-3

）。

openrouter-stt

Original

Translation

OpenRouter Speech-to-Text

OpenRouter 语音转文本

One call, JSON back

一次调用，返回JSON

Drop-in workflow

即插即用工作流

--data-binary @file keeps the base64 payload off argv (avoids E2BIG / ARG_MAX).

--data-binary @file 可避免base64负载出现在命令行参数中（防止E2BIG / ARG_MAX错误）。

Discovering STT models

发现STT模型

Parameters

参数

Picking an audio format

选择音频格式

Provider-specific options

服务商特定选项

TypeScript (fetch)

TypeScript (fetch)

Python (requests)

Python (requests)

Troubleshooting

故障排除

References

参考资料