text-to-speech

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Text-to-Speech — Bulbul

文本转语音 — Bulbul

[!IMPORTANT] Auth:
api-subscription-key
header — NOT
Authorization: Bearer
. Base URL:
https://api.sarvam.ai/v1

[!IMPORTANT] 认证：使用
api-subscription-key
请求头 — 而非
Authorization: Bearer
。基础URL：
https://api.sarvam.ai/v1

Model

模型

bulbul:v3

— 11 languages, 30+ voices (default:

shubh

), REST/HTTP stream/WebSocket.

bulbul:v3

— 支持11种语言、30+种音色（默认音色：

shubh

），兼容REST/HTTP流式传输/WebSocket协议。

Quick Start (Python)

快速开始（Python）

python

from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI()

response = client.text_to_speech.convert(
    text="नमस्ते, आप कैसे हैं?",
    target_language_code="hi-IN",
    model="bulbul:v3",
    speaker="shubh"
)
save(response, "output.wav")

python

from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI()

response = client.text_to_speech.convert(
    text="नमस्ते, आप कैसे हैं?",
    target_language_code="hi-IN",
    model="bulbul:v3",
    speaker="shubh"
)
save(response, "output.wav")

HTTP Stream (lower latency, binary audio)

chunks = [] for chunk in client.text_to_speech.convert_stream( text="Hello from Sarvam AI", target_language_code="en-IN", speaker="shubh", model="bulbul:v3" ): chunks.append(chunk) audio = b"".join(chunks)

undefined

undefined

Quick Start (JavaScript/TypeScript)

快速开始（JavaScript/TypeScript）

typescript

import { SarvamAIClient } from "sarvamai";
import { writeFile } from "fs/promises";

const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });

// REST
const response = await client.textToSpeech.convert({
    text: "नमस्ते, आप कैसे हैं?",
    target_language_code: "hi-IN",
    model: "bulbul:v3",
    speaker: "shubh"
});

// HTTP Stream (lower latency, returns BinaryResponse)
const streamResponse = await client.textToSpeech.convertStream({
    text: "Hello from Sarvam AI",
    target_language_code: "en-IN",
    speaker: "shubh",
    model: "bulbul:v3"
});
const bytes = await streamResponse.bytes();
await writeFile("output.wav", bytes);

typescript

import { SarvamAIClient } from "sarvamai";
import { writeFile } from "fs/promises";

const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });

// REST
const response = await client.textToSpeech.convert({
    text: "नमस्ते, आप कैसे हैं?",
    target_language_code: "hi-IN",
    model: "bulbul:v3",
    speaker: "shubh"
});

// HTTP Stream (lower latency, returns BinaryResponse)
const streamResponse = await client.textToSpeech.convertStream({
    text: "Hello from Sarvam AI",
    target_language_code: "en-IN",
    speaker: "shubh",
    model: "bulbul:v3"
});
const bytes = await streamResponse.bytes();
await writeFile("output.wav", bytes);

WebSocket Streaming

WebSocket流式传输

python

import asyncio
from sarvamai import AsyncSarvamAI

async def tts_stream():
    client = AsyncSarvamAI()
    async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
        await ws.configure(target_language_code="hi-IN", speaker="shubh")
        await ws.convert("Your text here")
        await ws.flush()
        async for message in ws:
            pass  # base64 audio chunks

asyncio.run(tts_stream())

python

import asyncio
from sarvamai import AsyncSarvamAI

async def tts_stream():
    client = AsyncSarvamAI()
    async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
        await ws.configure(target_language_code="hi-IN", speaker="shubh")
        await ws.convert("Your text here")
        await ws.flush()
        async for message in ws:
            pass  # base64 audio chunks

asyncio.run(tts_stream())

Character Limits

字符限制

Method	Max Text
REST ( `convert` )	2,500 chars
HTTP Stream ( `convert_stream` )	3,500 chars
WebSocket	2,500 chars/msg

调用方式	最大文本长度
REST ( `convert` )	2500字符
HTTP流式传输 ( `convert_stream` )	3500字符
WebSocket	2500字符/消息

Gotchas

注意事项

Gotcha	Detail
JS method name	`client.textToSpeech.convert({...})` and `.convertStream({...})` — camelCase. Stream returns `BinaryResponse` with `.stream()` , `.bytes()` , `.blob()` .
`pitch` / `loudness` rejected	SDK accepts these but API returns 400 for v3. Only `pace` (0.5–2.0) works.
v2 voices incompatible	`anushka` , `abhilash` , `arya` , etc. don't work with v3. Use `shubh` (default).
Sample rate >24kHz	32kHz, 44.1kHz, 48kHz only via REST, not streaming.
REST response	Base64-encoded audio in `response.audios[0]` . Use `sarvamai.play.save()` or `base64.b64decode()` .
Pronunciation dictionary	`dict_id` param teaches custom word pronunciations. Create via `client.pronunciation_dictionary.create(file=f)` .

注意点	详细说明
JavaScript方法名	`client.textToSpeech.convert({...})` 和 `.convertStream({...})` — 采用小驼峰命名。流式接口返回 `BinaryResponse` ，支持 `.stream()` 、 `.bytes()` 、 `.blob()` 方法。
`pitch` / `loudness` 参数被拒绝	SDK支持这两个参数，但v3版本API会返回400错误。仅 `pace` 参数（取值范围0.5–2.0）可用。
v2版本音色不兼容	`anushka` 、 `abhilash` 、 `arya` 等v2音色无法在v3版本中使用。请使用默认音色 `shubh` 。
采样率>24kHz	32kHz、44.1kHz、48kHz采样率仅支持REST调用，流式传输不支持。
REST接口响应	音频内容以Base64编码格式存储在 `response.audios[0]` 中。可使用 `sarvamai.play.save()` 方法或 `base64.b64decode()` 进行解码。
发音词典	通过 `dict_id` 参数可自定义单词发音。可调用 `client.pronunciation_dictionary.create(file=f)` 创建自定义发音词典。

Full Docs

完整文档

Fetch voice catalog, streaming protocol, pronunciation dictionary CRUD, and codec options from:

https://docs.sarvam.ai/llms.txt — comprehensive docs index
TTS Overview
Voice Catalog
HTTP Stream
Pronunciation Dictionary
Rate Limits

如需获取音色列表、流式传输协议、发音词典增删改查操作以及编码选项等内容，请查看：

https://docs.sarvam.ai/llms.txt — 完整文档索引
TTS概述
音色列表
HTTP流式传输
发音词典
调用限制