text-to-speech

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Text-to-Speech — Bulbul

文本转语音 — Bulbul

[!IMPORTANT] Auth:
api-subscription-key
header — NOT
Authorization: Bearer
. Base URL:
https://api.sarvam.ai/v1
[!IMPORTANT] 认证:使用
api-subscription-key
请求头 — 而非
Authorization: Bearer
。基础URL:
https://api.sarvam.ai/v1

Model

模型

bulbul:v3
— 11 languages, 30+ voices (default:
shubh
), REST/HTTP stream/WebSocket.
bulbul:v3
— 支持11种语言、30+种音色(默认音色:
shubh
),兼容REST/HTTP流式传输/WebSocket协议。

Quick Start (Python)

快速开始(Python)

python
from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI()

response = client.text_to_speech.convert(
    text="नमस्ते, आप कैसे हैं?",
    target_language_code="hi-IN",
    model="bulbul:v3",
    speaker="shubh"
)
save(response, "output.wav")
python
from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI()

response = client.text_to_speech.convert(
    text="नमस्ते, आप कैसे हैं?",
    target_language_code="hi-IN",
    model="bulbul:v3",
    speaker="shubh"
)
save(response, "output.wav")

HTTP Stream (lower latency, binary audio)

HTTP Stream (lower latency, binary audio)

chunks = [] for chunk in client.text_to_speech.convert_stream( text="Hello from Sarvam AI", target_language_code="en-IN", speaker="shubh", model="bulbul:v3" ): chunks.append(chunk) audio = b"".join(chunks)
undefined
chunks = [] for chunk in client.text_to_speech.convert_stream( text="Hello from Sarvam AI", target_language_code="en-IN", speaker="shubh", model="bulbul:v3" ): chunks.append(chunk) audio = b"".join(chunks)
undefined

Quick Start (JavaScript/TypeScript)

快速开始(JavaScript/TypeScript)

typescript
import { SarvamAIClient } from "sarvamai";
import { writeFile } from "fs/promises";

const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });

// REST
const response = await client.textToSpeech.convert({
    text: "नमस्ते, आप कैसे हैं?",
    target_language_code: "hi-IN",
    model: "bulbul:v3",
    speaker: "shubh"
});

// HTTP Stream (lower latency, returns BinaryResponse)
const streamResponse = await client.textToSpeech.convertStream({
    text: "Hello from Sarvam AI",
    target_language_code: "en-IN",
    speaker: "shubh",
    model: "bulbul:v3"
});
const bytes = await streamResponse.bytes();
await writeFile("output.wav", bytes);
typescript
import { SarvamAIClient } from "sarvamai";
import { writeFile } from "fs/promises";

const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });

// REST
const response = await client.textToSpeech.convert({
    text: "नमस्ते, आप कैसे हैं?",
    target_language_code: "hi-IN",
    model: "bulbul:v3",
    speaker: "shubh"
});

// HTTP Stream (lower latency, returns BinaryResponse)
const streamResponse = await client.textToSpeech.convertStream({
    text: "Hello from Sarvam AI",
    target_language_code: "en-IN",
    speaker: "shubh",
    model: "bulbul:v3"
});
const bytes = await streamResponse.bytes();
await writeFile("output.wav", bytes);

WebSocket Streaming

WebSocket流式传输

python
import asyncio
from sarvamai import AsyncSarvamAI

async def tts_stream():
    client = AsyncSarvamAI()
    async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
        await ws.configure(target_language_code="hi-IN", speaker="shubh")
        await ws.convert("Your text here")
        await ws.flush()
        async for message in ws:
            pass  # base64 audio chunks

asyncio.run(tts_stream())
python
import asyncio
from sarvamai import AsyncSarvamAI

async def tts_stream():
    client = AsyncSarvamAI()
    async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
        await ws.configure(target_language_code="hi-IN", speaker="shubh")
        await ws.convert("Your text here")
        await ws.flush()
        async for message in ws:
            pass  # base64 audio chunks

asyncio.run(tts_stream())

Character Limits

字符限制

MethodMax Text
REST (
convert
)
2,500 chars
HTTP Stream (
convert_stream
)
3,500 chars
WebSocket2,500 chars/msg
调用方式最大文本长度
REST (
convert
)
2500字符
HTTP流式传输 (
convert_stream
)
3500字符
WebSocket2500字符/消息

Gotchas

注意事项

GotchaDetail
JS method name
client.textToSpeech.convert({...})
and
.convertStream({...})
— camelCase. Stream returns
BinaryResponse
with
.stream()
,
.bytes()
,
.blob()
.
pitch
/
loudness
rejected
SDK accepts these but API returns 400 for v3. Only
pace
(0.5–2.0) works.
v2 voices incompatible
anushka
,
abhilash
,
arya
, etc. don't work with v3. Use
shubh
(default).
Sample rate >24kHz32kHz, 44.1kHz, 48kHz only via REST, not streaming.
REST responseBase64-encoded audio in
response.audios[0]
. Use
sarvamai.play.save()
or
base64.b64decode()
.
Pronunciation dictionary
dict_id
param teaches custom word pronunciations. Create via
client.pronunciation_dictionary.create(file=f)
.
注意点详细说明
JavaScript方法名
client.textToSpeech.convert({...})
.convertStream({...})
— 采用小驼峰命名。流式接口返回
BinaryResponse
,支持
.stream()
.bytes()
.blob()
方法。
pitch
/
loudness
参数被拒绝
SDK支持这两个参数,但v3版本API会返回400错误。仅
pace
参数(取值范围0.5–2.0)可用。
v2版本音色不兼容
anushka
abhilash
arya
等v2音色无法在v3版本中使用。请使用默认音色
shubh
采样率>24kHz32kHz、44.1kHz、48kHz采样率仅支持REST调用,流式传输不支持。
REST接口响应音频内容以Base64编码格式存储在
response.audios[0]
中。可使用
sarvamai.play.save()
方法或
base64.b64decode()
进行解码。
发音词典通过
dict_id
参数可自定义单词发音。可调用
client.pronunciation_dictionary.create(file=f)
创建自定义发音词典。

Full Docs

完整文档

Fetch voice catalog, streaming protocol, pronunciation dictionary CRUD, and codec options from:
如需获取音色列表、流式传输协议、发音词典增删改查操作以及编码选项等内容,请查看: