byted-text-to-speech

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Byted-Text-to-Speech Skill

基于火山引擎豆包语音合成（HTTP Chunked/SSE 单向流式-V3）将文本转为语音并保存为音频文件。

Convert text to speech and save as audio files based on VolcEngine Doubao Text-to-Speech (HTTP Chunked/SSE One-way Streaming-V3).

何时使用

When to Use

当用户有以下需求时，优先使用本 skill：

需要把一段文字转成语音、朗读音频
需要生成配音、旁白、播报、有声读物片段
需要将代码注释、文档、文章等内容转为音频便于收听
需要生成多语言语音（中文、英文等）
用户提到「文字转语音」「TTS」「语音合成」「朗读」「配音」「念出来」「读给我听」
用户没有明确说"语音合成"，但任务本质上需要将文本内容转为可播放的音频时

Prioritize using this skill when users have the following needs:

Need to convert a piece of text to speech or reading audio
Need to generate dubbing, narration, broadcasts, or audiobook clips
Need to convert code comments, documents, articles, etc. to audio for easy listening
Need to generate multilingual speech (Chinese, English, etc.)
Users mention terms like "text-to-speech", "TTS", "speech synthesis", "reading aloud", "dubbing", "read it out", or "read to me"
Users don't explicitly mention "speech synthesis", but the task essentially requires converting text content to playable audio

使用前检查

Pre-Use Checks

优先检查是否已配置以下凭证：

```
MODEL_SPEECH_API_KEY
```

如果缺少凭证，打开

references/setup-guide.md

查看开通、申请和配置方式，并给予用户开通建议

First check if the following credentials have been configured:

```
MODEL_SPEECH_API_KEY
```

If credentials are missing, open

references/setup-guide.md

to view the activation, application, and configuration methods, and provide users with activation suggestions

脚本参数

Script Parameters

参数	简写	必填	说明
`--text`	`-t`	是	要合成的文本内容
`--output`	`-o`	否	输出音频文件路径（默认自动生成）
`--speaker`	`-s`	否	发音人，默认 `zh_female_vv_uranus_bigtts` ，音色列表
`--format`		否	音频格式： `mp3` （默认）、 `pcm` 、 `ogg_opus`
`--sample-rate`		否	采样率，如 16000、24000（默认 24000）
`--speech-rate`		否	语速 [-50, 100]，100 代表 2.0 倍速，-50 代表 0.5 倍速，默认 0
`--pitch-rate`		否	音调 [-12, 12]，默认 0
`--loudness-rate`		否	音量 [-50, 100]，100 代表 2.0 倍音量，-50 代表 0.5 倍音量，默认 0
`--bit-rate`		否	比特率，对 mp3 和 ogg_opus 格式生效（如 64000、128000），默认 64000
`--filter-markdown`		否	过滤 markdown 语法（如 `你好` 读为"你好"），默认关闭
`--enable-latex`		否	启用 LaTeX 公式播报（使用 latex_parser v2，自动开启 markdown 过滤），默认关闭

Parameter	Shortcut	Required	Description
`--text`	`-t`	Yes	The text content to be synthesized
`--output`	`-o`	No	Output audio file path (auto-generated by default)
`--speaker`	`-s`	No	Speaker, default `zh_female_vv_uranus_bigtts` , Voice Timbre List
`--format`		No	Audio format: `mp3` (default), `pcm` , `ogg_opus`
`--sample-rate`		No	Sample rate, e.g., 16000, 24000 (default 24000)
`--speech-rate`		No	Speech rate [-50, 100], 100 represents 2.0x speed, -50 represents 0.5x speed, default 0
`--pitch-rate`		No	Pitch [-12, 12], default 0
`--loudness-rate`		No	Loudness [-50, 100], 100 represents 2.0x volume, -50 represents 0.5x volume, default 0
`--bit-rate`		No	Bit rate, valid for mp3 and ogg_opus formats (e.g., 64000, 128000), default 64000
`--filter-markdown`		No	Filter Markdown syntax (e.g., `Hello` is read as "Hello"), disabled by default
`--enable-latex`		No	Enable LaTeX formula broadcasting (uses latex_parser v2, automatically enables Markdown filtering), disabled by default

返回值说明

Return Value Description

脚本输出 JSON，包含：

```
status
```
:
```
"success"
```
或
```
"error"
```
```
local_path
```
: 本地音频文件路径
```
format
```
: 音频格式
```
error
```
: 失败时的错误信息

请将

local_path

或可访问的音频 URL 返回给用户，便于播放或下载。

The script outputs JSON containing:

```
status
```
:
```
"success"
```
or
```
"error"
```
```
local_path
```
: Local audio file path
```
format
```
: Audio format
```
error
```
: Error message when failed

Please return the

local_path

or accessible audio URL to the user for easy playback or download.

错误处理

Error Handling

若报错
```
PermissionError: MODEL_SPEECH_API_KEY ... 需在环境变量中配置
```
：提示用户在 API Key 管理获取并配置
```
MODEL_SPEECH_API_KEY
```
，写入 workspace 下的环境变量文件后重试。
若返回 4xx/5xx 或业务错误码：根据错误信息提示用户检查文本内容、发音人 ID 及账号是否已开通豆包语音服务。

If the error
```
PermissionError: MODEL_SPEECH_API_KEY ... needs to be configured in environment variables
```
occurs: Prompt the user to obtain and configure
```
MODEL_SPEECH_API_KEY
```
in API Key Management, write it to the environment variable file under the workspace, and try again.
If 4xx/5xx or business error codes are returned: Prompt the user to check the text content, speaker ID, and whether the account has activated the Doubao Speech Service based on the error message.

故障排查

Troubleshooting

缺少凭证：打开
```
references/setup-guide.md
```
需要查 API 参数、字段、错误码：打开
```
references/docs-index.md
```
如果脚本返回权限错误，优先检查服务是否已开通、凭证是否有效，给予用户明确的操作指引

Missing credentials: Open
```
references/setup-guide.md
```
Need to check API parameters, fields, error codes: Open
```
references/docs-index.md
```
If the script returns a permission error, first check if the service is activated and the credentials are valid, and provide users with clear operation guidelines

参考资料

Reference Materials

按需打开以下文件，不必默认全部加载：

```
references/setup-guide.md
```
：服务开通、凭证申请、环境变量配置
```
references/docs-index.md
```
：API 文档索引、参数说明、音色列表、错误码速查

Open the following files as needed; there's no need to load all by default:

```
references/setup-guide.md
```
: Service activation, credential application, environment variable configuration
```
references/docs-index.md
```
: API documentation index, parameter descriptions, voice timbre list, quick reference for error codes

示例

Examples

bash

undefined

bash

undefined

基本用法

Basic usage

python scripts/text_to_speech.py -t "欢迎使用火山引擎语音合成服务。"

python scripts/text_to_speech.py -t "Welcome to the VolcEngine Speech Synthesis Service."

指定发音人与输出格式

Specify speaker and output format

python scripts/text_to_speech.py -t "这是一段测试语音。" -s zh_female_vv_uranus_bigtts -o output.mp3 --format mp3

python scripts/text_to_speech.py -t "This is a test speech." -s zh_female_vv_uranus_bigtts -o output.mp3 --format mp3

指定语速与采样率

Specify speech rate and sample rate

python scripts/text_to_speech.py -t "语速和音调可调。" --speech-rate 10 --sample-rate 16000

undefined

python scripts/text_to_speech.py -t "Speech rate and pitch are adjustable." --speech-rate 10 --sample-rate 16000

undefined