byted-text-to-speech

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Byted-Text-to-Speech Skill

Byted-Text-to-Speech Skill

基于火山引擎豆包语音合成(HTTP Chunked/SSE 单向流式-V3)将文本转为语音并保存为音频文件。
Convert text to speech and save as audio files based on VolcEngine Doubao Text-to-Speech (HTTP Chunked/SSE One-way Streaming-V3).

何时使用

When to Use

当用户有以下需求时,优先使用本 skill:
  • 需要把一段文字转成语音、朗读音频
  • 需要生成配音、旁白、播报、有声读物片段
  • 需要将代码注释、文档、文章等内容转为音频便于收听
  • 需要生成多语言语音(中文、英文等)
  • 用户提到「文字转语音」「TTS」「语音合成」「朗读」「配音」「念出来」「读给我听」
  • 用户没有明确说"语音合成",但任务本质上需要将文本内容转为可播放的音频时
Prioritize using this skill when users have the following needs:
  • Need to convert a piece of text to speech or reading audio
  • Need to generate dubbing, narration, broadcasts, or audiobook clips
  • Need to convert code comments, documents, articles, etc. to audio for easy listening
  • Need to generate multilingual speech (Chinese, English, etc.)
  • Users mention terms like "text-to-speech", "TTS", "speech synthesis", "reading aloud", "dubbing", "read it out", or "read to me"
  • Users don't explicitly mention "speech synthesis", but the task essentially requires converting text content to playable audio

使用前检查

Pre-Use Checks

优先检查是否已配置以下凭证:
  • MODEL_SPEECH_API_KEY
如果缺少凭证,打开
references/setup-guide.md
查看开通、申请和配置方式,并给予用户开通建议
First check if the following credentials have been configured:
  • MODEL_SPEECH_API_KEY
If credentials are missing, open
references/setup-guide.md
to view the activation, application, and configuration methods, and provide users with activation suggestions

脚本参数

Script Parameters

参数简写必填说明
--text
-t
要合成的文本内容
--output
-o
输出音频文件路径(默认自动生成)
--speaker
-s
发音人,默认
zh_female_vv_uranus_bigtts
音色列表
--format
音频格式:
mp3
(默认)、
pcm
ogg_opus
--sample-rate
采样率,如 16000、24000(默认 24000)
--speech-rate
语速 [-50, 100],100 代表 2.0 倍速,-50 代表 0.5 倍速,默认 0
--pitch-rate
音调 [-12, 12],默认 0
--loudness-rate
音量 [-50, 100],100 代表 2.0 倍音量,-50 代表 0.5 倍音量,默认 0
--bit-rate
比特率,对 mp3 和 ogg_opus 格式生效(如 64000、128000),默认 64000
--filter-markdown
过滤 markdown 语法(如
**你好**
读为"你好"),默认关闭
--enable-latex
启用 LaTeX 公式播报(使用 latex_parser v2,自动开启 markdown 过滤),默认关闭
ParameterShortcutRequiredDescription
--text
-t
YesThe text content to be synthesized
--output
-o
NoOutput audio file path (auto-generated by default)
--speaker
-s
NoSpeaker, default
zh_female_vv_uranus_bigtts
, Voice Timbre List
--format
NoAudio format:
mp3
(default),
pcm
,
ogg_opus
--sample-rate
NoSample rate, e.g., 16000, 24000 (default 24000)
--speech-rate
NoSpeech rate [-50, 100], 100 represents 2.0x speed, -50 represents 0.5x speed, default 0
--pitch-rate
NoPitch [-12, 12], default 0
--loudness-rate
NoLoudness [-50, 100], 100 represents 2.0x volume, -50 represents 0.5x volume, default 0
--bit-rate
NoBit rate, valid for mp3 and ogg_opus formats (e.g., 64000, 128000), default 64000
--filter-markdown
NoFilter Markdown syntax (e.g.,
**Hello**
is read as "Hello"), disabled by default
--enable-latex
NoEnable LaTeX formula broadcasting (uses latex_parser v2, automatically enables Markdown filtering), disabled by default

返回值说明

Return Value Description

脚本输出 JSON,包含:
  • status
    :
    "success"
    "error"
  • local_path
    : 本地音频文件路径
  • format
    : 音频格式
  • error
    : 失败时的错误信息
请将
local_path
或可访问的音频 URL 返回给用户,便于播放或下载。
The script outputs JSON containing:
  • status
    :
    "success"
    or
    "error"
  • local_path
    : Local audio file path
  • format
    : Audio format
  • error
    : Error message when failed
Please return the
local_path
or accessible audio URL to the user for easy playback or download.

错误处理

Error Handling

  • 若报错
    PermissionError: MODEL_SPEECH_API_KEY ... 需在环境变量中配置
    :提示用户在 API Key 管理 获取并配置
    MODEL_SPEECH_API_KEY
    ,写入 workspace 下的环境变量文件后重试。
  • 若返回 4xx/5xx 或业务错误码:根据错误信息提示用户检查文本内容、发音人 ID 及账号是否已开通豆包语音服务。
  • If the error
    PermissionError: MODEL_SPEECH_API_KEY ... needs to be configured in environment variables
    occurs: Prompt the user to obtain and configure
    MODEL_SPEECH_API_KEY
    in API Key Management, write it to the environment variable file under the workspace, and try again.
  • If 4xx/5xx or business error codes are returned: Prompt the user to check the text content, speaker ID, and whether the account has activated the Doubao Speech Service based on the error message.

故障排查

Troubleshooting

  • 缺少凭证:打开
    references/setup-guide.md
  • 需要查 API 参数、字段、错误码:打开
    references/docs-index.md
  • 如果脚本返回权限错误,优先检查服务是否已开通、凭证是否有效,给予用户明确的操作指引
  • Missing credentials: Open
    references/setup-guide.md
  • Need to check API parameters, fields, error codes: Open
    references/docs-index.md
  • If the script returns a permission error, first check if the service is activated and the credentials are valid, and provide users with clear operation guidelines

参考资料

Reference Materials

按需打开以下文件,不必默认全部加载:
  • references/setup-guide.md
    :服务开通、凭证申请、环境变量配置
  • references/docs-index.md
    :API 文档索引、参数说明、音色列表、错误码速查
Open the following files as needed; there's no need to load all by default:
  • references/setup-guide.md
    : Service activation, credential application, environment variable configuration
  • references/docs-index.md
    : API documentation index, parameter descriptions, voice timbre list, quick reference for error codes

示例

Examples

bash
undefined
bash
undefined

基本用法

Basic usage

python scripts/text_to_speech.py -t "欢迎使用火山引擎语音合成服务。"
python scripts/text_to_speech.py -t "Welcome to the VolcEngine Speech Synthesis Service."

指定发音人与输出格式

Specify speaker and output format

python scripts/text_to_speech.py -t "这是一段测试语音。" -s zh_female_vv_uranus_bigtts -o output.mp3 --format mp3
python scripts/text_to_speech.py -t "This is a test speech." -s zh_female_vv_uranus_bigtts -o output.mp3 --format mp3

指定语速与采样率

Specify speech rate and sample rate

python scripts/text_to_speech.py -t "语速和音调可调。" --speech-rate 10 --sample-rate 16000
undefined
python scripts/text_to_speech.py -t "Speech rate and pitch are adjustable." --speech-rate 10 --sample-rate 16000
undefined