tts
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWhen to Use
适用场景
- User wants to convert text to spoken audio
- User asks for "read aloud", "TTS", "text to speech", "voice narration"
- User says "朗读", "配音", "语音合成"
- User wants multi-speaker scripted audio or dialogue
- 用户希望将文本转换为语音音频
- 用户提出“read aloud”、“TTS”、“text to speech”、“voice narration”需求
- 用户说出“朗读”、“配音”、“语音合成”指令
- 用户需要多角色脚本音频或对话音频
When NOT to Use
不适用场景
- User wants a podcast-style discussion with topic exploration (use )
/podcast - User wants an explainer video with visuals (use )
/explainer - User wants to generate an image (use )
/image-gen
- 用户需要带有主题探讨的播客式讨论(请使用)
/podcast - 用户需要带视觉效果的解说视频(请使用)
/explainer - 用户需要生成图片(请使用)
/image-gen
Purpose
功能目标
Convert text into natural-sounding speech audio. Two paths:
- Quick mode (): Single voice, low-latency, sync MP3 stream. For casual chat, reading snippets, instant audio.
/v1/tts - Script mode (): Multi-speaker, per-segment voice assignment. For dialogue, audiobooks, scripted content.
/v1/speech
将文本转换为自然流畅的语音音频。提供两种模式:
- 快速模式 ():单语音角色,低延迟,同步MP3流。适用于日常聊天、片段朗读、即时音频生成。
/v1/tts - 脚本模式 ():多语音角色,支持按片段分配说话人。适用于对话内容、有声书、脚本化内容。
/v1/speech
Hard Constraints
硬性约束
- No shell scripts. Construct curl commands from the API reference files listed in Resources
- Always read for API key and headers
shared/authentication.md - Follow for errors and interaction patterns
shared/common-patterns.md - Never hardcode speaker IDs — always fetch from the speakers API
- Always read config following before any interaction
shared/config-pattern.md - Always follow for speaker selection (text table + free-text input)
shared/speaker-selection.md - Never save files to or
~/Downloads/as primary output — use/tmp/.listenhub/tts/
- 禁止使用shell脚本。请根据资源中列出的API参考文件构造curl命令
- 务必阅读获取API密钥和请求头信息
shared/authentication.md - 遵循中的错误处理和交互模式
shared/common-patterns.md - 绝对不要硬编码说话人ID——务必从speakers API获取
- 在进行任何交互前,务必遵循读取配置
shared/config-pattern.md - 说话人选择务必遵循(文本表格+自由文本输入)
shared/speaker-selection.md - 不要将文件保存到或
~/Downloads/作为主要输出——使用/tmp/.listenhub/tts/
Mode Detection
模式检测
Determine the mode from the user's input automatically before asking any questions:
| Signal | Mode |
|---|---|
| "多角色", "脚本", "对话", "script", "dialogue", "multi-speaker" | Script |
| Multiple characters mentioned by name or role | Script |
| Input contains structured segments (A: ..., B: ...) | Script |
| Single paragraph of text, no character markers | Quick |
| "读一下", "read this", "TTS", "朗读" with plain text | Quick |
| Ambiguous | Quick (default) |
在提问前自动根据用户输入判断模式:
| 触发信号 | 模式 |
|---|---|
| “多角色”、“脚本”、“对话”、“script”、“dialogue”、“multi-speaker” | 脚本模式 |
| 输入中提到多个带名称或角色的人物 | 脚本模式 |
| 输入包含结构化片段(A: ..., B: ...) | 脚本模式 |
| 单段落文本,无角色标记 | 快速模式 |
| 搭配纯文本的“读一下”、“read this”、“TTS”、“朗读” | 快速模式 |
| 信号不明确 | 默认使用快速模式 |
Interaction Flow
交互流程
Step -1: API Key Check
步骤-1:API密钥检查
Follow § API Key Check. If the key is missing, stop immediately.
shared/config-pattern.md遵循中的API密钥检查章节。如果密钥缺失,立即停止操作。
shared/config-pattern.mdStep 0: Config Setup
步骤0:配置初始化
Follow Step 0.
shared/config-pattern.mdIf file doesn't exist — ask location, then create immediately:
bash
mkdir -p ".listenhub/tts"
echo '{"outputDir":".listenhub","outputMode":"inline","language":null,"defaultSpeakers":{}}' > ".listenhub/tts/config.json"
CONFIG_PATH=".listenhub/tts/config.json"遵循的步骤0。
shared/config-pattern.md如果配置文件不存在——询问保存位置,然后立即创建:
bash
mkdir -p ".listenhub/tts"
echo '{"outputDir":".listenhub","outputMode":"inline","language":null,"defaultSpeakers":{}}' > ".listenhub/tts/config.json"
CONFIG_PATH=".listenhub/tts/config.json"(or $HOME/.listenhub/tts/config.json for global)
(or $HOME/.listenhub/tts/config.json for global)
Then run **Setup Flow** below.
**If file exists** — read config, display summary, and confirm:当前配置 (tts):
输出方式:{inline / download / both}
语言偏好:{zh / en / 未设置}
默认主播:{speakerName / 未设置}
Ask: "使用已保存的配置?" → **确认,直接继续** / **重新配置**然后执行下方的**配置流程**。
**如果配置文件已存在**——读取配置,显示摘要并请用户确认:当前配置 (tts):
输出方式:{inline / download / both}
语言偏好:{zh / en / 未设置}
默认主播:{speakerName / 未设置}
询问:“使用已保存的配置?” → **确认,直接继续** / **重新配置**Setup Flow (first run or reconfigure)
配置流程(首次运行或重新配置)
Ask these questions in order, then save all answers to config at once:
-
outputMode: Follow§ Setup Flow Question.
shared/output-mode.md -
Language (optional): "默认语言?"
- "中文 (zh)"
- "English (en)"
- "每次手动选择" → keep
null
After collecting answers, save immediately:
bash
undefined按顺序询问以下问题,然后一次性将所有答案保存到配置文件:
-
outputMode:遵循中的配置流程提问。
shared/output-mode.md -
语言(可选):“默认语言?"
- "中文 (zh)"
- "English (en)"
- "每次手动选择" → 保持
null
收集完答案后立即保存:
bash
undefinedSave outputMode; only update language if user picked one
Save outputMode; only update language if user picked one
Follow shared/output-mode.md § Save to Config
Follow shared/output-mode.md § Save to Config
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
If language was chosen (not "每次手动选择"):
If language was chosen (not "每次手动选择"):
NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "zh" '. + {"language": $lang}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
Note: `defaultSpeakers` are saved after speaker selection in Step 3 — not here.NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "zh" '. + {"language": $lang}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
注意:`defaultSpeakers`会在步骤3的说话人选择后保存——不在此步骤保存。Quick Mode — POST /v1/tts
POST /v1/tts快速模式 — POST /v1/tts
POST /v1/ttsStep 1: Extract text
Get the text to convert. If the user hasn't provided it, ask:
"What text would you like me to read aloud?"
Step 2: Determine voice
- If is set → use it silently (skip to Step 4)
config.defaultSpeakers.{language}[0] - Otherwise: , then follow
GET /speakers/list?language={detected-language}(text table + free-text input)shared/speaker-selection.md
Step 3: Save preference
Question: "Save {voice name} as your default voice for {language}?"
Options:
- "Yes" — update .listenhub/tts/config.json
- "No" — use for this session onlyStep 4: Confirm
Ready to generate:
Text: "{first 80 chars}..."
Voice: {voice name}
Proceed?Step 5: Generate
bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": "...", "voice": "..."}' \
--output /tmp/tts-output.mp3Step 6: Present result
Read from config. Follow for behavior.
OUTPUT_MODEshared/output-mode.mdUse a timestamped jobId:
$(date +%s)inlinebothaudioUrlbash
JOB_ID=$(date +%s)
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": "...", "voice": "..."}' \
--output /tmp/tts-${JOB_ID}.mp3Then use the Read tool on .
/tmp/tts-{jobId}.mp3Present:
Audio generated!downloadbothbash
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/tts/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": "...", "voice": "..."}' \
--output "${JOB_DIR}/${JOB_ID}.mp3"Present:
Audio generated!
已下载到 .listenhub/tts/{YYYY-MM-DD}-{jobId}/:
{jobId}.mp3步骤1:提取文本
获取需要转换的文本。如果用户未提供,询问:
"请问您需要朗读什么文本?"
步骤2:确定语音角色
- 如果已设置 → 自动使用该角色(跳至步骤4)
config.defaultSpeakers.{language}[0] - 否则:调用,然后遵循
GET /speakers/list?language={detected-language}(文本表格+自由文本输入)shared/speaker-selection.md
步骤3:保存偏好设置
问题:“是否将{voice name}设为{language}的默认语音角色?"
选项:
- "是" — 更新.listenhub/tts/config.json
- "否" — 仅本次会话使用步骤4:确认参数
准备生成音频:
文本:"{前80个字符}..."
语音角色:{voice name}
是否继续?步骤5:生成音频
bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": "...", "voice": "..."}' \
--output /tmp/tts-output.mp3步骤6:展示结果
从配置中读取,遵循的行为规则。
OUTPUT_MODEshared/output-mode.md使用带时间戳的任务ID:
$(date +%s)inlinebothaudioUrlbash
JOB_ID=$(date +%s)
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": "...", "voice": "..."}' \
--output /tmp/tts-${JOB_ID}.mp3然后使用Read工具读取。
/tmp/tts-{jobId}.mp3展示内容:
音频已生成!downloadbothbash
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/tts/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": "...", "voice": "..."}' \
--output "${JOB_DIR}/${JOB_ID}.mp3"展示内容:
音频已生成!
已下载至 .listenhub/tts/{YYYY-MM-DD}-{jobId}/:
{jobId}.mp3Script Mode — POST /v1/speech
POST /v1/speech脚本模式 — POST /v1/speech
POST /v1/speechStep 1: Get scripts
Determine whether the user already has a scripts array:
-
Already provided (JSON or clear segments): parse and display for confirmation
-
Not yet provided: help the user structure segments. Ask:"Please provide the script with speaker assignments. Format: each line as. I'll convert it."
SpeakerName: text contentOnce the user provides the script, parse it into theJSON format.scripts
Step 2: Assign voices per character
For each unique character in the script:
- If has saved voices → auto-assign silently (one per character in order)
config.defaultSpeakers.{language} - Otherwise: fetch and follow
GET /speakers/list?language={detected-language}for each charactershared/speaker-selection.md
Step 3: Save preferences
After all voices are assigned (if any were new):
Question: "Save these voice assignments for future sessions?"
Options:
- "Yes" — update defaultSpeakers in .listenhub/tts/config.json
- "No" — use for this session onlyStep 4: Confirm
Ready to generate:
Characters:
{name}: {voice}
{name}: {voice}
Segments: {count}
Title: (auto-generated)
Proceed?Step 5: Generate
Write the request body to a temp file, then submit:
bash
undefined步骤1:获取脚本内容
判断用户是否已提供脚本数组:
-
已提供(JSON格式或清晰的分段内容):解析后展示请用户确认
-
未提供:帮助用户整理分段内容。询问:"请提供带有说话人分配的脚本,格式为:每行。我会为您转换。"
SpeakerName: 文本内容用户提供脚本后,将其解析为JSON格式。scripts
步骤2:为每个角色分配语音
对于脚本中的每个独特角色:
- 如果中已保存语音角色 → 自动分配(按顺序为每个角色分配一个)
config.defaultSpeakers.{language} - 否则:调用,然后为每个角色遵循
GET /speakers/list?language={detected-language}选择语音shared/speaker-selection.md
步骤3:保存偏好设置
完成所有语音分配后(如果有新分配的角色):
问题:“是否保存这些语音分配以便后续会话使用?"
选项:
- "是" — 更新.listenhub/tts/config.json中的defaultSpeakers
- "否" — 仅本次会话使用步骤4:确认参数
准备生成音频:
角色列表:
{name}: {voice}
{name}: {voice}
片段数量:{count}
标题:(自动生成)
是否继续?步骤5:生成音频
将请求体写入临时文件,然后提交:
bash
undefinedWrite request to temp file
Write request to temp file
cat > /tmp/lh-speech-request.json << 'ENDJSON'
{
"scripts": [
{"content": "...", "speakerId": "..."},
{"content": "...", "speakerId": "..."}
]
}
ENDJSON
cat > /tmp/lh-speech-request.json << 'ENDJSON'
{
"scripts": [
{"content": "...", "speakerId": "..."},
{"content": "...", "speakerId": "..."}
]
}
ENDJSON
Submit
Submit
curl -sS -X POST "https://api.marswave.ai/openapi/v1/speech"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d @/tmp/lh-speech-request.json
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d @/tmp/lh-speech-request.json
rm /tmp/lh-speech-request.json
**Step 6: Present result**
Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior.
**`inline` or `both`**: Display the `audioUrl` and `subtitlesUrl` as clickable links.
Present:Audio generated!
在线收听:{audioUrl}
字幕:{subtitlesUrl}
时长:{audioDuration / 1000}s
消耗积分:{credits}
**`download` or `both`**: Also download the file.
```bash
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/tts/${DATE}-{jobId}"
mkdir -p "$JOB_DIR"
curl -sS -o "${JOB_DIR}/{jobId}.mp3" "{audioUrl}"Present the download path in addition to the above summary.
curl -sS -X POST "https://api.marswave.ai/openapi/v1/speech"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d @/tmp/lh-speech-request.json
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d @/tmp/lh-speech-request.json
rm /tmp/lh-speech-request.json
**步骤6:展示结果**
从配置中读取`OUTPUT_MODE`,遵循`shared/output-mode.md`的行为规则。
**`inline`或`both`模式**:将`audioUrl`和`subtitlesUrl`显示为可点击链接。
展示内容:音频已生成!
在线收听:{audioUrl}
字幕:{subtitlesUrl}
时长:{audioDuration / 1000}s
消耗积分:{credits}
**`download`或`both`模式**:同时下载文件。
```bash
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/tts/${DATE}-{jobId}"
mkdir -p "$JOB_DIR"
curl -sS -o "${JOB_DIR}/{jobId}.mp3" "{audioUrl}"除上述摘要外,还需展示下载路径。
Updating Config
更新配置
When saving preferences, merge into — do not overwrite unchanged keys.
Follow the merge pattern in .
.listenhub/tts/config.jsonshared/config-pattern.md- Quick voice: set to the selected
defaultSpeakers.{language}[0]speakerId - Script voices: set to the full array assigned this session
defaultSpeakers.{language} - Language: set if the user explicitly specifies it
language
保存偏好设置时,将内容合并到——不要覆盖未修改的键。遵循中的合并模式。
.listenhub/tts/config.jsonshared/config-pattern.md- 快速模式语音角色:将设为所选的
defaultSpeakers.{language}[0]speakerId - 脚本模式语音角色:将设为本会话分配的完整角色数组
defaultSpeakers.{language} - 语言:如果用户明确指定,设置字段
language
API Reference
API参考
- TTS & Speech endpoints:
shared/api-tts.md - Speaker list:
shared/api-speakers.md - Speaker selection guide:
shared/speaker-selection.md - Error handling: § Error Handling
shared/common-patterns.md - Long text input: § Long Text Input
shared/common-patterns.md
- TTS与Speech接口:
shared/api-tts.md - 说话人列表:
shared/api-speakers.md - 说话人选择指南:
shared/speaker-selection.md - 错误处理:中的错误处理章节
shared/common-patterns.md - 长文本输入:中的长文本输入章节
shared/common-patterns.md
Composability
组合性
- Invokes: speakers API (for speaker selection)
- Invoked by: explainer (for voiceover)
- 调用:speakers API(用于说话人选择)
- 被调用:explainer(用于旁白生成)
Examples
示例
Quick mode:
"TTS this: The server will be down for maintenance at midnight."
- Detect: Quick mode (plain text, "TTS this")
- Read config: is
quickVoicenull - Fetch speakers, user picks "Yuanye"
- Ask to save → yes → update config
- with
POST /v1/tts+inputvoice - Present:
/tmp/tts-output.mp3
Script mode:
"帮我做一段双人对话配音,A说:欢迎大家,B说:谢谢邀请"
- Detect: Script mode ("双人对话")
- Parse segments: A → "欢迎大家", B → "谢谢邀请"
- Read config: empty
scriptVoices - Fetch speakers, assign A and B voices
zh - Ask to save → yes → update config
- with scripts array
POST /v1/speech - Present: ,
audioUrl, durationsubtitlesUrl
快速模式示例:
"TTS这段文本:服务器将于午夜进行维护。"
- 检测:快速模式(纯文本,含“TTS this”)
- 读取配置:为
quickVoicenull - 获取说话人列表,用户选择“Yuanye”
- 询问是否保存 → 是 → 更新配置
- 携带和
input参数调用voicePOST /v1/tts - 展示结果:
/tmp/tts-output.mp3
脚本模式示例:
"帮我做一段双人对话配音,A说:欢迎大家,B说:谢谢邀请"
- 检测:脚本模式(含“双人对话”)
- 解析片段:A → "欢迎大家",B → "谢谢邀请"
- 读取配置:为空
scriptVoices - 获取中文说话人列表,为A和B分配语音角色
- 询问是否保存 → 是 → 更新配置
- 携带scripts数组调用
POST /v1/speech - 展示结果:、
audioUrl、时长subtitlesUrl