tts

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

When to Use

适用场景

User wants to convert text to spoken audio
User asks for "read aloud", "TTS", "text to speech", "voice narration"
User says "朗读", "配音", "语音合成"
User wants multi-speaker scripted audio or dialogue

用户希望将文本转换为语音音频
用户提出“read aloud”、“TTS”、“text to speech”、“voice narration”需求
用户说出“朗读”、“配音”、“语音合成”指令
用户需要多角色脚本音频或对话音频

When NOT to Use

不适用场景

User wants a podcast-style discussion with topic exploration (use
```
/podcast
```
)
User wants an explainer video with visuals (use
```
/explainer
```
)
User wants to generate an image (use
```
/image-gen
```
)

用户需要带有主题探讨的播客式讨论（请使用
```
/podcast
```
）
用户需要带视觉效果的解说视频（请使用
```
/explainer
```
）
用户需要生成图片（请使用
```
/image-gen
```
）

Purpose

功能目标

Convert text into natural-sounding speech audio. Two paths:

Quick mode (
```
/v1/tts
```
): Single voice, low-latency, sync MP3 stream. For casual chat, reading snippets, instant audio.
Script mode (
```
/v1/speech
```
): Multi-speaker, per-segment voice assignment. For dialogue, audiobooks, scripted content.

将文本转换为自然流畅的语音音频。提供两种模式：

快速模式 (
```
/v1/tts
```
)：单语音角色，低延迟，同步MP3流。适用于日常聊天、片段朗读、即时音频生成。
脚本模式 (
```
/v1/speech
```
)：多语音角色，支持按片段分配说话人。适用于对话内容、有声书、脚本化内容。

Hard Constraints

硬性约束

No shell scripts. Construct curl commands from the API reference files listed in Resources
Always read
```
shared/authentication.md
```
for API key and headers
Follow
```
shared/common-patterns.md
```
for errors and interaction patterns
Never hardcode speaker IDs — always fetch from the speakers API
Always read config following
```
shared/config-pattern.md
```
before any interaction
Always follow
```
shared/speaker-selection.md
```
for speaker selection (text table + free-text input)
Never save files to
```
~/Downloads/
```
or
```
/tmp/
```
as primary output — use
```
.listenhub/tts/
```

<HARD-GATE> Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation API until the user has explicitly confirmed. </HARD-GATE>

禁止使用shell脚本。请根据资源中列出的API参考文件构造curl命令
务必阅读
```
shared/authentication.md
```
获取API密钥和请求头信息
遵循
```
shared/common-patterns.md
```
中的错误处理和交互模式
绝对不要硬编码说话人ID——务必从speakers API获取
在进行任何交互前，务必遵循
```
shared/config-pattern.md
```
读取配置
说话人选择务必遵循
```
shared/speaker-selection.md
```
（文本表格+自由文本输入）
不要将文件保存到
```
~/Downloads/
```
或
```
/tmp/
```
作为主要输出——使用
```
.listenhub/tts/
```

<HARD-GATE> 每一步多选操作都必须使用AskUserQuestion工具——不要以纯文本形式打印选项。一次只问一个问题。等待用户回答后再进行下一步。收集完所有参数后，总结所选内容并请用户确认。在用户明确确认前，禁止调用任何生成API。 </HARD-GATE>

Mode Detection

模式检测

Determine the mode from the user's input automatically before asking any questions:

Signal	Mode
"多角色", "脚本", "对话", "script", "dialogue", "multi-speaker"	Script
Multiple characters mentioned by name or role	Script
Input contains structured segments (A: ..., B: ...)	Script
Single paragraph of text, no character markers	Quick
"读一下", "read this", "TTS", "朗读" with plain text	Quick
Ambiguous	Quick (default)

在提问前自动根据用户输入判断模式：

触发信号	模式
“多角色”、“脚本”、“对话”、“script”、“dialogue”、“multi-speaker”	脚本模式
输入中提到多个带名称或角色的人物	脚本模式
输入包含结构化片段（A: ..., B: ...）	脚本模式
单段落文本，无角色标记	快速模式
搭配纯文本的“读一下”、“read this”、“TTS”、“朗读”	快速模式
信号不明确	默认使用快速模式

Interaction Flow

交互流程

Step -1: API Key Check

步骤-1：API密钥检查

shared/config-pattern.md

§ API Key Check. If the key is missing, stop immediately.

遵循

shared/config-pattern.md

中的API密钥检查章节。如果密钥缺失，立即停止操作。

Step 0: Config Setup

步骤0：配置初始化

shared/config-pattern.md

Step 0.

If file doesn't exist — ask location, then create immediately:

bash

mkdir -p ".listenhub/tts"
echo '{"outputDir":".listenhub","outputMode":"inline","language":null,"defaultSpeakers":{}}' > ".listenhub/tts/config.json"
CONFIG_PATH=".listenhub/tts/config.json"

遵循

shared/config-pattern.md

的步骤0。

如果配置文件不存在——询问保存位置，然后立即创建：

bash

mkdir -p ".listenhub/tts"
echo '{"outputDir":".listenhub","outputMode":"inline","language":null,"defaultSpeakers":{}}' > ".listenhub/tts/config.json"
CONFIG_PATH=".listenhub/tts/config.json"

(or $HOME/.listenhub/tts/config.json for global)

Then run **Setup Flow** below.

**If file exists** — read config, display summary, and confirm:

当前配置 (tts)：输出方式：{inline / download / both} 语言偏好：{zh / en / 未设置} 默认主播：{speakerName / 未设置}

Ask: "使用已保存的配置？" → **确认，直接继续** / **重新配置**

然后执行下方的**配置流程**。

**如果配置文件已存在**——读取配置，显示摘要并请用户确认：

当前配置 (tts)：输出方式：{inline / download / both} 语言偏好：{zh / en / 未设置} 默认主播：{speakerName / 未设置}

询问：“使用已保存的配置？” → **确认，直接继续** / **重新配置**

Setup Flow (first run or reconfigure)

配置流程（首次运行或重新配置）

Ask these questions in order, then save all answers to config at once:

outputMode: Follow
```
shared/output-mode.md
```
§ Setup Flow Question.
Language (optional): "默认语言？"
- "中文 (zh)"
- "English (en)"
- "每次手动选择" → keep
```
null
```

After collecting answers, save immediately:

bash

undefined

按顺序询问以下问题，然后一次性将所有答案保存到配置文件：

outputMode：遵循
```
shared/output-mode.md
```
中的配置流程提问。
语言（可选）：“默认语言？"
- "中文 (zh)"
- "English (en)"
- "每次手动选择" → 保持
```
null
```

收集完答案后立即保存：

bash

undefined

Save outputMode; only update language if user picked one

Follow shared/output-mode.md § Save to Config

NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')

If language was chosen (not "每次手动选择"):

NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "zh" '. + {"language": $lang}') echo "$NEW_CONFIG" > "$CONFIG_PATH" CONFIG=$(cat "$CONFIG_PATH")


Note: `defaultSpeakers` are saved after speaker selection in Step 3 — not here.

NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "zh" '. + {"language": $lang}') echo "$NEW_CONFIG" > "$CONFIG_PATH" CONFIG=$(cat "$CONFIG_PATH")

注意：`defaultSpeakers`会在步骤3的说话人选择后保存——不在此步骤保存。

Quick Mode —

POST /v1/tts

快速模式 —

POST /v1/tts

Step 1: Extract text

Get the text to convert. If the user hasn't provided it, ask:

"What text would you like me to read aloud?"

Step 2: Determine voice

If
```
config.defaultSpeakers.{language}[0]
```
is set → use it silently (skip to Step 4)

Otherwise:

GET /speakers/list?language={detected-language}

, then follow

shared/speaker-selection.md

(text table + free-text input)

Step 3: Save preference

Question: "Save {voice name} as your default voice for {language}?"
Options:
  - "Yes" — update .listenhub/tts/config.json
  - "No" — use for this session only

Step 4: Confirm

Ready to generate:

  Text: "{first 80 chars}..."
  Voice: {voice name}

Proceed?

Step 5: Generate

bash

curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "...", "voice": "..."}' \
  --output /tmp/tts-output.mp3

Step 6: Present result

Read

OUTPUT_MODE

from config. Follow

shared/output-mode.md

for behavior.

Use a timestamped jobId:

$(date +%s)

inline
or
both
(TTS quick returns a sync audio stream — no

audioUrl

bash

JOB_ID=$(date +%s)
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "...", "voice": "..."}' \
  --output /tmp/tts-${JOB_ID}.mp3

Then use the Read tool on

/tmp/tts-{jobId}.mp3

Present:

Audio generated!

download
or
both
:

bash

JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/tts/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "...", "voice": "..."}' \
  --output "${JOB_DIR}/${JOB_ID}.mp3"

Present:

Audio generated!

已下载到 .listenhub/tts/{YYYY-MM-DD}-{jobId}/：
  {jobId}.mp3

步骤1：提取文本

获取需要转换的文本。如果用户未提供，询问：

"请问您需要朗读什么文本？"

步骤2：确定语音角色

如果
```
config.defaultSpeakers.{language}[0]
```
已设置 → 自动使用该角色（跳至步骤4）

否则：调用

GET /speakers/list?language={detected-language}

，然后遵循

shared/speaker-selection.md

（文本表格+自由文本输入）

步骤3：保存偏好设置

问题：“是否将{voice name}设为{language}的默认语音角色？"
选项：
  - "是" — 更新.listenhub/tts/config.json
  - "否" — 仅本次会话使用

步骤4：确认参数

准备生成音频：

  文本："{前80个字符}..."
  语音角色：{voice name}

是否继续？

步骤5：生成音频

bash

curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "...", "voice": "..."}' \
  --output /tmp/tts-output.mp3

步骤6：展示结果

从配置中读取

OUTPUT_MODE

，遵循

shared/output-mode.md

的行为规则。

使用带时间戳的任务ID：

$(date +%s)

inline
或
both
模式（TTS快速模式返回同步音频流——无

audioUrl

）：

bash

JOB_ID=$(date +%s)
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "...", "voice": "..."}' \
  --output /tmp/tts-${JOB_ID}.mp3

然后使用Read工具读取

/tmp/tts-{jobId}.mp3

。

展示内容：

音频已生成！

download
或
both
模式：

bash

JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/tts/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "...", "voice": "..."}' \
  --output "${JOB_DIR}/${JOB_ID}.mp3"

展示内容：

音频已生成！

已下载至 .listenhub/tts/{YYYY-MM-DD}-{jobId}/：
  {jobId}.mp3

Script Mode —

POST /v1/speech

脚本模式 —

POST /v1/speech

Step 1: Get scripts

Determine whether the user already has a scripts array:

Already provided (JSON or clear segments): parse and display for confirmation
Not yet provided: help the user structure segments. Ask:
"Please provide the script with speaker assignments. Format: each line as
```
SpeakerName: text content
```
. I'll convert it."
Once the user provides the script, parse it into the
```
scripts
```
JSON format.

Step 2: Assign voices per character

For each unique character in the script:

If
```
config.defaultSpeakers.{language}
```
has saved voices → auto-assign silently (one per character in order)

Otherwise: fetch

GET /speakers/list?language={detected-language}

and follow

shared/speaker-selection.md

for each character

Step 3: Save preferences

After all voices are assigned (if any were new):

Question: "Save these voice assignments for future sessions?"
Options:
  - "Yes" — update defaultSpeakers in .listenhub/tts/config.json
  - "No" — use for this session only

Step 4: Confirm

Ready to generate:

  Characters:
    {name}: {voice}
    {name}: {voice}
  Segments: {count}
  Title: (auto-generated)

Proceed?

Step 5: Generate

Write the request body to a temp file, then submit:

bash

undefined

步骤1：获取脚本内容

判断用户是否已提供脚本数组：

已提供（JSON格式或清晰的分段内容）：解析后展示请用户确认
未提供：帮助用户整理分段内容。询问：
"请提供带有说话人分配的脚本，格式为：每行
```
SpeakerName: 文本内容
```
。我会为您转换。"
用户提供脚本后，将其解析为
```
scripts
```
JSON格式。

步骤2：为每个角色分配语音

对于脚本中的每个独特角色：

如果
```
config.defaultSpeakers.{language}
```
中已保存语音角色 → 自动分配（按顺序为每个角色分配一个）

否则：调用

GET /speakers/list?language={detected-language}

，然后为每个角色遵循

shared/speaker-selection.md

选择语音

步骤3：保存偏好设置

完成所有语音分配后（如果有新分配的角色）：

问题：“是否保存这些语音分配以便后续会话使用？"
选项：
  - "是" — 更新.listenhub/tts/config.json中的defaultSpeakers
  - "否" — 仅本次会话使用

步骤4：确认参数

准备生成音频：

  角色列表：
    {name}: {voice}
    {name}: {voice}
  片段数量：{count}
  标题：（自动生成）

是否继续？

步骤5：生成音频

将请求体写入临时文件，然后提交：

bash

undefined

Write request to temp file

cat > /tmp/lh-speech-request.json << 'ENDJSON' { "scripts": [ {"content": "...", "speakerId": "..."}, {"content": "...", "speakerId": "..."} ] } ENDJSON

Submit

curl -sS -X POST "https://api.marswave.ai/openapi/v1/speech"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d @/tmp/lh-speech-request.json

rm /tmp/lh-speech-request.json


**Step 6: Present result**

Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior.

**`inline` or `both`**: Display the `audioUrl` and `subtitlesUrl` as clickable links.

Present:

Audio generated!

在线收听：{audioUrl} 字幕：{subtitlesUrl} 时长：{audioDuration / 1000}s 消耗积分：{credits}


**`download` or `both`**: Also download the file.
```bash
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/tts/${DATE}-{jobId}"
mkdir -p "$JOB_DIR"
curl -sS -o "${JOB_DIR}/{jobId}.mp3" "{audioUrl}"

Present the download path in addition to the above summary.

curl -sS -X POST "https://api.marswave.ai/openapi/v1/speech"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d @/tmp/lh-speech-request.json

rm /tmp/lh-speech-request.json


**步骤6：展示结果**

从配置中读取`OUTPUT_MODE`，遵循`shared/output-mode.md`的行为规则。

**`inline`或`both`模式**：将`audioUrl`和`subtitlesUrl`显示为可点击链接。

展示内容：

音频已生成！

在线收听：{audioUrl} 字幕：{subtitlesUrl} 时长：{audioDuration / 1000}s 消耗积分：{credits}


**`download`或`both`模式**：同时下载文件。
```bash
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/tts/${DATE}-{jobId}"
mkdir -p "$JOB_DIR"
curl -sS -o "${JOB_DIR}/{jobId}.mp3" "{audioUrl}"

除上述摘要外，还需展示下载路径。

Updating Config

更新配置

When saving preferences, merge into

.listenhub/tts/config.json

— do not overwrite unchanged keys. Follow the merge pattern in

shared/config-pattern.md

Quick voice: set
```
defaultSpeakers.{language}[0]
```
to the selected
```
speakerId
```
Script voices: set
```
defaultSpeakers.{language}
```
to the full array assigned this session
Language: set
```
language
```
if the user explicitly specifies it

保存偏好设置时，将内容合并到

.listenhub/tts/config.json

——不要覆盖未修改的键。遵循

shared/config-pattern.md

中的合并模式。

快速模式语音角色：将
```
defaultSpeakers.{language}[0]
```
设为所选的
```
speakerId
```
脚本模式语音角色：将
```
defaultSpeakers.{language}
```
设为本会话分配的完整角色数组
语言：如果用户明确指定，设置
```
language
```
字段

API Reference

API参考

TTS & Speech endpoints:
```
shared/api-tts.md
```
Speaker list:
```
shared/api-speakers.md
```
Speaker selection guide:
```
shared/speaker-selection.md
```
Error handling:
```
shared/common-patterns.md
```
§ Error Handling
Long text input:
```
shared/common-patterns.md
```
§ Long Text Input

TTS与Speech接口：
```
shared/api-tts.md
```
说话人列表：
```
shared/api-speakers.md
```
说话人选择指南：
```
shared/speaker-selection.md
```
错误处理：
```
shared/common-patterns.md
```
中的错误处理章节
长文本输入：
```
shared/common-patterns.md
```
中的长文本输入章节

Composability

组合性

Invokes: speakers API (for speaker selection)
Invoked by: explainer (for voiceover)

调用：speakers API（用于说话人选择）
被调用：explainer（用于旁白生成）

Examples

示例

Quick mode:

"TTS this: The server will be down for maintenance at midnight."

Detect: Quick mode (plain text, "TTS this")
Read config:
```
quickVoice
```
is
```
null
```
Fetch speakers, user picks "Yuanye"
Ask to save → yes → update config
```
POST /v1/tts
```
with
```
input
```
+
```
voice
```
Present:
```
/tmp/tts-output.mp3
```

Script mode:

"帮我做一段双人对话配音，A说：欢迎大家，B说：谢谢邀请"

Detect: Script mode ("双人对话")
Parse segments: A → "欢迎大家", B → "谢谢邀请"
Read config:
```
scriptVoices
```
empty
Fetch
```
zh
```
speakers, assign A and B voices
Ask to save → yes → update config
```
POST /v1/speech
```
with scripts array
Present:
```
audioUrl
```
,
```
subtitlesUrl
```
, duration

快速模式示例：

"TTS这段文本：服务器将于午夜进行维护。"

检测：快速模式（纯文本，含“TTS this”）
读取配置：
```
quickVoice
```
为
```
null
```
获取说话人列表，用户选择“Yuanye”
询问是否保存 → 是 → 更新配置
携带
```
input
```
和
```
voice
```
参数调用
```
POST /v1/tts
```
展示结果：
```
/tmp/tts-output.mp3
```

脚本模式示例：

"帮我做一段双人对话配音，A说：欢迎大家，B说：谢谢邀请"

检测：脚本模式（含“双人对话”）
解析片段：A → "欢迎大家"，B → "谢谢邀请"
读取配置：
```
scriptVoices
```
为空
获取中文说话人列表，为A和B分配语音角色
询问是否保存 → 是 → 更新配置
携带scripts数组调用
```
POST /v1/speech
```
展示结果：
```
audioUrl
```
、
```
subtitlesUrl
```
、时长