alicloud-ai-audio-tts
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCategory: provider
分类:服务商
Model Studio Qwen TTS
Model Studio Qwen TTS
Critical model name
关键模型名称
Use the recommended model:
qwen3-tts-flash
使用推荐的模型:
qwen3-tts-flash
Prerequisites
前提条件
- Install SDK (recommended in a venv to avoid PEP 668 limits):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope- Set in your environment, or add
DASHSCOPE_API_KEYtodashscope_api_key(env takes precedence).~/.alibabacloud/credentials
- 安装SDK(建议在虚拟环境venv中安装,以规避PEP 668限制):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope- 在环境变量中设置,或在
DASHSCOPE_API_KEY中添加~/.alibabacloud/credentials(环境变量优先级更高)。dashscope_api_key
Normalized interface (tts.generate)
标准化接口(tts.generate)
Request
请求参数
- (string, required)
text - (string, required)
voice - (string, optional; default
language_type)Auto - (bool, optional; default false)
stream
- (字符串,必填)
text - (字符串,必填)
voice - (字符串,可选;默认值
language_type)Auto - (布尔值,可选;默认值false)
stream
Response
响应参数
- (string, when stream=false)
audio_url - (string, when stream=true)
audio_base64_pcm - (int, 24000)
sample_rate - (string, wav or pcm depending on mode)
format
- (字符串,当stream=false时返回)
audio_url - (字符串,当stream=true时返回)
audio_base64_pcm - (整数,固定为24000)
sample_rate - (字符串,根据模式不同为wav或pcm)
format
Quick start (Python + DashScope SDK)
快速开始(Python + DashScope SDK)
python
import os
import dashscopepython
import os
import dashscopePrefer env var for auth: export DASHSCOPE_API_KEY=...
优先使用环境变量进行身份验证:export DASHSCOPE_API_KEY=...
Or use ~/.alibabacloud/credentials with dashscope_api_key under [default].
或在~/.alibabacloud/credentials的[default]下添加dashscope_api_key。
Beijing region; for Singapore use: https://dashscope-intl.aliyuncs.com/api/v1
北京区域;若使用新加坡区域请改为:https://dashscope-intl.aliyuncs.com/api/v1
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
text = "Hello, this is a short voice line."
response = dashscope.MultiModalConversation.call(
model="qwen3-tts-flash",
api_key=os.getenv("DASHSCOPE_API_KEY"),
text=text,
voice="Cherry",
language_type="English",
stream=False,
)
audio_url = response.output.audio.url
print(audio_url)
undefineddashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
text = "Hello, this is a short voice line."
response = dashscope.MultiModalConversation.call(
model="qwen3-tts-flash",
api_key=os.getenv("DASHSCOPE_API_KEY"),
text=text,
voice="Cherry",
language_type="English",
stream=False,
)
audio_url = response.output.audio.url
print(audio_url)
undefinedStreaming notes
流式传输注意事项
- returns Base64-encoded PCM chunks at 24kHz.
stream=True - Decode chunks and play or concatenate to a pcm buffer.
- The response contains when the stream ends.
finish_reason == "stop"
- 当时,会返回24kHz采样率的Base64编码PCM音频块。
stream=True - 解码这些音频块后可直接播放,或拼接成PCM缓冲区。
- 流式传输结束时,响应中会包含标识。
finish_reason == "stop"
Operational guidance
操作指南
- Keep requests concise; split long text into multiple calls if you hit size or timeout errors.
- Use consistent with the text to improve pronunciation.
language_type - Cache by to avoid repeat costs.
(text, voice, language_type)
- 保持请求简洁;若遇到大小限制或超时错误,可将长文本拆分为多个请求。
- 设置与文本一致的参数,以提升发音准确性。
language_type - 可通过作为键进行缓存,避免重复调用产生额外费用。
(text, voice, language_type)
Output location
输出位置
- Default output:
output/ai-audio-tts/audio/ - Override base dir with .
OUTPUT_DIR
- 默认输出路径:
output/ai-audio-tts/audio/ - 可通过环境变量覆盖基础目录。
OUTPUT_DIR
References
参考资料
-
for parameter mapping and streaming example.
references/api_reference.md -
Source list:
references/sources.md
-
:包含参数映射及流式传输示例。
references/api_reference.md -
来源列表:
references/sources.md