alicloud-ai-audio-tts

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Category: provider
分类:服务商

Model Studio Qwen TTS

Model Studio Qwen TTS

Critical model name

关键模型名称

Use the recommended model:
  • qwen3-tts-flash
使用推荐的模型:
  • qwen3-tts-flash

Prerequisites

前提条件

  • Install SDK (recommended in a venv to avoid PEP 668 limits):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope
  • Set
    DASHSCOPE_API_KEY
    in your environment, or add
    dashscope_api_key
    to
    ~/.alibabacloud/credentials
    (env takes precedence).
  • 安装SDK(建议在虚拟环境venv中安装,以规避PEP 668限制):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope
  • 在环境变量中设置
    DASHSCOPE_API_KEY
    ,或在
    ~/.alibabacloud/credentials
    中添加
    dashscope_api_key
    (环境变量优先级更高)。

Normalized interface (tts.generate)

标准化接口(tts.generate)

Request

请求参数

  • text
    (string, required)
  • voice
    (string, required)
  • language_type
    (string, optional; default
    Auto
    )
  • stream
    (bool, optional; default false)
  • text
    (字符串,必填)
  • voice
    (字符串,必填)
  • language_type
    (字符串,可选;默认值
    Auto
  • stream
    (布尔值,可选;默认值false)

Response

响应参数

  • audio_url
    (string, when stream=false)
  • audio_base64_pcm
    (string, when stream=true)
  • sample_rate
    (int, 24000)
  • format
    (string, wav or pcm depending on mode)
  • audio_url
    (字符串,当stream=false时返回)
  • audio_base64_pcm
    (字符串,当stream=true时返回)
  • sample_rate
    (整数,固定为24000)
  • format
    (字符串,根据模式不同为wav或pcm)

Quick start (Python + DashScope SDK)

快速开始(Python + DashScope SDK)

python
import os
import dashscope
python
import os
import dashscope

Prefer env var for auth: export DASHSCOPE_API_KEY=...

优先使用环境变量进行身份验证:export DASHSCOPE_API_KEY=...

Or use ~/.alibabacloud/credentials with dashscope_api_key under [default].

或在~/.alibabacloud/credentials的[default]下添加dashscope_api_key。

Beijing region; for Singapore use: https://dashscope-intl.aliyuncs.com/api/v1

北京区域;若使用新加坡区域请改为:https://dashscope-intl.aliyuncs.com/api/v1

dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
text = "Hello, this is a short voice line." response = dashscope.MultiModalConversation.call( model="qwen3-tts-flash", api_key=os.getenv("DASHSCOPE_API_KEY"), text=text, voice="Cherry", language_type="English", stream=False, )
audio_url = response.output.audio.url print(audio_url)
undefined
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
text = "Hello, this is a short voice line." response = dashscope.MultiModalConversation.call( model="qwen3-tts-flash", api_key=os.getenv("DASHSCOPE_API_KEY"), text=text, voice="Cherry", language_type="English", stream=False, )
audio_url = response.output.audio.url print(audio_url)
undefined

Streaming notes

流式传输注意事项

  • stream=True
    returns Base64-encoded PCM chunks at 24kHz.
  • Decode chunks and play or concatenate to a pcm buffer.
  • The response contains
    finish_reason == "stop"
    when the stream ends.
  • stream=True
    时,会返回24kHz采样率的Base64编码PCM音频块。
  • 解码这些音频块后可直接播放,或拼接成PCM缓冲区。
  • 流式传输结束时,响应中会包含
    finish_reason == "stop"
    标识。

Operational guidance

操作指南

  • Keep requests concise; split long text into multiple calls if you hit size or timeout errors.
  • Use
    language_type
    consistent with the text to improve pronunciation.
  • Cache by
    (text, voice, language_type)
    to avoid repeat costs.
  • 保持请求简洁;若遇到大小限制或超时错误,可将长文本拆分为多个请求。
  • 设置与文本一致的
    language_type
    参数,以提升发音准确性。
  • 可通过
    (text, voice, language_type)
    作为键进行缓存,避免重复调用产生额外费用。

Output location

输出位置

  • Default output:
    output/ai-audio-tts/audio/
  • Override base dir with
    OUTPUT_DIR
    .
  • 默认输出路径:
    output/ai-audio-tts/audio/
  • 可通过
    OUTPUT_DIR
    环境变量覆盖基础目录。

References

参考资料

  • references/api_reference.md
    for parameter mapping and streaming example.
  • Source list:
    references/sources.md
  • references/api_reference.md
    :包含参数映射及流式传输示例。
  • 来源列表:
    references/sources.md