aliyun-cosyvoice-voice-design

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Category: provider
Category: provider

Model Studio CosyVoice Voice Design

Model Studio CosyVoice 音色设计

Use the CosyVoice voice enrollment API to create designed voices from a natural-language voice description.
使用CosyVoice音色录入API,通过自然语言音色描述创建自定义设计的音色。

Critical model names

关键模型名称

Use
model="voice-enrollment"
and one of these
target_model
values:
  • cosyvoice-v3.5-plus
  • cosyvoice-v3.5-flash
  • cosyvoice-v3-plus
  • cosyvoice-v3-flash
Recommended default in this repo:
  • target_model="cosyvoice-v3.5-plus"
使用
model="voice-enrollment"
以及以下任意一个
target_model
值:
  • cosyvoice-v3.5-plus
  • cosyvoice-v3.5-flash
  • cosyvoice-v3-plus
  • cosyvoice-v3-flash
本仓库推荐的默认值:
  • target_model="cosyvoice-v3.5-plus"

Region and compatibility

区域与兼容性

  • cosyvoice-v3.5-plus
    and
    cosyvoice-v3.5-flash
    are available only in China mainland deployment mode (Beijing endpoint).
  • In international deployment mode (Singapore endpoint),
    cosyvoice-v3-plus
    and
    cosyvoice-v3-flash
    do not support voice clone/design.
  • The
    target_model
    must match the later speech synthesis model.
  • cosyvoice-v3.5-plus
    cosyvoice-v3.5-flash
    仅在中国大陆部署模式(北京节点)可用。
  • 在国际部署模式(新加坡节点)下,
    cosyvoice-v3-plus
    cosyvoice-v3-flash
    不支持音色克隆/设计功能。
  • target_model
    必须与后续使用的语音合成模型相匹配。

Endpoint

接口地址

  • Domestic:
    https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
  • International:
    https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • 国内:
    https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
  • 国际:
    https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Prerequisites

前置条件

  • Set
    DASHSCOPE_API_KEY
    in your environment, or add
    dashscope_api_key
    to
    ~/.alibabacloud/credentials
    .
  • 在你的环境变量中设置
    DASHSCOPE_API_KEY
    ,或者将
    dashscope_api_key
    添加到
    ~/.alibabacloud/credentials
    文件中。

Normalized interface (cosyvoice.voice_design)

标准化接口 (cosyvoice.voice_design)

Request

请求参数

  • model
    (string, optional): fixed to
    voice-enrollment
  • target_model
    (string, optional): default
    cosyvoice-v3.5-plus
  • prefix
    (string, required): letters/digits only, max 10 chars
  • voice_prompt
    (string, required): max 500 chars, Chinese or English only
  • preview_text
    (string, required): max 200 chars, Chinese or English
  • language_hints
    (array[string], optional):
    zh
    or
    en
    , and should match
    preview_text
  • sample_rate
    (int, optional): e.g.
    24000
  • response_format
    (string, optional): e.g.
    wav
  • model
    (字符串, 可选):固定为
    voice-enrollment
  • target_model
    (字符串, 可选):默认值为
    cosyvoice-v3.5-plus
  • prefix
    (字符串, 必填):仅支持字母/数字,最长10个字符
  • voice_prompt
    (字符串, 必填):最长500个字符,仅支持中文或英文
  • preview_text
    (字符串, 必填):最长200个字符,支持中文或英文
  • language_hints
    (字符串数组, 可选):可选值为
    zh
    en
    ,需要与
    preview_text
    的语言匹配
  • sample_rate
    (整数, 可选):例如
    24000
  • response_format
    (字符串, 可选):例如
    wav

Response

返回参数

  • voice_id
    (string)
  • request_id
    (string)
  • status
    (string, optional)
  • voice_id
    (字符串)
  • request_id
    (字符串)
  • status
    (字符串, 可选)

Operational guidance

操作指引

  • Keep
    voice_prompt
    concrete: timbre, age range, pace, emotion, articulation, and scenario.
  • If
    language_hints
    is used, it should match the language of
    preview_text
    .
  • Designed voice names include a
    -vd-
    marker in the generated backend naming convention.
  • voice_prompt
    的描述要具体:包括音色、年龄范围、语速、情绪、咬字清晰度和使用场景等信息。
  • 如果使用了
    language_hints
    ,其取值需要与
    preview_text
    的语言匹配。
  • 设计生成的音色名称在后端命名规则中会包含
    -vd-
    标识。

Local helper script

本地辅助脚本

Prepare a normalized request JSON:
bash
python skills/ai/audio/aliyun-cosyvoice-voice-design/scripts/prepare_cosyvoice_design_request.py \
  --target-model cosyvoice-v3.5-plus \
  --prefix announcer \
  --voice-prompt "沉稳的中年男性播音员,低沉有磁性,语速平稳,吐字清晰。" \
  --preview-text "各位听众朋友,大家好,欢迎收听晚间新闻。" \
  --language-hint zh
准备标准化的请求JSON:
bash
python skills/ai/audio/aliyun-cosyvoice-voice-design/scripts/prepare_cosyvoice_design_request.py \
  --target-model cosyvoice-v3.5-plus \
  --prefix announcer \
  --voice-prompt "沉稳的中年男性播音员,低沉有磁性,语速平稳,吐字清晰。" \
  --preview-text "各位听众朋友,大家好,欢迎收听晚间新闻。" \
  --language-hint zh

Validation

校验

bash
mkdir -p output/aliyun-cosyvoice-voice-design
for f in skills/ai/audio/aliyun-cosyvoice-voice-design/scripts/*.py; do
  python3 -m py_compile "$f"
done
echo "py_compile_ok" > output/aliyun-cosyvoice-voice-design/validate.txt
Pass criteria: command exits 0 and
output/aliyun-cosyvoice-voice-design/validate.txt
is generated.
bash
mkdir -p output/aliyun-cosyvoice-voice-design
for f in skills/ai/audio/aliyun-cosyvoice-voice-design/scripts/*.py; do
  python3 -m py_compile "$f"
done
echo "py_compile_ok" > output/aliyun-cosyvoice-voice-design/validate.txt
通过标准:命令退出码为0,且生成了
output/aliyun-cosyvoice-voice-design/validate.txt
文件。

Output And Evidence

输出与凭证

  • Save artifacts, command outputs, and API response summaries under
    output/aliyun-cosyvoice-voice-design/
    .
  • Include
    target_model
    ,
    prefix
    ,
    voice_prompt
    , and
    preview_text
    in the evidence file.
  • 将产物、命令输出和API响应摘要保存在
    output/aliyun-cosyvoice-voice-design/
    目录下。
  • 凭证文件中需要包含
    target_model
    prefix
    voice_prompt
    preview_text
    信息。

References

参考文档

  • references/api_reference.md
  • references/sources.md
  • references/api_reference.md
  • references/sources.md