alicloud-ai-audio-cosyvoice-voice-clone

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Category: provider
分类:服务商

Model Studio CosyVoice Voice Clone

Model Studio CosyVoice 语音克隆

Use the CosyVoice voice enrollment API to create cloned voices from public reference audio.
使用CosyVoice语音注册API,通过公开参考音频创建克隆语音。

Critical model names

关键模型名称

Use
model="voice-enrollment"
and one of these
target_model
values:
  • cosyvoice-v3.5-plus
  • cosyvoice-v3.5-flash
  • cosyvoice-v3-plus
  • cosyvoice-v3-flash
  • cosyvoice-v2
Recommended default in this repo:
  • target_model="cosyvoice-v3.5-plus"
使用
model="voice-enrollment"
,并选择以下
target_model
值之一:
  • cosyvoice-v3.5-plus
  • cosyvoice-v3.5-flash
  • cosyvoice-v3-plus
  • cosyvoice-v3-flash
  • cosyvoice-v2
本仓库推荐默认值:
  • target_model="cosyvoice-v3.5-plus"

Region and compatibility

地域与兼容性

  • cosyvoice-v3.5-plus
    and
    cosyvoice-v3.5-flash
    are available only in China mainland deployment mode (Beijing endpoint).
  • In international deployment mode (Singapore endpoint),
    cosyvoice-v3-plus
    and
    cosyvoice-v3-flash
    do not support voice clone/design.
  • The
    target_model
    used during enrollment must match the model used later in speech synthesis, otherwise synthesis fails.
  • cosyvoice-v3.5-plus
    cosyvoice-v3.5-flash
    仅在中国大陆部署模式(北京接入点)可用。
  • 在国际部署模式(新加坡接入点)下,
    cosyvoice-v3-plus
    cosyvoice-v3-flash
    不支持语音克隆/定制功能。
  • 注册时使用的
    target_model
    必须与后续语音合成使用的模型一致,否则合成会失败。

Endpoint

接入端点

  • Domestic:
    https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
  • International:
    https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • 国内:
    https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
  • 国际:
    https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Prerequisites

前置条件

  • Set
    DASHSCOPE_API_KEY
    in your environment, or add
    dashscope_api_key
    to
    ~/.alibabacloud/credentials
    .
  • Provide a public audio URL for the enrollment sample.
  • 在环境变量中设置
    DASHSCOPE_API_KEY
    ,或在
    ~/.alibabacloud/credentials
    中添加
    dashscope_api_key
  • 提供用于注册样本的公开音频URL。

Normalized interface (cosyvoice.voice_clone)

标准化接口(cosyvoice.voice_clone)

Request

请求参数

  • model
    (string, optional): fixed to
    voice-enrollment
  • target_model
    (string, optional): default
    cosyvoice-v3.5-plus
  • prefix
    (string, required): letters/digits only, max 10 chars
  • voice_sample_url
    (string, required): public audio URL
  • language_hints
    (array[string], optional): only first item is used
  • max_prompt_audio_length
    (float, optional): only for
    cosyvoice-v3.5-plus
    ,
    cosyvoice-v3.5-flash
    ,
    cosyvoice-v3-flash
  • enable_preprocess
    (bool, optional): only for
    cosyvoice-v3.5-plus
    ,
    cosyvoice-v3.5-flash
    ,
    cosyvoice-v3-flash
  • model
    (字符串,可选):固定为
    voice-enrollment
  • target_model
    (字符串,可选):默认值为
    cosyvoice-v3.5-plus
  • prefix
    (字符串,必填):仅包含字母/数字,最多10个字符
  • voice_sample_url
    (字符串,必填):公开音频URL
  • language_hints
    (字符串数组,可选):仅第一个元素会被使用
  • max_prompt_audio_length
    (浮点数,可选):仅适用于
    cosyvoice-v3.5-plus
    cosyvoice-v3.5-flash
    cosyvoice-v3-flash
  • enable_preprocess
    (布尔值,可选):仅适用于
    cosyvoice-v3.5-plus
    cosyvoice-v3.5-flash
    cosyvoice-v3-flash

Response

响应参数

  • voice_id
    (string): use this as the
    voice
    parameter in later TTS calls
  • request_id
    (string)
  • usage.count
    (number, optional)
  • voice_id
    (字符串):后续TTS调用中用作
    voice
    参数
  • request_id
    (字符串)
  • usage.count
    (数字,可选)

Operational guidance

操作指南

  • For Chinese dialect reference audio, keep
    language_hints=["zh"]
    ; control dialect style later in synthesis via text or
    instruct
    .
  • For
    cosyvoice-v3.5-plus
    , supported
    language_hints
    include
    zh
    ,
    en
    ,
    fr
    ,
    de
    ,
    ja
    ,
    ko
    ,
    ru
    ,
    pt
    ,
    th
    ,
    id
    ,
    vi
    .
  • Avoid frequent enrollment calls; each call creates a new custom voice and consumes quota.
  • 对于中文方言参考音频,保持
    language_hints=["zh"]
    ;后续可通过文本或
    instruct
    参数控制方言风格。
  • 对于
    cosyvoice-v3.5-plus
    ,支持的
    language_hints
    包括
    zh
    en
    fr
    de
    ja
    ko
    ru
    pt
    th
    id
    vi
  • 避免频繁调用注册接口;每次调用都会创建一个新的自定义语音并消耗配额。

Local helper script

本地辅助脚本

Prepare a normalized request JSON:
bash
python skills/ai/audio/alicloud-ai-audio-cosyvoice-voice-clone/scripts/prepare_cosyvoice_clone_request.py \
  --target-model cosyvoice-v3.5-plus \
  --prefix myvoice \
  --voice-sample-url https://example.com/voice.wav \
  --language-hint zh
准备标准化请求JSON:
bash
python skills/ai/audio/alicloud-ai-audio-cosyvoice-voice-clone/scripts/prepare_cosyvoice_clone_request.py \
  --target-model cosyvoice-v3.5-plus \
  --prefix myvoice \
  --voice-sample-url https://example.com/voice.wav \
  --language-hint zh

Validation

验证步骤

bash
mkdir -p output/alicloud-ai-audio-cosyvoice-voice-clone
for f in skills/ai/audio/alicloud-ai-audio-cosyvoice-voice-clone/scripts/*.py; do
  python3 -m py_compile "$f"
done
echo "py_compile_ok" > output/alicloud-ai-audio-cosyvoice-voice-clone/validate.txt
Pass criteria: command exits 0 and
output/alicloud-ai-audio-cosyvoice-voice-clone/validate.txt
is generated.
bash
mkdir -p output/alicloud-ai-audio-cosyvoice-voice-clone
for f in skills/ai/audio/alicloud-ai-audio-cosyvoice-voice-clone/scripts/*.py; do
  python3 -m py_compile "$f"
done
echo "py_compile_ok" > output/alicloud-ai-audio-cosyvoice-voice-clone/validate.txt
通过标准:命令执行退出码为0,且生成
output/alicloud-ai-audio-cosyvoice-voice-clone/validate.txt
文件。

Output And Evidence

输出与证据

  • Save artifacts, command outputs, and API response summaries under
    output/alicloud-ai-audio-cosyvoice-voice-clone/
    .
  • Include
    target_model
    ,
    prefix
    , and sample URL in the evidence file.
  • 将产物、命令输出和API响应摘要保存至
    output/alicloud-ai-audio-cosyvoice-voice-clone/
    目录下。
  • 证据文件中需包含
    target_model
    prefix
    和样本URL。

References

参考资料

  • references/api_reference.md
  • references/sources.md
  • references/api_reference.md
  • references/sources.md