aliyun-wan-digital-human

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Category: provider
分类:提供商

Model Studio Digital Human

Model Studio 数字人

Validation

验证

bash
mkdir -p output/aliyun-wan-digital-human
python -m py_compile skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py && echo "py_compile_ok" > output/aliyun-wan-digital-human/validate.txt
Pass criteria: command exits 0 and
output/aliyun-wan-digital-human/validate.txt
is generated.
bash
mkdir -p output/aliyun-wan-digital-human
python -m py_compile skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py && echo "py_compile_ok" > output/aliyun-wan-digital-human/validate.txt
通过条件:命令退出码为0,且成功生成
output/aliyun-wan-digital-human/validate.txt
文件。

Output And Evidence

输出与证明

  • Save normalized request payloads, chosen resolution, and task polling snapshots under
    output/aliyun-wan-digital-human/
    .
  • Record image/audio URLs and whether the input image passed detection.
Use this skill for image + audio driven speaking, singing, or presenting characters.
  • 将标准化的请求载荷、选择的分辨率、任务轮询快照保存到
    output/aliyun-wan-digital-human/
    目录下。
  • 记录图像/音频URL,以及输入图像是否通过检测。
本技能可实现图像+音频驱动的人物说话、唱歌或演示效果。

Critical model names

关键模型名称

Use these exact model strings:
  • wan2.2-s2v-detect
  • wan2.2-s2v
Selection guidance:
  • Run
    wan2.2-s2v-detect
    first to validate the image.
  • Use
    wan2.2-s2v
    for the actual video generation job.
请使用以下准确的模型字符串:
  • wan2.2-s2v-detect
  • wan2.2-s2v
选择指南:
  • 首先运行
    wan2.2-s2v-detect
    验证输入图像是否合规。
  • 使用
    wan2.2-s2v
    执行实际的视频生成任务。

Prerequisites

前置条件

  • China mainland (Beijing) only.
  • Set
    DASHSCOPE_API_KEY
    in your environment, or add
    dashscope_api_key
    to
    ~/.alibabacloud/credentials
    .
  • Input audio should contain clear speech or singing, and input image should depict a clear subject.
  • 仅支持中国大陆(北京)区域。
  • 在环境变量中设置
    DASHSCOPE_API_KEY
    ,或者在
    ~/.alibabacloud/credentials
    文件中添加
    dashscope_api_key
    配置。
  • 输入音频需包含清晰的语音或歌声,输入图像需呈现清晰的人物主体。

Normalized interface (video.digital_human)

标准化接口(video.digital_human)

Detect Request

检测请求

  • model
    (string, optional): default
    wan2.2-s2v-detect
  • image_url
    (string, required)
  • model
    (字符串,可选):默认值为
    wan2.2-s2v-detect
  • image_url
    (字符串,必填)

Generate Request

生成请求

  • model
    (string, optional): default
    wan2.2-s2v
  • image_url
    (string, required)
  • audio_url
    (string, required)
  • resolution
    (string, optional):
    480P
    or
    720P
  • scenario
    (string, optional):
    talk
    ,
    sing
    , or
    perform
  • model
    (字符串,可选):默认值为
    wan2.2-s2v
  • image_url
    (字符串,必填)
  • audio_url
    (字符串,必填)
  • resolution
    (字符串,可选):可选值为
    480P
    720P
  • scenario
    (字符串,可选):可选值为
    talk
    sing
    perform

Response

返回结果

  • task_id
    (string)
  • task_status
    (string)
  • video_url
    (string, when finished)
  • task_id
    (字符串)
  • task_status
    (字符串)
  • video_url
    (字符串,任务完成时返回)

Quick start

快速开始

bash
python skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py \
  --image-url "https://example.com/anchor.png" \
  --audio-url "https://example.com/voice.mp3" \
  --resolution 720P \
  --scenario talk
bash
python skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py \
  --image-url "https://example.com/anchor.png" \
  --audio-url "https://example.com/voice.mp3" \
  --resolution 720P \
  --scenario talk

Operational guidance

操作指南

  • Use a portrait, half-body, or full-body image with a clear face and stable framing.
  • Match audio length to the desired output duration; the output follows the audio length up to the model limit.
  • Keep image and audio as public HTTP/HTTPS URLs.
  • If the image fails detection, do not proceed directly to video generation.
  • 建议使用人脸清晰、构图稳定的肖像、半身或全身图像。
  • 音频时长需与期望的输出时长匹配,输出视频时长与音频一致,最长不超过模型限制。
  • 图像和音频需为公网可访问的HTTP/HTTPS链接。
  • 如果图像未通过检测,请勿直接发起视频生成请求。

Output location

输出位置

  • Default output:
    output/aliyun-wan-digital-human/request.json
  • Override base dir with
    OUTPUT_DIR
    .
  • 默认输出路径:
    output/aliyun-wan-digital-human/request.json
  • 可通过
    OUTPUT_DIR
    环境变量覆盖基础输出目录。

References

参考资料

  • references/sources.md
  • references/sources.md