aliyun-wan-digital-human
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCategory: provider
分类:提供商
Model Studio Digital Human
Model Studio 数字人
Validation
验证
bash
mkdir -p output/aliyun-wan-digital-human
python -m py_compile skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py && echo "py_compile_ok" > output/aliyun-wan-digital-human/validate.txtPass criteria: command exits 0 and is generated.
output/aliyun-wan-digital-human/validate.txtbash
mkdir -p output/aliyun-wan-digital-human
python -m py_compile skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py && echo "py_compile_ok" > output/aliyun-wan-digital-human/validate.txt通过条件:命令退出码为0,且成功生成 文件。
output/aliyun-wan-digital-human/validate.txtOutput And Evidence
输出与证明
- Save normalized request payloads, chosen resolution, and task polling snapshots under .
output/aliyun-wan-digital-human/ - Record image/audio URLs and whether the input image passed detection.
Use this skill for image + audio driven speaking, singing, or presenting characters.
- 将标准化的请求载荷、选择的分辨率、任务轮询快照保存到 目录下。
output/aliyun-wan-digital-human/ - 记录图像/音频URL,以及输入图像是否通过检测。
本技能可实现图像+音频驱动的人物说话、唱歌或演示效果。
Critical model names
关键模型名称
Use these exact model strings:
wan2.2-s2v-detectwan2.2-s2v
Selection guidance:
- Run first to validate the image.
wan2.2-s2v-detect - Use for the actual video generation job.
wan2.2-s2v
请使用以下准确的模型字符串:
wan2.2-s2v-detectwan2.2-s2v
选择指南:
- 首先运行 验证输入图像是否合规。
wan2.2-s2v-detect - 使用 执行实际的视频生成任务。
wan2.2-s2v
Prerequisites
前置条件
- China mainland (Beijing) only.
- Set in your environment, or add
DASHSCOPE_API_KEYtodashscope_api_key.~/.alibabacloud/credentials - Input audio should contain clear speech or singing, and input image should depict a clear subject.
- 仅支持中国大陆(北京)区域。
- 在环境变量中设置 ,或者在
DASHSCOPE_API_KEY文件中添加~/.alibabacloud/credentials配置。dashscope_api_key - 输入音频需包含清晰的语音或歌声,输入图像需呈现清晰的人物主体。
Normalized interface (video.digital_human)
标准化接口(video.digital_human)
Detect Request
检测请求
- (string, optional): default
modelwan2.2-s2v-detect - (string, required)
image_url
- (字符串,可选):默认值为
modelwan2.2-s2v-detect - (字符串,必填)
image_url
Generate Request
生成请求
- (string, optional): default
modelwan2.2-s2v - (string, required)
image_url - (string, required)
audio_url - (string, optional):
resolutionor480P720P - (string, optional):
scenario,talk, orsingperform
- (字符串,可选):默认值为
modelwan2.2-s2v - (字符串,必填)
image_url - (字符串,必填)
audio_url - (字符串,可选):可选值为
resolution或480P720P - (字符串,可选):可选值为
scenario、talk或singperform
Response
返回结果
- (string)
task_id - (string)
task_status - (string, when finished)
video_url
- (字符串)
task_id - (字符串)
task_status - (字符串,任务完成时返回)
video_url
Quick start
快速开始
bash
python skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py \
--image-url "https://example.com/anchor.png" \
--audio-url "https://example.com/voice.mp3" \
--resolution 720P \
--scenario talkbash
python skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py \
--image-url "https://example.com/anchor.png" \
--audio-url "https://example.com/voice.mp3" \
--resolution 720P \
--scenario talkOperational guidance
操作指南
- Use a portrait, half-body, or full-body image with a clear face and stable framing.
- Match audio length to the desired output duration; the output follows the audio length up to the model limit.
- Keep image and audio as public HTTP/HTTPS URLs.
- If the image fails detection, do not proceed directly to video generation.
- 建议使用人脸清晰、构图稳定的肖像、半身或全身图像。
- 音频时长需与期望的输出时长匹配,输出视频时长与音频一致,最长不超过模型限制。
- 图像和音频需为公网可访问的HTTP/HTTPS链接。
- 如果图像未通过检测,请勿直接发起视频生成请求。
Output location
输出位置
- Default output:
output/aliyun-wan-digital-human/request.json - Override base dir with .
OUTPUT_DIR
- 默认输出路径:
output/aliyun-wan-digital-human/request.json - 可通过 环境变量覆盖基础输出目录。
OUTPUT_DIR
References
参考资料
references/sources.md
references/sources.md