luma-digital-human

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Luma Digital Human

Luma 数字人

Use this skill when an agent needs to create a digital-human spoken video from script text, a voice, and an avatar.

Read

../luma-shared/SKILL.md

first for common auth, project, output, and artifact rules.

当Agent需要根据脚本文本、语音和数字人形象创建数字人口播视频时，可使用本技能。

请先阅读

../luma-shared/SKILL.md

，了解通用的认证、项目、输出及产物规则。

Asset First

先查看资源

Inspect available voices and avatars:

bash

luma-cli asset list voice
luma-cli asset list roles

If the user provides a reference voice sample, clone it first:

bash

luma-cli voice clone ./voice.wav --name my_voice

If the user provides a video and says they want the sound, voice, tone, or audio from it, treat the video as a voice source and use voice clone. Do not upload that video as a

roles

avatar/source-role asset unless the user explicitly says they want the person/visual in the video to appear.

If the user provides a local avatar video, upload it:

bash

luma-cli asset upload avatar.mp4 --group roles

查看可用的语音和数字人形象：

bash

luma-cli asset list voice
luma-cli asset list roles

如果用户提供参考语音样本，请先进行克隆：

bash

luma-cli voice clone ./voice.wav --name my_voice

如果用户提供视频并表示需要其中的声音、语音、语调或音频，请将该视频视为语音源并使用语音克隆功能。除非用户明确表示希望视频中的人物/视觉形象出现，否则不要将该视频作为

roles

数字人形象/源角色资源上传。

如果用户提供本地数字人形象视频，请上传：

bash

luma-cli asset upload avatar.mp4 --group roles

Standard Flow

标准流程

Create or select a project:

bash

luma-cli project create demo
luma-cli project use demo

Generate voice:

bash

luma-cli --json tts "script text" --voice my_voice --speech-rate 1.1 --output step2_tts.wav

The

--json

flag returns

audio_object_key

in the output envelope. Use this key in step 3 to skip a redundant upload.

Generate lip-sync video (prefer

--audio-key

over

--audio

bash

luma-cli lipsync --avatar 数字人男 --audio-key <audio_object_key> --output step3_lipsync.mp4

--audio-key

is omitted, lipsync falls back to the project's

latest_tts_key

, then to

--audio

file upload.

Add subtitles:

bash

luma-cli subtitle step3_lipsync.mp4 --output step5_subtitle.mp4

Optionally enhance:

bash

luma-cli enhance step5_subtitle.mp4 --scale 2

创建或选择项目：

bash

luma-cli project create demo
luma-cli project use demo

生成语音：

bash

luma-cli --json tts "script text" --voice my_voice --speech-rate 1.1 --output step2_tts.wav

--json

参数会在输出信息中返回

audio_object_key

。在步骤3中使用该密钥可避免重复上传。

生成唇形同步视频（优先使用

--audio-key

而非

--audio

）：

bash

luma-cli lipsync --avatar 数字人男 --audio-key <audio_object_key> --output step3_lipsync.mp4

如果省略

--audio-key

，唇形同步会先使用项目的

latest_tts_key

，再回退到

--audio

文件上传。

添加字幕：

bash

luma-cli subtitle step3_lipsync.mp4 --output step5_subtitle.mp4

可选增强处理：

bash

luma-cli enhance step5_subtitle.mp4 --scale 2

Agent Notes

Agent注意事项

Script must come from research, not imagination. If the script is for a short-video production, the text source must be backed by
```
luma-cli research run
```
data or a known viral reference. Never invent a script topic without data support. See
```
../luma-workflow-viral-remix/SKILL.md
```
for the full research → rewrite flow.
Use
```
voice.clone
```
when a user provides a voice sample.
Uploaded videos are ambiguous. If the user did not clearly say whether the video is for voice clone, avatar/source role, PIP material, ASR/rewrite, or video processing, ask a short confirmation before choosing a workflow.
Do not assume a user-uploaded video is a digital-human avatar. If the user asks to use "the voice/audio/sound inside this video", extract or use audio for voice clone instead.
Use
```
asset.list voice
```
and
```
asset.list roles
```
when the user asks what is available.
Use the latest project TTS output for lip-sync unless the user explicitly provides
```
--audio
```
.
Keep the script text outside media commands until it is final enough for this generation attempt.
Do not enhance every draft; enhance only the selected final render.
Keep script revisions outside the media commands. The CLI should receive the final text for each generation attempt.

Use advanced backend parameters only when the user asks for them:

TTS:
```
--trim-long-silence
```

Lip-sync:

--random-start

--guidance-scale

--num-inference-steps

--no-superres

--superres-scale

--multi-shot-json

脚本必须基于调研内容，不可凭空编造。 如果脚本用于短视频制作，文本来源必须基于
```
luma-cli research run
```
的数据或已知的爆款参考内容。绝不能在无数据支持的情况下凭空构思脚本主题。完整的调研→改写流程请参考
```
../luma-workflow-viral-remix/SKILL.md
```
。
当用户提供语音样本时，使用
```
voice.clone
```
功能。
用户上传的视频可能存在歧义。如果用户未明确说明该视频用于语音克隆、数字人形象/源角色、画中画素材、ASR/改写还是视频处理，在选择工作流前请先简短确认。
不要默认用户上传的视频是数字人形象。如果用户要求使用“该视频中的语音/音频/声音”，请提取或使用音频进行语音克隆。
当用户询问可用资源时，使用
```
asset.list voice
```
和
```
asset.list roles
```
命令。
除非用户明确提供
```
--audio
```
，否则使用项目最新的TTS输出进行唇形同步。
在脚本足够最终确定之前，不要将脚本文本放入媒体命令中。
不要对每个草稿都进行增强处理；仅对选定的最终渲染版本进行增强。
脚本修订操作不要包含在媒体命令中。CLI应接收每次生成尝试的最终文本。

仅当用户要求时才使用高级后端参数：

TTS：
```
--trim-long-silence
```

唇形同步：

--random-start

--guidance-scale

--num-inference-steps

--no-superres

--superres-scale

--multi-shot-json