luma-digital-human
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLuma Digital Human
Luma 数字人
Use this skill when an agent needs to create a digital-human spoken video from script text, a voice, and an avatar.
Read first for common auth, project, output, and artifact rules.
../luma-shared/SKILL.md当Agent需要根据脚本文本、语音和数字人形象创建数字人口播视频时,可使用本技能。
请先阅读,了解通用的认证、项目、输出及产物规则。
../luma-shared/SKILL.mdAsset First
先查看资源
Inspect available voices and avatars:
bash
luma-cli asset list voice
luma-cli asset list rolesIf the user provides a reference voice sample, clone it first:
bash
luma-cli voice clone ./voice.wav --name my_voiceIf the user provides a video and says they want the sound, voice, tone, or audio from it, treat the video as a voice source and use voice clone. Do not upload that video as a avatar/source-role asset unless the user explicitly says they want the person/visual in the video to appear.
rolesIf the user provides a local avatar video, upload it:
bash
luma-cli asset upload avatar.mp4 --group roles查看可用的语音和数字人形象:
bash
luma-cli asset list voice
luma-cli asset list roles如果用户提供参考语音样本,请先进行克隆:
bash
luma-cli voice clone ./voice.wav --name my_voice如果用户提供视频并表示需要其中的声音、语音、语调或音频,请将该视频视为语音源并使用语音克隆功能。除非用户明确表示希望视频中的人物/视觉形象出现,否则不要将该视频作为数字人形象/源角色资源上传。
roles如果用户提供本地数字人形象视频,请上传:
bash
luma-cli asset upload avatar.mp4 --group rolesStandard Flow
标准流程
-
Create or select a project:bash
luma-cli project create demo luma-cli project use demo -
Generate voice:bash
luma-cli --json tts "script text" --voice my_voice --speech-rate 1.1 --output step2_tts.wavTheflag returns--jsonin the output envelope. Use this key in step 3 to skip a redundant upload.audio_object_key -
Generate lip-sync video (preferover
--audio-key):--audiobashluma-cli lipsync --avatar 数字人男 --audio-key <audio_object_key> --output step3_lipsync.mp4Ifis omitted, lipsync falls back to the project's--audio-key, then tolatest_tts_keyfile upload.--audio -
Add subtitles:bash
luma-cli subtitle step3_lipsync.mp4 --output step5_subtitle.mp4 -
Optionally enhance:bash
luma-cli enhance step5_subtitle.mp4 --scale 2
-
创建或选择项目:bash
luma-cli project create demo luma-cli project use demo -
生成语音:bash
luma-cli --json tts "script text" --voice my_voice --speech-rate 1.1 --output step2_tts.wav参数会在输出信息中返回--json。在步骤3中使用该密钥可避免重复上传。audio_object_key -
生成唇形同步视频(优先使用而非
--audio-key):--audiobashluma-cli lipsync --avatar 数字人男 --audio-key <audio_object_key> --output step3_lipsync.mp4如果省略,唇形同步会先使用项目的--audio-key,再回退到latest_tts_key文件上传。--audio -
添加字幕:bash
luma-cli subtitle step3_lipsync.mp4 --output step5_subtitle.mp4 -
可选增强处理:bash
luma-cli enhance step5_subtitle.mp4 --scale 2
Agent Notes
Agent注意事项
- Script must come from research, not imagination. If the script is for a short-video production, the text source must be backed by data or a known viral reference. Never invent a script topic without data support. See
luma-cli research runfor the full research → rewrite flow.../luma-workflow-viral-remix/SKILL.md - Use when a user provides a voice sample.
voice.clone - Uploaded videos are ambiguous. If the user did not clearly say whether the video is for voice clone, avatar/source role, PIP material, ASR/rewrite, or video processing, ask a short confirmation before choosing a workflow.
- Do not assume a user-uploaded video is a digital-human avatar. If the user asks to use "the voice/audio/sound inside this video", extract or use audio for voice clone instead.
- Use and
asset.list voicewhen the user asks what is available.asset.list roles - Use the latest project TTS output for lip-sync unless the user explicitly provides .
--audio - Keep the script text outside media commands until it is final enough for this generation attempt.
- Do not enhance every draft; enhance only the selected final render.
- Keep script revisions outside the media commands. The CLI should receive the final text for each generation attempt.
- Use advanced backend parameters only when the user asks for them:
- TTS:
--trim-long-silence - Lip-sync: ,
--random-start,--guidance-scale,--num-inference-steps,--no-superres,--superres-scale--multi-shot-json
- TTS:
- 脚本必须基于调研内容,不可凭空编造。 如果脚本用于短视频制作,文本来源必须基于的数据或已知的爆款参考内容。绝不能在无数据支持的情况下凭空构思脚本主题。完整的调研→改写流程请参考
luma-cli research run。../luma-workflow-viral-remix/SKILL.md - 当用户提供语音样本时,使用功能。
voice.clone - 用户上传的视频可能存在歧义。如果用户未明确说明该视频用于语音克隆、数字人形象/源角色、画中画素材、ASR/改写还是视频处理,在选择工作流前请先简短确认。
- 不要默认用户上传的视频是数字人形象。如果用户要求使用“该视频中的语音/音频/声音”,请提取或使用音频进行语音克隆。
- 当用户询问可用资源时,使用和
asset.list voice命令。asset.list roles - 除非用户明确提供,否则使用项目最新的TTS输出进行唇形同步。
--audio - 在脚本足够最终确定之前,不要将脚本文本放入媒体命令中。
- 不要对每个草稿都进行增强处理;仅对选定的最终渲染版本进行增强。
- 脚本修订操作不要包含在媒体命令中。CLI应接收每次生成尝试的最终文本。
- 仅当用户要求时才使用高级后端参数:
- TTS:
--trim-long-silence - 唇形同步:,
--random-start,--guidance-scale,--num-inference-steps,--no-superres,--superres-scale--multi-shot-json
- TTS: