luma-digital-human

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Luma Digital Human

Luma 数字人

Use this skill when an agent needs to create a digital-human spoken video from script text, a voice, and an avatar.
Read
../luma-shared/SKILL.md
first for common auth, project, output, and artifact rules.
当Agent需要根据脚本文本、语音和数字人形象创建数字人口播视频时,可使用本技能。
请先阅读
../luma-shared/SKILL.md
,了解通用的认证、项目、输出及产物规则。

Asset First

先查看资源

Inspect available voices and avatars:
bash
luma-cli asset list voice
luma-cli asset list roles
If the user provides a reference voice sample, clone it first:
bash
luma-cli voice clone ./voice.wav --name my_voice
If the user provides a video and says they want the sound, voice, tone, or audio from it, treat the video as a voice source and use voice clone. Do not upload that video as a
roles
avatar/source-role asset unless the user explicitly says they want the person/visual in the video to appear.
If the user provides a local avatar video, upload it:
bash
luma-cli asset upload avatar.mp4 --group roles
查看可用的语音和数字人形象:
bash
luma-cli asset list voice
luma-cli asset list roles
如果用户提供参考语音样本,请先进行克隆:
bash
luma-cli voice clone ./voice.wav --name my_voice
如果用户提供视频并表示需要其中的声音、语音、语调或音频,请将该视频视为语音源并使用语音克隆功能。除非用户明确表示希望视频中的人物/视觉形象出现,否则不要将该视频作为
roles
数字人形象/源角色资源上传。
如果用户提供本地数字人形象视频,请上传:
bash
luma-cli asset upload avatar.mp4 --group roles

Standard Flow

标准流程

  1. Create or select a project:
    bash
    luma-cli project create demo
    luma-cli project use demo
  2. Generate voice:
    bash
    luma-cli --json tts "script text" --voice my_voice --speech-rate 1.1 --output step2_tts.wav
    The
    --json
    flag returns
    audio_object_key
    in the output envelope. Use this key in step 3 to skip a redundant upload.
  3. Generate lip-sync video (prefer
    --audio-key
    over
    --audio
    ):
    bash
    luma-cli lipsync --avatar 数字人男 --audio-key <audio_object_key> --output step3_lipsync.mp4
    If
    --audio-key
    is omitted, lipsync falls back to the project's
    latest_tts_key
    , then to
    --audio
    file upload.
  4. Add subtitles:
    bash
    luma-cli subtitle step3_lipsync.mp4 --output step5_subtitle.mp4
  5. Optionally enhance:
    bash
    luma-cli enhance step5_subtitle.mp4 --scale 2
  1. 创建或选择项目:
    bash
    luma-cli project create demo
    luma-cli project use demo
  2. 生成语音:
    bash
    luma-cli --json tts "script text" --voice my_voice --speech-rate 1.1 --output step2_tts.wav
    --json
    参数会在输出信息中返回
    audio_object_key
    。在步骤3中使用该密钥可避免重复上传。
  3. 生成唇形同步视频(优先使用
    --audio-key
    而非
    --audio
    ):
    bash
    luma-cli lipsync --avatar 数字人男 --audio-key <audio_object_key> --output step3_lipsync.mp4
    如果省略
    --audio-key
    ,唇形同步会先使用项目的
    latest_tts_key
    ,再回退到
    --audio
    文件上传。
  4. 添加字幕:
    bash
    luma-cli subtitle step3_lipsync.mp4 --output step5_subtitle.mp4
  5. 可选增强处理:
    bash
    luma-cli enhance step5_subtitle.mp4 --scale 2

Agent Notes

Agent注意事项

  • Script must come from research, not imagination. If the script is for a short-video production, the text source must be backed by
    luma-cli research run
    data or a known viral reference. Never invent a script topic without data support. See
    ../luma-workflow-viral-remix/SKILL.md
    for the full research → rewrite flow.
  • Use
    voice.clone
    when a user provides a voice sample.
  • Uploaded videos are ambiguous. If the user did not clearly say whether the video is for voice clone, avatar/source role, PIP material, ASR/rewrite, or video processing, ask a short confirmation before choosing a workflow.
  • Do not assume a user-uploaded video is a digital-human avatar. If the user asks to use "the voice/audio/sound inside this video", extract or use audio for voice clone instead.
  • Use
    asset.list voice
    and
    asset.list roles
    when the user asks what is available.
  • Use the latest project TTS output for lip-sync unless the user explicitly provides
    --audio
    .
  • Keep the script text outside media commands until it is final enough for this generation attempt.
  • Do not enhance every draft; enhance only the selected final render.
  • Keep script revisions outside the media commands. The CLI should receive the final text for each generation attempt.
  • Use advanced backend parameters only when the user asks for them:
    • TTS:
      --trim-long-silence
    • Lip-sync:
      --random-start
      ,
      --guidance-scale
      ,
      --num-inference-steps
      ,
      --no-superres
      ,
      --superres-scale
      ,
      --multi-shot-json
  • 脚本必须基于调研内容,不可凭空编造。 如果脚本用于短视频制作,文本来源必须基于
    luma-cli research run
    的数据或已知的爆款参考内容。绝不能在无数据支持的情况下凭空构思脚本主题。完整的调研→改写流程请参考
    ../luma-workflow-viral-remix/SKILL.md
  • 当用户提供语音样本时,使用
    voice.clone
    功能。
  • 用户上传的视频可能存在歧义。如果用户未明确说明该视频用于语音克隆、数字人形象/源角色、画中画素材、ASR/改写还是视频处理,在选择工作流前请先简短确认。
  • 不要默认用户上传的视频是数字人形象。如果用户要求使用“该视频中的语音/音频/声音”,请提取或使用音频进行语音克隆。
  • 当用户询问可用资源时,使用
    asset.list voice
    asset.list roles
    命令。
  • 除非用户明确提供
    --audio
    ,否则使用项目最新的TTS输出进行唇形同步。
  • 在脚本足够最终确定之前,不要将脚本文本放入媒体命令中。
  • 不要对每个草稿都进行增强处理;仅对选定的最终渲染版本进行增强。
  • 脚本修订操作不要包含在媒体命令中。CLI应接收每次生成尝试的最终文本。
  • 仅当用户要求时才使用高级后端参数:
    • TTS:
      --trim-long-silence
    • 唇形同步:
      --random-start
      ,
      --guidance-scale
      ,
      --num-inference-steps
      ,
      --no-superres
      ,
      --superres-scale
      ,
      --multi-shot-json