muapi-ai-fight-scene

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI Fight Scene Generator

AI打斗场景生成器

Generate a high-cut-density action / fight scene by first composing a 16-cell storyboard image, then driving Seedance 2.0 image-to-video off that storyboard.
Estimated credits: ~250 per run.
The core idea (from the AtlasCloud Seedance 2 + GPT Image 2 tutorial): action tension comes from cut density, not single-shot quality. Forcing the video model to follow a pre-drawn 4×4 storyboard grid gives you 16 distinct shots in a 15-second clip — landing punches, reverse angles, ECUs, whip-pans — that no t2v prompt could choreograph on its own.
先制作一张16格分镜图,再基于该分镜图通过Seedance 2.0图生视频工具生成高镜头密度的动作/打斗场景。
预估消耗点数: 每次运行约250点。
核心思路(来自AtlasCloud Seedance 2 + GPT Image 2教程):动作张力源于镜头密度,而非单镜头质量。 强制视频模型遵循预先绘制的4×4分镜网格,可在15秒的片段中呈现16个不同镜头——出拳、反打角度、特写镜头、甩镜等,这是纯文本转视频提示词无法独立编排的效果。

Inputs

输入参数

NameTypeRequiredDefaultDescription
character_description
textyesFull physical description of the fighter(s). Asymmetric details (eye colour, scar side, holster on left hip) help the model preserve identity across panels.
environment_description
textyesThe scene setting — e.g. "cyberpunk wet back-alley, neon kanji signage, Stray-game aesthetic, rain on chrome."
action_script
textyesThe action beat — prose or numbered beats. E.g. "Hero is cornered → blocks first punch → counter-elbow → throw opponent into trash cans → finisher."
style_direction
textnocinematic action film, anamorphic lens, high contrast, motion blur on hitsAesthetic / look tags applied to every frame.
duration
intno15Final video length in seconds. The storyboard's 16 cells map roughly 1 shot per second at default.
aspect_ratio
textno16:9Output aspect —
16:9
cinematic,
9:16
vertical,
1:1
square.
名称类型是否必填默认值描述
character_description
文本打斗角色的完整外貌描述。不对称细节(如眼睛颜色、疤痕位置、左侧腰带上的枪套)有助于模型在不同画面中保持角色一致性。
environment_description
文本场景设定——例如:"赛博朋克潮湿后巷,霓虹日文标识,《Stray》游戏风格,雨水打在金属表面。"
action_script
文本动作节拍——散文或编号步骤形式。例如:"主角被包围 → 格挡第一拳 → 反击肘击 → 将对手扔进垃圾桶 → 终结招式。"
style_direction
文本cinematic action film, anamorphic lens, high contrast, motion blur on hits应用于每一帧的美学/视觉风格标签。
duration
整数15最终视频时长(秒)。默认设置下,分镜的16个画面大致对应每秒一个镜头。
aspect_ratio
文本16:9输出画面比例——
16:9
宽屏电影格式,
9:16
竖屏,
1:1
正方形。

Steps

操作步骤

Phase A — Character Sheet

阶段A — 角色设定图

Generate a clean turnaround-style character sheet using
muapi image generate
(model=
gpt-image-2-text-to-image
):
  • Prompt:
    Character reference sheet of {{character_description}}. Three views — front, 3/4, profile — on a neutral grey backdrop. Studio lighting, full body, no text overlays, photoreal. Asymmetric identifying details preserved on the correct side. {{style_direction}}.
  • Aspect ratio:
    3:2
Present the character sheet and confirm identity details look right before proceeding. This image becomes reference #1 for later phases.
使用
muapi image generate
(模型=
gpt-image-2-text-to-image
)生成清晰的多角度角色设定图:
  • 提示词:
    Character reference sheet of {{character_description}}. Three views — front, 3/4, profile — on a neutral grey backdrop. Studio lighting, full body, no text overlays, photoreal. Asymmetric identifying details preserved on the correct side. {{style_direction}}.
  • 画面比例:
    3:2
展示角色设定图并确认身份细节无误后再继续。此图将作为后续阶段的参考素材#1。

Phase B — Environment Concept

阶段B — 环境概念设计

Use
muapi image generate
(model=
nano-banana-2
) to design the scene/world:
  • Prompt:
    Wide establishing shot of {{environment_description}}. No characters in frame — environment only. Strong perspective lines, depth, atmospheric haze. {{style_direction}}. Production-design concept art.
  • Aspect ratio:
    {{aspect_ratio}}
Nano-Banana-2 is chosen here for its reasoning-driven composition — it's better than text-to-image-only models at producing locations with believable spatial logic (chokepoints, cover, sightlines) that an action scene can use. Present for approval. This becomes reference #2.
使用
muapi image generate
(模型=
nano-banana-2
)设计场景/世界观:
  • 提示词:
    Wide establishing shot of {{environment_description}}. No characters in frame — environment only. Strong perspective lines, depth, atmospheric haze. {{style_direction}}. Production-design concept art.
  • 画面比例:
    {{aspect_ratio}}
选择Nano-Banana-2是因为它擅长基于逻辑构建构图——相比纯文本转图像模型,它能生成具有可信空间逻辑(如 choke points、掩护物、视线)的场景,适合动作戏使用。展示并确认设计。此图将作为参考素材#2。

Phase C — 16-Cell Storyboard

阶段C — 16格分镜图

Compose the action onto a single 4×4 storyboard image using
muapi image edit
(model=
gpt-image-2-image-to-image
):
  • Reference Images: the character sheet from Phase A and the environment plate from Phase B.
  • Prompt:
    Compose a 4×4 storyboard grid (16 numbered cells) for the following action sequence:
    {{action_script}}
    
    CHARACTER (use reference image 1 identity throughout, asymmetric details preserved):
    {{character_description}}
    
    LOCATION (use reference image 2 spatial layout):
    {{environment_description}}
    
    Each cell labels: SHOT # (1–16) · SIZE (WIDE / MS / CU / ECU) · CAMERA-MOVE arrow (push, pull, whip, dolly, crash-zoom, handheld) · 1-word RHYTHM note (BEAT / IMPACT / RECOVERY / RESET).
    
    Vary shot size aggressively — never two WIDEs in a row. Land every IMPACT on a CU or ECU.
    Hand-drawn comic-book ink-and-wash style, monochrome with selective red accents on hits.
    Numbered cells, clear gutters between panels.
    
    Aesthetic: {{style_direction}}.
  • Aspect ratio:
    1:1
    (square works best for a 4×4 grid)
Present the storyboard to the user. Confirm:
  • The 16 shots read clearly
  • Identity stays consistent cell-to-cell
  • Cut density / shot-size variation looks aggressive enough
If a panel reads poorly, regenerate just the storyboard with that cell's note bolded ("CELL 7 must be an ECU on the right fist").
使用
muapi image edit
(模型=
gpt-image-2-image-to-image
)将动作编排到一张4×4的分镜图中:
  • 参考图像:阶段A生成的角色设定图 阶段B生成的场景图。
  • 提示词:
    Compose a 4×4 storyboard grid (16 numbered cells) for the following action sequence:
    {{action_script}}
    
    CHARACTER (use reference image 1 identity throughout, asymmetric details preserved):
    {{character_description}}
    
    LOCATION (use reference image 2 spatial layout):
    {{environment_description}}
    
    Each cell labels: SHOT # (1–16) · SIZE (WIDE / MS / CU / ECU) · CAMERA-MOVE arrow (push, pull, whip, dolly, crash-zoom, handheld) · 1-word RHYTHM note (BEAT / IMPACT / RECOVERY / RESET).
    
    Vary shot size aggressively — never two WIDEs in a row. Land every IMPACT on a CU or ECU.
    Hand-drawn comic-book ink-and-wash style, monochrome with selective red accents on hits.
    Numbered cells, clear gutters between panels.
    
    Aesthetic: {{style_direction}}.
  • 画面比例:
    1:1
    (正方形最适合4×4网格)
向用户展示分镜图,确认:
  • 16个镜头逻辑清晰
  • 角色身份在各画面中保持一致
  • 镜头密度/镜头尺寸变化足够丰富
如果某一画面效果不佳,可重新生成分镜图并将该画面的要求加粗(例如"CELL 7必须是右拳的特写镜头")。

Phase D — Storyboard → Video (Seedance 2.0)

阶段D — 分镜转视频(Seedance 2.0)

Hand the storyboard to
muapi video from-image
(model=
seedance-v2.0-i2v
):
  • Reference Image: the 16-cell storyboard from Phase C.
  • Prompt:
    Generate a {{duration}}-second action sequence that strictly follows the 16-cell storyboard reference image, cell-by-cell, top-left to bottom-right.
    
    - Honour each cell's labelled SHOT SIZE and CAMERA-MOVE — match cuts to the storyboard's rhythm notes.
    - Strong cinematic feel and shot language. Exaggerated dynamics. Hits land hard with motion blur and impact frames.
    - Camera language: anamorphic, handheld where the storyboard calls for it, locked-off where it doesn't.
    - Native audio: impact sfx on every IMPACT cell, footsteps, fabric/Foley, restrained low score under the action.
    
    Action being rendered: {{action_script}}.
    Aesthetic: {{style_direction}}.
  • Duration:
    {{duration}}
    (default 15)
  • Aspect ratio:
    {{aspect_ratio}}
After generation, present the final video. If the cut density feels too low or shots don't match the storyboard, regenerate Phase D first (cheaper than rebuilding the storyboard) with the prompt emphasising "strict cell-by-cell adherence" more aggressively.
将分镜图导入
muapi video from-image
(模型=
seedance-v2.0-i2v
):
  • 参考图像:阶段C生成的16格分镜图。
  • 提示词:
    Generate a {{duration}}-second action sequence that strictly follows the 16-cell storyboard reference image, cell-by-cell, top-left to bottom-right.
    
    - Honour each cell's labelled SHOT SIZE and CAMERA-MOVE — match cuts to the storyboard's rhythm notes.
    - Strong cinematic feel and shot language. Exaggerated dynamics. Hits land hard with motion blur and impact frames.
    - Camera language: anamorphic, handheld where the storyboard calls for it, locked-off where it doesn't.
    - Native audio: impact sfx on every IMPACT cell, footsteps, fabric/Foley, restrained low score under the action.
    
    Action being rendered: {{action_script}}.
    Aesthetic: {{style_direction}}.
  • 时长:
    {{duration}}
    (默认15秒)
  • 画面比例:
    {{aspect_ratio}}
生成完成后展示最终视频。如果镜头密度过低或镜头与分镜不符,可先重新生成阶段D(比分镜重构成本更低),并在提示词中更强调"严格遵循逐格画面"。

Notes

注意事项

  • Why the storyboard image and not a text storyboard? Seedance 2.0 i2v anchors its motion plan to the visual reference. A grid of 16 drawn cells gives it 16 visual targets to hit — text descriptions of shots get averaged into mush.
  • Asymmetric character details matter. Without something like "scar over the right eyebrow" or "leather glove on the left hand only", identity drift between cells is the #1 failure mode.
  • Use
    seedance-2.0-i2v-480p
    to draft.
    Cheaper preview pass before committing to the full-res
    seedance-v2.0-i2v
    run.
  • For longer fights, chain two runs: first run uses storyboard A (cells 1–16, beats 1–15s); second run uses storyboard B (cells 17–32, beats 15–30s) with the last cell of A as a continuity anchor in B's first cell.
  • Language: Both English and Chinese prompts work in all four models, so the storyboard cell labels can be in either language.
  • 为什么用分镜图而非文本分镜? Seedance 2.0 i2v会将运动计划锚定到视觉参考。16格绘制的画面为它提供了16个明确的视觉目标——而镜头的文本描述会被模型平均化,导致效果模糊。
  • 角色不对称细节至关重要。 如果没有诸如"右眉上方的疤痕"或"仅左手戴皮手套"这类细节,角色身份在不同画面中漂移是最常见的失败原因。
  • 使用
    seedance-2.0-i2v-480p
    进行草稿生成。
    在使用全分辨率
    seedance-v2.0-i2v
    生成前,先用更低成本的预览版本测试。
  • 如需更长打斗场景,可将两次运行串联:第一次运行使用分镜A(画面1–16,对应0–15秒动作);第二次运行使用分镜B(画面17–32,对应15–30秒动作),并将分镜A的最后一帧作为分镜B第一帧的连续性锚点。
  • 语言支持:所有四个模型均支持英文和中文提示词,因此分镜画面的标签可使用任意一种语言。

Trigger Keywords

触发关键词

fight scene
,
action sequence
,
storyboard to video
,
cut density
,
cinematic action
,
combat choreography
,
seedance 2 storyboard
fight scene
,
action sequence
,
storyboard to video
,
cut density
,
cinematic action
,
combat choreography
,
seedance 2 storyboard

Pipeline at a Glance

流程概览

character_description ──► [GPT-Image-2 t2i]   ─► character sheet ──┐
environment_description ─► [Nano-Banana-2 t2i] ─► environment plate ┼─► [GPT-Image-2 i2i] ─► 16-cell storyboard ─► [Seedance 2.0 i2v] ─► 15s action video
action_script + style_direction ───────────────────────────────────►┘

character_description ──► [GPT-Image-2 t2i]   ─► character sheet ──┐
environment_description ─► [Nano-Banana-2 t2i] ─► environment plate ┼─► [GPT-Image-2 i2i] ─► 16-cell storyboard ─► [Seedance 2.0 i2v] ─► 15s action video
action_script + style_direction ───────────────────────────────────►┘

Notes for the Executing Agent

执行Agent注意事项

  • This recipe is LLM-orchestrated: read each phase, gather any missing inputs from the user, then call
    muapi
    CLI commands. Use
    muapi auth configure
    first if
    MUAPI_API_KEY
    is unset.
  • For model IDs without a CLI alias yet, fall back to the raw endpoint via
    curl -X POST https://api.muapi.ai/api/v1/<endpoint> -H "x-api-key: $MUAPI_API_KEY" -H 'content-type: application/json' -d '{...}'
    and poll with
    muapi predict wait <request_id>
    .
  • Phase C uses TWO reference images (character sheet + environment plate). When calling
    gpt-image-2-image-to-image
    , pass them as a list under
    images_list
    (or the model's documented multi-ref field).
  • Substitute
    {{input_name}}
    placeholders with the user's actual inputs before issuing each call.
  • 此流程由LLM编排:阅读每个阶段,向用户收集缺失的输入参数,然后调用
    muapi
    CLI命令。如果
    MUAPI_API_KEY
    未设置,先执行
    muapi auth configure
  • 对于尚未有CLI别名的模型ID,可通过原始端点调用:
    curl -X POST https://api.muapi.ai/api/v1/<endpoint> -H "x-api-key: $MUAPI_API_KEY" -H 'content-type: application/json' -d '{...}'
    ,并使用
    muapi predict wait <request_id>
    轮询结果。
  • 阶段C使用两张参考图像(角色设定图+场景图)。调用
    gpt-image-2-image-to-image
    时,需将它们作为列表传入
    images_list
    (或模型文档指定的多参考字段)。
  • 在发出每个调用前,将
    {{input_name}}
    占位符替换为用户的实际输入。