muapi-storyboard-to-cooking-video

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Storyboard to Cooking Video

故事板转烹饪视频

Turn a single photo of a person into a polished 15-second cinematic cooking tutorial. The skill first generates a high-end production reference sheet — character look, kitchen environment, and a 9-panel action board — then drives a continuous reference-to-video render that keeps the subject's face, outfit, and kitchen consistent across every frame.
Estimated credits: ~120 per run (1 image edit + 1 video at 720p / 15s with audio).
将单张人物照片转换为制作精良的15秒电影级烹饪教程。该技能首先生成一份高端制作参考图——人物形象、厨房环境和9格动作板——然后驱动从参考图到视频的连续渲染,确保人物的面部、着装和厨房在每一帧中保持一致。
预估消耗点数: 每次运行约120点(1次图像编辑 + 1段720p/15秒带音频的视频)。

Inputs

输入参数

NameTypeRequiredDefaultDescription
person_image
image_urlyesURL of the person photo. Used as identity reference in BOTH the reference sheet and the final video.
dish
textnofresh pastaThe cooking subject (e.g. "fresh pasta", "sushi rolls", "wood-fired pizza", "matcha latte"). Drives the 9-step action board.
kitchen_style
textnowarm rustic-modern ItalianThe kitchen aesthetic (e.g. "warm rustic-modern Italian", "minimalist Tokyo", "bright Scandinavian", "moody industrial").
outfit
textnowhite t-shirt, olive green apron, dark trousersWhat the person wears throughout the video.
duration_seconds
intno15Final video duration. Use 15 for the full 9-step arc; 10 collapses to ~6 beats.
aspect_ratio
textno16:9Output aspect ratio. Use
9:16
for vertical/Reels.
resolution
textno720pVideo resolution. Options:
480p
,
720p
.
名称类型是否必填默认值描述
person_image
image_url人物照片的URL。在参考图和最终视频中均用作身份参考。
dish
文本fresh pasta烹饪主题(例如“fresh pasta”、“寿司卷”、“木火披萨”、“抹茶拿铁”)。用于生成9步动作板。
kitchen_style
文本warm rustic-modern Italian厨房风格(例如“warm rustic-modern Italian”、“极简日式”、“明亮斯堪的纳维亚风”、“氛围感工业风”)。
outfit
文本white t-shirt, olive green apron, dark trousers人物在视频全程的着装。
duration_seconds
整数15最终视频时长。使用15秒可呈现完整9步流程;10秒则压缩至约6个环节。
aspect_ratio
文本16:9输出画面比例。使用
9:16
可生成竖版/Reels视频。
resolution
文本720p视频分辨率。可选值:
480p
720p

Steps

步骤

Submit the plan with TWO sequential steps. Step 2 depends on the output of Step 1.
提交包含两个连续步骤的方案。步骤2依赖于步骤1的输出。

Step 1 — Reference Sheet (Composite Storyboard)

步骤1 — 参考图(复合故事板)

Generate the composite "production reference board" image. This is a single image, NOT a video frame — it bundles character sheet + location reference + 9-panel action board.
Endpoint:
gpt-image-v2-edit
CLI:
bash
muapi image edit \
  --model gpt-image-v2-edit \
  --image "{{person_image}}" \
  --image-size "3840x2160" \
  --quality auto \
  --background auto \
  --moderation low \
  --output-format png \
  --prompt "Create one single composite reference sheet for a {{duration_seconds}}-second realistic {{dish}}-making tutorial video. The image should be a clean, high-end production reference board, not a poster with heavy text. Format: {{aspect_ratio}} wide reference sheet, elegant white margins, clean grid layout, realistic cinematic photography style. Concept: {{dish}} tutorial in a {{kitchen_style}} kitchen.

Top row: motion / choreography guide with 9 numbered cinematic action panels showing the {{dish}} process step-by-step from raw ingredients to final plated dish.

Middle-left: realistic character reference sheet of the uploaded person — preserve their exact face, hair color, hair texture, eye color, skin tone, and all facial features with 100% accuracy. Show the same person in: face close-up, full-body front view, side/action working pose, and back view. Dress them in {{outfit}}. Keep them grounded, approachable, skilled, and cinematic.

Middle-right / background: location reference sheet of an elegant {{kitchen_style}} kitchen with tactile surfaces, natural daylight from a large window, hanging cookware, herbs, and premium cooking atmosphere appropriate to the cuisine.

Style: realistic, cinematic, warm natural light, shallow depth of field, tactile food photography, premium cooking show aesthetic, rich surface textures.

Bottom strip: simple visual icons only for {{duration_seconds}} seconds, {{aspect_ratio}}, realistic, cinematic, tasty, natural camera. Minimal text, no dense paragraphs. Let the visuals do the heavy lifting."
Wait for completion and capture the output URL as
{{reference_sheet_url}}
. Show it to the user and confirm the character likeness + kitchen mood before moving to Step 2 — Step 2 is the expensive call.
生成复合“制作参考板”图像。这是一张单一图像,而非视频帧——它整合了人物参考图+场景参考图+9格动作板。
端点:
gpt-image-v2-edit
CLI命令:
bash
muapi image edit \
  --model gpt-image-v2-edit \
  --image "{{person_image}}" \
  --image-size "3840x2160" \
  --quality auto \
  --background auto \
  --moderation low \
  --output-format png \
  --prompt "Create one single composite reference sheet for a {{duration_seconds}}-second realistic {{dish}}-making tutorial video. The image should be a clean, high-end production reference board, not a poster with heavy text. Format: {{aspect_ratio}} wide reference sheet, elegant white margins, clean grid layout, realistic cinematic photography style. Concept: {{dish}} tutorial in a {{kitchen_style}} kitchen.

Top row: motion / choreography guide with 9 numbered cinematic action panels showing the {{dish}} process step-by-step from raw ingredients to final plated dish.

Middle-left: realistic character reference sheet of the uploaded person — preserve their exact face, hair color, hair texture, eye color, skin tone, and all facial features with 100% accuracy. Show the same person in: face close-up, full-body front view, side/action working pose, and back view. Dress them in {{outfit}}. Keep them grounded, approachable, skilled, and cinematic.

Middle-right / background: location reference sheet of an elegant {{kitchen_style}} kitchen with tactile surfaces, natural daylight from a large window, hanging cookware, herbs, and premium cooking atmosphere appropriate to the cuisine.

Style: realistic, cinematic, warm natural light, shallow depth of field, tactile food photography, premium cooking show aesthetic, rich surface textures.

Bottom strip: simple visual icons only for {{duration_seconds}} seconds, {{aspect_ratio}}, realistic, cinematic, tasty, natural camera. Minimal text, no dense paragraphs. Let the visuals do the heavy lifting."
等待任务完成并捕获输出URL作为
{{reference_sheet_url}}
。将其展示给用户,确认人物相似度和厨房氛围后再进入步骤2——步骤2的调用成本较高。

Step 2 — Cooking Video (Reference-to-Video)

步骤2 — 烹饪视频(参考图转视频)

Animate the full sequence using both the original person photo (identity anchor) and the reference sheet (narrative + environment guide) as dual references.
Endpoint:
bytedance-seedance-2-0-reference-to-video-fast
CLI:
bash
muapi video generate \
  --model bytedance-seedance-2-0-reference-to-video-fast \
  --image "{{person_image}}" \
  --image "{{reference_sheet_url}}" \
  --aspect-ratio "{{aspect_ratio}}" \
  --duration "{{duration_seconds}}" \
  --resolution "{{resolution}}" \
  --generate-audio true \
  --prompt "The person in @Image1 is the subject — preserve their exact face, hair, eye color, skin tone, and all facial features with 100% accuracy throughout the entire video.
Use @Image2 as the visual and narrative guide — follow the cooking steps, kitchen setting, outfit, and atmosphere shown in the reference sheet exactly.
A single continuous cinematic video of the person from @Image1 making {{dish}} in the {{kitchen_style}} kitchen shown in @Image2. They wear {{outfit}} throughout.

VIDEO STRUCTURE
Follow the exact 9-step sequence as shown in @Image2, beat by beat, from raw ingredients through preparation to a final plated close-up.

MOTION STYLE
- Slow, deliberate, satisfying transitions between each step
- Natural hand and body movement with clear culinary intent
- Continuous flow with no jump cuts
- Warm and immersive pacing

CAMERA & CINEMATOGRAPHY
- Close-up shots for hands during mixing, kneading, cutting, plating
- Medium shots showing the person working at the counter
- Pull back slightly for the final plating to reveal the full kitchen
- Shallow depth of field — focus on hands and food, soft background blur
- No abrupt cuts — smooth match cuts and fluid transitions

VISUAL STYLE
- Warm natural daylight from a large kitchen window
- Rich tactile textures matching @Image2's environment
- Full color, warm cinematic color grading

CONSISTENCY RULES
- Same character throughout — face of @Image1 in every frame
- Same outfit across entire video
- Same kitchen environment as shown in @Image2

AUDIO
- Soft kitchen ambience, gentle culinary SFX (chopping, sizzling, pouring), light cinematic underscore
- No dialogue, no narration

OUTPUT STYLE
- Duration: exactly {{duration_seconds}} seconds
- Polished, cinematic, premium cooking show quality
- Ends with a beautiful close-up of the finished plated {{dish}}"
After generation:
  • Present the final video URL to the user.
  • Offer follow-ups: vertical 9:16 re-render for Reels, a longer 30s extended cut, or swap
    {{dish}}
    for a different cuisine using the same person image.
同时以原始人物照片(身份锚点)和参考图(叙事+环境指南)作为双重参考,为完整序列添加动画。
端点:
bytedance-seedance-2-0-reference-to-video-fast
CLI命令:
bash
muapi video generate \
  --model bytedance-seedance-2-0-reference-to-video-fast \
  --image "{{person_image}}" \
  --image "{{reference_sheet_url}}" \
  --aspect-ratio "{{aspect_ratio}}" \
  --duration "{{duration_seconds}}" \
  --resolution "{{resolution}}" \
  --generate-audio true \
  --prompt "The person in @Image1 is the subject — preserve their exact face, hair, eye color, skin tone, and all facial features with 100% accuracy throughout the entire video.
Use @Image2 as the visual and narrative guide — follow the cooking steps, kitchen setting, outfit, and atmosphere shown in the reference sheet exactly.
A single continuous cinematic video of the person from @Image1 making {{dish}} in the {{kitchen_style}} kitchen shown in @Image2. They wear {{outfit}} throughout.

VIDEO STRUCTURE
Follow the exact 9-step sequence as shown in @Image2, beat by beat, from raw ingredients through preparation to a final plated close-up.

MOTION STYLE
- Slow, deliberate, satisfying transitions between each step
- Natural hand and body movement with clear culinary intent
- Continuous flow with no jump cuts
- Warm and immersive pacing

CAMERA & CINEMATOGRAPHY
- Close-up shots for hands during mixing, kneading, cutting, plating
- Medium shots showing the person working at the counter
- Pull back slightly for the final plating to reveal the full kitchen
- Shallow depth of field — focus on hands and food, soft background blur
- No abrupt cuts — smooth match cuts and fluid transitions

VISUAL STYLE
- Warm natural daylight from a large kitchen window
- Rich tactile textures matching @Image2's environment
- Full color, warm cinematic color grading

CONSISTENCY RULES
- Same character throughout — face of @Image1 in every frame
- Same outfit across entire video
- Same kitchen environment as shown in @Image2

AUDIO
- Soft kitchen ambience, gentle culinary SFX (chopping, sizzling, pouring), light cinematic underscore
- No dialogue, no narration

OUTPUT STYLE
- Duration: exactly {{duration_seconds}} seconds
- Polished, cinematic, premium cooking show quality
- Ends with a beautiful close-up of the finished plated {{dish}}"
生成完成后:
  • 向用户展示最终视频URL。
  • 提供后续选项:重新渲染为9:16竖版Reels视频、30秒加长版,或使用同一人物照片更换
    {{dish}}
    制作其他料理视频。

Notes

注意事项

  • Two-image reference is the whole trick.
    @Image1
    locks identity,
    @Image2
    locks choreography + environment. Never drop one — single-reference runs lose either the face or the kitchen.
  • The reference sheet at Step 1 must be wide (3840x2160). Smaller resolutions blur the 9 action panels and the video model can't read them.
  • bytedance-seedance-2-0-reference-to-video-fast
    natively generates audio when
    generate_audio=true
    . Always include an audio direction in the prompt; otherwise the soundtrack is random.
  • Real human faces ARE supported here because the person photo is the user's own subject and we route through the reference-to-video endpoint (not the restricted i2v variants).
  • If the user wants a non-cooking sequence (e.g., latte art, plating tutorial, mixology), keep the same two-step structure — only
    {{dish}}
    and the 9-step description change.
  • For shorter pieces (<= 8s), reduce the action board to 5–6 panels in Step 1; cramming 9 beats into 8s degrades motion quality (single-beat rule).
  • 双图像参考是核心诀窍。
    @Image1
    锁定身份,
    @Image2
    锁定动作编排+环境。切勿省略其中一个——单参考运行会丢失面部或厨房的一致性。
  • 步骤1的参考图必须为宽幅(3840x2160)。分辨率过小会导致9格动作板模糊,视频模型无法识别。
  • 当设置
    generate_audio=true
    时,
    bytedance-seedance-2-0-reference-to-video-fast
    会原生生成音频。务必在提示词中添加音频方向,否则音轨会随机生成。
  • 此处支持真实人脸,因为人物照片是用户提供的主体,且我们通过参考图转视频端点(而非受限制的i2v变体)进行处理。
  • 如果用户需要非烹饪序列(例如拉花教程、摆盘教程、调酒),保持相同的两步结构——仅需修改
    {{dish}}
    和9步描述即可。
  • 对于较短的视频(≤8秒),在步骤1中将动作板减少至5–6格;将9个环节塞进8秒会降低运动质量(遵循单环节对应时长规则)。

Trigger Keywords

触发关键词

cooking video
,
cooking tutorial
,
pasta video
,
recipe video
,
food video
,
chef video
,
cooking storyboard
,
kitchen tutorial
,
cooking reel
,
tutorial video from photo
,
storyboard to video

cooking video
,
cooking tutorial
,
pasta video
,
recipe video
,
food video
,
chef video
,
cooking storyboard
,
kitchen tutorial
,
cooking reel
,
tutorial video from photo
,
storyboard to video

Notes for the Executing Agent

执行Agent注意事项

  • This recipe is LLM-orchestrated: read each phase, gather any missing inputs from the user, then call
    muapi
    CLI commands. Use
    muapi auth configure
    first if
    MUAPI_API_KEY
    is unset.
  • For model IDs without a CLI alias yet, fall back to the raw endpoint via
    curl -X POST https://api.muapi.ai/api/v1/<endpoint> -H "x-api-key: $MUAPI_API_KEY" -H 'content-type: application/json' -d '{...}'
    and poll with
    muapi predict wait <request_id>
    .
  • Substitute
    {{input_name}}
    placeholders with the user's actual inputs before issuing each call.
  • Step 1 must complete and return an output image URL before Step 2 fires — pass that URL as the second
    --image
    to the video step.
  • 本方案由LLM编排:阅读每个阶段,向用户收集缺失的输入,然后调用
    muapi
    CLI命令。如果未设置
    MUAPI_API_KEY
    ,请先使用
    muapi auth configure
    进行配置。
  • 对于尚未有CLI别名的模型ID,可通过原始端点回退使用
    curl -X POST https://api.muapi.ai/api/v1/<endpoint> -H "x-api-key: $MUAPI_API_KEY" -H 'content-type: application/json' -d '{...}'
    ,并通过
    muapi predict wait <request_id>
    轮询结果。
  • 在发出每个调用前,将
    {{input_name}}
    占位符替换为用户的实际输入。
  • 步骤1必须完成并返回输出图像URL后,才能启动步骤2——将该URL作为视频步骤的第二个
    --image
    参数传入。