muapi-storyboard-to-cooking-video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Storyboard to Cooking Video

故事板转烹饪视频

Turn a single photo of a person into a polished 15-second cinematic cooking tutorial. The skill first generates a high-end production reference sheet — character look, kitchen environment, and a 9-panel action board — then drives a continuous reference-to-video render that keeps the subject's face, outfit, and kitchen consistent across every frame.

Estimated credits: ~120 per run (1 image edit + 1 video at 720p / 15s with audio).

将单张人物照片转换为制作精良的15秒电影级烹饪教程。该技能首先生成一份高端制作参考图——人物形象、厨房环境和9格动作板——然后驱动从参考图到视频的连续渲染，确保人物的面部、着装和厨房在每一帧中保持一致。

预估消耗点数： 每次运行约120点（1次图像编辑 + 1段720p/15秒带音频的视频）。

Inputs

输入参数

Name	Type	Required	Default	Description
`person_image`	image_url	yes	—	URL of the person photo. Used as identity reference in BOTH the reference sheet and the final video.
`dish`	text	no	fresh pasta	The cooking subject (e.g. "fresh pasta", "sushi rolls", "wood-fired pizza", "matcha latte"). Drives the 9-step action board.
`kitchen_style`	text	no	warm rustic-modern Italian	The kitchen aesthetic (e.g. "warm rustic-modern Italian", "minimalist Tokyo", "bright Scandinavian", "moody industrial").
`outfit`	text	no	white t-shirt, olive green apron, dark trousers	What the person wears throughout the video.
`duration_seconds`	int	no	15	Final video duration. Use 15 for the full 9-step arc; 10 collapses to ~6 beats.
`aspect_ratio`	text	no	16:9	Output aspect ratio. Use `9:16` for vertical/Reels.
`resolution`	text	no	720p	Video resolution. Options: `480p` , `720p` .

名称	类型	是否必填	默认值	描述
`person_image`	image_url	是	—	人物照片的URL。在参考图和最终视频中均用作身份参考。
`dish`	文本	否	fresh pasta	烹饪主题（例如“fresh pasta”、“寿司卷”、“木火披萨”、“抹茶拿铁”）。用于生成9步动作板。
`kitchen_style`	文本	否	warm rustic-modern Italian	厨房风格（例如“warm rustic-modern Italian”、“极简日式”、“明亮斯堪的纳维亚风”、“氛围感工业风”）。
`outfit`	文本	否	white t-shirt, olive green apron, dark trousers	人物在视频全程的着装。
`duration_seconds`	整数	否	15	最终视频时长。使用15秒可呈现完整9步流程；10秒则压缩至约6个环节。
`aspect_ratio`	文本	否	16:9	输出画面比例。使用 `9:16` 可生成竖版/Reels视频。
`resolution`	文本	否	720p	视频分辨率。可选值： `480p` 、 `720p` 。

Steps

步骤

Submit the plan with TWO sequential steps. Step 2 depends on the output of Step 1.

提交包含两个连续步骤的方案。步骤2依赖于步骤1的输出。

Step 1 — Reference Sheet (Composite Storyboard)

步骤1 — 参考图（复合故事板）

Generate the composite "production reference board" image. This is a single image, NOT a video frame — it bundles character sheet + location reference + 9-panel action board.

Endpoint:

gpt-image-v2-edit

CLI:

bash

muapi image edit \
  --model gpt-image-v2-edit \
  --image "{{person_image}}" \
  --image-size "3840x2160" \
  --quality auto \
  --background auto \
  --moderation low \
  --output-format png \
  --prompt "Create one single composite reference sheet for a {{duration_seconds}}-second realistic {{dish}}-making tutorial video. The image should be a clean, high-end production reference board, not a poster with heavy text. Format: {{aspect_ratio}} wide reference sheet, elegant white margins, clean grid layout, realistic cinematic photography style. Concept: {{dish}} tutorial in a {{kitchen_style}} kitchen.

Top row: motion / choreography guide with 9 numbered cinematic action panels showing the {{dish}} process step-by-step from raw ingredients to final plated dish.

Middle-left: realistic character reference sheet of the uploaded person — preserve their exact face, hair color, hair texture, eye color, skin tone, and all facial features with 100% accuracy. Show the same person in: face close-up, full-body front view, side/action working pose, and back view. Dress them in {{outfit}}. Keep them grounded, approachable, skilled, and cinematic.

Middle-right / background: location reference sheet of an elegant {{kitchen_style}} kitchen with tactile surfaces, natural daylight from a large window, hanging cookware, herbs, and premium cooking atmosphere appropriate to the cuisine.

Style: realistic, cinematic, warm natural light, shallow depth of field, tactile food photography, premium cooking show aesthetic, rich surface textures.

Bottom strip: simple visual icons only for {{duration_seconds}} seconds, {{aspect_ratio}}, realistic, cinematic, tasty, natural camera. Minimal text, no dense paragraphs. Let the visuals do the heavy lifting."

Wait for completion and capture the output URL as

{{reference_sheet_url}}

. Show it to the user and confirm the character likeness + kitchen mood before moving to Step 2 — Step 2 is the expensive call.

生成复合“制作参考板”图像。这是一张单一图像，而非视频帧——它整合了人物参考图+场景参考图+9格动作板。

端点：

gpt-image-v2-edit

CLI命令：

bash

muapi image edit \
  --model gpt-image-v2-edit \
  --image "{{person_image}}" \
  --image-size "3840x2160" \
  --quality auto \
  --background auto \
  --moderation low \
  --output-format png \
  --prompt "Create one single composite reference sheet for a {{duration_seconds}}-second realistic {{dish}}-making tutorial video. The image should be a clean, high-end production reference board, not a poster with heavy text. Format: {{aspect_ratio}} wide reference sheet, elegant white margins, clean grid layout, realistic cinematic photography style. Concept: {{dish}} tutorial in a {{kitchen_style}} kitchen.

Top row: motion / choreography guide with 9 numbered cinematic action panels showing the {{dish}} process step-by-step from raw ingredients to final plated dish.

Middle-left: realistic character reference sheet of the uploaded person — preserve their exact face, hair color, hair texture, eye color, skin tone, and all facial features with 100% accuracy. Show the same person in: face close-up, full-body front view, side/action working pose, and back view. Dress them in {{outfit}}. Keep them grounded, approachable, skilled, and cinematic.

Middle-right / background: location reference sheet of an elegant {{kitchen_style}} kitchen with tactile surfaces, natural daylight from a large window, hanging cookware, herbs, and premium cooking atmosphere appropriate to the cuisine.

Style: realistic, cinematic, warm natural light, shallow depth of field, tactile food photography, premium cooking show aesthetic, rich surface textures.

Bottom strip: simple visual icons only for {{duration_seconds}} seconds, {{aspect_ratio}}, realistic, cinematic, tasty, natural camera. Minimal text, no dense paragraphs. Let the visuals do the heavy lifting."

等待任务完成并捕获输出URL作为

{{reference_sheet_url}}

。将其展示给用户，确认人物相似度和厨房氛围后再进入步骤2——步骤2的调用成本较高。

Step 2 — Cooking Video (Reference-to-Video)

步骤2 — 烹饪视频（参考图转视频）

Animate the full sequence using both the original person photo (identity anchor) and the reference sheet (narrative + environment guide) as dual references.

Endpoint:

bytedance-seedance-2-0-reference-to-video-fast

CLI:

bash

muapi video generate \
  --model bytedance-seedance-2-0-reference-to-video-fast \
  --image "{{person_image}}" \
  --image "{{reference_sheet_url}}" \
  --aspect-ratio "{{aspect_ratio}}" \
  --duration "{{duration_seconds}}" \
  --resolution "{{resolution}}" \
  --generate-audio true \
  --prompt "The person in @Image1 is the subject — preserve their exact face, hair, eye color, skin tone, and all facial features with 100% accuracy throughout the entire video.
Use @Image2 as the visual and narrative guide — follow the cooking steps, kitchen setting, outfit, and atmosphere shown in the reference sheet exactly.
A single continuous cinematic video of the person from @Image1 making {{dish}} in the {{kitchen_style}} kitchen shown in @Image2. They wear {{outfit}} throughout.

VIDEO STRUCTURE
Follow the exact 9-step sequence as shown in @Image2, beat by beat, from raw ingredients through preparation to a final plated close-up.

MOTION STYLE
- Slow, deliberate, satisfying transitions between each step
- Natural hand and body movement with clear culinary intent
- Continuous flow with no jump cuts
- Warm and immersive pacing

CAMERA & CINEMATOGRAPHY
- Close-up shots for hands during mixing, kneading, cutting, plating
- Medium shots showing the person working at the counter
- Pull back slightly for the final plating to reveal the full kitchen
- Shallow depth of field — focus on hands and food, soft background blur
- No abrupt cuts — smooth match cuts and fluid transitions

VISUAL STYLE
- Warm natural daylight from a large kitchen window
- Rich tactile textures matching @Image2's environment
- Full color, warm cinematic color grading

CONSISTENCY RULES
- Same character throughout — face of @Image1 in every frame
- Same outfit across entire video
- Same kitchen environment as shown in @Image2

AUDIO
- Soft kitchen ambience, gentle culinary SFX (chopping, sizzling, pouring), light cinematic underscore
- No dialogue, no narration

OUTPUT STYLE
- Duration: exactly {{duration_seconds}} seconds
- Polished, cinematic, premium cooking show quality
- Ends with a beautiful close-up of the finished plated {{dish}}"

After generation:

Present the final video URL to the user.
Offer follow-ups: vertical 9:16 re-render for Reels, a longer 30s extended cut, or swap
```
{{dish}}
```
for a different cuisine using the same person image.

同时以原始人物照片（身份锚点）和参考图（叙事+环境指南）作为双重参考，为完整序列添加动画。

端点：

bytedance-seedance-2-0-reference-to-video-fast

CLI命令：

bash

muapi video generate \
  --model bytedance-seedance-2-0-reference-to-video-fast \
  --image "{{person_image}}" \
  --image "{{reference_sheet_url}}" \
  --aspect-ratio "{{aspect_ratio}}" \
  --duration "{{duration_seconds}}" \
  --resolution "{{resolution}}" \
  --generate-audio true \
  --prompt "The person in @Image1 is the subject — preserve their exact face, hair, eye color, skin tone, and all facial features with 100% accuracy throughout the entire video.
Use @Image2 as the visual and narrative guide — follow the cooking steps, kitchen setting, outfit, and atmosphere shown in the reference sheet exactly.
A single continuous cinematic video of the person from @Image1 making {{dish}} in the {{kitchen_style}} kitchen shown in @Image2. They wear {{outfit}} throughout.

VIDEO STRUCTURE
Follow the exact 9-step sequence as shown in @Image2, beat by beat, from raw ingredients through preparation to a final plated close-up.

MOTION STYLE
- Slow, deliberate, satisfying transitions between each step
- Natural hand and body movement with clear culinary intent
- Continuous flow with no jump cuts
- Warm and immersive pacing

CAMERA & CINEMATOGRAPHY
- Close-up shots for hands during mixing, kneading, cutting, plating
- Medium shots showing the person working at the counter
- Pull back slightly for the final plating to reveal the full kitchen
- Shallow depth of field — focus on hands and food, soft background blur
- No abrupt cuts — smooth match cuts and fluid transitions

VISUAL STYLE
- Warm natural daylight from a large kitchen window
- Rich tactile textures matching @Image2's environment
- Full color, warm cinematic color grading

CONSISTENCY RULES
- Same character throughout — face of @Image1 in every frame
- Same outfit across entire video
- Same kitchen environment as shown in @Image2

AUDIO
- Soft kitchen ambience, gentle culinary SFX (chopping, sizzling, pouring), light cinematic underscore
- No dialogue, no narration

OUTPUT STYLE
- Duration: exactly {{duration_seconds}} seconds
- Polished, cinematic, premium cooking show quality
- Ends with a beautiful close-up of the finished plated {{dish}}"

生成完成后：

向用户展示最终视频URL。
提供后续选项：重新渲染为9:16竖版Reels视频、30秒加长版，或使用同一人物照片更换
```
{{dish}}
```
制作其他料理视频。

Notes

注意事项

Two-image reference is the whole trick.
```
@Image1
```
locks identity,
```
@Image2
```
locks choreography + environment. Never drop one — single-reference runs lose either the face or the kitchen.
The reference sheet at Step 1 must be wide (3840x2160). Smaller resolutions blur the 9 action panels and the video model can't read them.
```
bytedance-seedance-2-0-reference-to-video-fast
```
natively generates audio when
```
generate_audio=true
```
. Always include an audio direction in the prompt; otherwise the soundtrack is random.
Real human faces ARE supported here because the person photo is the user's own subject and we route through the reference-to-video endpoint (not the restricted i2v variants).
If the user wants a non-cooking sequence (e.g., latte art, plating tutorial, mixology), keep the same two-step structure — only
```
{{dish}}
```
and the 9-step description change.
For shorter pieces (<= 8s), reduce the action board to 5–6 panels in Step 1; cramming 9 beats into 8s degrades motion quality (single-beat rule).

双图像参考是核心诀窍。
```
@Image1
```
锁定身份，
```
@Image2
```
锁定动作编排+环境。切勿省略其中一个——单参考运行会丢失面部或厨房的一致性。
步骤1的参考图必须为宽幅（3840x2160）。分辨率过小会导致9格动作板模糊，视频模型无法识别。
当设置
```
generate_audio=true
```
时，
```
bytedance-seedance-2-0-reference-to-video-fast
```
会原生生成音频。务必在提示词中添加音频方向，否则音轨会随机生成。
此处支持真实人脸，因为人物照片是用户提供的主体，且我们通过参考图转视频端点（而非受限制的i2v变体）进行处理。
如果用户需要非烹饪序列（例如拉花教程、摆盘教程、调酒），保持相同的两步结构——仅需修改
```
{{dish}}
```
和9步描述即可。
对于较短的视频（≤8秒），在步骤1中将动作板减少至5–6格；将9个环节塞进8秒会降低运动质量（遵循单环节对应时长规则）。

Trigger Keywords

触发关键词

cooking video

cooking tutorial

pasta video

recipe video

food video

chef video

cooking storyboard

kitchen tutorial

cooking reel

tutorial video from photo

storyboard to video

cooking video

cooking tutorial

pasta video

recipe video

food video

chef video

cooking storyboard

kitchen tutorial

cooking reel

tutorial video from photo

storyboard to video

Notes for the Executing Agent

执行Agent注意事项

This recipe is LLM-orchestrated: read each phase, gather any missing inputs from the user, then call
```
muapi
```
CLI commands. Use
```
muapi auth configure
```
first if
```
MUAPI_API_KEY
```
is unset.

For model IDs without a CLI alias yet, fall back to the raw endpoint via

curl -X POST https://api.muapi.ai/api/v1/<endpoint> -H "x-api-key: $MUAPI_API_KEY" -H 'content-type: application/json' -d '{...}'

and poll with

muapi predict wait <request_id>

Substitute
```
{{input_name}}
```
placeholders with the user's actual inputs before issuing each call.
Step 1 must complete and return an output image URL before Step 2 fires — pass that URL as the second
```
--image
```
to the video step.

本方案由LLM编排：阅读每个阶段，向用户收集缺失的输入，然后调用
```
muapi
```
CLI命令。如果未设置
```
MUAPI_API_KEY
```
，请先使用
```
muapi auth configure
```
进行配置。

对于尚未有CLI别名的模型ID，可通过原始端点回退使用

curl -X POST https://api.muapi.ai/api/v1/<endpoint> -H "x-api-key: $MUAPI_API_KEY" -H 'content-type: application/json' -d '{...}'

，并通过

muapi predict wait <request_id>

轮询结果。

在发出每个调用前，将
```
{{input_name}}
```
占位符替换为用户的实际输入。
步骤1必须完成并返回输出图像URL后，才能启动步骤2——将该URL作为视频步骤的第二个
```
--image
```
参数传入。