fix-my-look

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

fix-my-look

Edit the source's first usable frame with

gpt-image-2

from the user's prompt, then propagate that look across the clip with

kling

reference-video while locking the original face, motion and audio via the original video + audio as references. All prep happens in one

mcp__plugin_pika_pika__normalize_video

call for short clips, or one normalize call per segment for longer clips. The output ratio uses the normalized clip's closest supported output ratio; this skill does NOT reframe the source video.

根据用户的提示词，使用

gpt-image-2

编辑源视频的第一可用帧，然后通过

kling

reference-video将该视觉效果传播到整个视频片段，同时以原始视频+音频为参考锁定原始人脸、动作和音频。对于短视频，所有预处理通过一次

mcp__plugin_pika_pika__normalize_video

调用完成；对于长视频，则每个片段调用一次归一化处理。输出比例采用归一化视频最接近的支持比例；本skill不会对源视频进行重新构图。

Inputs

输入项

```
<source>
```
— path or URL to a video file with audio
```
<change_prompt>
```
— what to change (e.g. "make it night with neon lights", "change my shirt to a leather jacket", "put me on a beach in Hawaii")

```
<source>
```
— 带音频的视频文件路径或URL
```
<change_prompt>
```
— 需要修改的内容（例如："make it night with neon lights"、"change my shirt to a leather jacket"、"put me on a beach in Hawaii"）

Empty-args menu

空参数菜单

"What's the source video path?"
"What do you want to change? (e.g. 'put me on a beach', 'make it night')"

"源视频路径是什么？"
"你想要修改什么？（例如：'put me on a beach'、'make it night'）"

Workflow

工作流程

Working dir:

~/Downloads/fix-my-look/<run-id>/

工作目录：

~/Downloads/fix-my-look/<run-id>/

。

Step 0 — Cost, timer and task IDs

步骤0 — 成本、计时与任务ID

Use the

mcp__plugin_pika_pika__*

names below as the canonical plugin namespace. If the host exposes the same tools under a local namespace such as

mcp__pika-mcp__*

mcp__pika-prod__*

, map by tool suffix and keep the same arguments.

Start a timer when the source and change prompt are known. Before paid generation, call

mcp__plugin_pika_pika__estimate_cost

for the planned

mcp__plugin_pika_pika__generate_image

mcp__plugin_pika_pika__generate_reference_video

, any multi-segment

mcp__plugin_pika_pika__edit_concat

, and any optional audio/lipsync repair call. If cost is not surfaced by the host, say

Cost not surfaced by this harness

in the final report instead of guessing. When any tool returns a

task_id

, copy the exact value into the run notes and reuse it verbatim; do not hand-type long JWT-style task IDs.

使用下方的

mcp__plugin_pika_pika__*

作为标准插件命名空间。如果宿主在本地命名空间（如

mcp__pika-mcp__*

或

mcp__pika-prod__*

）下提供了相同工具，请按工具后缀映射并保持参数一致。

在确定源视频和修改提示词后启动计时器。在付费生成前，调用

mcp__plugin_pika_pika__estimate_cost

预估计划执行的

mcp__plugin_pika_pika__generate_image

、

mcp__plugin_pika_pika__generate_reference_video

、多片段的

mcp__plugin_pika_pika__edit_concat

以及可选的音频/唇形同步修复调用的成本。如果宿主未显示成本，在最终报告中填写

Cost not surfaced by this harness

而非猜测。当任何工具返回

task_id

时，将精确值复制到运行记录中并直接复用；请勿手动输入长JWT格式的任务ID。

Step 1 — Prepare the clip

步骤1 — 准备视频片段

Local file?

mcp__plugin_pika_pika__upload_asset

it first; an HTTPS media URL passes directly. Decide the source windows before normalizing: use one 14.8s window for sources <=15s, and split longer sources into ordered 14.8s windows. Call

mcp__plugin_pika_pika__normalize_video(video_url=<source>, start_s=<offset>, max_duration_s=14.8, extract_audio=true, extract_face_frame=true)

once per window. Use the first window's

face_frame_url

for the edited still; use each window's

video_url

as that segment's motion/identity reference. For multi-window clips, also call

mcp__plugin_pika_pika__extract_audio_from_video(video_url=<source>)

so the final merged output can be restored to one continuous source audio track.

Wire the result into the rest:

face_frame_url

is the Step 2 edit target; each normalized

video_url

is Kling's reference for that segment in Step 4; set

aspect_ratio = result.aspect_ratio ?? result.closest_aspect_ratio

for each normalize result, then carry that local

aspect_ratio

through the image and video calls. If neither field is present, stop and report that normalization output is missing an aspect label. Compute

duration = max(4, min(15, round(duration_s)))

per segment, and use

resolution="720p"

unless the user asked for high res. If

face_found

is false, no clear face was found and

face_frame_url

fell back to the t=0 frame — proceed but warn identity may drift, or re-run with a

start_s

at a section where the subject faces camera.

Reference-video providers can reject oversized reference assets. If the normalize result or the downstream provider error shows a normalized video is over the provider limit, retry

mcp__plugin_pika_pika__normalize_video

once with

crf=28

and the same

start_s

max_duration_s

extract_audio

, and

extract_face_frame

values. If the reference is still too large, stop before another paid video attempt and report that

mcp__plugin_pika_pika__normalize_video

needs a worker-side 1080-edge / reference-size cap. Do not patch this with local shell media commands.

如果是本地文件？先调用

mcp__plugin_pika_pika__upload_asset

上传；HTTPS媒体URL可直接使用。在归一化前确定源视频窗口：时长≤15秒的视频使用一个14.8秒的窗口，更长的视频则分割为多个有序的14.8秒窗口。每个窗口调用一次

mcp__plugin_pika_pika__normalize_video(video_url=<source>, start_s=<offset>, max_duration_s=14.8, extract_audio=true, extract_face_frame=true)

。使用第一个窗口的

face_frame_url

作为编辑后的静态帧；每个窗口的

video_url

作为对应片段的动作/身份参考。对于多窗口视频，还需调用

mcp__plugin_pika_pika__extract_audio_from_video(video_url=<source>)

，以便最终合并输出恢复为连续的源音频轨道。

将结果接入后续流程：

face_frame_url

是步骤2的编辑目标；每个归一化后的

video_url

是步骤4中Kling对应片段的参考；为每个归一化结果设置

aspect_ratio = result.aspect_ratio ?? result.closest_aspect_ratio

，并将该本地

aspect_ratio

带入图像和视频调用。如果两个字段都不存在，停止操作并报告归一化输出缺少比例标签。计算每个片段的

duration = max(4, min(15, round(duration_s)))

，除非用户要求高分辨率，否则使用

resolution="720p"

。如果

face_found

为false，说明未检测到清晰人脸，

face_frame_url

回退到t=0帧——继续操作但需警告身份可能偏移，或重新运行并将

start_s

设置为拍摄对象面向镜头的片段。

参考视频提供商可能会拒绝过大的参考资源。如果归一化结果或下游提供商错误显示归一化视频超出限制，使用相同的

start_s

、

max_duration_s

、

extract_audio

和

extract_face_frame

参数，将

crf=28

带入

mcp__plugin_pika_pika__normalize_video

重试一次。如果参考资源仍然过大，在进行下一次付费视频尝试前停止操作，并报告

mcp__plugin_pika_pika__normalize_video

需要在工作端设置1080p边缘/参考大小上限。请勿使用本地shell媒体命令修复此问题。

Step 2 — Edit the frame with gpt-image-2 (the "change" stage)

步骤2 — 使用gpt-image-2编辑帧（“修改”阶段）

mcp__plugin_pika_pika__generate_image

with

provider="gpt-image-2"

aspect_ratio=<aspect_ratio>

resolution="2K"

reference_images=[<face_frame_url>]

quality="high"

, prompt:

"Modify the reference photograph as follows:
<change_prompt>
. Keep the person's face, identity, hair, body and pose EXACTLY as in the reference. CRITICAL: preserve every object the subject is holding or touching — phones, products, drinks, bags, props, jewelry — in the exact same hand, position, orientation and scale; never remove, replace or restyle them. Change only the requested scene, background, clothing, lighting or environment, not who the person is."

Keep the "preserve held objects" clause verbatim on every re-render — without it gpt-image-2 silently drops products/phones the subject is holding.

If gpt-image-2 returns a content-policy false positive for fashion, glam, or beauty prompts, retry once with the same intent but a modest / editorial wording such as "polished event styling, opaque clothing, natural pose, non-sexual fashion portrait". For makeup prompts, explicitly preserve the original eye shape, eyelids, iris color and gaze; heavy eyeliner/eye shadow is a high-risk identity-drift source.

调用

mcp__plugin_pika_pika__generate_image

，参数设置为

provider="gpt-image-2"

、

aspect_ratio=<aspect_ratio>

、

resolution="2K"

、

reference_images=[<face_frame_url>]

、

quality="high"

，提示词如下：

"按如下要求修改参考照片：
<change_prompt>
。精确保留参考照片中人物的面部、身份、发型、体态和姿势。重要提示：保留拍摄对象手持或接触的所有物品——手机、产品、饮品、包袋、道具、首饰——保持其在手中的位置、方向和比例完全不变；不得移除、替换或重新设计这些物品。仅修改要求的场景、背景、服装、光线或环境，不得改变人物本身。"

每次重新渲染时都要保留“保留手持物品”的条款——如果缺少该条款，gpt-image-2会自动移除拍摄对象手持的产品/手机。

如果gpt-image-2对时尚、美妆或美容类提示词返回内容策略误判，使用相同意图但更温和的编辑类措辞（如"polished event styling, opaque clothing, natural pose, non-sexual fashion portrait"）重试一次。对于美妆类提示词，需明确保留原始眼形、眼睑、虹膜颜色和视线；浓重的眼线/眼影是身份偏移的高风险因素。

Step 3 — Show the edited frame and wait for approval

步骤3 — 展示编辑后的帧并等待确认

Surface the edited frame and STOP. Ask "Approve for video generation, or tweak and re-render?" Do NOT call video generation until approved. For tweaks, re-run Step 2 (locked clauses verbatim) and loop.

展示编辑后的帧并停止操作。询问“确认生成视频，还是调整后重新渲染？”在获得确认前不得调用视频生成。如需调整，重新运行步骤2（严格保留锁定条款）并循环此流程。

Step 4 — Propagate via Kling reference-video

步骤4 — 通过Kling reference-video传播效果

For each normalized segment, call

mcp__plugin_pika_pika__generate_reference_video

with

provider="kling"

reference_videos=[<segment video_url>]

reference_images=[<edited_frame_url>]

aspect_ratio=<aspect_ratio>

duration=<segment duration>

sound=false

video_keep_sounds=[true]

, prompt:

"Apply the change shown in <<<image_1>>> to <<<video_1>>>. Keep the person in <<<video_1>>> with the EXACT same face, identity, expressions, motion and timing; preserve the original video's kept sound track. The new scene/background/clothing/lighting should match <<<image_1>>>. CRITICAL: preserve every object the subject is holding or touching in <<<video_1>>> — phones, products, drinks, bags, props — in the same hand and orientation every frame. Keep mouth motion active through the final frame when the person is speaking. Do not alter the person's identity."

Append any extra creative direction (e.g. "very cinematic, soft golden light") after the locked text — never replace it.

Do not pass

sound=true

to Kling with a video input. Kling rejects that combination with

error:1201 sound on is not supported with video input

; use

sound=false

plus

video_keep_sounds=[true]

to keep the source video's audio.

If the source was split into multiple windows, call

mcp__plugin_pika_pika__edit_concat(video_urls=[<segment outputs in order>])

. After concat, run

mcp__plugin_pika_pika__edit_audio_replace(video_url=<concat_url>, audio_url=<full_source_audio_url>, duration_policy="video")

when the merged output audio is missing, drifted, or discontinuous.

Only try Seedance if the user explicitly asks for it, or if Kling fails and a second provider attempt is useful. Use the same segmenting rule and record the provider error plainly if Seedance rejects the input or drops speech/action.

Async handling: if any call returns a

{task_id, status}

envelope, poll

mcp__plugin_pika_pika__task_status({task_id})

in a tight loop until terminal.

对于每个归一化片段，调用

mcp__plugin_pika_pika__generate_reference_video

，参数设置为

provider="kling"

、

reference_videos=[<segment video_url>]

、

reference_images=[<edited_frame_url>]

、

aspect_ratio=<aspect_ratio>

、

duration=<segment duration>

、

sound=false

、

video_keep_sounds=[true]

，提示词如下：

"将<<<image_1>>>中展示的修改效果应用到<<<video_1>>>。保留<<<video_1>>>中人物的面部、身份、表情、动作和时间轴完全不变；保留原始视频的音频轨道。新的场景/背景/服装/光线需与<<<image_1>>>匹配。重要提示：保留<<<video_1>>>中拍摄对象手持或接触的所有物品——手机、产品、饮品、包袋、道具——在每一帧中保持相同的手持方式和方向。当人物说话时，保持嘴部动作直到最后一帧。不得改变人物身份。"

在锁定文本后添加额外创意方向（如"very cinematic, soft golden light"）——不得替换锁定文本。

向Kling传入视频输入时不得设置

sound=true

。Kling会拒绝该组合并返回

error:1201 sound on is not supported with video input

；需使用

sound=false

搭配

video_keep_sounds=[true]

来保留源视频音频。

如果源视频被分割为多个窗口，调用

mcp__plugin_pika_pika__edit_concat(video_urls=[<segment outputs in order>])

。拼接完成后，如果合并输出的音频缺失、偏移或不连续，运行

mcp__plugin_pika_pika__edit_audio_replace(video_url=<concat_url>, audio_url=<full_source_audio_url>, duration_policy="video")

。

仅当用户明确要求，或Kling失败且需要尝试第二个提供商时，才使用Seedance。遵循相同的分段规则，如果Seedance拒绝输入或丢失语音/动作，需清晰记录提供商错误。

异步处理：如果任何调用返回

{task_id, status}

包，循环调用

mcp__plugin_pika_pika__task_status({task_id})

轮询直到任务进入终态。

Step 5 — Audio, duration and identity QA

步骤5 — 音频、时长与身份校验

Before reporting success, verify the generated video against the source:

Duration must not be meaningfully cut off. If output duration differs from the intended source window or merged source duration by more than 0.5s, mark the run as failed / needs follow-up.
If the source has speech, audio must be present through the tail and mouth movement must not freeze before the spoken content ends. If words are missing, garbled, silent, or visibly out of sync, do not call the run
```
PASS
```
.
The approved frame corrections must persist into the video. If the provider reintroduces a removed artifact such as eyeglass glare, mark it as a propagation caveat or re-render from a stronger approved frame.
Compare identity at start, middle, segment boundaries, and end. If Kling preserved motion but changed the face, call that out as a provider limitation instead of a pass.

If the video is visually acceptable but speech audio is missing, incomplete, or drifted, offer one paid repair pass:

mcp__plugin_pika_pika__edit_audio_replace(video_url=<generated_video_url>, audio_url=<full_source_audio_url or segment_audio_url>, duration_policy="video")

mcp__plugin_pika_pika__edit_lipsync(video_url=<audio_restored_url>, audio_url=<full_source_audio_url or segment_audio_url>, variant="v2-pro")

If the model froze the mouth near the end, do not keep escalating to

sync-3

automatically; lip-sync cannot reliably recover a face track with no mouth motion. Offer trim / regenerate instead.

在报告成功前，将生成的视频与源视频进行对比验证：

时长不得被大幅截断。如果输出时长与预期源窗口或合并源时长差异超过0.5秒，标记运行失败/需要跟进。
如果源视频包含语音，音频必须完整覆盖到结尾，且嘴部动作不得在语音结束前停止。如果出现语音缺失、混乱、静音或明显不同步，不得标记运行“通过”。
已确认的帧修改效果必须在视频中持续保留。如果提供商重新引入已移除的瑕疵（如眼镜反光），标记为传播缺陷或基于更清晰的确认帧重新渲染。
在开头、中间、片段边界和结尾对比身份。如果Kling保留了动作但改变了面部，需标注为提供商限制而非通过。

如果视频视觉效果可接受但语音音频缺失、不完整或偏移，提供一次付费修复：

mcp__plugin_pika_pika__edit_audio_replace(video_url=<generated_video_url>, audio_url=<full_source_audio_url or segment_audio_url>, duration_policy="video")

mcp__plugin_pika_pika__edit_lipsync(video_url=<audio_restored_url>, audio_url=<full_source_audio_url or segment_audio_url>, variant="v2-pro")

如果模型在结尾部分冻结了嘴部动作，请勿自动升级到

sync-3

；唇形同步无法可靠恢复无嘴部动作的人脸跟踪。提供修剪/重新生成选项。

Step 6 — Download + return

步骤6 — 下载并返回结果

Download the result to

~/Downloads/fix-my-look/<run-id>/result.mp4

and return that path plus the final report fields: source, edited frame URL, final video URL, provider, job/task IDs, cost estimate or

not surfaced

, elapsed time, QA notes, and follow-up issue.

将结果下载到

~/Downloads/fix-my-look/<run-id>/result.mp4

，返回该路径及最终报告字段：源视频、编辑后帧URL、最终视频URL、提供商、任务ID、成本预估或

not surfaced

、耗时、校验记录及跟进问题。

Failure modes

故障模式

Symptom	Cause	Fix
Output face drifts from the original	gpt-image-2 over-edited the face OR the provider under-weighted the source video	Re-run Step 2 with a stronger "keep the face the same" clause; soften `change_prompt` .
Output looks like the original (no change)	Edited image too similar, OR you passed the raw frame not the edited output	Re-run Step 2 with a more dramatic prompt; confirm the edited frame URL.
Output aspect doesn't match source	Source aspect not in {16:9, 9:16, 1:1, 4:3, 3:4}	Step 1 returns `aspect_ratio` , or `closest_aspect_ratio` on older worker payloads; use it as the closest supported output label and ask the user for exotic aspects.
Provider rejects the normalized video as too large	normalize output can remain too large for 4K/iPhone sources	Retry normalize once with `crf=28` ; if still too large, stop and file worker follow-up for a 1080-edge / reference-size cap.
Long source only returns the first short window	The caller normalized once with `max_duration_s=14.8` and skipped segmenting	Split into 14.8s windows, generate each segment, then `mcp__plugin_pika_pika__edit_concat` in order and restore full source audio if needed.
Speaking clip loses sound, drops words, or freezes mouth at the tail	Provider regenerated speech/audio instead of preserving the source, or the face track has no mouth motion to drive	Mark as not pass. Offer one `mcp__plugin_pika_pika__edit_audio_replace` + `mcp__plugin_pika_pika__edit_lipsync` repair pass; if tail mouth motion is frozen, offer trim/regenerate instead.
Approved frame fix disappears in the video	Provider propagation reintroduced the original artifact	Re-render from a stronger approved frame or mark provider propagation caveat; do not claim the frame correction shipped.
Kling rejects with `error:1201 sound on is not supported with video input`	`sound=true` was passed with a video reference	Retry the Kling call with `sound=false` and `video_keep_sounds=[true]` ; do not use `reference_audio` for Kling video input.
Kling output is shorter than the normalized source	Provider returned a shorter render, or the caller accidentally passed a trimmed reference	Do not mark pass. Compare output duration to the normalized source, then regenerate that segment or ask the user for a shorter window.

症状	原因	修复方案
输出人脸与原始人脸偏移	gpt-image-2过度编辑人脸，或提供商对源视频权重不足	重新运行步骤2，强化“保持人脸不变”的条款；弱化 `change_prompt` 。
输出与原始视频一致（未修改）	编辑后的图像与原图过于相似，或传入的是原始帧而非编辑后的输出	重新运行步骤2，使用更具冲击力的提示词；确认传入的是编辑后帧的URL。
输出比例与源视频不匹配	源视频比例不在{16:9, 9:16, 1:1, 4:3, 3:4}范围内	步骤1返回 `aspect_ratio` ，旧版工作负载返回 `closest_aspect_ratio` ；使用该值作为最接近的支持输出标签，并询问用户是否接受特殊比例。
提供商拒绝归一化视频（过大）	4K/iPhone源视频的归一化输出仍过大	使用 `crf=28` 重试一次归一化；如果仍然过大，停止操作并提交工作端跟进需求，设置1080p边缘/参考大小上限。
长视频仅返回第一个短窗口	调用方仅用 `max_duration_s=14.8` 归一化一次，未分割片段	分割为14.8秒窗口，生成每个片段，然后按顺序调用 `mcp__plugin_pika_pika__edit_concat` 拼接，必要时恢复完整源音频。
带语音的视频丢失声音、漏词或结尾嘴部冻结	提供商重新生成了语音/音频而非保留源音频，或人脸跟踪无嘴部动作驱动	标记为未通过。提供一次 `mcp__plugin_pika_pika__edit_audio_replace` + `mcp__plugin_pika_pika__edit_lipsync` 修复；如果结尾嘴部动作冻结，提供修剪/重新生成选项。
已确认的帧修改效果在视频中消失	提供商传播过程中重新引入原始瑕疵	基于更清晰的确认帧重新渲染，或标记为提供商传播缺陷；不得声称帧修改已生效。
Kling返回 `error:1201 sound on is not supported with video input`	传入视频参考时设置了 `sound=true`	使用 `sound=false` 和 `video_keep_sounds=[true]` 重试Kling调用；向Kling传入视频输入时不得使用 `reference_audio` 。
Kling输出比归一化源视频短	提供商返回的渲染时长更短，或调用方意外传入了修剪后的参考	标记为未通过。对比输出时长与归一化源视频，重新生成该片段或询问用户是否使用更短的窗口。