fix-my-look
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesefix-my-look
fix-my-look
Edit the source's first usable frame with from the user's prompt,
then propagate that look across the clip with reference-video while
locking the original face, motion and audio via the original video + audio as
references. All prep happens in one
call for short clips, or one normalize call per segment for longer clips. The
output ratio uses the normalized clip's closest supported output ratio; this
skill does NOT reframe the source video.
gpt-image-2klingmcp__plugin_pika_pika__normalize_video根据用户的提示词,使用编辑源视频的第一可用帧,然后通过 reference-video将该视觉效果传播到整个视频片段,同时以原始视频+音频为参考锁定原始人脸、动作和音频。对于短视频,所有预处理通过一次调用完成;对于长视频,则每个片段调用一次归一化处理。输出比例采用归一化视频最接近的支持比例;本skill不会对源视频进行重新构图。
gpt-image-2klingmcp__plugin_pika_pika__normalize_videoInputs
输入项
- — path or URL to a video file with audio
<source> - — what to change (e.g. "make it night with neon lights", "change my shirt to a leather jacket", "put me on a beach in Hawaii")
<change_prompt>
- — 带音频的视频文件路径或URL
<source> - — 需要修改的内容(例如:"make it night with neon lights"、"change my shirt to a leather jacket"、"put me on a beach in Hawaii")
<change_prompt>
Empty-args menu
空参数菜单
- "What's the source video path?"
- "What do you want to change? (e.g. 'put me on a beach', 'make it night')"
- "源视频路径是什么?"
- "你想要修改什么?(例如:'put me on a beach'、'make it night')"
Workflow
工作流程
Working dir: .
~/Downloads/fix-my-look/<run-id>/工作目录:。
~/Downloads/fix-my-look/<run-id>/Step 0 — Cost, timer and task IDs
步骤0 — 成本、计时与任务ID
Use the names below as the canonical plugin
namespace. If the host exposes the same tools under a local namespace such as
or , map by tool suffix and keep the same
arguments.
mcp__plugin_pika_pika__*mcp__pika-mcp__*mcp__pika-prod__*Start a timer when the source and change prompt are known. Before paid
generation, call for the planned
,
, any multi-segment
, and any optional audio/lipsync repair
call. If cost is not surfaced by the host, say
in the final report instead of guessing.
When any tool returns a
, copy the exact value into the run notes and reuse it verbatim; do not
hand-type long JWT-style task IDs.
mcp__plugin_pika_pika__estimate_costmcp__plugin_pika_pika__generate_imagemcp__plugin_pika_pika__generate_reference_videomcp__plugin_pika_pika__edit_concatCost not surfaced by this harnesstask_id使用下方的作为标准插件命名空间。如果宿主在本地命名空间(如或)下提供了相同工具,请按工具后缀映射并保持参数一致。
mcp__plugin_pika_pika__*mcp__pika-mcp__*mcp__pika-prod__*在确定源视频和修改提示词后启动计时器。在付费生成前,调用预估计划执行的、、多片段的以及可选的音频/唇形同步修复调用的成本。如果宿主未显示成本,在最终报告中填写而非猜测。当任何工具返回时,将精确值复制到运行记录中并直接复用;请勿手动输入长JWT格式的任务ID。
mcp__plugin_pika_pika__estimate_costmcp__plugin_pika_pika__generate_imagemcp__plugin_pika_pika__generate_reference_videomcp__plugin_pika_pika__edit_concatCost not surfaced by this harnesstask_idStep 1 — Prepare the clip
步骤1 — 准备视频片段
Local file? it first; an HTTPS media URL
passes directly. Decide the source windows before normalizing: use one 14.8s
window for sources <=15s, and split longer sources into ordered 14.8s windows.
Call
once per window. Use the first window's for the edited still;
use each window's as that segment's motion/identity reference. For
multi-window clips, also call
so the
final merged output can be restored to one continuous source audio track.
mcp__plugin_pika_pika__upload_assetmcp__plugin_pika_pika__normalize_video(video_url=<source>, start_s=<offset>, max_duration_s=14.8, extract_audio=true, extract_face_frame=true)face_frame_urlvideo_urlmcp__plugin_pika_pika__extract_audio_from_video(video_url=<source>)Wire the result into the rest: is the Step 2 edit target;
each normalized is Kling's reference for that segment in Step 4;
set for
each normalize result, then carry that local through the image
and video calls. If neither field is present, stop and report that normalization
output is missing an aspect label. Compute
per segment, and use
unless the user asked for high res. If is false, no clear face was
found and fell back to the t=0 frame — proceed but warn
identity may drift, or re-run with a at a section where the subject
faces camera.
face_frame_urlvideo_urlaspect_ratio = result.aspect_ratio ?? result.closest_aspect_ratioaspect_ratioduration = max(4, min(15, round(duration_s)))resolution="720p"face_foundface_frame_urlstart_sReference-video providers can reject oversized reference assets. If the
normalize result or the downstream provider error shows a normalized video is
over the provider limit, retry once
with and the same , , , and
values. If the reference is still too large, stop before
another paid video attempt and report that
needs a worker-side 1080-edge /
reference-size cap. Do not patch this with local shell media commands.
mcp__plugin_pika_pika__normalize_videocrf=28start_smax_duration_sextract_audioextract_face_framemcp__plugin_pika_pika__normalize_video如果是本地文件?先调用上传;HTTPS媒体URL可直接使用。在归一化前确定源视频窗口:时长≤15秒的视频使用一个14.8秒的窗口,更长的视频则分割为多个有序的14.8秒窗口。每个窗口调用一次。使用第一个窗口的作为编辑后的静态帧;每个窗口的作为对应片段的动作/身份参考。对于多窗口视频,还需调用,以便最终合并输出恢复为连续的源音频轨道。
mcp__plugin_pika_pika__upload_assetmcp__plugin_pika_pika__normalize_video(video_url=<source>, start_s=<offset>, max_duration_s=14.8, extract_audio=true, extract_face_frame=true)face_frame_urlvideo_urlmcp__plugin_pika_pika__extract_audio_from_video(video_url=<source>)将结果接入后续流程:是步骤2的编辑目标;每个归一化后的是步骤4中Kling对应片段的参考;为每个归一化结果设置,并将该本地带入图像和视频调用。如果两个字段都不存在,停止操作并报告归一化输出缺少比例标签。计算每个片段的,除非用户要求高分辨率,否则使用。如果为false,说明未检测到清晰人脸,回退到t=0帧——继续操作但需警告身份可能偏移,或重新运行并将设置为拍摄对象面向镜头的片段。
face_frame_urlvideo_urlaspect_ratio = result.aspect_ratio ?? result.closest_aspect_ratioaspect_ratioduration = max(4, min(15, round(duration_s)))resolution="720p"face_foundface_frame_urlstart_s参考视频提供商可能会拒绝过大的参考资源。如果归一化结果或下游提供商错误显示归一化视频超出限制,使用相同的、、和参数,将带入重试一次。如果参考资源仍然过大,在进行下一次付费视频尝试前停止操作,并报告需要在工作端设置1080p边缘/参考大小上限。请勿使用本地shell媒体命令修复此问题。
start_smax_duration_sextract_audioextract_face_framecrf=28mcp__plugin_pika_pika__normalize_videomcp__plugin_pika_pika__normalize_videoStep 2 — Edit the frame with gpt-image-2 (the "change" stage)
步骤2 — 使用gpt-image-2编辑帧(“修改”阶段)
mcp__plugin_pika_pika__generate_imageprovider="gpt-image-2"aspect_ratio=<aspect_ratio>resolution="2K"reference_images=[<face_frame_url>]quality="high""Modify the reference photograph as follows:. Keep the person's face, identity, hair, body and pose EXACTLY as in the reference. CRITICAL: preserve every object the subject is holding or touching — phones, products, drinks, bags, props, jewelry — in the exact same hand, position, orientation and scale; never remove, replace or restyle them. Change only the requested scene, background, clothing, lighting or environment, not who the person is."<change_prompt>
Keep the "preserve held objects" clause verbatim on every re-render — without
it gpt-image-2 silently drops products/phones the subject is holding.
If gpt-image-2 returns a content-policy false positive for fashion, glam, or
beauty prompts, retry once with the same intent but a modest / editorial wording
such as "polished event styling, opaque clothing, natural pose, non-sexual
fashion portrait". For makeup prompts, explicitly preserve the original eye
shape, eyelids, iris color and gaze; heavy eyeliner/eye shadow is a high-risk
identity-drift source.
调用,参数设置为、、、、,提示词如下:
mcp__plugin_pika_pika__generate_imageprovider="gpt-image-2"aspect_ratio=<aspect_ratio>resolution="2K"reference_images=[<face_frame_url>]quality="high""按如下要求修改参考照片:。精确保留参考照片中人物的面部、身份、发型、体态和姿势。重要提示:保留拍摄对象手持或接触的所有物品——手机、产品、饮品、包袋、道具、首饰——保持其在手中的位置、方向和比例完全不变;不得移除、替换或重新设计这些物品。仅修改要求的场景、背景、服装、光线或环境,不得改变人物本身。"<change_prompt>
每次重新渲染时都要保留“保留手持物品”的条款——如果缺少该条款,gpt-image-2会自动移除拍摄对象手持的产品/手机。
如果gpt-image-2对时尚、美妆或美容类提示词返回内容策略误判,使用相同意图但更温和的编辑类措辞(如"polished event styling, opaque clothing, natural pose, non-sexual fashion portrait")重试一次。对于美妆类提示词,需明确保留原始眼形、眼睑、虹膜颜色和视线;浓重的眼线/眼影是身份偏移的高风险因素。
Step 3 — Show the edited frame and wait for approval
步骤3 — 展示编辑后的帧并等待确认
Surface the edited frame and STOP. Ask "Approve for video generation, or tweak
and re-render?" Do NOT call video generation until approved. For tweaks, re-run Step 2
(locked clauses verbatim) and loop.
展示编辑后的帧并停止操作。询问“确认生成视频,还是调整后重新渲染?”在获得确认前不得调用视频生成。如需调整,重新运行步骤2(严格保留锁定条款)并循环此流程。
Step 4 — Propagate via Kling reference-video
步骤4 — 通过Kling reference-video传播效果
For each normalized segment, call
with , ,
, ,
, , ,
prompt:
mcp__plugin_pika_pika__generate_reference_videoprovider="kling"reference_videos=[<segment video_url>]reference_images=[<edited_frame_url>]aspect_ratio=<aspect_ratio>duration=<segment duration>sound=falsevideo_keep_sounds=[true]"Apply the change shown in <<<image_1>>> to <<<video_1>>>. Keep the person in <<<video_1>>> with the EXACT same face, identity, expressions, motion and timing; preserve the original video's kept sound track. The new scene/background/clothing/lighting should match <<<image_1>>>. CRITICAL: preserve every object the subject is holding or touching in <<<video_1>>> — phones, products, drinks, bags, props — in the same hand and orientation every frame. Keep mouth motion active through the final frame when the person is speaking. Do not alter the person's identity."
Append any extra creative direction (e.g. "very cinematic, soft golden light")
after the locked text — never replace it.
Do not pass to Kling with a video input. Kling rejects that
combination with ; use
plus to keep the source video's audio.
sound=trueerror:1201 sound on is not supported with video inputsound=falsevideo_keep_sounds=[true]If the source was split into multiple windows, call
.
After concat, run
when the merged output audio is missing, drifted, or discontinuous.
mcp__plugin_pika_pika__edit_concat(video_urls=[<segment outputs in order>])mcp__plugin_pika_pika__edit_audio_replace(video_url=<concat_url>, audio_url=<full_source_audio_url>, duration_policy="video")Only try Seedance if the user explicitly asks for it, or if Kling fails and a
second provider attempt is useful. Use the same segmenting rule and record the
provider error plainly if Seedance rejects the input or drops speech/action.
Async handling: if any call returns a envelope, poll
in a tight loop until terminal.
{task_id, status}mcp__plugin_pika_pika__task_status({task_id})对于每个归一化片段,调用,参数设置为、、、、、、,提示词如下:
mcp__plugin_pika_pika__generate_reference_videoprovider="kling"reference_videos=[<segment video_url>]reference_images=[<edited_frame_url>]aspect_ratio=<aspect_ratio>duration=<segment duration>sound=falsevideo_keep_sounds=[true]"将<<<image_1>>>中展示的修改效果应用到<<<video_1>>>。保留<<<video_1>>>中人物的面部、身份、表情、动作和时间轴完全不变;保留原始视频的音频轨道。新的场景/背景/服装/光线需与<<<image_1>>>匹配。重要提示:保留<<<video_1>>>中拍摄对象手持或接触的所有物品——手机、产品、饮品、包袋、道具——在每一帧中保持相同的手持方式和方向。当人物说话时,保持嘴部动作直到最后一帧。不得改变人物身份。"
在锁定文本后添加额外创意方向(如"very cinematic, soft golden light")——不得替换锁定文本。
向Kling传入视频输入时不得设置。Kling会拒绝该组合并返回;需使用搭配来保留源视频音频。
sound=trueerror:1201 sound on is not supported with video inputsound=falsevideo_keep_sounds=[true]如果源视频被分割为多个窗口,调用。拼接完成后,如果合并输出的音频缺失、偏移或不连续,运行。
mcp__plugin_pika_pika__edit_concat(video_urls=[<segment outputs in order>])mcp__plugin_pika_pika__edit_audio_replace(video_url=<concat_url>, audio_url=<full_source_audio_url>, duration_policy="video")仅当用户明确要求,或Kling失败且需要尝试第二个提供商时,才使用Seedance。遵循相同的分段规则,如果Seedance拒绝输入或丢失语音/动作,需清晰记录提供商错误。
异步处理:如果任何调用返回包,循环调用轮询直到任务进入终态。
{task_id, status}mcp__plugin_pika_pika__task_status({task_id})Step 5 — Audio, duration and identity QA
步骤5 — 音频、时长与身份校验
Before reporting success, verify the generated video against the source:
- Duration must not be meaningfully cut off. If output duration differs from the intended source window or merged source duration by more than 0.5s, mark the run as failed / needs follow-up.
- If the source has speech, audio must be present through the tail and mouth
movement must not freeze before the spoken content ends. If words are missing,
garbled, silent, or visibly out of sync, do not call the run .
PASS - The approved frame corrections must persist into the video. If the provider reintroduces a removed artifact such as eyeglass glare, mark it as a propagation caveat or re-render from a stronger approved frame.
- Compare identity at start, middle, segment boundaries, and end. If Kling preserved motion but changed the face, call that out as a provider limitation instead of a pass.
If the video is visually acceptable but speech audio is missing, incomplete, or
drifted, offer one paid repair pass:
mcp__plugin_pika_pika__edit_audio_replace(video_url=<generated_video_url>, audio_url=<full_source_audio_url or segment_audio_url>, duration_policy="video")mcp__plugin_pika_pika__edit_lipsync(video_url=<audio_restored_url>, audio_url=<full_source_audio_url or segment_audio_url>, variant="v2-pro")
If the model froze the mouth near the end, do not keep escalating to
automatically; lip-sync cannot reliably recover a face track with no mouth motion.
Offer trim / regenerate instead.
sync-3在报告成功前,将生成的视频与源视频进行对比验证:
- 时长不得被大幅截断。如果输出时长与预期源窗口或合并源时长差异超过0.5秒,标记运行失败/需要跟进。
- 如果源视频包含语音,音频必须完整覆盖到结尾,且嘴部动作不得在语音结束前停止。如果出现语音缺失、混乱、静音或明显不同步,不得标记运行“通过”。
- 已确认的帧修改效果必须在视频中持续保留。如果提供商重新引入已移除的瑕疵(如眼镜反光),标记为传播缺陷或基于更清晰的确认帧重新渲染。
- 在开头、中间、片段边界和结尾对比身份。如果Kling保留了动作但改变了面部,需标注为提供商限制而非通过。
如果视频视觉效果可接受但语音音频缺失、不完整或偏移,提供一次付费修复:
mcp__plugin_pika_pika__edit_audio_replace(video_url=<generated_video_url>, audio_url=<full_source_audio_url or segment_audio_url>, duration_policy="video")mcp__plugin_pika_pika__edit_lipsync(video_url=<audio_restored_url>, audio_url=<full_source_audio_url or segment_audio_url>, variant="v2-pro")
如果模型在结尾部分冻结了嘴部动作,请勿自动升级到;唇形同步无法可靠恢复无嘴部动作的人脸跟踪。提供修剪/重新生成选项。
sync-3Step 6 — Download + return
步骤6 — 下载并返回结果
Download the result to and return
that path plus the final report fields: source, edited frame URL, final video
URL, provider, job/task IDs, cost estimate or , elapsed
time, QA notes, and follow-up issue.
~/Downloads/fix-my-look/<run-id>/result.mp4not surfaced将结果下载到,返回该路径及最终报告字段:源视频、编辑后帧URL、最终视频URL、提供商、任务ID、成本预估或、耗时、校验记录及跟进问题。
~/Downloads/fix-my-look/<run-id>/result.mp4not surfacedFailure modes
故障模式
| Symptom | Cause | Fix |
|---|---|---|
| Output face drifts from the original | gpt-image-2 over-edited the face OR the provider under-weighted the source video | Re-run Step 2 with a stronger "keep the face the same" clause; soften |
| Output looks like the original (no change) | Edited image too similar, OR you passed the raw frame not the edited output | Re-run Step 2 with a more dramatic prompt; confirm the edited frame URL. |
| Output aspect doesn't match source | Source aspect not in {16:9, 9:16, 1:1, 4:3, 3:4} | Step 1 returns |
| Provider rejects the normalized video as too large | normalize output can remain too large for 4K/iPhone sources | Retry normalize once with |
| Long source only returns the first short window | The caller normalized once with | Split into 14.8s windows, generate each segment, then |
| Speaking clip loses sound, drops words, or freezes mouth at the tail | Provider regenerated speech/audio instead of preserving the source, or the face track has no mouth motion to drive | Mark as not pass. Offer one |
| Approved frame fix disappears in the video | Provider propagation reintroduced the original artifact | Re-render from a stronger approved frame or mark provider propagation caveat; do not claim the frame correction shipped. |
Kling rejects with | | Retry the Kling call with |
| Kling output is shorter than the normalized source | Provider returned a shorter render, or the caller accidentally passed a trimmed reference | Do not mark pass. Compare output duration to the normalized source, then regenerate that segment or ask the user for a shorter window. |
| 症状 | 原因 | 修复方案 |
|---|---|---|
| 输出人脸与原始人脸偏移 | gpt-image-2过度编辑人脸,或提供商对源视频权重不足 | 重新运行步骤2,强化“保持人脸不变”的条款;弱化 |
| 输出与原始视频一致(未修改) | 编辑后的图像与原图过于相似,或传入的是原始帧而非编辑后的输出 | 重新运行步骤2,使用更具冲击力的提示词;确认传入的是编辑后帧的URL。 |
| 输出比例与源视频不匹配 | 源视频比例不在{16:9, 9:16, 1:1, 4:3, 3:4}范围内 | 步骤1返回 |
| 提供商拒绝归一化视频(过大) | 4K/iPhone源视频的归一化输出仍过大 | 使用 |
| 长视频仅返回第一个短窗口 | 调用方仅用 | 分割为14.8秒窗口,生成每个片段,然后按顺序调用 |
| 带语音的视频丢失声音、漏词或结尾嘴部冻结 | 提供商重新生成了语音/音频而非保留源音频,或人脸跟踪无嘴部动作驱动 | 标记为未通过。提供一次 |
| 已确认的帧修改效果在视频中消失 | 提供商传播过程中重新引入原始瑕疵 | 基于更清晰的确认帧重新渲染,或标记为提供商传播缺陷;不得声称帧修改已生效。 |
Kling返回 | 传入视频参考时设置了 | 使用 |
| Kling输出比归一化源视频短 | 提供商返回的渲染时长更短,或调用方意外传入了修剪后的参考 | 标记为未通过。对比输出时长与归一化源视频,重新生成该片段或询问用户是否使用更短的窗口。 |