language-swap
Original:🇺🇸 English
Translated
Translate and dub a video into another language. One worker call preserves each speaker's voice, translates the speech, and returns a fully A/V-synced video. Lipsync ON by default. Use when the user says "translate this video", "dub this in <language>", "make this Spanish/French/Japanese", "translate the audio", or asks for bilingual subtitles on a dubbed/language-swap output. NOT for: subtitles/captions only (use add_captions or video-captions), transcription only (use transcribe_audio directly), or translating on-screen text overlays.
6installs
Sourcepika-labs/pika-plugins
Added on
NPX Install
npx skill4agent add pika-labs/pika-plugins language-swapTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →<!-- source-of-truth: pika-claude-plugin/skills/language-swap -->
takes a plan instead of (pass exactly one — they are mutually exclusive):
Step 1 — Dub the video (state:
is worker-backed: if the response comes back as , poll until , then read the dubbed video from the result ( for a video source; for an audio source). Also capture optional , , and — these are target-language transcript metadata the dub worker produced, consumed in Step 3.
Step 2 — Lipsync (state:
Step 3 — Burn target-language captions (state:
/pika:language-swap
Translate and dub a video into another language while preserving the original speaker's voice. Pipeline: dub (one worker call) → lipsync (default ON) → burn target-language captions or bilingual captions.
The dubbing worker does the heavy lifting in a single call: it transcribes, translates, preserves each speaker's voice server-side (no separate clone step), and returns a fully A/V-synced video — so there is no manual transcribe/clone/TTS/replace chain to manage and no duration-drift handling to do by hand.
Segmented / multi-language dub (per-range languages)
Use this when the user wants different languages on different parts of one video (e.g. first half Spanish, second half Japanese), or wants to translate only some sections and keep the rest in the original voice. Both are the same thing: a timeline of segments, each tagged with a language; any uncovered range keeps the original audio.
mcp__plugin_pika_pika__dub_videosegmentstarget_languagemcp__plugin_pika_pika__dub_video(source_video_url=<video_url>, segments=[
{start_s: 0, end_s: 30, target_language: "es"},
{start_s: 30, end_s: 60, target_language: "ja"}
])How to build the plan: the user needs to know where the content is before they can pick ranges, so transcribe first — extract the audio with , then , show the user the timestamped segments, and let them say which time range goes to which language. Then assemble (seconds, ordered, non-overlapping) and make ONE call. There is no separate "video understanding" tool — the timestamped transcript is the understanding step.
mcp__claude_ai_pika__extract_audio_from_videomcp__claude_ai_pika__transcribe_audio(audio=<audio_url>, timestamps=true)segments[]dub_videoBehavior of the segmented path:
- Shared voice across all segments. The source speaker is cloned once and every segment — in every language — is spoken in that same cloned voice, then the clone is recycled, all inside the one call. You never clone or delete a voice yourself.
dub_video - Keep-original. Any time range NOT covered by a segment plays the original audio (voice + background) untouched. To translate only parts of a video, list only the parts you want translated.
- Length-locked. Output stays exactly the source length (each dubbed range is speed-fit to its window), so boundaries line up with the original timeline.
- Provider. Mixed-language-per-range always uses the voice-cloning route automatically; the single-call whole-video dubbing route can't mix languages per range, so don't force a single-call provider for a segmented plan. Every covered language must be supported on the voice-cloning route — if one isn't, surface the error and consult .
references/language-coverage.md - Result. Same dubbed-video result; echoes the covered languages comma-joined (e.g.
target_language), and no single"spa,jpn"is returned (the track is multi-language). Lipsync (Step 2, default ON, ≤5 min) still runs on the whole dubbed video. For captions (Step 3), use the returned multi-languagetranscript_languageinsubtitles[]; auto re-transcription can't pick a single language for a mixed track.caption_mode="manual"
If rejects (older deployment without segmented support), fall back to dubbing each range single-language and concatenating — but prefer the one-call segmented path when available.
mcp__plugin_pika_pika__dub_videosegmentsBehavior defaults
- Target language: required via . Prefer language codes:
--to <language>,es,fr,ja,de,pt-BR. The dubbing worker accepts ISO/BCP-47-like tags and normalizes script/region subtags before calling ElevenLabs (for examplezh-Hans→zh-Hans;zh→zh-Hant-TW).zh - Lipsync: ON by default — re-matches the speaker's mouth to the translated audio (fal sync-lipsync; the full-video lip-matcher, distinct from the portrait-image animator). Pass to skip it when the source has no on-camera face or to avoid the meaningful cost (~$4/min on the sync-2-pro tier). Applies only to videos ≤5 min —
--no-lipsynchard-caps at 300 s upstream, so longer sources auto-skip lipsync (see Step 2); the dub itself has no length limit.edit_lipsync - BGM / background music: kept by default — the dub lays the translated voice over the original music / SFX bed. Pass for a translate-only output: the worker drops the original music and keeps only the translated speech (
--no-bgm).drop_background_audio=true - Captions: target-language captions are burned by default. When the user asks for bilingual / dual subtitles, burn the target-language (translated) row on top and the source-language (original) row below it — after dubbing, the translated speech is what's actually being said, so it's the primary row; the original is the secondary reference.
- Bilingual captions: enable when the user passes or asks for "bilingual subtitles", "dual subtitles", "two-language captions", "original + translated subtitles", "双语字幕", or "原文+译文字幕".
--bilingual-subtitles - Language coverage: if language support is questioned or a language-related upstream error occurs, consult . Do not proactively surface provider-specific language-list details in normal user replies.
references/language-coverage.md
State variables produced and consumed
- : input — from positional arg
video_url - : original positional URL — preserved for diagnostics if
source_input_urlis rehostedvideo_url - : text — from
target_language--to <language> - : boolean — defaults true; false only when
with_lipsync--no-lipsync - : boolean — true when
no_bgm(maps to--no-bgm)drop_background_audio=true - : boolean — true when the user asks for bilingual / dual subtitles
bilingual_subtitles - : dubbed, A/V-synced video — produced by Step 1
dubbed_video_url - : optional target-language timed subtitles from the dub result — consumed by Step 3
dub_subtitles - : optional source-language timed subtitles from the dub result — consumed by Step 3 for bilingual captions
source_subtitles - : optional target-language SRT from the dub result — returned for review/debugging
dub_transcript_srt - : optional source-language SRT from the dub result — returned for review/debugging
source_transcript_srt - : optional source-language code from the dub result
source_transcript_language - : dubbed video with mouth re-matched — produced by Step 2 (when lipsync runs)
lipsynced_video_url - : final visual video URL before captions are burned
caption_target_video_url - : video with target-language captions burned in — produced by Step 3
final_video_url
Step 0 — Parse input
Required:
- Positional — MUST be
video_urlhttps://... - — target language (free-text or BCP-47 code)
--to <language>
Optional:
- — skip the default mouth-matching step.
--no-lipsync - — translate-only output; drop the original music/SFX bed.
--no-bgm - — burn source-language + target-language subtitle rows.
--bilingual-subtitles
Infer from user wording even if the explicit flag is absent.
bilingual_subtitles=trueIf is missing, STOP and prompt the user — UNLESS the user wants different languages on different parts, or to translate only some sections: that is the per-range segmented path (see "Segmented / multi-language dub" above), which uses a plan instead of .
--tosegments--toFor the segmented path, first build the time-range plan: extract the audio with , then transcribe it with timestamps via , show the user the timestamped segments, and capture which time range maps to which language into .
mcp__claude_ai_pika__extract_audio_from_videomcp__claude_ai_pika__transcribe_audio(audio=<audio_url>, timestamps=true)segments[]Outputs: , , (default true), (default false), (default false).
video_urltarget_languagewith_lipsyncno_bgmbilingual_subtitlesStep 1 — Dub the video (state: dubbed_video_url
)
dubbed_video_urlCall with:
mcp__plugin_pika_pika__dub_video- —
source_video_url<video_url> - —
target_language(ISO/BCP-47-like tag, e.g.<target_language>,es,pt-BR)zh-Hans - —
source_language"auto" - —
drop_background_audioonly whentrueis set; otherwise omit (keeps the original music bed)no_bgm
In Claude plugin installs the tool is exposed as . If your host exposes the same Pika server under a different local namespace, call that fully-qualified local tool with the same arguments. The Claude.ai connector surface may lag this plugin-only tool, so do not assume the connector prefix has it.
mcp__plugin_pika_pika__dub_videomcp__plugin_pika_pika__dub_video{task_id, status}mcp__plugin_pika_pika__task_statuscompletedvideo_urlaudio_urlsubtitles[]transcript_srttranscript_languageFor bilingual captions, also capture optional , , and . These source-language transcript fields are best-effort. The dubbed media is still valid when transcript fields are absent.
source_subtitles[]source_transcript_srtsource_transcript_languageSource not worker-fetchable: if fails because the source URL cannot be fetched — especially HTTP / , hotlink protection, UA-gated hosts (Wikimedia/news CDNs), or "Access Denied" errors — do not keep retrying the same call. Rehost first:
mcp__plugin_pika_pika__dub_video4034xx- Download the source bytes in the client/host environment using a normal browser/download path or an HTTP client with a real user-agent.
- Call with the downloaded filename, MIME type, and exact byte size, then upload the bytes to the returned presigned URL.
mcp__claude_ai_pika__upload_asset - Set and replace
source_input_url = <original URL>with the returned Pika CDNvideo_url. Do not construct CDN URLs manually.public_url - Retry Step 1 once against the Pika CDN URL. All later steps must use the updated .
video_url
If the client/host also cannot download the source bytes, stop and tell the user the host blocks direct fetch; ask them to upload the file or provide a different URL.
Outputs: , , , , , .
dubbed_video_urldub_subtitlessource_subtitlesdub_transcript_srtsource_transcript_srtsource_transcript_languageStep 2 — Lipsync (state: lipsynced_video_url
)
lipsynced_video_urlDefault ON. Skip entirely when is passed (then Step 3 captions directly).
--no-lipsyncdubbed_video_urlHard 5-minute cap — check duration before calling. enforces a 300-second (5-minute) audio limit upstream (sync.so) and rejects anything longer with before billing; every tier shares the same cap, so falling back through tiers does NOT help. If the dubbed video's (returned by Step 1) is > 300, skip lipsync entirely, go straight to Step 3 captioning , and tell the user lipsync isn't available past 5 minutes (the dub itself works at any length). Only run the lipsync call below when .
mcp__claude_ai_pika__edit_lipsyncinvalid_inputvariantduration_secondsdubbed_video_urlduration_seconds ≤ 300Cost heads-up first. Lipsync is the dominant cost (~$4/min on the v2-pro tier). Before calling it, estimate from the dubbed video's (returned by Step 1) — — and send the user a one-line heads-up, e.g. "Lipsync on — ~2 min video, est. ~$8 (pass to skip). Starting now." Then proceed straight into the call; this is a heads-up, not an approval gate.
duration_secondsceil(duration_seconds / 60) × $4--no-lipsyncCall with no — the worker syncs to the dubbed video's own embedded translated audio. Do not extract the audio just to feed it back in. ( defaults to , with / as fallbacks.)
mcp__claude_ai_pika__edit_lipsync(video_url=<dubbed_video_url>)audio_urlvariantv2-prosync-3v2Outputs: (read from of response). When this step runs, Step 3 captions this video, not — otherwise the lip-matching is dropped.
lipsynced_video_urlurldubbed_video_urlStep 3 — Burn target-language captions (state: final_video_url
)
final_video_urlCaption the final video so the output carries readable subtitles (matches the common "translate + subtitle" expectation). Set to when lipsync ran (the default), or when skipped it.
caption_target_video_urllipsynced_video_urldubbed_video_url--no-lipsyncIf this request is part of a Double video / split-screen comparison flow, build that Double video first and set to the final composed video URL. Do not burn captions onto only one panel before the Double video is composed; the bilingual caption burn should happen once, on the final visual output.
caption_target_video_urlCall once on .
mcp__claude_ai_pika__add_captionscaption_target_video_urlWhen , use manual bilingual mode if both tracks are available: call . The target-language (translated) row is the primary and renders on top; the source-language (original) row is the secondary reference and renders below it () — after dubbing the translated speech is what's actually spoken, so it leads. It works for every dub worker provider branch as long as returns both subtitle tracks.
bilingual_subtitles=truemcp__claude_ai_pika__add_captions(video_url=<caption_target_video_url>, caption_mode="manual", subtitles=<dub_subtitles>, secondary_subtitles=<source_subtitles>, language=<target_language>, secondary_language=<source_transcript_language if available>, secondary_subtitles_position="below", style="branded-space-mono", position="bottom")subtitlessecondary_subtitles_position="below"mcp__plugin_pika_pika__dub_videoIf but is missing, fall back to target-language captions only and tell the user the source transcript was unavailable from the dubbing provider. Do not invent a source-language row by retranscribing the final dubbed audio; that audio is already in the target language.
bilingual_subtitles=truesource_subtitlesFor target-language-only captions, prefer the target-language subtitles the dub worker already returned: if is non-empty, call . Manual mode skips a duplicate transcription pass and preserves the dubbing provider's target-language text.
dub_subtitlesmcp__claude_ai_pika__add_captions(video_url=<caption_target_video_url>, caption_mode="manual", subtitles=<dub_subtitles>, language=<target_language>, style="branded-space-mono", position="bottom")If is missing, empty, or rejected by , fall back to auto: call . Auto mode re-transcribes the dubbed audio; use it only as the fallback because it costs extra time and can introduce CJK/proper-noun drift.
dub_subtitlesmcp__claude_ai_pika__add_captionsmcp__claude_ai_pika__add_captions(video_url=<caption_target_video_url>, caption_mode="auto", language=<target_language>, style="branded-space-mono", position="bottom")Use unless the user asks for a punchier style ( / / ). Skip this step only if the user explicitly asked for audio-only dubbing with no captions.
style="branded-space-mono"tiktokhormozikaraokeOutputs: (read from of response).
final_video_urlurlStep 4 — Return
Reply with + the translated transcript (from / the dub result) for user review.
final_video_urldub_transcript_srtOffer a bilingual-subtitle version. When this run burned target-language-only captions () and a source transcript is available ( is non-empty), close the reply by asking whether the user also wants a dual-subtitle version, e.g. "Want a bilingual version with the original + translated subtitles stacked? I can add it." If they say yes, re-run Step 3 in bilingual manual mode on the same (the pre-caption visual video) — no re-dub or re-lipsync is needed, only the caption burn changes — and return the new . Skip the offer when bilingual captions were already burned (), or when is missing — without a source transcript a bilingual version can't be produced (the dubbed audio is already in the target language), so do not offer what can't be delivered.
bilingual_subtitles=falsesource_subtitlescaption_target_video_urlfinal_video_urlbilingual_subtitles=truesource_subtitlesFailure modes
| Class | Trigger | Mitigation | Fallback |
|---|---|---|---|
| Source URL not worker-fetchable | | Download source bytes in the client/host environment, | If local download also fails, ask the user to upload the file or provide a different URL |
| Extra target language | Target is Cantonese ( | Supported — call | Background music isn't preserved for these languages (dubbed speech only) |
| Dub call fails (not fetchability) | | Surface the error to the user; if the message points at the language, check | None — return the error, do not silently produce a non-dubbed video |
| Dub returns no speech | Silent video — nothing to translate | Surface to user: "no detectable speech in video — nothing to translate" | None |
| Original voice can't be kept | For the languages above, the source is too short or noisy to keep the original speaker's voice | Surface the error and ask the user for a cleaner / longer source clip | None — the dub fails rather than using a different voice |
| Lipsync source too long | Dubbed video >5 min — | Check | Dubbed video, no lip-match |
| Lipsync step fails | | Fall back through | Audio-replaced video, no lip-match |
| Captions wrong language | Step 3 auto-transcription mis-detects language | Pass explicit | Manual |
| Bilingual source row unavailable | User asked for bilingual subtitles but | Use target-language captions and explain the source transcript was unavailable | Target-language captions only |
Compatibility
Primary target: Claude Code. Uses standard MCP tools only. Works on Codex / Cursor / Claude Desktop.