image-to-video
Animate any still image on RunComfy — this skill is a smart router that matches the user's intent to the right i2v model in the RunComfy catalog. Picks HappyHorse 1.0 I2V (Arena #1, native audio, identity preservation) for general animations, Wan 2.7 with `audio_url` for custom-voiceover lip-sync, or Seedance 2.0 Pro for multi-modal animation from image + reference video + reference audio. Bundles each model's documented prompting patterns so the caller gets sharper output without burning iterations on the wrong model. Calls `runcomfy run <vendor>/<model>/image-to-video` (or endpoint variant) through the local RunComfy CLI. Triggers on "image to video", "image-to-video", "i2v", "animate image", "make this move", or any explicit ask to turn a still into video.
NPX Install
npx skill4agent add agentspace-so/runcomfy-agent-skills image-to-videoTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Image-to-Video — Pro Pack on RunComfy
npx skills add agentspace-so/runcomfy-skills --skill image-to-video -gPick the right model for the user's intent
| User intent | Model | Why |
|---|---|---|
| Animate a portrait — keep identity stable | HappyHorse 1.0 I2V | #1 on Artificial Analysis Arena (Elo 1392); strong facial fidelity |
| Product reveal / 360 / macro motion | HappyHorse 1.0 I2V | Geometry preservation + smooth camera moves |
| Native synchronized ambient audio in one pass | HappyHorse 1.0 I2V | In-pass audio synthesis |
| Animate and lip-sync to a custom voiceover track | Wan 2.7 + | Accepts your own MP3/WAV (3–30s, ≤15MB) and drives lip-sync to it |
| Multi-language dub variants (same image, different audio per call) | Wan 2.7 + | Same shot, swap |
| Multi-modal — image + reference video + reference audio together | Seedance 2.0 Pro | Up to 9 image refs, 3 video refs (2–15s each), 3 audio refs |
| Brand-consistent narrative with character ref + scene ref + voice ref | Seedance 2.0 Pro | Image holds identity, video holds scene, audio holds voice |
| Default if unspecified | HappyHorse 1.0 I2V | Best all-round quality + native audio |
Prerequisites
- RunComfy CLI —
npm i -g @runcomfy/cli - RunComfy account — opens a browser device-code flow.
runcomfy login - CI / containers — set .
RUNCOMFY_TOKEN=<token> - A source image URL — JPEG/PNG/WebP, min 300px, ≤10MB; aspect 1:2.5 to 2.5:1 (HappyHorse) — other models have similar specs.
Route 1: HappyHorse 1.0 I2V — default for portrait / product / general animation
happyhorse/happyhorse-1-0/image-to-videoSchema
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
| string | yes | — | JPEG/JPG/PNG/WEBP. Min 300px. Aspect 1:2.5–2.5:1. ≤10MB. |
| string | yes | — | ≤5000 non-CJK or 2500 CJK chars. Motion / camera / lighting description. |
| enum | no | | |
| int | no | 5 | 3–15 seconds. |
| int | no | 0 | Reuse for variant comparisons. |
| bool | no | true | Provider watermark toggle. |
Invoke
runcomfy run happyhorse/happyhorse-1-0/image-to-video \
--input '{
"image_url": "https://.../portrait.jpg",
"prompt": "Gentle camera drift around the subject'\''s face, subtle breathing motion, identity-stable features, soft natural light."
}' \
--output-dir <absolute/path>Prompting tips
- Lead with motion verbs: "drift", "dolly in", "orbit", "tilt up", "reveal", "blink", "breathe". Front-load what's MOVING.
- Don't restate the image — the model sees it. Focus tokens on what changes.
- Preservation goals explicit: "identity-stable features", "packaging unchanged", "background geometry stable".
- Lighting evolution: "rim light intensifying", "shadows shortening as camera rises".
- One beat per clip — single primary motion (orbit OR dolly OR tilt OR character action).
Route 2: Wan 2.7 + audio_url
— when the user has a custom voiceover
audio_urlwan-ai/wan-2-7/text-to-video/image-to-videoaudio_urlSchema (Wan 2.7 t2v with audio)
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
| string | yes | — | Up to ~5000 chars. Describe the talking-head shot: framing, lighting, motion. |
| string | yes (for lip-sync) | — | WAV/MP3, 3–30s, ≤15MB. Drives lip-sync. |
| enum | no | | |
| enum | no | | |
| enum | no | | 2–15 (whole seconds). Match your audio length. |
| string | no | — | Concrete issues to avoid (e.g. "no subtitles, no flicker"). |
| int | no | — | Reproducibility. |
Invoke
runcomfy run wan-ai/wan-2-7/text-to-video \
--input '{
"prompt": "Medium close-up of a confident spokesperson in a softly-lit recording booth, leaning slightly toward the camera, locked tripod, shallow DOF, warm key light from camera-left.",
"audio_url": "https://.../voiceover-en.mp3",
"duration": 12,
"aspect_ratio": "9:16"
}' \
--output-dir <absolute/path>Prompting tips
- Describe the talking-head shot — framing, lighting, lens feel. The audio drives the lip-sync; the prompt builds the visual frame around it.
- Match to audio length — clip will be silent past the audio if too long.
duration - Use for issues:
negative_prompt."no subtitles, no flicker, no distorted hands" - For multi-language dubs — same prompt, swap per call. Lock seed for visual consistency across languages.
audio_url
Route 3: Seedance 2.0 Pro — multi-modal animation (image + ref video + ref audio)
bytedance/seedance-v2/proSchema (Seedance 2.0 Pro, i2v-relevant fields)
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
| string | yes | — | CN ≤500 chars OR EN ≤1000 words. |
| array | yes (for i2v) | | 0–9 images. First is the primary subject. |
| array | no | | 0–3 reference clips (MP4/MOV), 2–15s each. |
| array | no | | 0–3 reference audio (WAV/MP3), 2–15s, < 15MB each. |
| enum | no | | |
| int | no | 5 | 4–15 (whole seconds). |
| enum | no | | |
| bool | no | true | In-pass synchronized speech / SFX / music. |
| int | no | — | Reproducibility. |
Invoke
runcomfy run bytedance/seedance-v2/pro \
--input '{
"prompt": "Subject from image 1 walks through the café in video 1, voice tone matches audio 1. Medium close-up, slow push-in, warm light, gentle ambience.",
"image_url": ["https://.../subject.jpg"],
"video_url": ["https://.../cafe-locked-shot.mp4"],
"audio_url": ["https://.../voice-tone.mp3"],
"duration": 8
}' \
--output-dir <absolute/path>Prompting tips
- Image vs text division — use for what must stay stable (face, costume, brand); use
image_urlfor what should evolve (action, mood, lighting).prompt - Number the refs in the prompt: . Seedance routes cues correctly.
"subject from image 1, lighting from video 1, voice from audio 1" - Reference media specs — videos / audio must be 2–15s; audio < 15MB.
- Don't mix radically different aesthetics — if image 1 is a watercolor and video 1 is photoreal, output drifts.
Limitations
- Each route inherits its model's limits. HappyHorse: 15s cap, output aspect = input aspect. Wan 2.7: 15s cap, audio 3–30s/15MB. Seedance: 720p ceiling on this template, 15s cap.
- No multi-route blending. This skill picks one model per call. If the user wants HappyHorse animation + Wan-style lip-sync in the same clip, that's two calls + a stitch (out of scope here).
- Brand-specific overrides — if the user named a specific model variant not listed (e.g. Wan 2.6, Seedance 1.5), route to the corresponding brand skill (,
wan-2-7) instead of forcing it through here.seedance-v2
Exit codes
| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
How it works
runcomfy run <model_id>.runcomfy.net.runcomfy.com--output-dirCtrl-CSecurity & Privacy
- Token storage: writes the API token to
runcomfy loginwith mode 0600 (owner-only read/write). Set~/.config/runcomfy/token.jsonenv var to bypass the file entirely in CI / containers.RUNCOMFY_TOKEN - Input boundary: the user prompt is passed as a JSON string to the CLI via . The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
--input - Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
- Outbound endpoints: only (request submission) and
model-api.runcomfy.net/*.runcomfy.net(download whitelist for generated outputs). No telemetry, no callbacks.*.runcomfy.com - Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.