Loading...
Loading...
Use when the user wants to convert a video between horizontal and vertical orientations while preserving the inverted aspect ratio (16:9 ↔ 9:16, 4:3 ↔ 3:4, 21:9 ↔ 9:21). The skill crops a narrow band from the source and tracks the active speaker — the person whose mouth is moving — via MediaPipe face landmarks and mouth-aspect-ratio variance, so the talker stays in frame even when other people are visible. Triggers — "横转竖", "竖转横", "做成竖屏发抖音/视频号/小红书", "16:9 to 9:16", "make this vertical for Reels / TikTok / YouTube Shorts", "crop to portrait", "convert to landscape".
npx skill4agent add jianshuo/claude-skills wjs-reframing-video.crop.json| Is | Is not |
|---|---|
| Visual active-speaker detection via MAR (mouth-aspect-ratio) variance | Audio-visual fusion (audio energy + lip motion cross-correlated) |
| Stable face tracking across frames by center-distance matching | Re-identification across long gaps / occlusions |
| Speaker-aligned segments with hysteresis to prevent flicker | Frame-by-frame switching on every flicker |
| |
Hard cuts between segments, fixed crop within each segment ( | Smooth panning that drifts during a speaker's turn (opt-in |
| Audio stream-copy (bit-exact) | Audio reprocessing / re-encoding |
MediaPipe Tasks | Per-frame neural inpainting / out-painting |
One | Frame-by-frame Python compositor |
pip install mediapipe opencv-python numpyPATHface_landmarker.taskcrop.py~/.claude/skills/wjs-reframing-video/models/W / HH / W| Source orientation | Crop window |
|---|---|
| Horizontal (W > H) → Portrait | |
| Portrait (W < H) → Horizontal | |
W_crop = 608H_crop = 1080W_crop = 1080H_crop = 608--output-size 1080x1920--target portrait|landscape--sample-fpsFaceLandmarkerface_id--mar-var-window-sec--mar-var-threshold--min-segment-sec--motion cut--motion smoothcrop=W:H:x='expr':y='expr', scale=OUT_W:OUT_Hxyscripts/crop.py<input>.crop.json<input>_cropped.mp4<input>.crop.json{
"_about": "wjs-reframing-video crop plan for cam_a.MOV. Active-speaker detected via MAR variance.",
"_help": {
"source_size": "[width, height] in pixels.",
"target_size": "[width, height] of the final rendered output.",
"crop_window": "[width, height] of the moving crop in source coords.",
"chunks": "Speaker-aligned segments: {t0, t1, cx, cy, speaker_id}.",
"face_pick_mode": "speaker = MAR-variance active-speaker; largest = old behavior.",
"speaker_id": "Stable face track id. null means no face / silence fallback."
},
"schema_version": 2,
"source": "cam_a.MOV",
"source_size": [1920, 1080],
"target": "portrait",
"target_size": [1080, 1920],
"crop_window": [608, 1080],
"face_pick_mode": "speaker",
"sample_fps": 5.0,
"mar_var_window_sec": 1.0,
"mar_var_threshold": 1.5e-4,
"min_segment_sec": 1.5,
"chunks": [
{"t0": 0.0, "t1": 4.2, "cx": 808, "cy": 540, "speaker_id": 0},
{"t0": 4.2, "t1": 11.6, "cx": 1182, "cy": 540, "speaker_id": 1},
{"t0": 11.6, "t1": 14.0, "cx": 808, "cy": 540, "speaker_id": 0}
],
"face_sample_count": 1234,
"track_count": 2
}--sample-fpshevc_videotoolboxface#N: Xs on screen (Y%)--mar-var-threshold--face-pick largestffmpeg lenscorrection608×1080 → 1080×1920--output-size 608x1080--bitrate 12M--bitrate 8M