wjs-segmenting-video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

wjs-segmenting-video

Cut a long video + SRT into multiple stand-alone short clips, each oriented for the target platform. This skill stops after cutting + cropping — it hands off the raw clips to

/wjs-overlaying-video

for covers, captions, illustrations, CTA, and final render.

将长视频+SRT字幕文件切割为多个独立的短视频片段，每个片段适配目标平台。此技能仅完成剪辑与裁切操作——之后会将原始片段交付给

/wjs-overlaying-video

，由其负责添加封面、字幕、插图、行动号召（CTA）并完成最终渲染。

When to use

适用场景

Long-form video (≥10 min) with an existing SRT transcript.
Goal is stand-alone short clips (each viewable without context).
The user will (or you will) drive post-production separately in
```
/wjs-overlaying-video
```
.

时长≥10分钟的长视频，且已有对应的SRT字幕文件。
目标是生成独立可观看的短视频片段（无需上下文即可理解内容）。
用户将（或由你）在
```
/wjs-overlaying-video
```
中单独进行后期制作。

When NOT to use

不适用场景

Single-topic trimming → just use
```
ffmpeg -ss A -to B
```
.
No transcript yet → run /wjs-transcribing-audio
first (then
```
/wjs-translating-subtitles
```
if the segments need a non-source language).
Multicam editing → use /wjs-editing-multicam
.
Highlight reel with multiple cuts inside a single topic → that's editing, not segmentation.

单主题修剪 → 直接使用
```
ffmpeg -ss A -to B
```
即可。
尚无字幕文件 → 先运行**
```
/wjs-transcribing-audio
```
**（如果片段需要非源语言字幕，后续再运行
```
/wjs-translating-subtitles
```
）。
多机位剪辑 → 使用**
```
/wjs-editing-multicam
```
**。
单主题内包含多段剪辑的高光集锦 → 这属于剪辑操作，而非片段分割。

What this skill IS — and IS NOT

本技能的能力边界

Is	Is not
You (the agent) read the full SRT and decide the topic boundaries	A script that runs NLP topic modeling, silence detection, or "viral moment" scoring. Topic boundaries are semantic; competing tools (Descript, OpusClip, Riverside Magic Clips) all get this wrong by automating it.
`segment.py` cuts; `/wjs-reframing-video` reorients	An end-to-end "magic" pipeline
Accurate-seek cuts by default (re-encode) — clip starts EXACTLY at requested timestamp	Stream-copy cuts (those produce keyframe-snap drift up to GOP duration)
Hands off raw cropped clips + per-clip SRTs	Burned subtitles, covers, intros, CTAs (those live in `/wjs-overlaying-video` )

具备的能力	不具备的能力
由你（Agent）通读完整SRT字幕并确定主题边界	运行NLP主题建模、静音检测或“爆点时刻”评分的脚本。主题边界是语义层面的；同类工具（Descript、OpusClip、Riverside Magic Clips）通过自动化处理都会在这一步出错。
`segment.py` 负责剪辑； `/wjs-reframing-video` 负责调整画面方向	端到端的“一键式”流水线
默认使用精准时间轴剪辑（重新编码）——片段起始时间与请求的时间戳完全一致	流复制剪辑（此类剪辑会产生关键帧偏移，偏移时长可达GOP周期）
交付裁切后的原始片段+对应每个片段的SRT字幕	添加内嵌字幕、封面、开场、行动号召（CTA）（这些操作由 `/wjs-overlaying-video` 负责）

The pipeline

工作流程

long video + SRT
   ↓     (agent reads SRT, decides topics — judgment, not parsing)
segments.json
   ↓     segment.py --reencode (accurate seek; clip starts exactly at requested t)
clip_NN.mp4 + frame_NN.jpg
   ↓     ASK: target platform orientation match source?
   ↓     /wjs-reframing-video on each clip (if 16:9 → 9:16, etc.)
   ↓     re-extract frames from cropped clips
clip_NN.mp4 (now in target orientation) + clip_NN.zh-CN.burn.srt
   ↓
HAND OFF → /wjs-overlaying-video
   (does covers + captions + illustrations + CTA + final render)

长视频 + SRT字幕
   ↓     (Agent通读SRT字幕，确定主题边界——依赖判断，而非解析)
segments.json
   ↓     segment.py --reencode (精准时间轴；片段起始时间与请求的时间完全一致)
clip_NN.mp4 + frame_NN.jpg
   ↓     询问：目标平台的画面方向是否与源视频匹配？
   ↓     对每个片段调用/wjs-reframing-video（如果是16:9转9:16等情况）
   ↓     从裁切后的片段中重新提取帧
clip_NN.mp4（已适配目标平台方向） + clip_NN.zh-CN.burn.srt
   ↓
交付 → /wjs-overlaying-video
   (负责添加封面+字幕+插图+CTA+最终渲染)

Step 1 — Read SRT, write

segments.json

步骤1 — 读取SRT字幕，生成

segments.json

Don't outsource topic identification to a script. For each candidate segment, judge:

Self-contained? A cold viewer must understand it without prior context.
Single thread? One central question / insight; if the speaker pivots mid-clip, that's two segments.
Length fits platform? 60–180s for 视频号 / 30–60s for 抖音&Shorts. <30s feels truncated; >4min loses retention.
Hook + payoff? Open on a claim / question / vivid image; close on a takeaway. Never end mid-sentence.
Snap to SRT cue boundaries — never cut mid-word.

3–6 strong segments from a 10-minute source is normal. Drop boring middles. Quality > quantity.

Schema (full spec in

references/segments_schema.json

, example in

references/example_segments.json

json

{
  "source_video": "input.mp4",
  "source_srt": "input.zh-CN.srt",
  "platform": "wechat_channels",
  "segments": [{
    "id": 1, "slug": "intent-not-code",
    "title": "AI 时代不是写代码\n而是写意图",
    "summary": "Two-sentence pitch — what's the insight, what's at stake.",
    "start": "00:00:43.460", "end": "00:02:35.220",
    "cover_prompt": "Visual concept for gpt-image-2 (style anchor, not literal scene)"
  }]
}

slug

= kebab-case English (used in filenames).

title

uses

\n

for line break, 2 lines max, 8–12 Chinese chars per line.

cover_prompt

is consumed downstream by

/wjs-overlaying-video

's cover-generation step — keep it written here so the overlay skill can pick it up without re-asking.

不要将主题识别工作外包给脚本。对于每个候选片段，需判断：

是否独立完整？ 首次观看的用户无需上下文即可理解内容。
是否单一主题？ 围绕一个核心问题/见解；如果演讲者中途切换主题，则拆分为两个片段。
时长是否适配平台？ 视频号为60-180秒/抖音&Shorts为30-60秒。少于30秒会显得不完整；超过4分钟会降低用户留存率。
是否有钩子与收尾？ 以观点/问题/生动画面开场；以结论收尾。切勿在句子中途结束。
对齐SRT字幕的时间轴边界——切勿在单词中间剪辑。

从10分钟的源视频中提取3-6个优质片段是正常情况。舍弃枯燥的中间部分。质量优先于数量。

格式规范（完整定义见

references/segments_schema.json

，示例见

references/example_segments.json

）：

json

{
  "source_video": "input.mp4",
  "source_srt": "input.zh-CN.srt",
  "platform": "wechat_channels",
  "segments": [{
    "id": 1, "slug": "intent-not-code",
    "title": "AI 时代不是写代码\n而是写意图",
    "summary": "Two-sentence pitch — what's the insight, what's at stake.",
    "start": "00:00:43.460", "end": "00:02:35.220",
    "cover_prompt": "Visual concept for gpt-image-2 (style anchor, not literal scene)"
  }]
}

slug

= 短横线分隔的英文（用于文件名）。

title

使用

\n

换行，最多2行，每行8-12个中文字符。

cover_prompt

供下游的

/wjs-overlaying-video

生成封面时使用——在此处填写，以便叠加技能无需再次询问即可获取。

Step 2 — Accurate-seek cut

步骤2 — 精准时间轴剪辑

bash

python3 ~/.claude/skills/wjs-segmenting-video/scripts/segment.py \
    --segments segments.json --out output/ --reencode

--reencode

is the default recommended mode. It cuts with

ffmpeg -ss N -i src -c:v libx264 -c:a aac

so the output starts EXACTLY at the requested timestamp. ~30s per clip on CPU. Also extracts a midpoint frame per segment to

output/frame_NN_slug.jpg

Why default to
--reencode
and not stream-copy:

Stream-copy via

ffmpeg -ss N -c copy

seeks to the nearest keyframe before N (it can't re-encode). The output's t=0 then maps to source t=keyframe, so the clip plays a fraction of a second of "lead-in" content before the requested speech. Captions sliced from the master SRT at boundary N appear AHEAD of the audio by exactly that GOP fraction — listeners feel "subtitles lead the voice."

In practice on H.264 source with GOP=2s: every clip is off by 0.6–1.5s. Looks like a synchronization bug downstream; it's actually a cut-time bug upstream.

bash

python3 ~/.claude/skills/wjs-segmenting-video/scripts/segment.py \
    --segments segments.json --out output/ --reencode

--reencode

是推荐的默认模式。它使用

ffmpeg -ss N -i src -c:v libx264 -c:a aac

进行剪辑，确保输出片段的起始时间与请求的时间戳完全一致。CPU上每个片段约需30秒处理时间。同时会为每个片段提取中间帧保存至

output/frame_NN_slug.jpg

。

为什么默认使用
--reencode
而非流复制：

通过

ffmpeg -ss N -c copy

进行流复制时，会跳转到N之前最近的关键帧（无法重新编码）。输出片段的t=0对应源视频的关键帧时间，因此片段会在请求的语音开始前播放一小段“前置”内容。从主SRT字幕中按边界N截取的字幕会比音频提前GOP周期的时长——听众会觉得“字幕比声音快”。

在实际使用H.264编码、GOP=2秒的源视频时：每个片段的偏移时长为0.6-1.5秒。这在下游环节看起来像是同步bug，但实际上是剪辑阶段的问题。

Stream-copy variant (only if you control the source encode)

流复制变体（仅当你能控制源视频编码时使用）

If the source has been re-encoded with

-force_key_frames

at every requested cut boundary, stream-copy IS accurate. Workflow:

bash

undefined

如果源视频已通过

-force_key_frames

在所有请求的剪辑边界处添加了关键帧，则流复制剪辑是精准的。工作流程如下：

bash

undefined

Build the comma-separated keyframe list from segments.json

从segments.json生成逗号分隔的关键帧列表

KF=$(python3 -c "import json; s=json.load(open('segments.json')) ts=[] for seg in s['segments']: ts += [seg['start'], seg['end']] print(','.join(ts))")

Re-encode master once, forcing keyframes at all segment boundaries

重新编码主视频，在所有片段边界处强制添加关键帧

ffmpeg -i master.mp4
-c:v libx264 -preset medium -crf 18
-force_key_frames "$KF"
-c:a copy master_kf.mp4

Now stream-copy cuts land exactly:

现在流复制剪辑可以精准定位：

python3 segment.py --segments segments.json --source master_kf.mp4 --out output/


Use this only when iterating on segment boundaries (you'll re-cut the
same source many times). For one-shot work, `--reencode` is simpler
and just as correct.

python3 segment.py --segments segments.json --source master_kf.mp4 --out output/


仅在需要反复调整片段边界（会多次重新剪辑同一源视频）时使用此方法。对于一次性工作，`--reencode`更简单且同样精准。

Diagnosing keyframe-snap on already-cut clips

诊断已剪辑片段的关键帧偏移

bash

ffprobe -v error -select_streams v:0 -read_intervals "$((N-2))%$((N+5))" \
  -show_entries packet=pts_time,flags -of csv=p=0 master.mp4 | grep "K_"

Output like

360.023,K__   362.023,K__

→ GOP=2s. A

-c copy

cut at 361.000 actually starts at 360.023, captions are 0.977s ahead of audio. The retroactive fix is a per-clip SRT offset shim (

requested_start − nearest_preceding_keyframe

) added to every cue's start/end, but the root fix is to re-cut with

--reencode

bash

ffprobe -v error -select_streams v:0 -read_intervals "$((N-2))%$((N+5))" \
  -show_entries packet=pts_time,flags -of csv=p=0 master.mp4 | grep "K_"

输出示例：

360.023,K__   362.023,K__

→ GOP=2秒。在361.000秒处进行

-c copy

剪辑，实际起始时间为360.023秒，字幕比音频提前0.977秒。事后修复方法是为每个字幕的起始/结束时间添加偏移量（

请求的起始时间 − 最近的前置关键帧时间

），但根本解决方法是使用

--reencode

重新剪辑。

Step 3 — Orientation check (ask before continuing)

步骤3 — 画面方向检查（继续前需询问）

Compare source video aspect ratio to the target platform:

Platform	Native orientation	Aspect
视频号 (WeChat Channels)	vertical	9:16
抖音 / TikTok / Reels	vertical	9:16
小红书 (Xiaohongshu video)	vertical	9:16
YouTube Shorts	vertical	9:16
YouTube (regular)	horizontal	16:9
B站 (Bilibili)	horizontal	16:9

Probe with

ffprobe

bash

ffprobe -v error -select_streams v:0 \
  -show_entries stream=width,height -of csv=p=0 clip_01_*.mp4

If source aspect already matches the platform → skip this step.

If mismatch → ASK THE USER before converting. Sample phrasing:

源视频是横屏 (1920×1080)，平台视频号需要竖屏 (9:16)。是否对每段调用
/wjs-reframing-video
转成竖屏？(crop 会用 MediaPipe 跟踪正在说话的人的脸，保持说话人始终在画面中)

Never silently skip the check — finding out at upload time that your horizontal clip needs to be vertical is a frustrating failure mode the skill exists to prevent.

对比源视频的宽高比与目标平台的要求：

平台	原生方向	宽高比
视频号 (WeChat Channels)	竖屏	9:16
抖音 / TikTok / Reels	竖屏	9:16
小红书 (Xiaohongshu video)	竖屏	9:16
YouTube Shorts	竖屏	9:16
YouTube (regular)	横屏	16:9
B站 (Bilibili)	横屏	16:9

使用

ffprobe

检测：

bash

ffprobe -v error -select_streams v:0 \
  -show_entries stream=width,height -of csv=p=0 clip_01_*.mp4

如果源视频宽高比已匹配平台要求 → 跳过此步骤。

如果不匹配 → 在转换前询问用户。示例话术：

源视频是横屏 (1920×1080)，平台视频号需要竖屏 (9:16)。是否对每段调用
/wjs-reframing-video
转成竖屏？(crop 会用 MediaPipe 跟踪正在说话的人的脸，保持说话人始终在画面中)

切勿跳过此检查——上传时才发现横屏播客需要转竖屏是本技能旨在避免的糟糕情况。

Calling

/wjs-reframing-video

调用

/wjs-reframing-video

The crop script needs

mediapipe + opencv + numpy

in a Python 3.12 venv (mediapipe doesn't ship wheels for 3.14+). One-time setup:

bash

uv venv --python 3.12 /tmp/_crop_venv
/tmp/_crop_venv/bin/python -m pip install mediapipe opencv-python numpy

Per-clip invocation:

bash

for n in 01 02 03 04 05; do
  slug=$(ls clip_${n}_*.mp4 | grep -v -E "_intro|_burned|_vert" | head -1 | sed -E "s/clip_${n}_(.+)\.mp4/\1/")
  /tmp/_crop_venv/bin/python ~/.claude/skills/wjs-reframing-video/scripts/crop.py \
    "clip_${n}_${slug}.mp4" \
    --out "clip_${n}_${slug}_vert.mp4" \
    --target portrait \
    --bitrate 8M    # 视频号 caps at 10Mbps
done

After cropping, swap the cropped versions to canonical names so downstream pipelines find them:

bash

mkdir -p _horizontal_archive
for n in 01 02 03 04 05; do
  base=$(ls clip_${n}_*_vert.mp4 | sed -E "s/_vert\.mp4$//")
  mv "${base}.mp4" "_horizontal_archive/"
  mv "${base}_vert.mp4" "${base}.mp4"
  # Re-extract midpoint frame:
  mid=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "${base}.mp4" | awk '{print $1/2}')
  slug=$(echo "$base" | sed -E "s/^clip_${n}_//")
  ffmpeg -hide_banner -loglevel error -ss "$mid" -i "${base}.mp4" \
    -frames:v 1 -q:v 3 "frame_${n}_${slug}.jpg" -y
done

Sanity check: face-on-screen detection rate in the crop log can read low (e.g.

face#0: 9.6s on screen (9%)

) when speakers sit further than ~2 m from the camera. That number being low is OK — the active-speaker hysteresis + fallback-to-largest-face still produces well-centered crops. Verify visually by extracting a midpoint frame and confirming the speaker is centered before committing.

裁切脚本需要在Python 3.12虚拟环境中安装

mediapipe + opencv + numpy

（mediapipe未提供Python 3.14+版本的安装包）。一次性安装步骤：

bash

uv venv --python 3.12 /tmp/_crop_venv
/tmp/_crop_venv/bin/python -m pip install mediapipe opencv-python numpy

每个片段的调用命令：

bash

for n in 01 02 03 04 05; do
  slug=$(ls clip_${n}_*.mp4 | grep -v -E "_intro|_burned|_vert" | head -1 | sed -E "s/clip_${n}_(.+)\.mp4/\1/")
  /tmp/_crop_venv/bin/python ~/.claude/skills/wjs-reframing-video/scripts/crop.py \
    "clip_${n}_${slug}.mp4" \
    --out "clip_${n}_${slug}_vert.mp4" \
    --target portrait \
    --bitrate 8M    # 视频号 caps at 10Mbps
done

裁切完成后，将裁切后的文件重命名为标准名称，以便下游流水线能找到它们：

bash

mkdir -p _horizontal_archive
for n in 01 02 03 04 05; do
  base=$(ls clip_${n}_*_vert.mp4 | sed -E "s/_vert\.mp4$//")
  mv "${base}.mp4" "_horizontal_archive/"
  mv "${base}_vert.mp4" "${base}.mp4"
  # 重新提取中间帧：
  mid=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "${base}.mp4" | awk '{print $1/2}')
  slug=$(echo "$base" | sed -E "s/^clip_${n}_//")
  ffmpeg -hide_banner -loglevel error -ss "$mid" -i "${base}.mp4" \
    -frames:v 1 -q:v 3 "frame_${n}_${slug}.jpg" -y
done

检查要点：当演讲者距离摄像头超过约2米时，裁切日志中的人脸检测率可能较低（例如

face#0: 9.6s on screen (9%)

）。该数值低是正常的——活跃演讲者滞后检测+ fallback到最大人脸的机制仍能生成居中的裁切画面。请视觉验证，提取中间帧并确认演讲者居中后再进行下一步。

Step 4 — Slice per-clip SRTs

步骤4 — 生成对应每个片段的SRT字幕

bash

python3 ~/.claude/skills/wjs-segmenting-video/scripts/burn_subs.py \
    --segments segments.json --out output/ --no-burn

The

--no-burn

flag emits per-clip SRTs (

clip_NN_slug.zh-CN.burn.srt

) with timestamps already shifted to start at 0 — exactly the input

/wjs-overlaying-video

captions expect (its compositions start the body at t=cover_duration, not the master clock).

Despite the legacy name

burn_subs.py

, this step does NOT burn pixels in

--no-burn

mode — it's just an SRT slicer. (The burn-pixels mode exists for the legacy "Path A" workflow but is deprecated in favor of

/wjs-overlaying-video

's HTML/CSS caption rendering.)

bash

python3 ~/.claude/skills/wjs-segmenting-video/scripts/burn_subs.py \
    --segments segments.json --out output/ --no-burn

--no-burn

参数会生成对应每个片段的SRT字幕文件（

clip_NN_slug.zh-CN.burn.srt

），其中时间戳已调整为从0开始——这正是

/wjs-overlaying-video

添加字幕时所需的格式（其合成内容从封面时长后开始，而非主视频时间轴）。

尽管脚本名为

burn_subs.py

，但在

--no-burn

模式下此步骤不会添加内嵌字幕——它只是一个SRT字幕分割工具。（内嵌字幕模式是旧版“A路径”工作流的遗留功能，现已被

/wjs-overlaying-video

的HTML/CSS字幕渲染取代。）

Hand-off package — what to deliver to

/wjs-overlaying-video

交付包 — 交付给

/wjs-overlaying-video

的内容

After Steps 1–4, deliver EXACTLY these per-segment artifacts:

output/
  clip_NN_slug.mp4                  # raw cropped clip (target orientation, no subs, no cover)
  clip_NN_slug.zh-CN.burn.srt       # per-clip SRT, timestamps shifted to start at 0
  frame_NN_slug.jpg                 # midpoint frame (cover reference)
  segments.json                     # for slug/title/summary/cover_prompt metadata

Then invoke

/wjs-overlaying-video

to add covers, captions, illustrations, CTA, and produce the upload-ready MP4 per clip. The overlay skill generates ONE final composition per clip and renders it in a single encode (no cascade of re-encodes).

完成步骤1-4后，需向

/wjs-overlaying-video

交付以下每个片段的文件：

output/
  clip_NN_slug.mp4                  # 裁切后的原始片段（适配目标平台方向，无字幕，无封面）
  clip_NN_slug.zh-CN.burn.srt       # 对应片段的SRT字幕，时间轴已调整为从0开始
  frame_NN_slug.jpg                 # 中间帧（封面参考）
  segments.json                     # 包含slug/标题/摘要/封面提示的元数据

然后调用

/wjs-overlaying-video

添加封面、字幕、插图、行动号召（CTA），并生成可直接上传的MP4文件。叠加技能会为每个片段生成一个最终合成内容，并一次性渲染完成（无需多次重新编码）。

Quick reference

快速参考

Task	Command
Cut clips (accurate, default)	`segment.py --segments S.json --out output/ --reencode`
Probe source aspect	`ffprobe -v error -select_streams v:0 -show_entries stream=width,height -of csv=p=0 IN.mp4`
Convert orientation (ask first)	invoke `/wjs-reframing-video` per clip
Slice per-clip SRTs	`burn_subs.py --segments S.json --out output/ --no-burn`
Diagnose keyframe positions	`ffprobe -v error -select_streams v:0 -read_intervals A%B -show_entries packet=pts_time,flags -of csv=p=0 src.mp4 \| grep K_`

任务	命令
剪辑片段（精准，默认模式）	`segment.py --segments S.json --out output/ --reencode`
检测源视频宽高比	`ffprobe -v error -select_streams v:0 -show_entries stream=width,height -of csv=p=0 IN.mp4`
转换画面方向（需先询问）	对每个片段调用 `/wjs-reframing-video`
生成对应片段的SRT字幕	`burn_subs.py --segments S.json --out output/ --no-burn`
诊断关键帧位置	`ffprobe -v error -select_streams v:0 -read_intervals A%B -show_entries packet=pts_time,flags -of csv=p=0 src.mp4 \| grep K_`

Common mistakes

常见错误

Cutting mid-sentence — always snap to SRT cue boundaries.
Trying to use 100% of the video — 3–6 strong clips from 10 min is normal. Boring middle = drop.
Letting the LLM write the title — the title is judgment, not summary. Review and rewrite before passing to make_cover.
Stream-copy without
--force_key_frames
preprocessing — produces clips with audio ahead of captions by up to 1 GOP. Use
```
--reencode
```
(default) unless the source was specifically prepared.
Skipping the orientation check — getting a horizontal podcast on 视频号 and finding out at upload time is preventable. Probe aspect and ask the user before cropping.
Burning subs / generating covers in THIS skill — those moved to
```
/wjs-overlaying-video
```
. This skill stops after Step 4.

在句子中途剪辑——始终对齐SRT字幕的时间轴边界。
试图使用视频的全部内容——从10分钟视频中提取3-6个优质片段是正常的。枯燥的中间部分应舍弃。
让大语言模型（LLM）撰写标题——标题需要主观判断，而非简单总结。交付给封面生成环节前需审核并重写。
未预先处理
--force_key_frames
就使用流复制——会产生字幕比音频提前最多1个GOP周期的片段。除非源视频已专门处理，否则请使用
```
--reencode
```
（默认模式）。
跳过画面方向检查——上传时才发现横屏播客需要转竖屏是可避免的。检测宽高比并在裁切前询问用户。
在此技能中添加内嵌字幕/生成封面——这些操作已移至
```
/wjs-overlaying-video
```
。本技能在步骤4后即完成任务。

Integration with other skills

与其他技能的集成

/wjs-transcribing-audio
— produce the source SRT first if missing. The word-level Whisper output (or Volcano/豆包 ASR output) is preferred for accurate cue timing. If the segments need translating, chain into /wjs-translating-subtitles
.
/wjs-reframing-video
— call in Step 3 when source orientation doesn't match target platform. Face-tracked active-speaker following keeps the talker in frame.
/wjs-editing-multicam
— if the source is multi-cam, render the synced single MP4 first, then segment.
/wjs-overlaying-video
— the default downstream for everything after Step 4. Covers, captions, illustrations, CTA, and final render all happen there. Don't add post-production in this skill.

/wjs-transcribing-audio
——如果缺少源SRT字幕，先运行此技能生成。优先使用支持单词级时间轴的Whisper输出（或火山/豆包ASR输出）以确保精准的字幕时间轴。如果片段需要翻译，后续调用**
```
/wjs-translating-subtitles
```
**。
/wjs-reframing-video
——当源视频方向与目标平台不匹配时，在步骤3中调用。基于人脸跟踪的活跃演讲者跟随机制可保持演讲者在画面中心。
/wjs-editing-multicam
——如果源视频是多机位的，先渲染为同步的单轨MP4，再进行片段分割。
/wjs-overlaying-video
——默认的下游技能，负责步骤4之后的所有操作。封面、字幕、插图、行动号召（CTA）及最终渲染均在此完成。请勿在此技能中进行后期制作操作。

Files & references

文件与参考资料

```
scripts/segment.py
```
— accurate-seek + stream-copy cutting
```
scripts/burn_subs.py
```
— SRT slicer (
```
--no-burn
```
mode); legacy libass burn-in mode is deprecated in favor of
```
/wjs-overlaying-video
```
```
references/segments_schema.json
```
— JSON Schema for segments.json
```
references/example_segments.json
```
— worked example

```
scripts/segment.py
```
——精准时间轴剪辑+流复制剪辑工具
```
scripts/burn_subs.py
```
——SRT字幕分割工具（
```
--no-burn
```
模式）；旧版libass内嵌字幕模式已被
```
/wjs-overlaying-video
```
取代，现已弃用

references/segments_schema.json

——

segments.json

的JSON格式规范

```
references/example_segments.json
```
——示例文件

wjs-segmenting-video

Original

Translation

wjs-segmenting-video

wjs-segmenting-video

When to use

适用场景

When NOT to use

不适用场景

What this skill IS — and IS NOT

本技能的能力边界

The pipeline

工作流程

Step 1 — Read SRT, write segments.json

步骤1 — 读取SRT字幕，生成segments.json

Step 2 — Accurate-seek cut

步骤2 — 精准时间轴剪辑

Stream-copy variant (only if you control the source encode)

流复制变体（仅当你能控制源视频编码时使用）

Build the comma-separated keyframe list from segments.json

从segments.json生成逗号分隔的关键帧列表

Re-encode master once, forcing keyframes at all segment boundaries

重新编码主视频，在所有片段边界处强制添加关键帧

Now stream-copy cuts land exactly:

现在流复制剪辑可以精准定位：

Diagnosing keyframe-snap on already-cut clips

诊断已剪辑片段的关键帧偏移

Step 3 — Orientation check (ask before continuing)

步骤3 — 画面方向检查（继续前需询问）

Calling /wjs-reframing-video

调用/wjs-reframing-video

Step 4 — Slice per-clip SRTs

步骤4 — 生成对应每个片段的SRT字幕

Hand-off package — what to deliver to /wjs-overlaying-video

交付包 — 交付给/wjs-overlaying-video的内容

Quick reference

快速参考

Common mistakes

常见错误

Integration with other skills

与其他技能的集成

Files & references

文件与参考资料

Step 1 — Read SRT, write
`segments.json`

步骤1 — 读取SRT字幕，生成
`segments.json`

Calling
`/wjs-reframing-video`

调用
`/wjs-reframing-video`

Hand-off package — what to deliver to
`/wjs-overlaying-video`

交付包 — 交付给
`/wjs-overlaying-video`
的内容