wjs-segmenting-video

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

wjs-segmenting-video

wjs-segmenting-video

Cut a long video + SRT into multiple stand-alone short clips, each oriented for the target platform. This skill stops after cutting + cropping — it hands off the raw clips to
/wjs-overlaying-video
for covers, captions, illustrations, CTA, and final render.
将长视频+SRT字幕文件切割为多个独立的短视频片段,每个片段适配目标平台。此技能仅完成剪辑与裁切操作——之后会将原始片段交付给
/wjs-overlaying-video
,由其负责添加封面、字幕、插图、行动号召(CTA)并完成最终渲染。

When to use

适用场景

  • Long-form video (≥10 min) with an existing SRT transcript.
  • Goal is stand-alone short clips (each viewable without context).
  • The user will (or you will) drive post-production separately in
    /wjs-overlaying-video
    .
  • 时长≥10分钟的长视频,且已有对应的SRT字幕文件。
  • 目标是生成独立可观看的短视频片段(无需上下文即可理解内容)。
  • 用户将(或由你)在
    /wjs-overlaying-video
    中单独进行后期制作。

When NOT to use

不适用场景

  • Single-topic trimming → just use
    ffmpeg -ss A -to B
    .
  • No transcript yet → run
    /wjs-transcribing-audio
    first (then
    /wjs-translating-subtitles
    if the segments need a non-source language).
  • Multicam editing → use
    /wjs-editing-multicam
    .
  • Highlight reel with multiple cuts inside a single topic → that's editing, not segmentation.
  • 单主题修剪 → 直接使用
    ffmpeg -ss A -to B
    即可。
  • 尚无字幕文件 → 先运行**
    /wjs-transcribing-audio
    **(如果片段需要非源语言字幕,后续再运行
    /wjs-translating-subtitles
    )。
  • 多机位剪辑 → 使用**
    /wjs-editing-multicam
    **。
  • 单主题内包含多段剪辑的高光集锦 → 这属于剪辑操作,而非片段分割。

What this skill IS — and IS NOT

本技能的能力边界

IsIs not
You (the agent) read the full SRT and decide the topic boundariesA script that runs NLP topic modeling, silence detection, or "viral moment" scoring. Topic boundaries are semantic; competing tools (Descript, OpusClip, Riverside Magic Clips) all get this wrong by automating it.
segment.py
cuts;
/wjs-reframing-video
reorients
An end-to-end "magic" pipeline
Accurate-seek cuts by default (re-encode) — clip starts EXACTLY at requested timestampStream-copy cuts (those produce keyframe-snap drift up to GOP duration)
Hands off raw cropped clips + per-clip SRTsBurned subtitles, covers, intros, CTAs (those live in
/wjs-overlaying-video
)
具备的能力不具备的能力
由你(Agent)通读完整SRT字幕并确定主题边界运行NLP主题建模、静音检测或“爆点时刻”评分的脚本。主题边界是语义层面的;同类工具(Descript、OpusClip、Riverside Magic Clips)通过自动化处理都会在这一步出错。
segment.py
负责剪辑;
/wjs-reframing-video
负责调整画面方向
端到端的“一键式”流水线
默认使用精准时间轴剪辑(重新编码)——片段起始时间与请求的时间戳完全一致流复制剪辑(此类剪辑会产生关键帧偏移,偏移时长可达GOP周期)
交付裁切后的原始片段+对应每个片段的SRT字幕添加内嵌字幕、封面、开场、行动号召(CTA)(这些操作由
/wjs-overlaying-video
负责)

The pipeline

工作流程

long video + SRT
   ↓     (agent reads SRT, decides topics — judgment, not parsing)
segments.json
   ↓     segment.py --reencode (accurate seek; clip starts exactly at requested t)
clip_NN.mp4 + frame_NN.jpg
   ↓     ASK: target platform orientation match source?
   ↓     /wjs-reframing-video on each clip (if 16:9 → 9:16, etc.)
   ↓     re-extract frames from cropped clips
clip_NN.mp4 (now in target orientation) + clip_NN.zh-CN.burn.srt
HAND OFF → /wjs-overlaying-video
   (does covers + captions + illustrations + CTA + final render)
长视频 + SRT字幕
   ↓     (Agent通读SRT字幕,确定主题边界——依赖判断,而非解析)
segments.json
   ↓     segment.py --reencode (精准时间轴;片段起始时间与请求的时间完全一致)
clip_NN.mp4 + frame_NN.jpg
   ↓     询问:目标平台的画面方向是否与源视频匹配?
   ↓     对每个片段调用/wjs-reframing-video(如果是16:9转9:16等情况)
   ↓     从裁切后的片段中重新提取帧
clip_NN.mp4(已适配目标平台方向) + clip_NN.zh-CN.burn.srt
交付 → /wjs-overlaying-video
   (负责添加封面+字幕+插图+CTA+最终渲染)

Step 1 — Read SRT, write
segments.json

步骤1 — 读取SRT字幕,生成
segments.json

Don't outsource topic identification to a script. For each candidate segment, judge:
  • Self-contained? A cold viewer must understand it without prior context.
  • Single thread? One central question / insight; if the speaker pivots mid-clip, that's two segments.
  • Length fits platform? 60–180s for 视频号 / 30–60s for 抖音&Shorts. <30s feels truncated; >4min loses retention.
  • Hook + payoff? Open on a claim / question / vivid image; close on a takeaway. Never end mid-sentence.
  • Snap to SRT cue boundaries — never cut mid-word.
3–6 strong segments from a 10-minute source is normal. Drop boring middles. Quality > quantity.
Schema (full spec in
references/segments_schema.json
, example in
references/example_segments.json
):
json
{
  "source_video": "input.mp4",
  "source_srt": "input.zh-CN.srt",
  "platform": "wechat_channels",
  "segments": [{
    "id": 1, "slug": "intent-not-code",
    "title": "AI 时代不是写代码\n而是写意图",
    "summary": "Two-sentence pitch — what's the insight, what's at stake.",
    "start": "00:00:43.460", "end": "00:02:35.220",
    "cover_prompt": "Visual concept for gpt-image-2 (style anchor, not literal scene)"
  }]
}
slug
= kebab-case English (used in filenames).
title
uses
\n
for line break, 2 lines max, 8–12 Chinese chars per line.
cover_prompt
is consumed downstream by
/wjs-overlaying-video
's cover-generation step — keep it written here so the overlay skill can pick it up without re-asking.
不要将主题识别工作外包给脚本。对于每个候选片段,需判断:
  • 是否独立完整? 首次观看的用户无需上下文即可理解内容。
  • 是否单一主题? 围绕一个核心问题/见解;如果演讲者中途切换主题,则拆分为两个片段。
  • 时长是否适配平台? 视频号为60-180秒/抖音&Shorts为30-60秒。少于30秒会显得不完整;超过4分钟会降低用户留存率。
  • 是否有钩子与收尾? 以观点/问题/生动画面开场;以结论收尾。切勿在句子中途结束。
  • 对齐SRT字幕的时间轴边界——切勿在单词中间剪辑。
从10分钟的源视频中提取3-6个优质片段是正常情况。舍弃枯燥的中间部分。质量优先于数量。
格式规范(完整定义见
references/segments_schema.json
,示例见
references/example_segments.json
):
json
{
  "source_video": "input.mp4",
  "source_srt": "input.zh-CN.srt",
  "platform": "wechat_channels",
  "segments": [{
    "id": 1, "slug": "intent-not-code",
    "title": "AI 时代不是写代码\n而是写意图",
    "summary": "Two-sentence pitch — what's the insight, what's at stake.",
    "start": "00:00:43.460", "end": "00:02:35.220",
    "cover_prompt": "Visual concept for gpt-image-2 (style anchor, not literal scene)"
  }]
}
slug
= 短横线分隔的英文(用于文件名)。
title
使用
\n
换行,最多2行,每行8-12个中文字符。
cover_prompt
供下游的
/wjs-overlaying-video
生成封面时使用——在此处填写,以便叠加技能无需再次询问即可获取。

Step 2 — Accurate-seek cut

步骤2 — 精准时间轴剪辑

bash
python3 ~/.claude/skills/wjs-segmenting-video/scripts/segment.py \
    --segments segments.json --out output/ --reencode
--reencode
is the default recommended mode. It cuts with
ffmpeg -ss N -i src -c:v libx264 -c:a aac
so the output starts EXACTLY at the requested timestamp. ~30s per clip on CPU. Also extracts a midpoint frame per segment to
output/frame_NN_slug.jpg
.
Why default to
--reencode
and not stream-copy:
Stream-copy via
ffmpeg -ss N -c copy
seeks to the nearest keyframe before N (it can't re-encode). The output's t=0 then maps to source t=keyframe, so the clip plays a fraction of a second of "lead-in" content before the requested speech. Captions sliced from the master SRT at boundary N appear AHEAD of the audio by exactly that GOP fraction — listeners feel "subtitles lead the voice."
In practice on H.264 source with GOP=2s: every clip is off by 0.6–1.5s. Looks like a synchronization bug downstream; it's actually a cut-time bug upstream.
bash
python3 ~/.claude/skills/wjs-segmenting-video/scripts/segment.py \
    --segments segments.json --out output/ --reencode
--reencode
推荐的默认模式。它使用
ffmpeg -ss N -i src -c:v libx264 -c:a aac
进行剪辑,确保输出片段的起始时间与请求的时间戳完全一致。CPU上每个片段约需30秒处理时间。同时会为每个片段提取中间帧保存至
output/frame_NN_slug.jpg
为什么默认使用
--reencode
而非流复制:
通过
ffmpeg -ss N -c copy
进行流复制时,会跳转到N之前最近的关键帧(无法重新编码)。输出片段的t=0对应源视频的关键帧时间,因此片段会在请求的语音开始前播放一小段“前置”内容。从主SRT字幕中按边界N截取的字幕会比音频提前GOP周期的时长——听众会觉得“字幕比声音快”。
在实际使用H.264编码、GOP=2秒的源视频时:每个片段的偏移时长为0.6-1.5秒。这在下游环节看起来像是同步bug,但实际上是剪辑阶段的问题。

Stream-copy variant (only if you control the source encode)

流复制变体(仅当你能控制源视频编码时使用)

If the source has been re-encoded with
-force_key_frames
at every requested cut boundary, stream-copy IS accurate. Workflow:
bash
undefined
如果源视频已通过
-force_key_frames
在所有请求的剪辑边界处添加了关键帧,则流复制剪辑是精准的。工作流程如下:
bash
undefined

Build the comma-separated keyframe list from segments.json

从segments.json生成逗号分隔的关键帧列表

KF=$(python3 -c "import json; s=json.load(open('segments.json')) ts=[] for seg in s['segments']: ts += [seg['start'], seg['end']] print(','.join(ts))")
KF=$(python3 -c "import json; s=json.load(open('segments.json')) ts=[] for seg in s['segments']: ts += [seg['start'], seg['end']] print(','.join(ts))")

Re-encode master once, forcing keyframes at all segment boundaries

重新编码主视频,在所有片段边界处强制添加关键帧

ffmpeg -i master.mp4
-c:v libx264 -preset medium -crf 18
-force_key_frames "$KF"
-c:a copy master_kf.mp4
ffmpeg -i master.mp4
-c:v libx264 -preset medium -crf 18
-force_key_frames "$KF"
-c:a copy master_kf.mp4

Now stream-copy cuts land exactly:

现在流复制剪辑可以精准定位:

python3 segment.py --segments segments.json --source master_kf.mp4 --out output/

Use this only when iterating on segment boundaries (you'll re-cut the
same source many times). For one-shot work, `--reencode` is simpler
and just as correct.
python3 segment.py --segments segments.json --source master_kf.mp4 --out output/

仅在需要反复调整片段边界(会多次重新剪辑同一源视频)时使用此方法。对于一次性工作,`--reencode`更简单且同样精准。

Diagnosing keyframe-snap on already-cut clips

诊断已剪辑片段的关键帧偏移

bash
ffprobe -v error -select_streams v:0 -read_intervals "$((N-2))%$((N+5))" \
  -show_entries packet=pts_time,flags -of csv=p=0 master.mp4 | grep "K_"
Output like
360.023,K__   362.023,K__
→ GOP=2s. A
-c copy
cut at 361.000 actually starts at 360.023, captions are 0.977s ahead of audio. The retroactive fix is a per-clip SRT offset shim (
requested_start − nearest_preceding_keyframe
) added to every cue's start/end, but the root fix is to re-cut with
--reencode
.
bash
ffprobe -v error -select_streams v:0 -read_intervals "$((N-2))%$((N+5))" \
  -show_entries packet=pts_time,flags -of csv=p=0 master.mp4 | grep "K_"
输出示例:
360.023,K__   362.023,K__
→ GOP=2秒。在361.000秒处进行
-c copy
剪辑,实际起始时间为360.023秒,字幕比音频提前0.977秒。事后修复方法是为每个字幕的起始/结束时间添加偏移量(
请求的起始时间 − 最近的前置关键帧时间
),但根本解决方法是使用
--reencode
重新剪辑。

Step 3 — Orientation check (ask before continuing)

步骤3 — 画面方向检查(继续前需询问)

Compare source video aspect ratio to the target platform:
PlatformNative orientationAspect
视频号 (WeChat Channels)vertical9:16
抖音 / TikTok / Reelsvertical9:16
小红书 (Xiaohongshu video)vertical9:16
YouTube Shortsvertical9:16
YouTube (regular)horizontal16:9
B站 (Bilibili)horizontal16:9
Probe with
ffprobe
:
bash
ffprobe -v error -select_streams v:0 \
  -show_entries stream=width,height -of csv=p=0 clip_01_*.mp4
If source aspect already matches the platform → skip this step.
If mismatch → ASK THE USER before converting. Sample phrasing:
源视频是横屏 (1920×1080),平台 视频号 需要竖屏 (9:16)。是否对每段 调用
/wjs-reframing-video
转成竖屏?(crop 会用 MediaPipe 跟踪正在说话 的人的脸,保持说话人始终在画面中)
Never silently skip the check — finding out at upload time that your horizontal clip needs to be vertical is a frustrating failure mode the skill exists to prevent.
对比源视频的宽高比与目标平台的要求:
平台原生方向宽高比
视频号 (WeChat Channels)竖屏9:16
抖音 / TikTok / Reels竖屏9:16
小红书 (Xiaohongshu video)竖屏9:16
YouTube Shorts竖屏9:16
YouTube (regular)横屏16:9
B站 (Bilibili)横屏16:9
使用
ffprobe
检测:
bash
ffprobe -v error -select_streams v:0 \
  -show_entries stream=width,height -of csv=p=0 clip_01_*.mp4
如果源视频宽高比已匹配平台要求 → 跳过此步骤
如果不匹配 → 在转换前询问用户。示例话术:
源视频是横屏 (1920×1080),平台 视频号 需要竖屏 (9:16)。是否对每段 调用
/wjs-reframing-video
转成竖屏?(crop 会用 MediaPipe 跟踪正在说话 的人的脸,保持说话人始终在画面中)
切勿跳过此检查——上传时才发现横屏播客需要转竖屏是本技能旨在避免的糟糕情况。

Calling
/wjs-reframing-video

调用
/wjs-reframing-video

The crop script needs
mediapipe + opencv + numpy
in a Python 3.12 venv (mediapipe doesn't ship wheels for 3.14+). One-time setup:
bash
uv venv --python 3.12 /tmp/_crop_venv
/tmp/_crop_venv/bin/python -m pip install mediapipe opencv-python numpy
Per-clip invocation:
bash
for n in 01 02 03 04 05; do
  slug=$(ls clip_${n}_*.mp4 | grep -v -E "_intro|_burned|_vert" | head -1 | sed -E "s/clip_${n}_(.+)\.mp4/\1/")
  /tmp/_crop_venv/bin/python ~/.claude/skills/wjs-reframing-video/scripts/crop.py \
    "clip_${n}_${slug}.mp4" \
    --out "clip_${n}_${slug}_vert.mp4" \
    --target portrait \
    --bitrate 8M    # 视频号 caps at 10Mbps
done
After cropping, swap the cropped versions to canonical names so downstream pipelines find them:
bash
mkdir -p _horizontal_archive
for n in 01 02 03 04 05; do
  base=$(ls clip_${n}_*_vert.mp4 | sed -E "s/_vert\.mp4$//")
  mv "${base}.mp4" "_horizontal_archive/"
  mv "${base}_vert.mp4" "${base}.mp4"
  # Re-extract midpoint frame:
  mid=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "${base}.mp4" | awk '{print $1/2}')
  slug=$(echo "$base" | sed -E "s/^clip_${n}_//")
  ffmpeg -hide_banner -loglevel error -ss "$mid" -i "${base}.mp4" \
    -frames:v 1 -q:v 3 "frame_${n}_${slug}.jpg" -y
done
Sanity check: face-on-screen detection rate in the crop log can read low (e.g.
face#0: 9.6s on screen (9%)
) when speakers sit further than ~2 m from the camera. That number being low is OK — the active-speaker hysteresis + fallback-to-largest-face still produces well-centered crops. Verify visually by extracting a midpoint frame and confirming the speaker is centered before committing.
裁切脚本需要在Python 3.12虚拟环境中安装
mediapipe + opencv + numpy
(mediapipe未提供Python 3.14+版本的安装包)。一次性安装步骤:
bash
uv venv --python 3.12 /tmp/_crop_venv
/tmp/_crop_venv/bin/python -m pip install mediapipe opencv-python numpy
每个片段的调用命令:
bash
for n in 01 02 03 04 05; do
  slug=$(ls clip_${n}_*.mp4 | grep -v -E "_intro|_burned|_vert" | head -1 | sed -E "s/clip_${n}_(.+)\.mp4/\1/")
  /tmp/_crop_venv/bin/python ~/.claude/skills/wjs-reframing-video/scripts/crop.py \
    "clip_${n}_${slug}.mp4" \
    --out "clip_${n}_${slug}_vert.mp4" \
    --target portrait \
    --bitrate 8M    # 视频号 caps at 10Mbps
done
裁切完成后,将裁切后的文件重命名为标准名称,以便下游流水线能找到它们:
bash
mkdir -p _horizontal_archive
for n in 01 02 03 04 05; do
  base=$(ls clip_${n}_*_vert.mp4 | sed -E "s/_vert\.mp4$//")
  mv "${base}.mp4" "_horizontal_archive/"
  mv "${base}_vert.mp4" "${base}.mp4"
  # 重新提取中间帧:
  mid=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "${base}.mp4" | awk '{print $1/2}')
  slug=$(echo "$base" | sed -E "s/^clip_${n}_//")
  ffmpeg -hide_banner -loglevel error -ss "$mid" -i "${base}.mp4" \
    -frames:v 1 -q:v 3 "frame_${n}_${slug}.jpg" -y
done
检查要点:当演讲者距离摄像头超过约2米时,裁切日志中的人脸检测率可能较低(例如
face#0: 9.6s on screen (9%)
)。该数值低是正常的——活跃演讲者滞后检测+ fallback到最大人脸的机制仍能生成居中的裁切画面。请视觉验证,提取中间帧并确认演讲者居中后再进行下一步。

Step 4 — Slice per-clip SRTs

步骤4 — 生成对应每个片段的SRT字幕

bash
python3 ~/.claude/skills/wjs-segmenting-video/scripts/burn_subs.py \
    --segments segments.json --out output/ --no-burn
The
--no-burn
flag emits per-clip SRTs (
clip_NN_slug.zh-CN.burn.srt
) with timestamps already shifted to start at 0 — exactly the input
/wjs-overlaying-video
captions expect (its compositions start the body at t=cover_duration, not the master clock).
Despite the legacy name
burn_subs.py
, this step does NOT burn pixels in
--no-burn
mode — it's just an SRT slicer. (The burn-pixels mode exists for the legacy "Path A" workflow but is deprecated in favor of
/wjs-overlaying-video
's HTML/CSS caption rendering.)
bash
python3 ~/.claude/skills/wjs-segmenting-video/scripts/burn_subs.py \
    --segments segments.json --out output/ --no-burn
--no-burn
参数会生成对应每个片段的SRT字幕文件(
clip_NN_slug.zh-CN.burn.srt
),其中时间戳已调整为从0开始——这正是
/wjs-overlaying-video
添加字幕时所需的格式(其合成内容从封面时长后开始,而非主视频时间轴)。
尽管脚本名为
burn_subs.py
,但在
--no-burn
模式下此步骤不会添加内嵌字幕——它只是一个SRT字幕分割工具。(内嵌字幕模式是旧版“A路径”工作流的遗留功能,现已被
/wjs-overlaying-video
的HTML/CSS字幕渲染取代。)

Hand-off package — what to deliver to
/wjs-overlaying-video

交付包 — 交付给
/wjs-overlaying-video
的内容

After Steps 1–4, deliver EXACTLY these per-segment artifacts:
output/
  clip_NN_slug.mp4                  # raw cropped clip (target orientation, no subs, no cover)
  clip_NN_slug.zh-CN.burn.srt       # per-clip SRT, timestamps shifted to start at 0
  frame_NN_slug.jpg                 # midpoint frame (cover reference)
  segments.json                     # for slug/title/summary/cover_prompt metadata
Then invoke
/wjs-overlaying-video
to add covers, captions, illustrations, CTA, and produce the upload-ready MP4 per clip. The overlay skill generates ONE final composition per clip and renders it in a single encode (no cascade of re-encodes).
完成步骤1-4后,需向
/wjs-overlaying-video
交付以下每个片段的文件:
output/
  clip_NN_slug.mp4                  # 裁切后的原始片段(适配目标平台方向,无字幕,无封面)
  clip_NN_slug.zh-CN.burn.srt       # 对应片段的SRT字幕,时间轴已调整为从0开始
  frame_NN_slug.jpg                 # 中间帧(封面参考)
  segments.json                     # 包含slug/标题/摘要/封面提示的元数据
然后调用
/wjs-overlaying-video
添加封面、字幕、插图、行动号召(CTA),并生成可直接上传的MP4文件。叠加技能会为每个片段生成一个最终合成内容,并一次性渲染完成(无需多次重新编码)。

Quick reference

快速参考

TaskCommand
Cut clips (accurate, default)
segment.py --segments S.json --out output/ --reencode
Probe source aspect
ffprobe -v error -select_streams v:0 -show_entries stream=width,height -of csv=p=0 IN.mp4
Convert orientation (ask first)invoke
/wjs-reframing-video
per clip
Slice per-clip SRTs
burn_subs.py --segments S.json --out output/ --no-burn
Diagnose keyframe positions
ffprobe -v error -select_streams v:0 -read_intervals A%B -show_entries packet=pts_time,flags -of csv=p=0 src.mp4 | grep K_
任务命令
剪辑片段(精准,默认模式)
segment.py --segments S.json --out output/ --reencode
检测源视频宽高比
ffprobe -v error -select_streams v:0 -show_entries stream=width,height -of csv=p=0 IN.mp4
转换画面方向(需先询问)对每个片段调用
/wjs-reframing-video
生成对应片段的SRT字幕
burn_subs.py --segments S.json --out output/ --no-burn
诊断关键帧位置
ffprobe -v error -select_streams v:0 -read_intervals A%B -show_entries packet=pts_time,flags -of csv=p=0 src.mp4 | grep K_

Common mistakes

常见错误

  • Cutting mid-sentence — always snap to SRT cue boundaries.
  • Trying to use 100% of the video — 3–6 strong clips from 10 min is normal. Boring middle = drop.
  • Letting the LLM write the title — the title is judgment, not summary. Review and rewrite before passing to make_cover.
  • Stream-copy without
    --force_key_frames
    preprocessing
    — produces clips with audio ahead of captions by up to 1 GOP. Use
    --reencode
    (default) unless the source was specifically prepared.
  • Skipping the orientation check — getting a horizontal podcast on 视频号 and finding out at upload time is preventable. Probe aspect and ask the user before cropping.
  • Burning subs / generating covers in THIS skill — those moved to
    /wjs-overlaying-video
    . This skill stops after Step 4.
  • 在句子中途剪辑——始终对齐SRT字幕的时间轴边界。
  • 试图使用视频的全部内容——从10分钟视频中提取3-6个优质片段是正常的。枯燥的中间部分应舍弃。
  • 让大语言模型(LLM)撰写标题——标题需要主观判断,而非简单总结。交付给封面生成环节前需审核并重写。
  • 未预先处理
    --force_key_frames
    就使用流复制
    ——会产生字幕比音频提前最多1个GOP周期的片段。除非源视频已专门处理,否则请使用
    --reencode
    (默认模式)。
  • 跳过画面方向检查——上传时才发现横屏播客需要转竖屏是可避免的。检测宽高比并在裁切前询问用户。
  • 在此技能中添加内嵌字幕/生成封面——这些操作已移至
    /wjs-overlaying-video
    。本技能在步骤4后即完成任务。

Integration with other skills

与其他技能的集成

  • /wjs-transcribing-audio
    — produce the source SRT first if missing. The word-level Whisper output (or Volcano/豆包 ASR output) is preferred for accurate cue timing. If the segments need translating, chain into
    /wjs-translating-subtitles
    .
  • /wjs-reframing-video
    — call in Step 3 when source orientation doesn't match target platform. Face-tracked active-speaker following keeps the talker in frame.
  • /wjs-editing-multicam
    — if the source is multi-cam, render the synced single MP4 first, then segment.
  • /wjs-overlaying-video
    — the default downstream for everything after Step 4. Covers, captions, illustrations, CTA, and final render all happen there. Don't add post-production in this skill.
  • /wjs-transcribing-audio
    ——如果缺少源SRT字幕,先运行此技能生成。优先使用支持单词级时间轴的Whisper输出(或火山/豆包ASR输出)以确保精准的字幕时间轴。如果片段需要翻译,后续调用**
    /wjs-translating-subtitles
    **。
  • /wjs-reframing-video
    ——当源视频方向与目标平台不匹配时,在步骤3中调用。基于人脸跟踪的活跃演讲者跟随机制可保持演讲者在画面中心。
  • /wjs-editing-multicam
    ——如果源视频是多机位的,先渲染为同步的单轨MP4,再进行片段分割。
  • /wjs-overlaying-video
    ——默认的下游技能,负责步骤4之后的所有操作。封面、字幕、插图、行动号召(CTA)及最终渲染均在此完成。请勿在此技能中进行后期制作操作。

Files & references

文件与参考资料

  • scripts/segment.py
    — accurate-seek + stream-copy cutting
  • scripts/burn_subs.py
    — SRT slicer (
    --no-burn
    mode); legacy libass burn-in mode is deprecated in favor of
    /wjs-overlaying-video
  • references/segments_schema.json
    — JSON Schema for segments.json
  • references/example_segments.json
    — worked example
  • scripts/segment.py
    ——精准时间轴剪辑+流复制剪辑工具
  • scripts/burn_subs.py
    ——SRT字幕分割工具(
    --no-burn
    模式);旧版libass内嵌字幕模式已被
    /wjs-overlaying-video
    取代,现已弃用
  • references/segments_schema.json
    ——
    segments.json
    的JSON格式规范
  • references/example_segments.json
    ——示例文件