wjs-syncing-multicam

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

wjs-syncing-multicam

wjs-syncing-multicam

Compute a single time offset for each multi-source recording of the same event using audio cross-correlation, and emit a
.sync.json
sidecar next to each original. Originals are never modified, copied, or re-encoded. Downstream tools use
-itsoffset
to apply the offset at consume time.
通过音频互相关计算同一事件的多源录制文件各自的时间偏移量,并在每个原始文件旁生成一个
.sync.json
辅助文件。原始文件绝不会被修改、复制或重新编码。下游工具可使用
-itsoffset
参数在使用时应用该偏移量。

Design principle — sidecar over re-encode

设计原则——优先辅助文件,而非重新编码

Earlier versions of this skill produced
*_synced.MOV
files by trimming + re-encoding to bake the offset into the file. We removed that:
  • Disk — a 75-min 4K shoot from 3 cameras is 60+ GB. Re-encoded synced copies double that for no information gain.
  • Quality — every re-encode is lossy. The originals are the source of truth; sidecars are reversible metadata.
  • Speed
    _synced.MOV
    generation took 10+ min per file on Apple Silicon; sidecar emission takes seconds.
  • Composability — any downstream tool (autoedit.py, NLE import, ffmpeg one-liners) reads the sidecar and applies the offset itself. No tool-specific file format lock-in.
本技能的早期版本通过裁剪+重新编码将偏移量嵌入文件,生成
*_synced.MOV
文件。我们现已移除该功能,原因如下:
  • 磁盘占用:3台相机拍摄的75分钟4K素材大小超过60GB。重新编码生成的同步副本会使磁盘占用翻倍,却没有任何信息增益。
  • 画质损失:每一次重新编码都会造成画质损耗。原始文件是唯一的可信源;辅助文件是可逆转的元数据。
  • 处理速度:在Apple Silicon设备上,生成
    _synced.MOV
    文件每个需要10分钟以上;而生成辅助文件仅需数秒。
  • 可组合性:任何下游工具(autoedit.py、非线编导入、FFmpeg单行命令)都可读取辅助文件并自行应用偏移量。不存在工具专属的文件格式锁定问题。

When NOT to use

不适用于以下场景

  • Single-camera footage — nothing to sync to. For splitting one source into clips, use video-segmentation.
  • Sources already aligned in an NLE timeline — don't fight the editor.
  • For the auto-edit / cut / PiP rendering step that comes AFTER sync, use wjs-editing-multicam (consumes these sidecars).
  • 单机位素材——没有可同步的对象。如需将单个源文件分割为片段,请使用video-segmentation
  • 已在非线编时间线中对齐的素材——无需重复操作。
  • 同步完成后的自动剪辑/分镜/PiP画中画渲染步骤,请使用wjs-editing-multicam(该工具可读取这些辅助文件)。

Why envelope-based, not raw waveform

为何采用包络法而非原始波形

Raw PCM cross-correlation gives weak peaks and false matches when the two mics have different gain / room response — i.e., almost always with a secondary cam. The log-energy envelope captures dialogue and music dynamics, which both mics hear regardless of frequency response. Don't skip the envelope step — it's the entire reason this skill is robust at low SNR.
当两个麦克风的增益/房间响应不同时(几乎所有副机位都会出现这种情况),原始PCM互相关会产生弱峰值和匹配错误。对数能量包络可捕捉对话和音乐的动态变化,无论频率响应如何,两个麦克风都能捕捉到这些动态。请勿跳过包络步骤——这是本技能在低信噪比环境下保持鲁棒性的核心原因

Algorithm

算法流程

  1. Extract mono PCM at 8 kHz, 16-bit from each input.
  2. Log-energy envelope at 100 Hz (10 ms hop, 50 ms window). High-pass with a 2nd-order Butterworth, 0.05 Hz cutoff, filtfilt — removes slow drift and gain offsets.
  3. FFT cross-correlate envelopes end-to-end → coarse offset (~10 ms).
  4. Refine at sample level with a 60 s probe from B near the coarse-aligned position in A, ±2 s search window, parabolic peak interpolation.
  5. Multi-probe drift check — repeat step 4 every ~3 min. Linear fit
    delta(t) = slope·t + intercept
    reveals real clock drift (5–50 ppm typical). Use the midpoint-canonical offset (
    slope · midpoint + intercept
    ) so residual error is symmetric around zero.
  6. Compute overlap window in the reference timeline:
    overlap = [max(0, delta), min(ref_dur, delta + src_dur)]
    .
  7. Emit
    .sync.json
    sidecar
    next to each non-reference input. No file is copied, trimmed, or re-encoded. The reference input gets a sidecar too (with
    delta_seconds: 0
    ) so downstream code can treat all inputs uniformly.
scripts/sync.py
is the implementation. Note: the current script still emits
_synced.MOV
files alongside the sidecar — that path is deprecated; the sidecar is the only authoritative output.
  1. 提取单声道PCM:从每个输入文件中提取8kHz、16位的单声道PCM音频。
  2. 对数能量包络:以100Hz(10ms步长、50ms窗口)计算对数能量包络。使用二阶Butterworth高通滤波器(截止频率0.05Hz)进行双向滤波——消除缓慢漂移和增益偏移。
  3. FFT包络互相关:对包络进行端到端FFT互相关计算→得到粗略偏移量(约10ms)。
  4. 样本级精细化:在A文件的粗略对齐位置附近,取B文件的60秒探测片段,在±2秒的搜索窗口内进行样本级互相关,通过抛物线峰值插值得到精确偏移量。
  5. 多探测漂移检查:每约3分钟重复步骤4。通过线性拟合
    delta(t) = slope·t + intercept
    检测实际时钟漂移(典型值为5–50ppm)。使用中点标准偏移量
    slope · midpoint + intercept
    ),使剩余误差围绕零值对称分布。
  6. 计算重叠窗口:在参考时间线中计算重叠窗口:
    overlap = [max(0, delta), min(ref_dur, delta + src_dur)]
  7. 生成
    .sync.json
    辅助文件
    :为每个非参考输入文件生成辅助文件。无文件被复制、裁剪或重新编码。参考输入文件也会生成辅助文件(其中
    delta_seconds: 0
    ),以便下游代码统一处理所有输入文件。
scripts/sync.py
是实现脚本。注意:当前脚本仍会在辅助文件旁生成
_synced.MOV
文件——该路径已被弃用;辅助文件是唯一权威输出。

Sidecar schema (
<input>.sync.json
)

辅助文件 schema(
<input>.sync.json

One sidecar per original input, written next to it. Pure JSON, no comments in-file — the field reference below is canonical.
json
{
  "_about": "Sync metadata for cam_b.MOV. Apply via ffmpeg -itsoffset. See wjs-syncing-multicam SKILL.md for full schema.",
  "schema_version": 1,
  "source": "cam_b.MOV",
  "reference": "cam_a.MOV",
  "delta_seconds": 12.345,
  "drift_slope": 1.8e-5,
  "overlap_in_reference": [12.345, 4512.180],
  "overlap_in_source":    [0.000,   4499.835],
  "verification": {
    "median_residual_ms": 4.2,
    "residual_spread_ms": 11.8,
    "probe_count": 24
  }
}
每个原始输入文件对应一个辅助文件,与原始文件同目录。纯JSON格式,文件内无注释——以下字段说明为权威参考。
json
{
  "_about": "Sync metadata for cam_b.MOV. Apply via ffmpeg -itsoffset. See wjs-syncing-multicam SKILL.md for full schema.",
  "schema_version": 1,
  "source": "cam_b.MOV",
  "reference": "cam_a.MOV",
  "delta_seconds": 12.345,
  "drift_slope": 1.8e-5,
  "overlap_in_reference": [12.345, 4512.180],
  "overlap_in_source":    [0.000,   4499.835],
  "verification": {
    "median_residual_ms": 4.2,
    "residual_spread_ms": 11.8,
    "probe_count": 24
  }
}

Field reference

字段说明

FieldTypeMeaning
_about
stringHuman-readable one-liner. Includes pointer back to this SKILL.md. Always present.
schema_version
intBumps on any breaking change to this schema. Current:
1
.
source
stringFilename of the original this sidecar describes. Relative to the sidecar's directory. Never points to a re-encoded file.
reference
stringThe input whose timeline we're aligned to. Reference's own sidecar lists itself here.
delta_seconds
floatThe source's
t=0
expressed in the reference's timeline. If positive, source starts after reference; pass to ffmpeg as
-itsoffset <delta>
.
Can be negative (source starts before reference, e.g. early-rolling camera).
drift_slope
floatLinear clock-drift slope (dimensionless, ~10⁻⁵).
0.0
means no measurable drift. Downstream applies
atempo = 1 + drift_slope
to the source ONLY for sync-sound / long-form lip-sync — for camera-cut editing, ignore.
overlap_in_reference
[start, end]
(seconds)
The window during which both source and reference have coverage, expressed in the reference's timeline. Use this to trim outputs to mutually-valid time ranges.
overlap_in_source
[start, end]
(seconds)
Same window expressed in the source's local timeline.
overlap_in_reference[0] - delta_seconds = overlap_in_source[0]
.
verification
objectOutput of running verify.py — drives a "did sync converge?" gate.
median_residual_ms
should be a few ms;
residual_spread_ms
> 1 frame at delivery fps means drift correction was needed but skipped.
字段类型含义
_about
字符串易读的单行说明。包含指向本SKILL.md的链接。始终存在。
schema_version
整数当schema发生任何破坏性变更时递增。当前版本:
1
source
字符串本辅助文件对应的原始文件名。相对于辅助文件所在目录。绝不会指向重新编码后的文件
reference
字符串对齐所参考的输入文件。参考文件自身的辅助文件中该字段指向其自身。
delta_seconds
浮点数源文件的
t=0
时刻在参考时间线中的对应时间。若为正值,源文件晚于参考文件开始;需将该值传入FFmpeg的
-itsoffset <delta>
参数
。可为负值(源文件早于参考文件开始,例如提前开机的相机)。
drift_slope
浮点数线性时钟漂移斜率(无量纲,约10⁻⁵)。
0.0
表示无可测量的漂移。下游工具仅在同步音轨/长片唇形同步时,对源文件应用
atempo = 1 + drift_slope
;对于机位剪辑场景,可忽略该值。
overlap_in_reference
[start, end]
(秒)
源文件与参考文件均有覆盖的时间窗口,以参考时间线表示。可用于将输出裁剪至双方均有效的时间范围。
overlap_in_source
[start, end]
(秒)
同一时间窗口以源文件本地时间线表示。
overlap_in_reference[0] - delta_seconds = overlap_in_source[0]
verification
对象运行verify.py的输出结果——用于判断同步是否收敛。
median_residual_ms
应仅为几毫秒;若
residual_spread_ms
大于交付帧率下的1帧,则说明需要漂移校正但未执行。

How downstream consumes the sidecar

下游工具如何使用辅助文件

-itsoffset
is per-input in ffmpeg and applies BEFORE
-i
. Always read the source's
delta_seconds
from the sidecar:
bash
undefined
-itsoffset
是FFmpeg的输入级参数,需在
-i
之前指定。务必从辅助文件中读取源文件的
delta_seconds
值:
bash
undefined

Play cam_b aligned to cam_a's timeline

播放与cam_a时间线对齐的cam_b

ffmpeg -itsoffset $(jq -r .delta_seconds cam_b.MOV.sync.json) -i cam_b.MOV
-i cam_a.MOV
-filter_complex "[0:v][1:v]hstack" out.mp4
ffmpeg -itsoffset $(jq -r .delta_seconds cam_b.MOV.sync.json) -i cam_b.MOV
-i cam_a.MOV
-filter_complex "[0:v][1:v]hstack" out.mp4

Trim to mutual overlap window (read from cam_b.MOV.sync.json)

裁剪至双方重叠窗口(从cam_b.MOV.sync.json读取)

ffmpeg -ss <overlap_in_source[0]> -i cam_b.MOV -t <overlap_dur> ...

For `wjs-editing-multicam`, the EDL builder in `autoedit.py` ingests every `<input>.sync.json` automatically; you don't compose these flags by hand.
ffmpeg -ss <overlap_in_source[0]> -i cam_b.MOV -t <overlap_dur> ...

对于`wjs-editing-multicam`,`autoedit.py`中的EDL构建器会自动读取所有`<input>.sync.json`文件;无需手动编写这些参数。

Partial-coverage clips

部分覆盖的片段

Common case — main cams cover 75 min, a Riverside / phone / lavalier recorder only covers the middle 30 min.
scripts/sync_partial.py REF.MOV NEW.mp4
:
  1. Cross-correlates the new input against the reference.
  2. Finds where the new clip's
    t=0
    sits in the reference timeline (
    delta_seconds
    may be large, e.g. 1842.5).
  3. Writes the sidecar — that's it. No black padding, no audio padding, no re-encode.
    overlap_in_reference
    tells consumers exactly when this input has coverage; outside that window, fall back to the main cams.
--audio-only
flag is meaningful only for hinting downstream that this source has no video stream — there's no encoding step to skip anymore.
常见场景——主机位覆盖75分钟,而Riverside/手机/领夹式录音设备仅覆盖中间30分钟。运行
scripts/sync_partial.py REF.MOV NEW.mp4
  1. 将新输入文件与参考文件进行互相关计算。
  2. 找到新片段的
    t=0
    时刻在参考时间线中的位置(
    delta_seconds
    可能很大,例如1842.5)。
  3. 生成辅助文件——操作完成。无黑场填充、无音频填充、无重新编码
    overlap_in_reference
    会告知消费者该输入文件的有效覆盖时段;超出该时段时,可 fallback 至主机位素材。
--audio-only
标志仅用于提示下游工具该源文件无视频流——现已不存在编码步骤可跳过。

When to skip drift correction

何时跳过漂移校正

For camera-cut editing (the common case), ±25 ms residual across an hour is below human perception — pass
drift_slope: 0.0
and use only the midpoint
delta_seconds
.
For sync-sound / lip-sync at long durations (>30 min and
verification.residual_spread_ms > 40
), downstream applies
atempo = 1 + drift_slope
to the source. Source files are still not modified — the
atempo
filter runs at consume time.
对于机位剪辑(常见场景),一小时内±25ms的剩余误差低于人类感知阈值——可设置
drift_slope: 0.0
,仅使用中点
delta_seconds
对于长时间(>30分钟且
verification.residual_spread_ms > 40
)的同步音轨/唇形同步场景,下游工具需对源文件应用
atempo = 1 + drift_slope
。源文件仍不会被修改——
atempo
滤镜在使用时实时运行。

Verification (always run)

验证(务必运行)

scripts/verify.py REF.MOV SRC.MOV SRC.sync.json
re-extracts audio from BOTH originals (with
-itsoffset
applied to the source per the sidecar) and runs multi-probe correlation again. Writes results back into the sidecar's
verification
field.
Pass criteria —
median_residual_ms < 15
and
residual_spread_ms < 1 frame at delivery fps
. Fail = retry with drift correction enabled.
scripts/verify.py REF.MOV SRC.MOV SRC.sync.json
会重新提取两个原始文件的音频(根据辅助文件对源文件应用
-itsoffset
),并再次执行多探测互相关计算。将结果写回辅助文件的
verification
字段。
通过标准——
median_residual_ms < 15
residual_spread_ms < 交付帧率下的1帧
。未通过则需启用漂移校正重试。

Common pitfalls

常见误区

  • Raw waveform cross-correlation gives false peaks under low SNR. Always envelope first — this is not a tunable, it's the entire premise.
  • -itsoffset
    semantics differ for audio vs video
    — for sync-correctness it must be the FIRST flag for that input.
    ffmpeg -i src -itsoffset X
    is wrong;
    ffmpeg -itsoffset X -i src
    is right.
  • Sidecar paths must be relative to the sidecar file's directory, not the working directory of the consuming process. Resolve
    source
    /
    reference
    against
    Path(sidecar).parent
    .
  • Don't bake
    drift_slope
    into the sidecar's
    delta_seconds
    .
    They're separate fields for a reason — naive consumers can ignore drift, sync-sound consumers can apply it. Mixing them loses information.
  • 低信噪比下,原始波形互相关会产生错误峰值。务必先计算包络——这不是可调节参数,而是本工具的核心前提。
  • -itsoffset
    对音频和视频的语义不同
    ——为保证同步正确性,该参数必须是对应输入的第一个标志。
    ffmpeg -i src -itsoffset X
    是错误写法;
    ffmpeg -itsoffset X -i src
    才是正确的。
  • 辅助文件中的路径必须相对于辅助文件所在目录,而非消费进程的工作目录。需根据
    Path(sidecar).parent
    解析
    source
    /
    reference
    路径。
  • 请勿将
    drift_slope
    融入辅助文件的
    delta_seconds
    。二者为独立字段是有原因的——基础消费者可忽略漂移,同步音轨消费者可应用漂移校正。混合二者会丢失信息。