wjs-syncing-multicam

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

wjs-syncing-multicam

Compute a single time offset for each multi-source recording of the same event using audio cross-correlation, and emit a

.sync.json

sidecar next to each original. Originals are never modified, copied, or re-encoded. Downstream tools use

-itsoffset

to apply the offset at consume time.

通过音频互相关计算同一事件的多源录制文件各自的时间偏移量，并在每个原始文件旁生成一个

.sync.json

辅助文件。原始文件绝不会被修改、复制或重新编码。下游工具可使用

-itsoffset

参数在使用时应用该偏移量。

Design principle — sidecar over re-encode

设计原则——优先辅助文件，而非重新编码

Earlier versions of this skill produced

*_synced.MOV

files by trimming + re-encoding to bake the offset into the file. We removed that:

Disk — a 75-min 4K shoot from 3 cameras is 60+ GB. Re-encoded synced copies double that for no information gain.
Quality — every re-encode is lossy. The originals are the source of truth; sidecars are reversible metadata.
Speed —
```
_synced.MOV
```
generation took 10+ min per file on Apple Silicon; sidecar emission takes seconds.
Composability — any downstream tool (autoedit.py, NLE import, ffmpeg one-liners) reads the sidecar and applies the offset itself. No tool-specific file format lock-in.

本技能的早期版本通过裁剪+重新编码将偏移量嵌入文件，生成

*_synced.MOV

文件。我们现已移除该功能，原因如下：

磁盘占用：3台相机拍摄的75分钟4K素材大小超过60GB。重新编码生成的同步副本会使磁盘占用翻倍，却没有任何信息增益。
画质损失：每一次重新编码都会造成画质损耗。原始文件是唯一的可信源；辅助文件是可逆转的元数据。
处理速度：在Apple Silicon设备上，生成
```
_synced.MOV
```
文件每个需要10分钟以上；而生成辅助文件仅需数秒。
可组合性：任何下游工具（autoedit.py、非线编导入、FFmpeg单行命令）都可读取辅助文件并自行应用偏移量。不存在工具专属的文件格式锁定问题。

When NOT to use

不适用于以下场景

Single-camera footage — nothing to sync to. For splitting one source into clips, use video-segmentation.
Sources already aligned in an NLE timeline — don't fight the editor.
For the auto-edit / cut / PiP rendering step that comes AFTER sync, use wjs-editing-multicam (consumes these sidecars).

单机位素材——没有可同步的对象。如需将单个源文件分割为片段，请使用video-segmentation。
已在非线编时间线中对齐的素材——无需重复操作。
同步完成后的自动剪辑/分镜/PiP画中画渲染步骤，请使用wjs-editing-multicam（该工具可读取这些辅助文件）。

Why envelope-based, not raw waveform

为何采用包络法而非原始波形

Raw PCM cross-correlation gives weak peaks and false matches when the two mics have different gain / room response — i.e., almost always with a secondary cam. The log-energy envelope captures dialogue and music dynamics, which both mics hear regardless of frequency response. Don't skip the envelope step — it's the entire reason this skill is robust at low SNR.

当两个麦克风的增益/房间响应不同时（几乎所有副机位都会出现这种情况），原始PCM互相关会产生弱峰值和匹配错误。对数能量包络可捕捉对话和音乐的动态变化，无论频率响应如何，两个麦克风都能捕捉到这些动态。请勿跳过包络步骤——这是本技能在低信噪比环境下保持鲁棒性的核心原因。

Algorithm

算法流程

Extract mono PCM at 8 kHz, 16-bit from each input.
Log-energy envelope at 100 Hz (10 ms hop, 50 ms window). High-pass with a 2nd-order Butterworth, 0.05 Hz cutoff, filtfilt — removes slow drift and gain offsets.
FFT cross-correlate envelopes end-to-end → coarse offset (~10 ms).
Refine at sample level with a 60 s probe from B near the coarse-aligned position in A, ±2 s search window, parabolic peak interpolation.
Multi-probe drift check — repeat step 4 every ~3 min. Linear fit
```
delta(t) = slope·t + intercept
```
reveals real clock drift (5–50 ppm typical). Use the midpoint-canonical offset (
```
slope · midpoint + intercept
```
) so residual error is symmetric around zero.

Compute overlap window in the reference timeline:

overlap = [max(0, delta), min(ref_dur, delta + src_dur)]

Emit
.sync.json
sidecar next to each non-reference input. No file is copied, trimmed, or re-encoded. The reference input gets a sidecar too (with
```
delta_seconds: 0
```
) so downstream code can treat all inputs uniformly.

scripts/sync.py

is the implementation. Note: the current script still emits

_synced.MOV

files alongside the sidecar — that path is deprecated; the sidecar is the only authoritative output.

提取单声道PCM：从每个输入文件中提取8kHz、16位的单声道PCM音频。
对数能量包络：以100Hz（10ms步长、50ms窗口）计算对数能量包络。使用二阶Butterworth高通滤波器（截止频率0.05Hz）进行双向滤波——消除缓慢漂移和增益偏移。
FFT包络互相关：对包络进行端到端FFT互相关计算→得到粗略偏移量（约10ms）。
样本级精细化：在A文件的粗略对齐位置附近，取B文件的60秒探测片段，在±2秒的搜索窗口内进行样本级互相关，通过抛物线峰值插值得到精确偏移量。
多探测漂移检查：每约3分钟重复步骤4。通过线性拟合
```
delta(t) = slope·t + intercept
```
检测实际时钟漂移（典型值为5–50ppm）。使用中点标准偏移量（
```
slope · midpoint + intercept
```
），使剩余误差围绕零值对称分布。
计算重叠窗口：在参考时间线中计算重叠窗口：
```
overlap = [max(0, delta), min(ref_dur, delta + src_dur)]
```
。
生成
.sync.json
辅助文件：为每个非参考输入文件生成辅助文件。无文件被复制、裁剪或重新编码。参考输入文件也会生成辅助文件（其中
```
delta_seconds: 0
```
），以便下游代码统一处理所有输入文件。

scripts/sync.py

是实现脚本。注意：当前脚本仍会在辅助文件旁生成

_synced.MOV

文件——该路径已被弃用；辅助文件是唯一权威输出。

Sidecar schema (

<input>.sync.json

)

辅助文件 schema（

<input>.sync.json

）

One sidecar per original input, written next to it. Pure JSON, no comments in-file — the field reference below is canonical.

json

{
  "_about": "Sync metadata for cam_b.MOV. Apply via ffmpeg -itsoffset. See wjs-syncing-multicam SKILL.md for full schema.",
  "schema_version": 1,
  "source": "cam_b.MOV",
  "reference": "cam_a.MOV",
  "delta_seconds": 12.345,
  "drift_slope": 1.8e-5,
  "overlap_in_reference": [12.345, 4512.180],
  "overlap_in_source":    [0.000,   4499.835],
  "verification": {
    "median_residual_ms": 4.2,
    "residual_spread_ms": 11.8,
    "probe_count": 24
  }
}

每个原始输入文件对应一个辅助文件，与原始文件同目录。纯JSON格式，文件内无注释——以下字段说明为权威参考。

json

{
  "_about": "Sync metadata for cam_b.MOV. Apply via ffmpeg -itsoffset. See wjs-syncing-multicam SKILL.md for full schema.",
  "schema_version": 1,
  "source": "cam_b.MOV",
  "reference": "cam_a.MOV",
  "delta_seconds": 12.345,
  "drift_slope": 1.8e-5,
  "overlap_in_reference": [12.345, 4512.180],
  "overlap_in_source":    [0.000,   4499.835],
  "verification": {
    "median_residual_ms": 4.2,
    "residual_spread_ms": 11.8,
    "probe_count": 24
  }
}

Field reference

字段说明

Field	Type	Meaning
`_about`	string	Human-readable one-liner. Includes pointer back to this SKILL.md. Always present.
`schema_version`	int	Bumps on any breaking change to this schema. Current: `1` .
`source`	string	Filename of the original this sidecar describes. Relative to the sidecar's directory. Never points to a re-encoded file.
`reference`	string	The input whose timeline we're aligned to. Reference's own sidecar lists itself here.
`delta_seconds`	float	The source's `t=0` expressed in the reference's timeline. If positive, source starts after reference; pass to ffmpeg as `-itsoffset <delta>` . Can be negative (source starts before reference, e.g. early-rolling camera).
`drift_slope`	float	Linear clock-drift slope (dimensionless, ~10⁻⁵). `0.0` means no measurable drift. Downstream applies `atempo = 1 + drift_slope` to the source ONLY for sync-sound / long-form lip-sync — for camera-cut editing, ignore.
`overlap_in_reference`	`[start, end]` (seconds)	The window during which both source and reference have coverage, expressed in the reference's timeline. Use this to trim outputs to mutually-valid time ranges.
`overlap_in_source`	`[start, end]` (seconds)	Same window expressed in the source's local timeline. `overlap_in_reference[0] - delta_seconds = overlap_in_source[0]` .
`verification`	object	Output of running verify.py — drives a "did sync converge?" gate. `median_residual_ms` should be a few ms; `residual_spread_ms` > 1 frame at delivery fps means drift correction was needed but skipped.

字段	类型	含义
`_about`	字符串	易读的单行说明。包含指向本SKILL.md的链接。始终存在。
`schema_version`	整数	当schema发生任何破坏性变更时递增。当前版本： `1` 。
`source`	字符串	本辅助文件对应的原始文件名。相对于辅助文件所在目录。绝不会指向重新编码后的文件。
`reference`	字符串	对齐所参考的输入文件。参考文件自身的辅助文件中该字段指向其自身。
`delta_seconds`	浮点数	源文件的 `t=0` 时刻在参考时间线中的对应时间。若为正值，源文件晚于参考文件开始；需将该值传入FFmpeg的 `-itsoffset <delta>` 参数。可为负值（源文件早于参考文件开始，例如提前开机的相机）。
`drift_slope`	浮点数	线性时钟漂移斜率（无量纲，约10⁻⁵）。 `0.0` 表示无可测量的漂移。下游工具仅在同步音轨/长片唇形同步时，对源文件应用 `atempo = 1 + drift_slope` ；对于机位剪辑场景，可忽略该值。
`overlap_in_reference`	`[start, end]` （秒）	源文件与参考文件均有覆盖的时间窗口，以参考时间线表示。可用于将输出裁剪至双方均有效的时间范围。
`overlap_in_source`	`[start, end]` （秒）	同一时间窗口以源文件本地时间线表示。 `overlap_in_reference[0] - delta_seconds = overlap_in_source[0]` 。
`verification`	对象	运行verify.py的输出结果——用于判断同步是否收敛。 `median_residual_ms` 应仅为几毫秒；若 `residual_spread_ms` 大于交付帧率下的1帧，则说明需要漂移校正但未执行。

How downstream consumes the sidecar

下游工具如何使用辅助文件

-itsoffset

is per-input in ffmpeg and applies BEFORE

-i

. Always read the source's

delta_seconds

from the sidecar:

bash

undefined

-itsoffset

是FFmpeg的输入级参数，需在

-i

之前指定。务必从辅助文件中读取源文件的

delta_seconds

值：

bash

undefined

Play cam_b aligned to cam_a's timeline

播放与cam_a时间线对齐的cam_b

ffmpeg -itsoffset $(jq -r .delta_seconds cam_b.MOV.sync.json) -i cam_b.MOV
-i cam_a.MOV
-filter_complex "[0:v][1:v]hstack" out.mp4

Trim to mutual overlap window (read from cam_b.MOV.sync.json)

裁剪至双方重叠窗口（从cam_b.MOV.sync.json读取）

ffmpeg -ss <overlap_in_source[0]> -i cam_b.MOV -t <overlap_dur> ...


For `wjs-editing-multicam`, the EDL builder in `autoedit.py` ingests every `<input>.sync.json` automatically; you don't compose these flags by hand.

ffmpeg -ss <overlap_in_source[0]> -i cam_b.MOV -t <overlap_dur> ...


对于`wjs-editing-multicam`，`autoedit.py`中的EDL构建器会自动读取所有`<input>.sync.json`文件；无需手动编写这些参数。

Partial-coverage clips

部分覆盖的片段

Common case — main cams cover 75 min, a Riverside / phone / lavalier recorder only covers the middle 30 min.

scripts/sync_partial.py REF.MOV NEW.mp4

Cross-correlates the new input against the reference.
Finds where the new clip's
```
t=0
```
sits in the reference timeline (
```
delta_seconds
```
may be large, e.g. 1842.5).
Writes the sidecar — that's it. No black padding, no audio padding, no re-encode.
```
overlap_in_reference
```
tells consumers exactly when this input has coverage; outside that window, fall back to the main cams.

--audio-only

flag is meaningful only for hinting downstream that this source has no video stream — there's no encoding step to skip anymore.

常见场景——主机位覆盖75分钟，而Riverside/手机/领夹式录音设备仅覆盖中间30分钟。运行

scripts/sync_partial.py REF.MOV NEW.mp4

：

将新输入文件与参考文件进行互相关计算。
找到新片段的
```
t=0
```
时刻在参考时间线中的位置（
```
delta_seconds
```
可能很大，例如1842.5）。
生成辅助文件——操作完成。无黑场填充、无音频填充、无重新编码。
```
overlap_in_reference
```
会告知消费者该输入文件的有效覆盖时段；超出该时段时，可 fallback 至主机位素材。

--audio-only

标志仅用于提示下游工具该源文件无视频流——现已不存在编码步骤可跳过。

When to skip drift correction

何时跳过漂移校正

For camera-cut editing (the common case), ±25 ms residual across an hour is below human perception — pass

drift_slope: 0.0

and use only the midpoint

delta_seconds

For sync-sound / lip-sync at long durations (>30 min and

verification.residual_spread_ms > 40

), downstream applies

atempo = 1 + drift_slope

to the source. Source files are still not modified — the

atempo

filter runs at consume time.

对于机位剪辑（常见场景），一小时内±25ms的剩余误差低于人类感知阈值——可设置

drift_slope: 0.0

，仅使用中点

delta_seconds

。

对于长时间（>30分钟且

verification.residual_spread_ms > 40

）的同步音轨/唇形同步场景，下游工具需对源文件应用

atempo = 1 + drift_slope

。源文件仍不会被修改——

atempo

滤镜在使用时实时运行。

Verification (always run)

验证（务必运行）

scripts/verify.py REF.MOV SRC.MOV SRC.sync.json

re-extracts audio from BOTH originals (with

-itsoffset

applied to the source per the sidecar) and runs multi-probe correlation again. Writes results back into the sidecar's

verification

field.

Pass criteria —

median_residual_ms < 15

and

residual_spread_ms < 1 frame at delivery fps

. Fail = retry with drift correction enabled.

scripts/verify.py REF.MOV SRC.MOV SRC.sync.json

会重新提取两个原始文件的音频（根据辅助文件对源文件应用

-itsoffset

），并再次执行多探测互相关计算。将结果写回辅助文件的

verification

字段。

通过标准——

median_residual_ms < 15

且

residual_spread_ms < 交付帧率下的1帧

。未通过则需启用漂移校正重试。

Common pitfalls

常见误区

Raw waveform cross-correlation gives false peaks under low SNR. Always envelope first — this is not a tunable, it's the entire premise.
-itsoffset
semantics differ for audio vs video — for sync-correctness it must be the FIRST flag for that input.
```
ffmpeg -i src -itsoffset X
```
is wrong;
```
ffmpeg -itsoffset X -i src
```
is right.
Sidecar paths must be relative to the sidecar file's directory, not the working directory of the consuming process. Resolve
```
source
```
/
```
reference
```
against
```
Path(sidecar).parent
```
.
Don't bake
drift_slope
into the sidecar's
delta_seconds
. They're separate fields for a reason — naive consumers can ignore drift, sync-sound consumers can apply it. Mixing them loses information.

低信噪比下，原始波形互相关会产生错误峰值。务必先计算包络——这不是可调节参数，而是本工具的核心前提。
-itsoffset
对音频和视频的语义不同——为保证同步正确性，该参数必须是对应输入的第一个标志。
```
ffmpeg -i src -itsoffset X
```
是错误写法；
```
ffmpeg -itsoffset X -i src
```
才是正确的。
辅助文件中的路径必须相对于辅助文件所在目录，而非消费进程的工作目录。需根据
```
Path(sidecar).parent
```
解析
```
source
```
/
```
reference
```
路径。
请勿将
drift_slope
融入辅助文件的
delta_seconds
中。二者为独立字段是有原因的——基础消费者可忽略漂移，同步音轨消费者可应用漂移校正。混合二者会丢失信息。

wjs-syncing-multicam

Original

Translation

wjs-syncing-multicam

wjs-syncing-multicam

Design principle — sidecar over re-encode

设计原则——优先辅助文件，而非重新编码

When NOT to use

不适用于以下场景

Why envelope-based, not raw waveform

为何采用包络法而非原始波形

Algorithm

算法流程

Sidecar schema (
`<input>.sync.json`
)

辅助文件 schema（
`<input>.sync.json`
）

Field reference

字段说明

How downstream consumes the sidecar

下游工具如何使用辅助文件

Play cam_b aligned to cam_a's timeline

播放与cam_a时间线对齐的cam_b

Trim to mutual overlap window (read from cam_b.MOV.sync.json)

裁剪至双方重叠窗口（从cam_b.MOV.sync.json读取）

Partial-coverage clips

部分覆盖的片段

When to skip drift correction

何时跳过漂移校正

Verification (always run)

验证（务必运行）

Common pitfalls

常见误区

wjs-syncing-multicam

Original

Translation

wjs-syncing-multicam

wjs-syncing-multicam

Design principle — sidecar over re-encode

设计原则——优先辅助文件，而非重新编码

When NOT to use

不适用于以下场景

Why envelope-based, not raw waveform

为何采用包络法而非原始波形

Algorithm

算法流程

Sidecar schema (<input>.sync.json)

辅助文件 schema（<input>.sync.json）

Field reference

字段说明

How downstream consumes the sidecar

下游工具如何使用辅助文件

Play cam_b aligned to cam_a's timeline

播放与cam_a时间线对齐的cam_b

Trim to mutual overlap window (read from cam_b.MOV.sync.json)

裁剪至双方重叠窗口（从cam_b.MOV.sync.json读取）

Partial-coverage clips

部分覆盖的片段

When to skip drift correction

何时跳过漂移校正

Verification (always run)

验证（务必运行）

Common pitfalls

常见误区

Sidecar schema (
`<input>.sync.json`
)

辅助文件 schema（
`<input>.sync.json`
）