wjs-syncing-multicam
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesewjs-syncing-multicam
wjs-syncing-multicam
Compute a single time offset for each multi-source recording of the same event using audio cross-correlation, and emit a sidecar next to each original. Originals are never modified, copied, or re-encoded. Downstream tools use to apply the offset at consume time.
.sync.json-itsoffset通过音频互相关计算同一事件的多源录制文件各自的时间偏移量,并在每个原始文件旁生成一个辅助文件。原始文件绝不会被修改、复制或重新编码。下游工具可使用参数在使用时应用该偏移量。
.sync.json-itsoffsetDesign principle — sidecar over re-encode
设计原则——优先辅助文件,而非重新编码
Earlier versions of this skill produced files by trimming + re-encoding to bake the offset into the file. We removed that:
*_synced.MOV- Disk — a 75-min 4K shoot from 3 cameras is 60+ GB. Re-encoded synced copies double that for no information gain.
- Quality — every re-encode is lossy. The originals are the source of truth; sidecars are reversible metadata.
- Speed — generation took 10+ min per file on Apple Silicon; sidecar emission takes seconds.
_synced.MOV - Composability — any downstream tool (autoedit.py, NLE import, ffmpeg one-liners) reads the sidecar and applies the offset itself. No tool-specific file format lock-in.
本技能的早期版本通过裁剪+重新编码将偏移量嵌入文件,生成文件。我们现已移除该功能,原因如下:
*_synced.MOV- 磁盘占用:3台相机拍摄的75分钟4K素材大小超过60GB。重新编码生成的同步副本会使磁盘占用翻倍,却没有任何信息增益。
- 画质损失:每一次重新编码都会造成画质损耗。原始文件是唯一的可信源;辅助文件是可逆转的元数据。
- 处理速度:在Apple Silicon设备上,生成文件每个需要10分钟以上;而生成辅助文件仅需数秒。
_synced.MOV - 可组合性:任何下游工具(autoedit.py、非线编导入、FFmpeg单行命令)都可读取辅助文件并自行应用偏移量。不存在工具专属的文件格式锁定问题。
When NOT to use
不适用于以下场景
- Single-camera footage — nothing to sync to. For splitting one source into clips, use video-segmentation.
- Sources already aligned in an NLE timeline — don't fight the editor.
- For the auto-edit / cut / PiP rendering step that comes AFTER sync, use wjs-editing-multicam (consumes these sidecars).
- 单机位素材——没有可同步的对象。如需将单个源文件分割为片段,请使用video-segmentation。
- 已在非线编时间线中对齐的素材——无需重复操作。
- 同步完成后的自动剪辑/分镜/PiP画中画渲染步骤,请使用wjs-editing-multicam(该工具可读取这些辅助文件)。
Why envelope-based, not raw waveform
为何采用包络法而非原始波形
Raw PCM cross-correlation gives weak peaks and false matches when the two mics have different gain / room response — i.e., almost always with a secondary cam. The log-energy envelope captures dialogue and music dynamics, which both mics hear regardless of frequency response. Don't skip the envelope step — it's the entire reason this skill is robust at low SNR.
当两个麦克风的增益/房间响应不同时(几乎所有副机位都会出现这种情况),原始PCM互相关会产生弱峰值和匹配错误。对数能量包络可捕捉对话和音乐的动态变化,无论频率响应如何,两个麦克风都能捕捉到这些动态。请勿跳过包络步骤——这是本技能在低信噪比环境下保持鲁棒性的核心原因。
Algorithm
算法流程
- Extract mono PCM at 8 kHz, 16-bit from each input.
- Log-energy envelope at 100 Hz (10 ms hop, 50 ms window). High-pass with a 2nd-order Butterworth, 0.05 Hz cutoff, filtfilt — removes slow drift and gain offsets.
- FFT cross-correlate envelopes end-to-end → coarse offset (~10 ms).
- Refine at sample level with a 60 s probe from B near the coarse-aligned position in A, ±2 s search window, parabolic peak interpolation.
- Multi-probe drift check — repeat step 4 every ~3 min. Linear fit reveals real clock drift (5–50 ppm typical). Use the midpoint-canonical offset (
delta(t) = slope·t + intercept) so residual error is symmetric around zero.slope · midpoint + intercept - Compute overlap window in the reference timeline: .
overlap = [max(0, delta), min(ref_dur, delta + src_dur)] - Emit sidecar next to each non-reference input. No file is copied, trimmed, or re-encoded. The reference input gets a sidecar too (with
.sync.json) so downstream code can treat all inputs uniformly.delta_seconds: 0
scripts/sync.py_synced.MOV- 提取单声道PCM:从每个输入文件中提取8kHz、16位的单声道PCM音频。
- 对数能量包络:以100Hz(10ms步长、50ms窗口)计算对数能量包络。使用二阶Butterworth高通滤波器(截止频率0.05Hz)进行双向滤波——消除缓慢漂移和增益偏移。
- FFT包络互相关:对包络进行端到端FFT互相关计算→得到粗略偏移量(约10ms)。
- 样本级精细化:在A文件的粗略对齐位置附近,取B文件的60秒探测片段,在±2秒的搜索窗口内进行样本级互相关,通过抛物线峰值插值得到精确偏移量。
- 多探测漂移检查:每约3分钟重复步骤4。通过线性拟合检测实际时钟漂移(典型值为5–50ppm)。使用中点标准偏移量(
delta(t) = slope·t + intercept),使剩余误差围绕零值对称分布。slope · midpoint + intercept - 计算重叠窗口:在参考时间线中计算重叠窗口:。
overlap = [max(0, delta), min(ref_dur, delta + src_dur)] - 生成辅助文件:为每个非参考输入文件生成辅助文件。无文件被复制、裁剪或重新编码。参考输入文件也会生成辅助文件(其中
.sync.json),以便下游代码统一处理所有输入文件。delta_seconds: 0
scripts/sync.py_synced.MOVSidecar schema (<input>.sync.json
)
<input>.sync.json辅助文件 schema(<input>.sync.json
)
<input>.sync.jsonOne sidecar per original input, written next to it. Pure JSON, no comments in-file — the field reference below is canonical.
json
{
"_about": "Sync metadata for cam_b.MOV. Apply via ffmpeg -itsoffset. See wjs-syncing-multicam SKILL.md for full schema.",
"schema_version": 1,
"source": "cam_b.MOV",
"reference": "cam_a.MOV",
"delta_seconds": 12.345,
"drift_slope": 1.8e-5,
"overlap_in_reference": [12.345, 4512.180],
"overlap_in_source": [0.000, 4499.835],
"verification": {
"median_residual_ms": 4.2,
"residual_spread_ms": 11.8,
"probe_count": 24
}
}每个原始输入文件对应一个辅助文件,与原始文件同目录。纯JSON格式,文件内无注释——以下字段说明为权威参考。
json
{
"_about": "Sync metadata for cam_b.MOV. Apply via ffmpeg -itsoffset. See wjs-syncing-multicam SKILL.md for full schema.",
"schema_version": 1,
"source": "cam_b.MOV",
"reference": "cam_a.MOV",
"delta_seconds": 12.345,
"drift_slope": 1.8e-5,
"overlap_in_reference": [12.345, 4512.180],
"overlap_in_source": [0.000, 4499.835],
"verification": {
"median_residual_ms": 4.2,
"residual_spread_ms": 11.8,
"probe_count": 24
}
}Field reference
字段说明
| Field | Type | Meaning |
|---|---|---|
| string | Human-readable one-liner. Includes pointer back to this SKILL.md. Always present. |
| int | Bumps on any breaking change to this schema. Current: |
| string | Filename of the original this sidecar describes. Relative to the sidecar's directory. Never points to a re-encoded file. |
| string | The input whose timeline we're aligned to. Reference's own sidecar lists itself here. |
| float | The source's |
| float | Linear clock-drift slope (dimensionless, ~10⁻⁵). |
| | The window during which both source and reference have coverage, expressed in the reference's timeline. Use this to trim outputs to mutually-valid time ranges. |
| | Same window expressed in the source's local timeline. |
| object | Output of running verify.py — drives a "did sync converge?" gate. |
| 字段 | 类型 | 含义 |
|---|---|---|
| 字符串 | 易读的单行说明。包含指向本SKILL.md的链接。始终存在。 |
| 整数 | 当schema发生任何破坏性变更时递增。当前版本: |
| 字符串 | 本辅助文件对应的原始文件名。相对于辅助文件所在目录。绝不会指向重新编码后的文件。 |
| 字符串 | 对齐所参考的输入文件。参考文件自身的辅助文件中该字段指向其自身。 |
| 浮点数 | 源文件的 |
| 浮点数 | 线性时钟漂移斜率(无量纲,约10⁻⁵)。 |
| | 源文件与参考文件均有覆盖的时间窗口,以参考时间线表示。可用于将输出裁剪至双方均有效的时间范围。 |
| | 同一时间窗口以源文件本地时间线表示。 |
| 对象 | 运行verify.py的输出结果——用于判断同步是否收敛。 |
How downstream consumes the sidecar
下游工具如何使用辅助文件
-itsoffset-idelta_secondsbash
undefined-itsoffset-idelta_secondsbash
undefinedPlay cam_b aligned to cam_a's timeline
播放与cam_a时间线对齐的cam_b
ffmpeg -itsoffset $(jq -r .delta_seconds cam_b.MOV.sync.json) -i cam_b.MOV
-i cam_a.MOV
-filter_complex "[0:v][1:v]hstack" out.mp4
-i cam_a.MOV
-filter_complex "[0:v][1:v]hstack" out.mp4
ffmpeg -itsoffset $(jq -r .delta_seconds cam_b.MOV.sync.json) -i cam_b.MOV
-i cam_a.MOV
-filter_complex "[0:v][1:v]hstack" out.mp4
-i cam_a.MOV
-filter_complex "[0:v][1:v]hstack" out.mp4
Trim to mutual overlap window (read from cam_b.MOV.sync.json)
裁剪至双方重叠窗口(从cam_b.MOV.sync.json读取)
ffmpeg -ss <overlap_in_source[0]> -i cam_b.MOV -t <overlap_dur> ...
For `wjs-editing-multicam`, the EDL builder in `autoedit.py` ingests every `<input>.sync.json` automatically; you don't compose these flags by hand.ffmpeg -ss <overlap_in_source[0]> -i cam_b.MOV -t <overlap_dur> ...
对于`wjs-editing-multicam`,`autoedit.py`中的EDL构建器会自动读取所有`<input>.sync.json`文件;无需手动编写这些参数。Partial-coverage clips
部分覆盖的片段
Common case — main cams cover 75 min, a Riverside / phone / lavalier recorder only covers the middle 30 min. :
scripts/sync_partial.py REF.MOV NEW.mp4- Cross-correlates the new input against the reference.
- Finds where the new clip's sits in the reference timeline (
t=0may be large, e.g. 1842.5).delta_seconds - Writes the sidecar — that's it. No black padding, no audio padding, no re-encode. tells consumers exactly when this input has coverage; outside that window, fall back to the main cams.
overlap_in_reference
--audio-only常见场景——主机位覆盖75分钟,而Riverside/手机/领夹式录音设备仅覆盖中间30分钟。运行:
scripts/sync_partial.py REF.MOV NEW.mp4- 将新输入文件与参考文件进行互相关计算。
- 找到新片段的时刻在参考时间线中的位置(
t=0可能很大,例如1842.5)。delta_seconds - 生成辅助文件——操作完成。无黑场填充、无音频填充、无重新编码。会告知消费者该输入文件的有效覆盖时段;超出该时段时,可 fallback 至主机位素材。
overlap_in_reference
--audio-onlyWhen to skip drift correction
何时跳过漂移校正
For camera-cut editing (the common case), ±25 ms residual across an hour is below human perception — pass and use only the midpoint .
drift_slope: 0.0delta_secondsFor sync-sound / lip-sync at long durations (>30 min and ), downstream applies to the source. Source files are still not modified — the filter runs at consume time.
verification.residual_spread_ms > 40atempo = 1 + drift_slopeatempo对于机位剪辑(常见场景),一小时内±25ms的剩余误差低于人类感知阈值——可设置,仅使用中点。
drift_slope: 0.0delta_seconds对于长时间(>30分钟且)的同步音轨/唇形同步场景,下游工具需对源文件应用。源文件仍不会被修改——滤镜在使用时实时运行。
verification.residual_spread_ms > 40atempo = 1 + drift_slopeatempoVerification (always run)
验证(务必运行)
scripts/verify.py REF.MOV SRC.MOV SRC.sync.json-itsoffsetverificationPass criteria — and . Fail = retry with drift correction enabled.
median_residual_ms < 15residual_spread_ms < 1 frame at delivery fpsscripts/verify.py REF.MOV SRC.MOV SRC.sync.json-itsoffsetverification通过标准——且。未通过则需启用漂移校正重试。
median_residual_ms < 15residual_spread_ms < 交付帧率下的1帧Common pitfalls
常见误区
- Raw waveform cross-correlation gives false peaks under low SNR. Always envelope first — this is not a tunable, it's the entire premise.
- semantics differ for audio vs video — for sync-correctness it must be the FIRST flag for that input.
-itsoffsetis wrong;ffmpeg -i src -itsoffset Xis right.ffmpeg -itsoffset X -i src - Sidecar paths must be relative to the sidecar file's directory, not the working directory of the consuming process. Resolve /
sourceagainstreference.Path(sidecar).parent - Don't bake into the sidecar's
drift_slope. They're separate fields for a reason — naive consumers can ignore drift, sync-sound consumers can apply it. Mixing them loses information.delta_seconds
- 低信噪比下,原始波形互相关会产生错误峰值。务必先计算包络——这不是可调节参数,而是本工具的核心前提。
- 对音频和视频的语义不同——为保证同步正确性,该参数必须是对应输入的第一个标志。
-itsoffset是错误写法;ffmpeg -i src -itsoffset X才是正确的。ffmpeg -itsoffset X -i src - 辅助文件中的路径必须相对于辅助文件所在目录,而非消费进程的工作目录。需根据解析
Path(sidecar).parent/source路径。reference - 请勿将融入辅助文件的
drift_slope中。二者为独立字段是有原因的——基础消费者可忽略漂移,同步音轨消费者可应用漂移校正。混合二者会丢失信息。delta_seconds