wjs-burning-subtitles
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesewjs-burning-subtitles
wjs-burning-subtitles
Video + SRT → video with subtitles. Also the final-encode stage for the localization pipeline: takes a video, an optional dub track from , and an optional SRT to burn, and produces the upload-ready MP4 in one ffmpeg pass. No cascade of decodes/re-encodes.
/wjs-dubbing-video视频 + SRT → 带字幕的视频。同时也是本地化流程的最终编码阶段:接收视频、可选的来自的配音轨道,以及可选的待烧录SRT字幕,通过一次ffmpeg编码生成可直接上传的MP4文件,无需多次解码/转码。
/wjs-dubbing-videoWhen to use
使用场景
- User has an SRT and wants it always-visible on the video (burn-in for 微信视频号 / 抖音 / WeChat — players that won't honor embedded subtitle tracks).
- User wants a togglable subtitle track (soft-mux) for QuickTime / VLC / IINA / mobile players that support .
mov_text - Final composite after : burn target-language subs + mix dub over original-as-bed in one encode.
/wjs-dubbing-video
- 用户拥有SRT字幕,希望字幕始终显示在视频上(为微信视频号/抖音/微信等不支持内嵌字幕轨道的平台制作硬字幕)。
- 用户希望生成可切换的字幕轨道(软封装),用于QuickTime/VLC/IINA/支持的移动播放器。
mov_text - 在之后进行最终合成:一次编码完成目标语言字幕烧录+配音与原背景音混合。
/wjs-dubbing-video
When NOT to use
不适用场景
- No SRT yet → run then
/wjs-transcribing-audiofirst./wjs-translating-subtitles - HTML/CSS captions (kinetic, per-word highlights, custom fonts) on a clip composed in HyperFrames → use instead. Don't mix libass burn-in with HyperFrames captions on the same output.
/wjs-overlaying-video - The "subtitles" are actually motion graphics (animated callouts, lower-thirds with logos, kinetic typography) → that's , not this skill.
/wjs-overlaying-video
- 还没有SRT字幕 → 先运行和
/wjs-transcribing-audio。/wjs-translating-subtitles - 在HyperFrames中制作的带HTML/CSS样式的字幕(动态字幕、逐词高亮、自定义字体)→ 请使用,不要在同一输出视频中混合使用libass烧录字幕和HyperFrames字幕。
/wjs-overlaying-video - “字幕”实际上是动态图形(动画标注、带logo的下三分之一字幕、动态排版)→ 这属于的功能范畴,而非本工具。
/wjs-overlaying-video
The 3 modes of render.py
render.pyrender.py
的三种模式
render.pyscripts/render.py- Subtitles only — → re-encodes video with burned subs, original audio passes through.
--video + --srt - Dub only — → keeps original video stream; replaces or mixes the audio track.
--video + --dub - Full localized cut — → burns subs AND mixes dub. By default keeps original audio at low volume as a "bed" under the dub (set
--video + --srt + --dubor--bed-volume 0to drop it).--no-original-audio
Burn-in requires an ffmpeg built with libass. The script auto-downloads a static libass-enabled build from evermeet.cx into on first use if needed.
/tmp/ff_bin/scripts/render.py- 仅字幕模式 — → 重新编码视频并烧录字幕,原音频直接保留。
--video + --srt - 仅配音模式 — → 保留原视频流;替换或混合音频轨道。
--video + --dub - 完整本地化剪辑模式 — → 烧录字幕并混合配音。默认将原音频以低音量作为“背景音”混合在配音下方(可设置
--video + --srt + --dub或--bed-volume 0来移除原音频)。--no-original-audio
烧录字幕需要编译了libass的ffmpeg。脚本首次运行时,若默认ffmpeg缺少libass,会自动从evermeet.cx下载带libass的静态构建包到。
/tmp/ff_bin/Soft-mux (togglable subtitle track)
软封装(可切换字幕轨道)
Player apps can show/hide. Works with any build — does not need libass:
ffmpegbash
ffmpeg -i input.mp4 -i input.zh-CN.srt \
-map 0:v -map 0:a -map 1:0 \
-c:v copy -c:a copy -c:s mov_text \
-metadata:s:s:0 language=zho -metadata:s:s:0 title="中文" \
output.mp4This is fast (stream-copy) and reversible. Use it when:
- Target platform supports embedded subs (YouTube auto-detects; VLC/QuickTime honors).
- User wants viewers to be able to toggle off.
- You don't want to re-encode the video.
render.py --video IN.mp4 --srt SUB.srt --soft-mux播放器应用可显示/隐藏字幕。适用于任何版本的——无需libass:
ffmpegbash
ffmpeg -i input.mp4 -i input.zh-CN.srt \
-map 0:v -map 0:a -map 1:0 \
-c:v copy -c:a copy -c:s mov_text \
-metadata:s:s:0 language=zho -metadata:s:s:0 title="中文" \
output.mp4此方式速度快(直接流复制)且可逆。适用于以下场景:
- 目标平台支持内嵌字幕(YouTube可自动识别;VLC/QuickTime支持)。
- 用户希望观看者可以关闭字幕。
- 不想重新编码视频。
运行即可执行此流程。
render.py --video IN.mp4 --srt SUB.srt --soft-muxHardcoded burn-in (always visible, libass)
硬编码烧录(始终可见,需libass)
Required for WeChat/抖音/朋友圈 etc. where the player will not honor embedded subtitle tracks.
适用于微信/抖音/朋友圈等不支持内嵌字幕轨道的平台。
Verify libass is available BEFORE promising burn-in
烧录前先确认libass可用
bash
ffmpeg -filters 2>&1 | grep -E "subtitles|^.. ass "If neither nor shows up, the build lacks libass. Homebrew's default formula is often stripped (no , no , no ). Don't waste time fighting the comma-escaping inside — it will fail with no matter how the shell quotes it.
subtitlesassffmpeg--enable-libass--enable-libfreetypedrawtextforce_styleNo such filter: 'subtitles'bash
ffmpeg -filters 2>&1 | grep -E "subtitles|^.. ass "如果未显示或,说明当前ffmpeg构建包缺少libass。Homebrew默认的配方通常是精简版(未启用、、)。不要浪费时间调整中的逗号转义——无论如何转义,都会提示错误。
subtitlesassffmpeg--enable-libass--enable-libfreetypedrawtextforce_styleNo such filter: 'subtitles'Fastest fix on macOS — drop in a static build, no system changes
macOS上最快的解决方法——直接使用静态构建包,无需修改系统
bash
curl -fsSL -o /tmp/ff.zip https://evermeet.cx/ffmpeg/getrelease/zip
unzip -o /tmp/ff.zip -d /tmp/ff_bin >/dev/null
FF=/tmp/ff_bin/ffmpeg
$FF -version | grep -oE -- "--enable-(libass|libfreetype)"Then use instead of for the render. The brew binary is fine for everything else (probe, audio extraction, soft-mux). does this auto-fallback if its default ffmpeg lacks libass.
$FFffmpegrender.pybash
curl -fsSL -o /tmp/ff.zip https://evermeet.cx/ffmpeg/getrelease/zip
unzip -o /tmp/ff.zip -d /tmp/ff_bin >/dev/null
FF=/tmp/ff_bin/ffmpeg
$FF -version | grep -oE -- "--enable-(libass|libfreetype)"之后使用代替进行渲染。Homebrew的二进制文件可用于其他操作(探测、音频提取、软封装)。会在默认ffmpeg缺少libass时自动执行此回退方案。
$FFffmpegrender.pyBurn-in render with style overrides
带样式覆盖的烧录渲染
🛑 Checkpoint — confirm before full-render. Burn-in re-encodes the entire video (minutes of CPU on a 5-min clip). Before kicking it off:
- Render only the first 30s with for a fast preview.
-t 30 - Extract a frame from the longest-line cue (see Fontsize calibration below) and Read it.
- Show the user the preview frame + the cue text, ask: "字号/字体/边距 OK 吗?OK 才跑全片。" Wait for explicit confirmation.
Skip the checkpoint only if the user has already approved a full render of this exact video at this exact font config in the same conversation.
bash
$FF -i input.mp4 \
-vf "subtitles=input.zh-CN.srt:force_style='Fontname=PingFang SC\,Fontsize=12\,PrimaryColour=&H00FFFFFF\,OutlineColour=&H00000000\,BorderStyle=1\,Outline=2\,Shadow=1\,MarginL=20\,MarginR=20\,MarginV=40'" \
-c:v libx264 -crf 18 -preset medium -pix_fmt yuv420p \
-c:a copy output.mp4Inside , escape every comma as (the filter graph parser eats the bare comma as a chain separator). All other special chars are fine.
force_style\,🛑 检查点——全片渲染前请确认。烧录字幕需要重新编码整个视频(5分钟的视频可能需要数分钟CPU时间)。开始全片渲染前:
- 使用仅渲染前30秒,快速预览效果。
-t 30 - 提取字幕最长行对应的帧(见下方字号校准)并查看。
- 向用户展示预览帧和字幕文本,询问:“字号/字体/边距是否OK?确认OK后再进行全片渲染。”等待用户明确确认。
只有当用户在同一场对话中已经批准过使用相同字体配置对该视频进行全片渲染时,才可跳过此检查点。
bash
$FF -i input.mp4 \
-vf "subtitles=input.zh-CN.srt:force_style='Fontname=PingFang SC\,Fontsize=12\,PrimaryColour=&H00FFFFFF\,OutlineColour=&H00000000\,BorderStyle=1\,Outline=2\,Shadow=1\,MarginL=20\,MarginR=20\,MarginV=40'" \
-c:v libx264 -crf 18 -preset medium -pix_fmt yuv420p \
-c:a copy output.mp4在中,需将每个逗号转义为(滤镜图解析器会将未转义的逗号视为链分隔符)。其他特殊字符无需处理。
force_style\,Fontsize calibration — critical
字号校准——至关重要
libass scales its internal PlayRes up to the actual video resolution. The number you pass is not pixels in the output. As a starting calibration on a 544×960 vertical phone video, rendered each Chinese character at ~55px wide and overflowed the frame, while rendered at ~30–35px wide and fit cleanly with 15-char lines.
Fontsize=22Fontsize=12Rule of thumb: start at , render, then always extract a frame and look:
Fontsize=12bash
$FF -ss 30 -i output.mp4 -frames:v 1 /tmp/frame.png -ylibass会将内部PlayRes缩放至实际视频分辨率。你传入的字号数值并非输出视频中的像素大小。以544×960的竖屏手机视频为例,会使每个中文字符宽约55px,超出画面;而会使字符宽约30–35px,每行15个字符可完全容纳。
Fontsize=22Fontsize=12经验法则:从开始渲染,然后务必提取帧查看效果:
Fontsize=12bash
$FF -ss 30 -i output.mp4 -frames:v 1 /tmp/frame.png -ythen Read /tmp/frame.png to verify the longest-line cue fits
然后查看/tmp/frame.png,确认最长字幕行是否完全显示
Pick a timestamp that lands on the cue with the most characters per line — short lines won't expose overflow. Add `MarginL=20 MarginR=20` as a safety inset; never trust default left/right margins.
选择字幕行字符数最多的时间点——短字幕行无法暴露溢出问题。添加`MarginL=20 MarginR=20`作为安全边距;永远不要依赖默认的左右边距。Style cheatsheet
样式速查表
Keys that matter (libass ):
force_style- — macOS default CJK; alternates:
Fontname=PingFang SC,Songti SC,Heiti SC,STHeiti.Hiragino Sans GB - — start small, scale up only after frame check.
Fontsize=12 - — white text (BBGGRR + alpha).
PrimaryColour=&H00FFFFFF - — black outline.
OutlineColour=&H00000000 - — outline only (clean over varied backgrounds). Use
BorderStyle=1for an opaque box behind text when the background is busy.BorderStyle=3 - — 2px outline thickness.
Outline=2 - — subtle drop shadow.
Shadow=1 - — keep text inside the frame.
MarginL=20 MarginR=20 - — vertical distance from the bottom edge.
MarginV=40
关键参数(libass ):
force_style- — macOS默认中文字体;替代字体:
Fontname=PingFang SC、Songti SC、Heiti SC、STHeiti。Hiragino Sans GB - — 从小字号开始,仅在帧检查后再调大。
Fontsize=12 - — 白色文本(格式为BBGGRR + 透明度)。
PrimaryColour=&H00FFFFFF - — 黑色描边。
OutlineColour=&H00000000 - — 仅描边(在多变背景下显示清晰)。当背景复杂时,使用
BorderStyle=1在文本后方添加不透明背景框。BorderStyle=3 - — 描边厚度为2px。
Outline=2 - — 轻微阴影。
Shadow=1 - — 确保文本在画面内。
MarginL=20 MarginR=20 - — 文本距底部边缘的垂直距离。
MarginV=40
SRT line-length discipline for burn-in
烧录字幕的SRT行长度规范
Even with correct , lines that are too long will wrap or overflow. Keep each on-screen line ≤ ~15 Chinese characters (~42 Latin chars). Use explicit line breaks inside the SRT block — do not rely on auto-wrapping. Two short lines beat one long one every time. (This is upstream discipline — should already cap cues at these limits.)
Fontsize\n/wjs-translating-subtitles即使字号设置正确,过长的字幕行也会换行或溢出。每行显示的中文字符数≤约15个(约42个拉丁字符)。在SRT块中使用明确的换行——不要依赖自动换行。两行短字幕永远优于一行长字幕。(这属于上游规范——应已将字幕行限制在此范围内。)
\n/wjs-translating-subtitlesAudio mixing — keep the original as a low-volume bed
音频混合——保留原音频作为低音量背景音
A pure dub-only track sounds dubbed (because it is). Mixing the original audio at low volume under the dub gives the "professional translation" feel — you still hear the speaker's breath, emphasis, and laughter, just under the new voice.
bash
$FF -i original.mp4 -i dub.mp4 \
-filter_complex "[0:a]volume=0.18[orig];\
[1:a]volume=1.0[dub];\
[orig][dub]amix=inputs=2:duration=longest:normalize=0[a]" \
-map 0:v -map "[a]" \
-c:v copy -c:a aac -b:a 192k mixed.mp4Reasonable starting volumes:
- Original bed at –
0.15(≈ −16 to −12 dB)0.25 - Dub at
1.0 - Use so amix doesn't auto-attenuate when both are active.
normalize=0
To drop the original entirely: (equivalent to ).
--no-original-audio--bed-volume 0纯配音轨道听起来很生硬(毕竟是后期配音)。将原音频以低音量混合在配音下方,可营造“专业翻译”的效果——你仍能听到原说话人的呼吸、重音和笑声,只是被新的配音覆盖。
bash
$FF -i original.mp4 -i dub.mp4 \
-filter_complex "[0:a]volume=0.18[orig];\
[1:a]volume=1.0[dub];\
[orig][dub]amix=inputs=2:duration=longest:normalize=0[a]" \
-map 0:v -map "[a]" \
-c:v copy -c:a aac -b:a 192k mixed.mp4合理的初始音量设置:
- 原背景音:–
0.15(≈ −16至−12 dB)0.25 - 配音:
1.0 - 使用,避免amix在两个音频同时播放时自动衰减音量。
normalize=0
若要完全移除原音频:使用(等效于)。
--no-original-audio--bed-volume 0Combining dub + burn-in + bed (the full job)
组合配音+烧录字幕+背景音(完整任务)
One ffmpeg call does all three — burn the target subtitle onto the video stream and mix the two audio tracks:
bash
$FF -i original.mp4 -i dub.mp4 \
-filter_complex "[0:v]subtitles=input.zh-CN.srt:force_style='Fontname=PingFang SC\,Fontsize=12\,PrimaryColour=&H00FFFFFF\,OutlineColour=&H00000000\,BorderStyle=1\,Outline=2\,Shadow=1\,MarginL=20\,MarginR=20\,MarginV=40'[v];\
[0:a]volume=0.18[orig];[1:a]volume=1.0[dub];\
[orig][dub]amix=inputs=2:duration=longest:normalize=0[a]" \
-map "[v]" -map "[a]" \
-c:v libx264 -crf 18 -preset medium -pix_fmt yuv420p \
-c:a aac -b:a 192k final.mp4This is the "ship to social media" final cut. runs this exact pipeline.
render.py --video original.mp4 --dub dub.mp4 --srt input.zh-CN.srt一次ffmpeg调用即可完成三项操作——将目标字幕烧录到视频流,并混合两个音频轨道:
bash
$FF -i original.mp4 -i dub.mp4 \
-filter_complex "[0:v]subtitles=input.zh-CN.srt:force_style='Fontname=PingFang SC\,Fontsize=12\,PrimaryColour=&H00FFFFFF\,OutlineColour=&H00000000\,BorderStyle=1\,Outline=2\,Shadow=1\,MarginL=20\,MarginR=20\,MarginV=40'[v];\
[0:a]volume=0.18[orig];[1:a]volume=1.0[dub];\
[orig][dub]amix=inputs=2:duration=longest:normalize=0[a]" \
-map "[v]" -map "[a]" \
-c:v libx264 -crf 18 -preset medium -pix_fmt yuv420p \
-c:a aac -b:a 192k final.mp4这就是可直接发布到社交媒体的最终剪辑版本。运行即可执行此完整流程。
render.py --video original.mp4 --dub dub.mp4 --srt input.zh-CN.srtRunning render.py
render.py运行render.py
render.pybash
undefinedbash
undefinedSubtitles only (burn):
仅字幕(烧录):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py
--video IN.mp4 --srt SUB.srt --out OUT.mp4
--video IN.mp4 --srt SUB.srt --out OUT.mp4
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py
--video IN.mp4 --srt SUB.srt --out OUT.mp4
--video IN.mp4 --srt SUB.srt --out OUT.mp4
Dub only (replace audio, no subs):
仅配音(替换音频,无字幕):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py
--video IN.mp4 --dub IN_zh_dub.mp4 --out OUT.mp4
--video IN.mp4 --dub IN_zh_dub.mp4 --out OUT.mp4
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py
--video IN.mp4 --dub IN_zh_dub.mp4 --out OUT.mp4
--video IN.mp4 --dub IN_zh_dub.mp4 --out OUT.mp4
Full localized cut (burn + dub + original bed):
完整本地化剪辑(烧录字幕+配音+原背景音):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py
--video IN.mp4 --srt IN.zh-CN.srt --dub IN_zh_dub.mp4 --out OUT.mp4
--video IN.mp4 --srt IN.zh-CN.srt --dub IN_zh_dub.mp4 --out OUT.mp4
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py
--video IN.mp4 --srt IN.zh-CN.srt --dub IN_zh_dub.mp4 --out OUT.mp4
--video IN.mp4 --srt IN.zh-CN.srt --dub IN_zh_dub.mp4 --out OUT.mp4
Soft-mux (no re-encode):
软封装(无需重新编码):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py
--video IN.mp4 --srt SUB.srt --soft-mux --out OUT.mp4
--video IN.mp4 --srt SUB.srt --soft-mux --out OUT.mp4
See `render.py --help` for the full style/audio flag list (`--font`, `--fontsize`, `--color`, `--outline-color`, `--margin-v`, `--bed-volume`, `--no-original-audio`).python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py
--video IN.mp4 --srt SUB.srt --soft-mux --out OUT.mp4
--video IN.mp4 --srt SUB.srt --soft-mux --out OUT.mp4
查看`render.py --help`获取完整的样式/音频参数列表(`--font`、`--fontsize`、`--color`、`--outline-color`、`--margin-v`、`--bed-volume`、`--no-original-audio`)。Output
输出文件
- Burn mode: (re-encoded, libass-rendered subs)
<source>_burned.mp4 - Soft-mux mode: (stream-copy,
<source>_softsub.mp4track)mov_text - Full cut: (re-encoded video with burned subs + mixed audio)
<source>_final.mp4
- 烧录模式:(重新编码,libass渲染字幕)
<source>_burned.mp4 - 软封装模式:(流复制,含
<source>_softsub.mp4字幕轨道)mov_text - 完整剪辑模式:(重新编码视频+烧录字幕+混合音频)
<source>_final.mp4
Anti-patterns
反模式
- ❌ Promising burn-in without verifying libass. Check first; auto-fall back to evermeet static build if missing.
ffmpeg -filters | grep subtitles - ❌ Committing a burn render without a frame check. Always extract a frame at the longest-line cue and Read it before kicking off the full render.
- ❌ Bare commas inside . The filter graph parser eats them. Escape every internal comma as
force_style.\, - ❌ Mixing libass burn-in with HyperFrames captions. Pick ONE caption system per output video. If you're using HTML/CSS captions in , don't burn here too.
/wjs-overlaying-video - ❌ Using period milliseconds in the SRT. Whisper local writes ; libass tolerates it but other downstream tools choke. Normalize to
.mmm.,mmm - ❌ Defaulting to (opaque box). Use
BorderStyle=3(outline only) unless the background is genuinely busy — the box looks heavy and dated.BorderStyle=1
- ❌ 未确认libass可用就承诺烧录字幕。先检查;若缺失,自动回退到evermeet静态构建包。
ffmpeg -filters | grep subtitles - ❌ 未检查帧就进行全片烧录渲染。全片渲染前,务必提取最长字幕行对应的帧并查看。
- ❌ 中使用未转义的逗号。滤镜图解析器会吃掉这些逗号。需将所有内部逗号转义为
force_style。\, - ❌ 混合使用libass烧录字幕和HyperFrames字幕。每个输出视频只能选择一种字幕系统。若在中使用HTML/CSS字幕,请勿在此处再进行烧录。
/wjs-overlaying-video - ❌ SRT中使用点号分隔毫秒。Whisper本地版会生成格式;libass可兼容,但其他下游工具会报错。需统一改为
.mmm格式。,mmm - ❌ 默认使用(不透明背景框)。除非背景确实复杂,否则使用
BorderStyle=3(仅描边)——背景框看起来厚重且过时。BorderStyle=1
Upstream
上游依赖
- +
/wjs-transcribing-audio— produce the SRT input./wjs-translating-subtitles - — produces the
/wjs-dubbing-videoinput for full-localized-cut mode. The dub-only file is technically a finished video; this skill is what mixes the original underneath and burns the subs to make it shippable.*_<lang>_dub.mp4
- +
/wjs-transcribing-audio— 生成SRT输入文件。/wjs-translating-subtitles - — 为完整本地化剪辑模式生成
/wjs-dubbing-video输入文件。纯配音文件本质上是已完成的视频;本工具的作用是将原音频混合在下方并烧录字幕,使其成为可发布的版本。*_<lang>_dub.mp4
Common pitfalls
常见陷阱
- Fontsize that worked on one video looks tiny / huge on another. libass scales by PlayRes ratio, not pixels. Recalibrate per video resolution; don't trust a hardcoded value.
- Margin defaults clip text on vertical phone videos. Always set and
MarginL=20 MarginR=20(or higher) explicitly.MarginV=40 - track shows up in QuickTime but not in some Android players. If the target audience is mobile-Chinese, soft-mux is unreliable; burn instead.
mov_text - Background-bus busy / contrast issues. Increase →
Outline=2, or switch toOutline=3for a translucent box (BorderStyle=3for 50% black).BackColour=&H80000000
- 在一个视频上适用的字号在另一个视频上显示过小/过大。libass按PlayRes比例缩放,而非像素。需根据每个视频的分辨率重新校准;不要依赖硬编码的数值。
- 默认边距在竖屏手机视频上会裁剪文本。务必显式设置和
MarginL=20 MarginR=20(或更大值)。MarginV=40 - 轨道在QuickTime中显示正常,但在部分Android播放器中不显示。若目标受众是国内移动用户,软封装不可靠;应使用烧录字幕。
mov_text - 背景复杂/对比度问题。将改为
Outline=2,或切换到Outline=3并添加半透明背景框(BorderStyle=3表示50%透明度的黑色)。BackColour=&H80000000