wjs-translating-subtitles
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesewjs-translating-subtitles
wjs-translating-subtitles
Source-language SRT in → target-language (or bilingual) SRT out. This skill is text-only. Burn-in lives in ; voice dub in .
/wjs-burning-subtitles/wjs-dubbing-video输入源语言SRT → 输出目标语言(或双语)SRT。本技能仅处理文本。硬字幕制作请使用;语音配音请使用。
/wjs-burning-subtitles/wjs-dubbing-videoWhen to use
适用场景
- User has an SRT in language A and wants it in language B.
- User pasted a transcript (with or without timestamps) and wants a translation that becomes an SRT.
- User has an SRT but cues end mid-sentence — this skill's re-segmentation step fixes that.
- 用户拥有语言A的SRT文件,希望转换为语言B的SRT文件。
- 用户粘贴了带或不带时间戳的转录文本,希望将其翻译并生成SRT文件。
- 用户的SRT字幕在语句中途结束——本技能的重新分段步骤可修复此问题。
When NOT to use
不适用场景
- No source-language SRT yet → run first.
/wjs-transcribing-audio - User wants burned-in subtitles → finish translation here, then .
/wjs-burning-subtitles - User wants voice dub → finish translation here, then .
/wjs-dubbing-video
- 尚未拥有源语言SRT文件 → 请先运行。
/wjs-transcribing-audio - 用户需要硬字幕 → 在此完成翻译后,使用制作。
/wjs-burning-subtitles - 用户需要语音配音 → 在此完成翻译后,使用制作。
/wjs-dubbing-video
Pick the target
选择目标语言
Resolve target from the user's phrasing once, don't re-ask:
- "翻成中文 / 中文字幕 / 中文配音" → .
zh-CN - "translate to English / English subs / English dub" → .
en - "bilingual" / "双语" → produce both and
.<source>.srt(and optionally a combined.<target>.srt)..<source>-<target>.srt - Ambiguous → default to whichever the user has historically chosen in the project.
Simplified Chinese and English are fully validated. Other targets (Japanese, Korean, French, etc.) work via the same rules; the bottleneck is TTS-voice availability if dubbing follows — see before promising.
/wjs-dubbing-video根据用户表述一次性确定目标语言,无需反复询问:
- 「翻成中文 / 中文字幕 / 中文配音」→ 。
zh-CN - 「translate to English / English subs / English dub」→ 。
en - 「bilingual」/「双语」→ 生成和
.<source>.srt两个文件(可选合并为.<target>.srt文件)。.<source>-<target>.srt - 目标语言不明确 → 默认使用用户在该项目中历史选择的语言。
简体中文和英文已完全验证。其他目标语言(日语、韩语、法语等)遵循相同规则;若后续需要配音,瓶颈在于TTS语音的可用性——在承诺前请参考的说明。
/wjs-dubbing-videoShared translation principles
通用翻译原则
- Prioritize meaning over literal wording.
- Use concise subtitle-style language — viewers read at ~3 wps for Chinese, ~3–4 wps for English; lines that exceed that go off-screen before they can be read.
- Preserve the tone of the speaker. Casual source → casual target; formal source → formal target.
- Do not over-translate names, brands, cultural references, or technical terms.
- Keep numbers, dates, names, and places accurate.
- If a phrase has no exact equivalent, translate the meaning naturally. No literal/word-for-word constructions.
- Avoid stiff, machine-translated output.
- 优先传递含义而非直译文字。
- 使用简洁的字幕风格语言——中文观众的阅读速度约为每秒3个字,英文观众约为每秒3-4个词;超出此范围的字幕行在观众读完前就会移出屏幕。
- 保留说话者的语气。口语化原文对应口语化译文;正式原文对应正式译文。
- 不要过度翻译姓名、品牌、文化参考或技术术语。
- 确保数字、日期、姓名和地点的准确性。
- 若短语没有完全对应的译文,自然地传递其含义。不要逐字直译。
- 避免生硬的机器翻译风格输出。
Translating into Simplified Chinese (zh-CN)
翻译为简体中文(zh-CN)
- Use natural spoken Mandarin for casual speech, formal Mandarin for formal speech.
- Use Simplified characters only (do NOT use Traditional Hanzi unless the user explicitly asks).
- Subtitle lines should be roughly 15 Chinese characters or fewer per line, max 2 lines per cue (3 only when unavoidable for very long cues).
- Use Chinese punctuation: 「,」「。」「;」「:」「、」「——」. Never mix English commas/periods into Chinese subtitles.
- Minimize filler demonstratives 「这」「那」「这个」「那个」「那份」「那种」「那里」「那样」. Spanish-to-Chinese (and English-to-Chinese) MT routinely inserts these because the source has overt demonstratives that Chinese usually drops. Examples:
- "这把我们带入二元世界的载体" → "把我们带入二元的载体"
- "运用那份能量" → "运用这股能量" if needed, or just "运用能量"
- "正是在这合一里" → "正是在合一中"
- "像罪人那样翻滚" → "像罪人翻滚" / "像罪人般翻滚"
- "那份精微的觉知" → "精微的觉知" Keep them only when they carry real meaning (deixis, contrast, or fixed phrase like spiritual "我就是那" / "tat tvam asi"). Default is to delete; add back only if the sentence becomes ambiguous.
Examples (Spanish → Chinese):
text
Spanish: No pasa nada. → Chinese: 没关系。
Spanish: Vamos a ver qué pasa. → Chinese: 我们看看会发生什么。
Spanish: Me parece una locura. → Chinese: 我觉得这太疯狂了。
Spanish: ¿Qué quieres decir? → Chinese: 你是什么意思?
Spanish: La verdad es que no lo esperaba.
→ Chinese: 说实话,我没想到会这样。- 口语化内容使用自然的普通话口语表达,正式内容使用正式普通话。
- 仅使用简体汉字(除非用户明确要求,否则不要使用繁体汉字)。
- 字幕行每行最多约15个汉字,每个字幕块最多2行(仅在超长字幕块无法避免时使用3行)。
- 使用中文标点:「,」「。」「;」「:」「、」「——」。切勿在中文字幕中混用英文逗号/句号。
- 尽量减少填充性指示代词「这」「那」「这个」「那个」「那份」「那种」「那里」「那样」。 西班牙语到中文(以及英文到中文)的机器翻译通常会插入这些词,因为原文有显性指示代词,而中文通常会省略。示例:
- 「这把我们带入二元世界的载体」→「把我们带入二元的载体」
- 「运用那份能量」→ 必要时改为「运用这股能量」,或直接「运用能量」
- 「正是在这合一里」→「正是在合一中」
- 「像罪人那样翻滚」→「像罪人翻滚」/「像罪人般翻滚」
- 「那份精微的觉知」→「精微的觉知」 仅当这些词具有实际含义(指示、对比或固定短语如灵性用语「我就是那」/「tat tvam asi」)时才保留。默认删除;仅当句子变得模糊时才重新添加。
示例(西班牙语 → 中文):
text
Spanish: No pasa nada. → Chinese: 没关系。
Spanish: Vamos a ver qué pasa. → Chinese: 我们看看会发生什么。
Spanish: Me parece una locura. → Chinese: 我觉得这太疯狂了。
Spanish: ¿Qué quieres decir? → Chinese: 你是什么意思?
Spanish: La verdad es que no lo esperaba.
→ Chinese: 说实话,我没想到会这样。Translating into English (en)
翻译为英文(en)
- Use natural conversational English. Avoid translationese ("It is precisely through entering the body…" → "It's by entering the body…").
- Lines should be roughly 40–42 characters or fewer (about 7–9 words), max 2 lines per cue. Hard cap 50 chars per line.
- Use ASCII punctuation:
,.;:(em-dash). Avoid Unicode curly quotes — keeps—portable..srt - For contemplative/spiritual content, prefer plain words over Latinate jargon: "presence" over "manifestation," "wholeness" over "totality," "wake up" over "awaken to consciousness."
Examples (Spanish → English):
text
Spanish: No pasa nada. → English: It's nothing.
Spanish: Vamos a ver qué pasa. → English: Let's see what happens.
Spanish: Me parece una locura. → English: This feels crazy to me.
Spanish: ¿Qué quieres decir? → English: What do you mean?
Spanish: La verdad es que no lo esperaba.
→ English: Honestly, I wasn't expecting this.- 使用自然的日常英语表达。避免翻译腔(例如将「It is precisely through entering the body…」改为「It's by entering the body…」)。
- 每行最多约40-42个字符(约7-9个词),每个字幕块最多2行。每行硬上限为50个字符。
- 使用ASCII标点:.
,;:(长破折号)。避免使用Unicode弯引号——确保—文件的可移植性。.srt - 对于冥想/灵性内容,优先使用简单词汇而非拉丁语系术语:用「presence」代替「manifestation」,用「wholeness」代替「totality」,用「wake up」代替「awaken to consciousness」。
示例(西班牙语 → 英文):
text
Spanish: No pasa nada. → English: It's nothing.
Spanish: Vamos a ver qué pasa. → English: Let's see what happens.
Spanish: Me parece una locura. → English: This feels crazy to me.
Spanish: ¿Qué quieres decir? → English: What do you mean?
Spanish: La verdad es que no lo esperaba.
→ English: Honestly, I wasn't expecting this.Re-segment at punctuation boundaries (mandatory)
按标点边界重新分段(必填步骤)
Whisper segments by silence/breath, not grammar. The result almost always has cues that end mid-sentence (e.g., "...es una forma de aterrizar," next cue starts "el espíritu en el cuerpo..."). Any TTS that processes one cue at a time will then insert an unnatural pause exactly where the original speaker did not. The fix is mandatory before dubbing — and improves on-screen reading too.
Punctuation set differs:
- Chinese cues must end at
,。;:or——.、 - English cues must end at
,.;:(em-dash) or, in practice for subtitles, occasionally a single dash. Never end an English cue on a comma-less clause break, and never split inside a phrase like "kind of" or "in order to".—
Rules:
- Every cue must end at a real punctuation mark. Never let a cue end on a noun, verb, conjunction, or article that flows into the next cue.
- It is fine (and often necessary) to split a single source cue into 2–4 shorter cues, with timestamps interpolated by character position within the original cue's duration.
- It is fine to merge the tail of one source cue with the head of the next when they form one clause — the merged cue inherits the start of the first and the end of the second.
- Target 3–8 seconds per cue. Cues shorter than ~1.5s feel choppy on screen; cues longer than ~10s usually contain a missed punctuation break.
A typical 2–3 minute talk yields roughly 25–40 punct-bounded cues from 12–18 raw source cues. Don't try to keep the original cue count.
When TTS dubbing follows: the punctuation-bounded structure means each TTS clip is a complete utterance with proper end-intonation, and concatenating clips sounds natural because every join is at a real pause point.
Whisper按沉默/呼吸分段,而非语法规则。结果几乎总是出现字幕块在语句中途结束的情况(例如:「...es una forma de aterrizar,」下一个字幕块以「el espíritu en el cuerpo...」开头)。任何逐块处理的TTS都会在原说话者未停顿的位置插入不自然的停顿。在配音前必须修复此问题——这也能提升屏幕阅读体验。
标点集合有所不同:
- 中文字幕块必须在「,」「。」「;」「:」「——」或「、」处结束。
- 英文字幕块必须在.
,;:(长破折号)处结束,或者在字幕实际使用中偶尔使用单个短破折号。切勿让英文字幕块在无逗号的从句断点处结束,也切勿在「kind of」或「in order to」这类短语内部拆分。—
规则:
- 每个字幕块必须在真实标点处结束。绝不能让字幕块在名词、动词、连词或冠词处结束,导致内容延续到下一个字幕块。
- 将单个源字幕块拆分为2-4个更短的字幕块是可行的(且通常必要),根据原字幕块时长内的字符位置插入时间戳。
- 当一个源字幕块的尾部与下一个源字幕块的头部构成完整从句时,将它们合并是可行的——合并后的字幕块继承第一个的开始时间和第二个的结束时间。
- 目标是每个字幕块时长为3-8秒。短于约1.5秒的字幕块在屏幕上会显得生硬;长于约10秒的字幕块通常意味着遗漏了标点断点。
一段典型的2-3分钟演讲,从12-18个原始源字幕块可生成约25-40个按标点分段的字幕块。无需保留原字幕块数量。
若后续进行TTS配音:按标点分段的结构意味着每个TTS片段都是完整的语句,带有正确的结尾语调,拼接后的片段听起来自然,因为每个衔接点都是真实的停顿位置。
SRT output rules
SRT输出规则
text
1
00:00:01,200 --> 00:00:04,800
中文字幕内容
2
00:00:04,800 --> 00:00:08,500
中文字幕内容- Number subtitles sequentially starting from .
1 - Timestamp format: . Comma milliseconds, never period milliseconds.
HH:MM:SS,mmm - Do not overlap timestamps.
- Preserve the original timing unless adjustment is necessary.
- Each subtitle should usually be 1–2 lines.
- If one subtitle is too long, split it into shorter subtitles when timing allows.
- Do not add commentary inside the subtitle file.
text
1
00:00:01,200 --> 00:00:04,800
中文字幕内容
2
00:00:04,800 --> 00:00:08,500
中文字幕内容- 字幕编号从开始连续递增。
1 - 时间戳格式:。毫秒部分使用逗号,绝对不使用句号。
HH:MM:SS,mmm - 时间戳不能重叠。
- 除非必要,否则保留原始时间。
- 每个字幕通常为1-2行。
- 若单个字幕过长,在时间允许的情况下拆分为更短的字幕。
- 不要在字幕文件内添加注释。
Bilingual output
双语输出
When the user asks for bilingual: source on first line, target on second:
text
1
00:00:01,200 --> 00:00:04,800
No pasa nada.
没关系。Rules:
- Keep source first, target second.
- Preserve timing.
- Avoid adding extra explanations unless requested.
- Keep both lines short enough to read.
当用户要求双语字幕时:源语言内容在上行,目标语言内容在下行:
text
1
00:00:01,200 --> 00:00:04,800
No pasa nada.
没关系。规则:
- 源语言内容在上,目标语言内容在下。
- 保留时间戳。
- 除非用户要求,否则不要添加额外解释。
- 确保两行内容都短到便于阅读。
Output formats
输出格式
Depending on the user request, provide one or more:
- Target-only
.srt - Bilingual (source line + target line)
.srt - Target transcript without timestamps
- Side-by-side source/target table
Default output for "translate this SRT" with no other modifiers: target-only + a short uncertainty note if needed.
.srt根据用户请求,提供以下一种或多种格式:
- 仅目标语言的文件
.srt - 双语文件(源语言行 + 目标语言行)
.srt - 无时间戳的目标语言转录文本
- 源语言与目标语言并排的表格
若用户仅要求“translate this SRT”且无其他修饰,默认输出:仅目标语言的文件 + 必要时添加简短的不确定性说明。
.srtFile naming
文件命名规则
text
input.srt # source (e.g., from /wjs-transcribing-audio)
translated outputs:
input.zh-CN.srt # Simplified Chinese only
input.en.srt # English only
input.es-zh.srt # Spanish + Chinese bilingual
input.es-en.srt # Spanish + English bilingual
input.es-zh-en.srt # three-languageBCP-47-style suffixes make the target language obvious at a glance and keep multiple target-language outputs side-by-side.
text
input.srt # 源文件(例如来自/wjs-transcribing-audio)
翻译输出文件:
input.zh-CN.srt # 仅简体中文
input.en.srt # 仅英文
input.es-zh.srt # 西班牙语+中文双语
input.es-en.srt # 西班牙语+英文双语
input.es-zh-en.srt # 三语字幕采用BCP-47风格的后缀可一目了然地识别目标语言,并使多个目标语言输出文件可并存。
Handling unclear audio markers
处理模糊音频标记
If the source SRT contains or :
[inaudible][unclear]- Translate the surrounding context naturally.
- Keep the bracketed marker in the target SRT (don't invent content).
- If a chunk makes a cue ungrammatical in the target language, leave it bracketed and add a note in the response (not in the SRT file).
[unclear]
若源SRT文件包含或:
[inaudible][unclear]- 自然翻译上下文内容。
- 在目标SRT文件中保留带括号的标记(不要编造内容)。
- 若片段导致目标语言字幕块语法不通,保留括号标记并在回复中添加说明(不要添加到SRT文件中)。
[unclear]
Quality gate before handoff
交付前的质量检查
- Subtitle numbers are sequential
- Timestamps are valid (, no overlap)
HH:MM:SS,mmm - Milliseconds use commas
- Translation is natural; speaker tone preserved
- Line length within platform/cue caps
- Proper nouns accurate
- No cue ends mid-clause / mid-phrase
- No invented content
- 字幕编号连续递增
- 时间戳有效(格式为,无重叠)
HH:MM:SS,mmm - 毫秒部分使用逗号
- 翻译自然,说话者语气得以保留
- 行长度符合平台/字幕块限制
- 专有名词准确
- 无字幕块在从句/短语中途结束
- 无编造内容
Downstream
下游流程
- — burn this SRT onto the video, or soft-mux as a togglable track.
/wjs-burning-subtitles - — generate a TTS voice dub from this SRT, time-aligned to the original timing.
/wjs-dubbing-video - For bilingual playback: most platforms can soft-mux multiple subtitle tracks, but if you need bilingual visible at once, burn the directly via
*.source-target.srt./wjs-burning-subtitles
- — 将此SRT文件烧录到视频上,或作为可切换轨道进行软封装。
/wjs-burning-subtitles - — 基于此SRT文件生成TTS语音配音,并与原始时间对齐。
/wjs-dubbing-video - 双语播放:大多数平台可软封装多个字幕轨道,但如果需要同时显示双语字幕,直接通过烧录
/wjs-burning-subtitles文件即可。*.source-target.srt
Common pitfalls
常见误区
- Letting the cue end mid-sentence after translation. The source's silence-aligned cues are unsafe boundaries; re-segment at punctuation, always.
- Filler demonstratives in Chinese output. MT inserts 「这」/「那」 because the source had . Delete them aggressively.
eso/that - Period milliseconds. Whisper local writes ; SRT spec is
.mmm. Always normalize.,mmm - Translating proper nouns. Brand names, place names, technical terms — leave as-is or use the conventional target-language version (e.g., "OpenAI" stays, "New York" → "纽约").
- Over-shortening for cue caps. If a line is genuinely longer than the cap, split into two cues with interpolated timestamps; don't drop meaning to fit the cap.
- Forgetting to do re-segmentation when no dub is requested. The punct-bounded SRT is also better for reading — line endings at natural pauses match how viewers scan. Re-segment even when burn-only.
- 翻译后字幕块在语句中途结束。源文件按沉默对齐的字幕块是不安全的边界;必须始终按标点重新分段。
- 中文字幕输出中出现填充性指示代词。机器翻译会因原文有而插入「这」/「那」,需果断删除。
eso/that - 毫秒部分使用句号。本地Whisper输出使用;SRT规范要求使用
.mmm。务必统一格式。,mmm - 翻译专有名词。品牌名称、地名、技术术语——保留原样或使用目标语言中的通用译法(例如“OpenAI”保留原名,“New York”→“纽约”)。
- 为符合字幕块限制过度缩短内容。若某行确实超出限制,将其拆分为两个带插入时间戳的字幕块;不要为了适配限制而删减含义。
- 未请求配音时忘记重新分段。按标点分段的SRT文件也更便于阅读——在自然停顿处换行符合观众的阅读习惯。即使仅制作硬字幕,也要进行重新分段。