wjs-translating-subtitles

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

wjs-translating-subtitles

Source-language SRT in → target-language (or bilingual) SRT out. This skill is text-only. Burn-in lives in

/wjs-burning-subtitles

; voice dub in

/wjs-dubbing-video

输入源语言SRT → 输出目标语言（或双语）SRT。本技能仅处理文本。硬字幕制作请使用

/wjs-burning-subtitles

；语音配音请使用

/wjs-dubbing-video

。

When to use

适用场景

User has an SRT in language A and wants it in language B.
User pasted a transcript (with or without timestamps) and wants a translation that becomes an SRT.
User has an SRT but cues end mid-sentence — this skill's re-segmentation step fixes that.

用户拥有语言A的SRT文件，希望转换为语言B的SRT文件。
用户粘贴了带或不带时间戳的转录文本，希望将其翻译并生成SRT文件。
用户的SRT字幕在语句中途结束——本技能的重新分段步骤可修复此问题。

When NOT to use

不适用场景

No source-language SRT yet → run
```
/wjs-transcribing-audio
```
first.
User wants burned-in subtitles → finish translation here, then
```
/wjs-burning-subtitles
```
.
User wants voice dub → finish translation here, then
```
/wjs-dubbing-video
```
.

尚未拥有源语言SRT文件 → 请先运行
```
/wjs-transcribing-audio
```
。
用户需要硬字幕 → 在此完成翻译后，使用
```
/wjs-burning-subtitles
```
制作。
用户需要语音配音 → 在此完成翻译后，使用
```
/wjs-dubbing-video
```
制作。

Pick the target

选择目标语言

Resolve target from the user's phrasing once, don't re-ask:

"翻成中文 / 中文字幕 / 中文配音" →
```
zh-CN
```
.
"translate to English / English subs / English dub" →
```
en
```
.
"bilingual" / "双语" → produce both
```
.<source>.srt
```
and
```
.<target>.srt
```
(and optionally a combined
```
.<source>-<target>.srt
```
).
Ambiguous → default to whichever the user has historically chosen in the project.

Simplified Chinese and English are fully validated. Other targets (Japanese, Korean, French, etc.) work via the same rules; the bottleneck is TTS-voice availability if dubbing follows — see

/wjs-dubbing-video

before promising.

根据用户表述一次性确定目标语言，无需反复询问：

「翻成中文 / 中文字幕 / 中文配音」→
```
zh-CN
```
。
「translate to English / English subs / English dub」→
```
en
```
。
「bilingual」/「双语」→ 生成
```
.<source>.srt
```
和
```
.<target>.srt
```
两个文件（可选合并为
```
.<source>-<target>.srt
```
文件）。
目标语言不明确 → 默认使用用户在该项目中历史选择的语言。

简体中文和英文已完全验证。其他目标语言（日语、韩语、法语等）遵循相同规则；若后续需要配音，瓶颈在于TTS语音的可用性——在承诺前请参考

/wjs-dubbing-video

的说明。

Shared translation principles

通用翻译原则

Prioritize meaning over literal wording.
Use concise subtitle-style language — viewers read at ~3 wps for Chinese, ~3–4 wps for English; lines that exceed that go off-screen before they can be read.
Preserve the tone of the speaker. Casual source → casual target; formal source → formal target.
Do not over-translate names, brands, cultural references, or technical terms.
Keep numbers, dates, names, and places accurate.
If a phrase has no exact equivalent, translate the meaning naturally. No literal/word-for-word constructions.
Avoid stiff, machine-translated output.

优先传递含义而非直译文字。
使用简洁的字幕风格语言——中文观众的阅读速度约为每秒3个字，英文观众约为每秒3-4个词；超出此范围的字幕行在观众读完前就会移出屏幕。
保留说话者的语气。口语化原文对应口语化译文；正式原文对应正式译文。
不要过度翻译姓名、品牌、文化参考或技术术语。
确保数字、日期、姓名和地点的准确性。
若短语没有完全对应的译文，自然地传递其含义。不要逐字直译。
避免生硬的机器翻译风格输出。

Translating into Simplified Chinese (zh-CN)

翻译为简体中文（zh-CN）

Use natural spoken Mandarin for casual speech, formal Mandarin for formal speech.
Use Simplified characters only (do NOT use Traditional Hanzi unless the user explicitly asks).
Subtitle lines should be roughly 15 Chinese characters or fewer per line, max 2 lines per cue (3 only when unavoidable for very long cues).
Use Chinese punctuation: 「，」「。」「；」「：」「、」「——」. Never mix English commas/periods into Chinese subtitles.
Minimize filler demonstratives 「这」「那」「这个」「那个」「那份」「那种」「那里」「那样」. Spanish-to-Chinese (and English-to-Chinese) MT routinely inserts these because the source has overt demonstratives that Chinese usually drops. Examples:
- "这把我们带入二元世界的载体" → "把我们带入二元的载体"
- "运用那份能量" → "运用这股能量" if needed, or just "运用能量"
- "正是在这合一里" → "正是在合一中"
- "像罪人那样翻滚" → "像罪人翻滚" / "像罪人般翻滚"
- "那份精微的觉知" → "精微的觉知" Keep them only when they carry real meaning (deixis, contrast, or fixed phrase like spiritual "我就是那" / "tat tvam asi"). Default is to delete; add back only if the sentence becomes ambiguous.

Examples (Spanish → Chinese):

text

Spanish: No pasa nada.            → Chinese: 没关系。
Spanish: Vamos a ver qué pasa.    → Chinese: 我们看看会发生什么。
Spanish: Me parece una locura.    → Chinese: 我觉得这太疯狂了。
Spanish: ¿Qué quieres decir?      → Chinese: 你是什么意思？
Spanish: La verdad es que no lo esperaba.
                                  → Chinese: 说实话，我没想到会这样。

口语化内容使用自然的普通话口语表达，正式内容使用正式普通话。
仅使用简体汉字（除非用户明确要求，否则不要使用繁体汉字）。
字幕行每行最多约15个汉字，每个字幕块最多2行（仅在超长字幕块无法避免时使用3行）。
使用中文标点：「，」「。」「；」「：」「、」「——」。切勿在中文字幕中混用英文逗号/句号。
尽量减少填充性指示代词「这」「那」「这个」「那个」「那份」「那种」「那里」「那样」。 西班牙语到中文（以及英文到中文）的机器翻译通常会插入这些词，因为原文有显性指示代词，而中文通常会省略。示例：
- 「这把我们带入二元世界的载体」→「把我们带入二元的载体」
- 「运用那份能量」→ 必要时改为「运用这股能量」，或直接「运用能量」
- 「正是在这合一里」→「正是在合一中」
- 「像罪人那样翻滚」→「像罪人翻滚」/「像罪人般翻滚」
- 「那份精微的觉知」→「精微的觉知」仅当这些词具有实际含义（指示、对比或固定短语如灵性用语「我就是那」/「tat tvam asi」）时才保留。默认删除；仅当句子变得模糊时才重新添加。

示例（西班牙语 → 中文）：

text

Spanish: No pasa nada.            → Chinese: 没关系。
Spanish: Vamos a ver qué pasa.    → Chinese: 我们看看会发生什么。
Spanish: Me parece una locura.    → Chinese: 我觉得这太疯狂了。
Spanish: ¿Qué quieres decir?      → Chinese: 你是什么意思？
Spanish: La verdad es que no lo esperaba.
                                  → Chinese: 说实话，我没想到会这样。

Translating into English (en)

翻译为英文（en）

Use natural conversational English. Avoid translationese ("It is precisely through entering the body…" → "It's by entering the body…").
Lines should be roughly 40–42 characters or fewer (about 7–9 words), max 2 lines per cue. Hard cap 50 chars per line.
Use ASCII punctuation:
```
,
```
```
.
```
```
;
```
```
:
```
```
—
```
(em-dash). Avoid Unicode curly quotes — keeps
```
.srt
```
portable.
For contemplative/spiritual content, prefer plain words over Latinate jargon: "presence" over "manifestation," "wholeness" over "totality," "wake up" over "awaken to consciousness."

Examples (Spanish → English):

text

Spanish: No pasa nada.            → English: It's nothing.
Spanish: Vamos a ver qué pasa.    → English: Let's see what happens.
Spanish: Me parece una locura.    → English: This feels crazy to me.
Spanish: ¿Qué quieres decir?      → English: What do you mean?
Spanish: La verdad es que no lo esperaba.
                                  → English: Honestly, I wasn't expecting this.

使用自然的日常英语表达。避免翻译腔（例如将「It is precisely through entering the body…」改为「It's by entering the body…」）。
每行最多约40-42个字符（约7-9个词），每个字幕块最多2行。每行硬上限为50个字符。
使用ASCII标点：
```
,
```
.
```
;
```
```
:
```
```
—
```
（长破折号）。避免使用Unicode弯引号——确保
```
.srt
```
文件的可移植性。
对于冥想/灵性内容，优先使用简单词汇而非拉丁语系术语：用「presence」代替「manifestation」，用「wholeness」代替「totality」，用「wake up」代替「awaken to consciousness」。

示例（西班牙语 → 英文）：

text

Spanish: No pasa nada.            → English: It's nothing.
Spanish: Vamos a ver qué pasa.    → English: Let's see what happens.
Spanish: Me parece una locura.    → English: This feels crazy to me.
Spanish: ¿Qué quieres decir?      → English: What do you mean?
Spanish: La verdad es que no lo esperaba.
                                  → English: Honestly, I wasn't expecting this.

Re-segment at punctuation boundaries (mandatory)

按标点边界重新分段（必填步骤）

Whisper segments by silence/breath, not grammar. The result almost always has cues that end mid-sentence (e.g., "...es una forma de aterrizar," next cue starts "el espíritu en el cuerpo..."). Any TTS that processes one cue at a time will then insert an unnatural pause exactly where the original speaker did not. The fix is mandatory before dubbing — and improves on-screen reading too.

Punctuation set differs:

Chinese cues must end at
```
，
```
```
。
```
```
；
```
```
：
```
```
——
```
or
```
、
```
.
English cues must end at
```
,
```
```
.
```
```
;
```
```
:
```
```
—
```
(em-dash) or, in practice for subtitles, occasionally a single dash. Never end an English cue on a comma-less clause break, and never split inside a phrase like "kind of" or "in order to".

Rules:

Every cue must end at a real punctuation mark. Never let a cue end on a noun, verb, conjunction, or article that flows into the next cue.
It is fine (and often necessary) to split a single source cue into 2–4 shorter cues, with timestamps interpolated by character position within the original cue's duration.
It is fine to merge the tail of one source cue with the head of the next when they form one clause — the merged cue inherits the start of the first and the end of the second.
Target 3–8 seconds per cue. Cues shorter than ~1.5s feel choppy on screen; cues longer than ~10s usually contain a missed punctuation break.

A typical 2–3 minute talk yields roughly 25–40 punct-bounded cues from 12–18 raw source cues. Don't try to keep the original cue count.

When TTS dubbing follows: the punctuation-bounded structure means each TTS clip is a complete utterance with proper end-intonation, and concatenating clips sounds natural because every join is at a real pause point.

Whisper按沉默/呼吸分段，而非语法规则。结果几乎总是出现字幕块在语句中途结束的情况（例如：「...es una forma de aterrizar,」下一个字幕块以「el espíritu en el cuerpo...」开头）。任何逐块处理的TTS都会在原说话者未停顿的位置插入不自然的停顿。在配音前必须修复此问题——这也能提升屏幕阅读体验。

标点集合有所不同：

中文字幕块必须在「，」「。」「；」「：」「——」或「、」处结束。
英文字幕块必须在
```
,
```
.
```
;
```
```
:
```
```
—
```
（长破折号）处结束，或者在字幕实际使用中偶尔使用单个短破折号。切勿让英文字幕块在无逗号的从句断点处结束，也切勿在「kind of」或「in order to」这类短语内部拆分。

规则：

每个字幕块必须在真实标点处结束。绝不能让字幕块在名词、动词、连词或冠词处结束，导致内容延续到下一个字幕块。
将单个源字幕块拆分为2-4个更短的字幕块是可行的（且通常必要），根据原字幕块时长内的字符位置插入时间戳。
当一个源字幕块的尾部与下一个源字幕块的头部构成完整从句时，将它们合并是可行的——合并后的字幕块继承第一个的开始时间和第二个的结束时间。
目标是每个字幕块时长为3-8秒。短于约1.5秒的字幕块在屏幕上会显得生硬；长于约10秒的字幕块通常意味着遗漏了标点断点。

一段典型的2-3分钟演讲，从12-18个原始源字幕块可生成约25-40个按标点分段的字幕块。无需保留原字幕块数量。

若后续进行TTS配音：按标点分段的结构意味着每个TTS片段都是完整的语句，带有正确的结尾语调，拼接后的片段听起来自然，因为每个衔接点都是真实的停顿位置。

SRT output rules

SRT输出规则

text

1
00:00:01,200 --> 00:00:04,800
中文字幕内容

2
00:00:04,800 --> 00:00:08,500
中文字幕内容

Number subtitles sequentially starting from
```
1
```
.
Timestamp format:
```
HH:MM:SS,mmm
```
. Comma milliseconds, never period milliseconds.
Do not overlap timestamps.
Preserve the original timing unless adjustment is necessary.
Each subtitle should usually be 1–2 lines.
If one subtitle is too long, split it into shorter subtitles when timing allows.
Do not add commentary inside the subtitle file.

text

1
00:00:01,200 --> 00:00:04,800
中文字幕内容

2
00:00:04,800 --> 00:00:08,500
中文字幕内容

字幕编号从
```
1
```
开始连续递增。
时间戳格式：
```
HH:MM:SS,mmm
```
。毫秒部分使用逗号，绝对不使用句号。
时间戳不能重叠。
除非必要，否则保留原始时间。
每个字幕通常为1-2行。
若单个字幕过长，在时间允许的情况下拆分为更短的字幕。
不要在字幕文件内添加注释。

Bilingual output

双语输出

When the user asks for bilingual: source on first line, target on second:

text

1
00:00:01,200 --> 00:00:04,800
No pasa nada.
没关系。

Rules:

Keep source first, target second.
Preserve timing.
Avoid adding extra explanations unless requested.
Keep both lines short enough to read.

当用户要求双语字幕时：源语言内容在上行，目标语言内容在下行：

text

1
00:00:01,200 --> 00:00:04,800
No pasa nada.
没关系。

规则：

源语言内容在上，目标语言内容在下。
保留时间戳。
除非用户要求，否则不要添加额外解释。
确保两行内容都短到便于阅读。

Output formats

输出格式

Depending on the user request, provide one or more:

Target-only
```
.srt
```
Bilingual
```
.srt
```
(source line + target line)
Target transcript without timestamps
Side-by-side source/target table

Default output for "translate this SRT" with no other modifiers: target-only
.srt
+ a short uncertainty note if needed.

根据用户请求，提供以下一种或多种格式：

仅目标语言的
```
.srt
```
文件
双语
```
.srt
```
文件（源语言行 + 目标语言行）
无时间戳的目标语言转录文本
源语言与目标语言并排的表格

若用户仅要求“translate this SRT”且无其他修饰，默认输出：仅目标语言的
.srt
文件 + 必要时添加简短的不确定性说明。

File naming

文件命名规则

text

input.srt                          # source (e.g., from /wjs-transcribing-audio)

translated outputs:
  input.zh-CN.srt                  # Simplified Chinese only
  input.en.srt                     # English only
  input.es-zh.srt                  # Spanish + Chinese bilingual
  input.es-en.srt                  # Spanish + English bilingual
  input.es-zh-en.srt               # three-language

BCP-47-style suffixes make the target language obvious at a glance and keep multiple target-language outputs side-by-side.

text

input.srt                          # 源文件（例如来自/wjs-transcribing-audio）

翻译输出文件：
  input.zh-CN.srt                  # 仅简体中文
  input.en.srt                     # 仅英文
  input.es-zh.srt                  # 西班牙语+中文双语
  input.es-en.srt                  # 西班牙语+英文双语
  input.es-zh-en.srt               # 三语字幕

采用BCP-47风格的后缀可一目了然地识别目标语言，并使多个目标语言输出文件可并存。

Handling unclear audio markers

处理模糊音频标记

If the source SRT contains

[inaudible]

[unclear]

Translate the surrounding context naturally.
Keep the bracketed marker in the target SRT (don't invent content).
If a
```
[unclear]
```
chunk makes a cue ungrammatical in the target language, leave it bracketed and add a note in the response (not in the SRT file).

若源SRT文件包含

[inaudible]

或

[unclear]

：

自然翻译上下文内容。
在目标SRT文件中保留带括号的标记（不要编造内容）。
若
```
[unclear]
```
片段导致目标语言字幕块语法不通，保留括号标记并在回复中添加说明（不要添加到SRT文件中）。

Quality gate before handoff

交付前的质量检查

Subtitle numbers are sequential
Timestamps are valid (
```
HH:MM:SS,mmm
```
, no overlap)
Milliseconds use commas
Translation is natural; speaker tone preserved
Line length within platform/cue caps
Proper nouns accurate
No cue ends mid-clause / mid-phrase
No invented content

字幕编号连续递增
时间戳有效（格式为
```
HH:MM:SS,mmm
```
，无重叠）
毫秒部分使用逗号
翻译自然，说话者语气得以保留
行长度符合平台/字幕块限制
专有名词准确
无字幕块在从句/短语中途结束
无编造内容

Downstream

下游流程

/wjs-burning-subtitles
— burn this SRT onto the video, or soft-mux as a togglable track.
/wjs-dubbing-video
— generate a TTS voice dub from this SRT, time-aligned to the original timing.
For bilingual playback: most platforms can soft-mux multiple subtitle tracks, but if you need bilingual visible at once, burn the
```
*.source-target.srt
```
directly via
```
/wjs-burning-subtitles
```
.

/wjs-burning-subtitles
— 将此SRT文件烧录到视频上，或作为可切换轨道进行软封装。
/wjs-dubbing-video
— 基于此SRT文件生成TTS语音配音，并与原始时间对齐。
双语播放：大多数平台可软封装多个字幕轨道，但如果需要同时显示双语字幕，直接通过
```
/wjs-burning-subtitles
```
烧录
```
*.source-target.srt
```
文件即可。

Common pitfalls

常见误区

Letting the cue end mid-sentence after translation. The source's silence-aligned cues are unsafe boundaries; re-segment at punctuation, always.
Filler demonstratives in Chinese output. MT inserts 「这」/「那」 because the source had
```
eso/that
```
. Delete them aggressively.
Period milliseconds. Whisper local writes
```
.mmm
```
; SRT spec is
```
,mmm
```
. Always normalize.
Translating proper nouns. Brand names, place names, technical terms — leave as-is or use the conventional target-language version (e.g., "OpenAI" stays, "New York" → "纽约").
Over-shortening for cue caps. If a line is genuinely longer than the cap, split into two cues with interpolated timestamps; don't drop meaning to fit the cap.
Forgetting to do re-segmentation when no dub is requested. The punct-bounded SRT is also better for reading — line endings at natural pauses match how viewers scan. Re-segment even when burn-only.

翻译后字幕块在语句中途结束。源文件按沉默对齐的字幕块是不安全的边界；必须始终按标点重新分段。
中文字幕输出中出现填充性指示代词。机器翻译会因原文有
```
eso/that
```
而插入「这」/「那」，需果断删除。
毫秒部分使用句号。本地Whisper输出使用
```
.mmm
```
；SRT规范要求使用
```
,mmm
```
。务必统一格式。
翻译专有名词。品牌名称、地名、技术术语——保留原样或使用目标语言中的通用译法（例如“OpenAI”保留原名，“New York”→“纽约”）。
为符合字幕块限制过度缩短内容。若某行确实超出限制，将其拆分为两个带插入时间戳的字幕块；不要为了适配限制而删减含义。
未请求配音时忘记重新分段。按标点分段的SRT文件也更便于阅读——在自然停顿处换行符合观众的阅读习惯。即使仅制作硬字幕，也要进行重新分段。