wjs-translating-subtitles

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

wjs-translating-subtitles

wjs-translating-subtitles

Source-language SRT in → target-language (or bilingual) SRT out. This skill is text-only. Burn-in lives in
/wjs-burning-subtitles
; voice dub in
/wjs-dubbing-video
.
输入源语言SRT → 输出目标语言(或双语)SRT。本技能仅处理文本。硬字幕制作请使用
/wjs-burning-subtitles
;语音配音请使用
/wjs-dubbing-video

When to use

适用场景

  • User has an SRT in language A and wants it in language B.
  • User pasted a transcript (with or without timestamps) and wants a translation that becomes an SRT.
  • User has an SRT but cues end mid-sentence — this skill's re-segmentation step fixes that.
  • 用户拥有语言A的SRT文件,希望转换为语言B的SRT文件。
  • 用户粘贴了带或不带时间戳的转录文本,希望将其翻译并生成SRT文件。
  • 用户的SRT字幕在语句中途结束——本技能的重新分段步骤可修复此问题。

When NOT to use

不适用场景

  • No source-language SRT yet → run
    /wjs-transcribing-audio
    first.
  • User wants burned-in subtitles → finish translation here, then
    /wjs-burning-subtitles
    .
  • User wants voice dub → finish translation here, then
    /wjs-dubbing-video
    .
  • 尚未拥有源语言SRT文件 → 请先运行
    /wjs-transcribing-audio
  • 用户需要硬字幕 → 在此完成翻译后,使用
    /wjs-burning-subtitles
    制作。
  • 用户需要语音配音 → 在此完成翻译后,使用
    /wjs-dubbing-video
    制作。

Pick the target

选择目标语言

Resolve target from the user's phrasing once, don't re-ask:
  • "翻成中文 / 中文字幕 / 中文配音" →
    zh-CN
    .
  • "translate to English / English subs / English dub" →
    en
    .
  • "bilingual" / "双语" → produce both
    .<source>.srt
    and
    .<target>.srt
    (and optionally a combined
    .<source>-<target>.srt
    ).
  • Ambiguous → default to whichever the user has historically chosen in the project.
Simplified Chinese and English are fully validated. Other targets (Japanese, Korean, French, etc.) work via the same rules; the bottleneck is TTS-voice availability if dubbing follows — see
/wjs-dubbing-video
before promising.
根据用户表述一次性确定目标语言,无需反复询问:
  • 「翻成中文 / 中文字幕 / 中文配音」→
    zh-CN
  • 「translate to English / English subs / English dub」→
    en
  • 「bilingual」/「双语」→ 生成
    .<source>.srt
    .<target>.srt
    两个文件(可选合并为
    .<source>-<target>.srt
    文件)。
  • 目标语言不明确 → 默认使用用户在该项目中历史选择的语言。
简体中文和英文已完全验证。其他目标语言(日语、韩语、法语等)遵循相同规则;若后续需要配音,瓶颈在于TTS语音的可用性——在承诺前请参考
/wjs-dubbing-video
的说明。

Shared translation principles

通用翻译原则

  • Prioritize meaning over literal wording.
  • Use concise subtitle-style language — viewers read at ~3 wps for Chinese, ~3–4 wps for English; lines that exceed that go off-screen before they can be read.
  • Preserve the tone of the speaker. Casual source → casual target; formal source → formal target.
  • Do not over-translate names, brands, cultural references, or technical terms.
  • Keep numbers, dates, names, and places accurate.
  • If a phrase has no exact equivalent, translate the meaning naturally. No literal/word-for-word constructions.
  • Avoid stiff, machine-translated output.
  • 优先传递含义而非直译文字。
  • 使用简洁的字幕风格语言——中文观众的阅读速度约为每秒3个字,英文观众约为每秒3-4个词;超出此范围的字幕行在观众读完前就会移出屏幕。
  • 保留说话者的语气。口语化原文对应口语化译文;正式原文对应正式译文。
  • 不要过度翻译姓名、品牌、文化参考或技术术语。
  • 确保数字、日期、姓名和地点的准确性。
  • 若短语没有完全对应的译文,自然地传递其含义。不要逐字直译。
  • 避免生硬的机器翻译风格输出。

Translating into Simplified Chinese (zh-CN)

翻译为简体中文(zh-CN)

  • Use natural spoken Mandarin for casual speech, formal Mandarin for formal speech.
  • Use Simplified characters only (do NOT use Traditional Hanzi unless the user explicitly asks).
  • Subtitle lines should be roughly 15 Chinese characters or fewer per line, max 2 lines per cue (3 only when unavoidable for very long cues).
  • Use Chinese punctuation: 「,」「。」「;」「:」「、」「——」. Never mix English commas/periods into Chinese subtitles.
  • Minimize filler demonstratives 「这」「那」「这个」「那个」「那份」「那种」「那里」「那样」. Spanish-to-Chinese (and English-to-Chinese) MT routinely inserts these because the source has overt demonstratives that Chinese usually drops. Examples:
    • "这把我们带入二元世界的载体" → "把我们带入二元的载体"
    • "运用那份能量" → "运用这股能量" if needed, or just "运用能量"
    • "正是在这合一里" → "正是在合一中"
    • "像罪人那样翻滚" → "像罪人翻滚" / "像罪人般翻滚"
    • "那份精微的觉知" → "精微的觉知" Keep them only when they carry real meaning (deixis, contrast, or fixed phrase like spiritual "我就是那" / "tat tvam asi"). Default is to delete; add back only if the sentence becomes ambiguous.
Examples (Spanish → Chinese):
text
Spanish: No pasa nada.            → Chinese: 没关系。
Spanish: Vamos a ver qué pasa.    → Chinese: 我们看看会发生什么。
Spanish: Me parece una locura.    → Chinese: 我觉得这太疯狂了。
Spanish: ¿Qué quieres decir?      → Chinese: 你是什么意思?
Spanish: La verdad es que no lo esperaba.
                                  → Chinese: 说实话,我没想到会这样。
  • 口语化内容使用自然的普通话口语表达,正式内容使用正式普通话。
  • 仅使用简体汉字(除非用户明确要求,否则不要使用繁体汉字)。
  • 字幕行每行最多约15个汉字,每个字幕块最多2行(仅在超长字幕块无法避免时使用3行)。
  • 使用中文标点:「,」「。」「;」「:」「、」「——」。切勿在中文字幕中混用英文逗号/句号。
  • 尽量减少填充性指示代词「这」「那」「这个」「那个」「那份」「那种」「那里」「那样」。 西班牙语到中文(以及英文到中文)的机器翻译通常会插入这些词,因为原文有显性指示代词,而中文通常会省略。示例:
    • 「这把我们带入二元世界的载体」→「把我们带入二元的载体」
    • 「运用那份能量」→ 必要时改为「运用这股能量」,或直接「运用能量」
    • 「正是在这合一里」→「正是在合一中」
    • 「像罪人那样翻滚」→「像罪人翻滚」/「像罪人般翻滚」
    • 「那份精微的觉知」→「精微的觉知」 仅当这些词具有实际含义(指示、对比或固定短语如灵性用语「我就是那」/「tat tvam asi」)时才保留。默认删除;仅当句子变得模糊时才重新添加。
示例(西班牙语 → 中文):
text
Spanish: No pasa nada.            → Chinese: 没关系。
Spanish: Vamos a ver qué pasa.    → Chinese: 我们看看会发生什么。
Spanish: Me parece una locura.    → Chinese: 我觉得这太疯狂了。
Spanish: ¿Qué quieres decir?      → Chinese: 你是什么意思?
Spanish: La verdad es que no lo esperaba.
                                  → Chinese: 说实话,我没想到会这样。

Translating into English (en)

翻译为英文(en)

  • Use natural conversational English. Avoid translationese ("It is precisely through entering the body…" → "It's by entering the body…").
  • Lines should be roughly 40–42 characters or fewer (about 7–9 words), max 2 lines per cue. Hard cap 50 chars per line.
  • Use ASCII punctuation:
    ,
    .
    ;
    :
    (em-dash). Avoid Unicode curly quotes — keeps
    .srt
    portable.
  • For contemplative/spiritual content, prefer plain words over Latinate jargon: "presence" over "manifestation," "wholeness" over "totality," "wake up" over "awaken to consciousness."
Examples (Spanish → English):
text
Spanish: No pasa nada.            → English: It's nothing.
Spanish: Vamos a ver qué pasa.    → English: Let's see what happens.
Spanish: Me parece una locura.    → English: This feels crazy to me.
Spanish: ¿Qué quieres decir?      → English: What do you mean?
Spanish: La verdad es que no lo esperaba.
                                  → English: Honestly, I wasn't expecting this.
  • 使用自然的日常英语表达。避免翻译腔(例如将「It is precisely through entering the body…」改为「It's by entering the body…」)。
  • 每行最多约40-42个字符(约7-9个词),每个字幕块最多2行。每行硬上限为50个字符。
  • 使用ASCII标点:
    ,
    .
    ;
    :
    (长破折号)。避免使用Unicode弯引号——确保
    .srt
    文件的可移植性。
  • 对于冥想/灵性内容,优先使用简单词汇而非拉丁语系术语:用「presence」代替「manifestation」,用「wholeness」代替「totality」,用「wake up」代替「awaken to consciousness」。
示例(西班牙语 → 英文):
text
Spanish: No pasa nada.            → English: It's nothing.
Spanish: Vamos a ver qué pasa.    → English: Let's see what happens.
Spanish: Me parece una locura.    → English: This feels crazy to me.
Spanish: ¿Qué quieres decir?      → English: What do you mean?
Spanish: La verdad es que no lo esperaba.
                                  → English: Honestly, I wasn't expecting this.

Re-segment at punctuation boundaries (mandatory)

按标点边界重新分段(必填步骤)

Whisper segments by silence/breath, not grammar. The result almost always has cues that end mid-sentence (e.g., "...es una forma de aterrizar," next cue starts "el espíritu en el cuerpo..."). Any TTS that processes one cue at a time will then insert an unnatural pause exactly where the original speaker did not. The fix is mandatory before dubbing — and improves on-screen reading too.
Punctuation set differs:
  • Chinese cues must end at
    ——
    or
    .
  • English cues must end at
    ,
    .
    ;
    :
    (em-dash) or, in practice for subtitles, occasionally a single dash. Never end an English cue on a comma-less clause break, and never split inside a phrase like "kind of" or "in order to".
Rules:
  • Every cue must end at a real punctuation mark. Never let a cue end on a noun, verb, conjunction, or article that flows into the next cue.
  • It is fine (and often necessary) to split a single source cue into 2–4 shorter cues, with timestamps interpolated by character position within the original cue's duration.
  • It is fine to merge the tail of one source cue with the head of the next when they form one clause — the merged cue inherits the start of the first and the end of the second.
  • Target 3–8 seconds per cue. Cues shorter than ~1.5s feel choppy on screen; cues longer than ~10s usually contain a missed punctuation break.
A typical 2–3 minute talk yields roughly 25–40 punct-bounded cues from 12–18 raw source cues. Don't try to keep the original cue count.
When TTS dubbing follows: the punctuation-bounded structure means each TTS clip is a complete utterance with proper end-intonation, and concatenating clips sounds natural because every join is at a real pause point.
Whisper按沉默/呼吸分段,而非语法规则。结果几乎总是出现字幕块在语句中途结束的情况(例如:「...es una forma de aterrizar,」下一个字幕块以「el espíritu en el cuerpo...」开头)。任何逐块处理的TTS都会在原说话者未停顿的位置插入不自然的停顿。在配音前必须修复此问题——这也能提升屏幕阅读体验。
标点集合有所不同:
  • 中文字幕块必须在「,」「。」「;」「:」「——」或「、」处结束。
  • 英文字幕块必须在
    ,
    .
    ;
    :
    (长破折号)处结束,或者在字幕实际使用中偶尔使用单个短破折号。切勿让英文字幕块在无逗号的从句断点处结束,也切勿在「kind of」或「in order to」这类短语内部拆分。
规则:
  • 每个字幕块必须在真实标点处结束。绝不能让字幕块在名词、动词、连词或冠词处结束,导致内容延续到下一个字幕块。
  • 将单个源字幕块拆分为2-4个更短的字幕块是可行的(且通常必要),根据原字幕块时长内的字符位置插入时间戳。
  • 当一个源字幕块的尾部与下一个源字幕块的头部构成完整从句时,将它们合并是可行的——合并后的字幕块继承第一个的开始时间和第二个的结束时间。
  • 目标是每个字幕块时长为3-8秒。短于约1.5秒的字幕块在屏幕上会显得生硬;长于约10秒的字幕块通常意味着遗漏了标点断点。
一段典型的2-3分钟演讲,从12-18个原始源字幕块可生成约25-40个按标点分段的字幕块。无需保留原字幕块数量。
若后续进行TTS配音:按标点分段的结构意味着每个TTS片段都是完整的语句,带有正确的结尾语调,拼接后的片段听起来自然,因为每个衔接点都是真实的停顿位置。

SRT output rules

SRT输出规则

text
1
00:00:01,200 --> 00:00:04,800
中文字幕内容

2
00:00:04,800 --> 00:00:08,500
中文字幕内容
  • Number subtitles sequentially starting from
    1
    .
  • Timestamp format:
    HH:MM:SS,mmm
    . Comma milliseconds, never period milliseconds.
  • Do not overlap timestamps.
  • Preserve the original timing unless adjustment is necessary.
  • Each subtitle should usually be 1–2 lines.
  • If one subtitle is too long, split it into shorter subtitles when timing allows.
  • Do not add commentary inside the subtitle file.
text
1
00:00:01,200 --> 00:00:04,800
中文字幕内容

2
00:00:04,800 --> 00:00:08,500
中文字幕内容
  • 字幕编号从
    1
    开始连续递增。
  • 时间戳格式:
    HH:MM:SS,mmm
    。毫秒部分使用逗号,绝对不使用句号。
  • 时间戳不能重叠。
  • 除非必要,否则保留原始时间。
  • 每个字幕通常为1-2行。
  • 若单个字幕过长,在时间允许的情况下拆分为更短的字幕。
  • 不要在字幕文件内添加注释。

Bilingual output

双语输出

When the user asks for bilingual: source on first line, target on second:
text
1
00:00:01,200 --> 00:00:04,800
No pasa nada.
没关系。
Rules:
  • Keep source first, target second.
  • Preserve timing.
  • Avoid adding extra explanations unless requested.
  • Keep both lines short enough to read.
当用户要求双语字幕时:源语言内容在上行,目标语言内容在下行:
text
1
00:00:01,200 --> 00:00:04,800
No pasa nada.
没关系。
规则:
  • 源语言内容在上,目标语言内容在下。
  • 保留时间戳。
  • 除非用户要求,否则不要添加额外解释。
  • 确保两行内容都短到便于阅读。

Output formats

输出格式

Depending on the user request, provide one or more:
  1. Target-only
    .srt
  2. Bilingual
    .srt
    (source line + target line)
  3. Target transcript without timestamps
  4. Side-by-side source/target table
Default output for "translate this SRT" with no other modifiers: target-only
.srt
+ a short uncertainty note if needed.
根据用户请求,提供以下一种或多种格式:
  1. 仅目标语言的
    .srt
    文件
  2. 双语
    .srt
    文件(源语言行 + 目标语言行)
  3. 无时间戳的目标语言转录文本
  4. 源语言与目标语言并排的表格
若用户仅要求“translate this SRT”且无其他修饰,默认输出:仅目标语言的
.srt
文件
+ 必要时添加简短的不确定性说明。

File naming

文件命名规则

text
input.srt                          # source (e.g., from /wjs-transcribing-audio)

translated outputs:
  input.zh-CN.srt                  # Simplified Chinese only
  input.en.srt                     # English only
  input.es-zh.srt                  # Spanish + Chinese bilingual
  input.es-en.srt                  # Spanish + English bilingual
  input.es-zh-en.srt               # three-language
BCP-47-style suffixes make the target language obvious at a glance and keep multiple target-language outputs side-by-side.
text
input.srt                          # 源文件(例如来自/wjs-transcribing-audio)

翻译输出文件:
  input.zh-CN.srt                  # 仅简体中文
  input.en.srt                     # 仅英文
  input.es-zh.srt                  # 西班牙语+中文双语
  input.es-en.srt                  # 西班牙语+英文双语
  input.es-zh-en.srt               # 三语字幕
采用BCP-47风格的后缀可一目了然地识别目标语言,并使多个目标语言输出文件可并存。

Handling unclear audio markers

处理模糊音频标记

If the source SRT contains
[inaudible]
or
[unclear]
:
  • Translate the surrounding context naturally.
  • Keep the bracketed marker in the target SRT (don't invent content).
  • If a
    [unclear]
    chunk makes a cue ungrammatical in the target language, leave it bracketed and add a note in the response (not in the SRT file).
若源SRT文件包含
[inaudible]
[unclear]
  • 自然翻译上下文内容。
  • 在目标SRT文件中保留带括号的标记(不要编造内容)。
  • [unclear]
    片段导致目标语言字幕块语法不通,保留括号标记并在回复中添加说明(不要添加到SRT文件中)。

Quality gate before handoff

交付前的质量检查

  • Subtitle numbers are sequential
  • Timestamps are valid (
    HH:MM:SS,mmm
    , no overlap)
  • Milliseconds use commas
  • Translation is natural; speaker tone preserved
  • Line length within platform/cue caps
  • Proper nouns accurate
  • No cue ends mid-clause / mid-phrase
  • No invented content
  • 字幕编号连续递增
  • 时间戳有效(格式为
    HH:MM:SS,mmm
    ,无重叠)
  • 毫秒部分使用逗号
  • 翻译自然,说话者语气得以保留
  • 行长度符合平台/字幕块限制
  • 专有名词准确
  • 无字幕块在从句/短语中途结束
  • 无编造内容

Downstream

下游流程

  • /wjs-burning-subtitles
    — burn this SRT onto the video, or soft-mux as a togglable track.
  • /wjs-dubbing-video
    — generate a TTS voice dub from this SRT, time-aligned to the original timing.
  • For bilingual playback: most platforms can soft-mux multiple subtitle tracks, but if you need bilingual visible at once, burn the
    *.source-target.srt
    directly via
    /wjs-burning-subtitles
    .
  • /wjs-burning-subtitles
    — 将此SRT文件烧录到视频上,或作为可切换轨道进行软封装。
  • /wjs-dubbing-video
    — 基于此SRT文件生成TTS语音配音,并与原始时间对齐。
  • 双语播放:大多数平台可软封装多个字幕轨道,但如果需要同时显示双语字幕,直接通过
    /wjs-burning-subtitles
    烧录
    *.source-target.srt
    文件即可。

Common pitfalls

常见误区

  • Letting the cue end mid-sentence after translation. The source's silence-aligned cues are unsafe boundaries; re-segment at punctuation, always.
  • Filler demonstratives in Chinese output. MT inserts 「这」/「那」 because the source had
    eso/that
    . Delete them aggressively.
  • Period milliseconds. Whisper local writes
    .mmm
    ; SRT spec is
    ,mmm
    . Always normalize.
  • Translating proper nouns. Brand names, place names, technical terms — leave as-is or use the conventional target-language version (e.g., "OpenAI" stays, "New York" → "纽约").
  • Over-shortening for cue caps. If a line is genuinely longer than the cap, split into two cues with interpolated timestamps; don't drop meaning to fit the cap.
  • Forgetting to do re-segmentation when no dub is requested. The punct-bounded SRT is also better for reading — line endings at natural pauses match how viewers scan. Re-segment even when burn-only.
  • 翻译后字幕块在语句中途结束。源文件按沉默对齐的字幕块是不安全的边界;必须始终按标点重新分段。
  • 中文字幕输出中出现填充性指示代词。机器翻译会因原文有
    eso/that
    而插入「这」/「那」,需果断删除。
  • 毫秒部分使用句号。本地Whisper输出使用
    .mmm
    ;SRT规范要求使用
    ,mmm
    。务必统一格式。
  • 翻译专有名词。品牌名称、地名、技术术语——保留原样或使用目标语言中的通用译法(例如“OpenAI”保留原名,“New York”→“纽约”)。
  • 为符合字幕块限制过度缩短内容。若某行确实超出限制,将其拆分为两个带插入时间戳的字幕块;不要为了适配限制而删减含义。
  • 未请求配音时忘记重新分段。按标点分段的SRT文件也更便于阅读——在自然停顿处换行符合观众的阅读习惯。即使仅制作硬字幕,也要进行重新分段。