captions

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Captions

字幕

Analyze the spoken content to determine caption style. If the user specifies a style, use that. Otherwise, detect tone from the transcript.
分析语音内容以确定字幕样式。若用户指定了样式,则使用该样式;否则从转录文本中检测语气。

Transcript Source

转录文本来源

The project's
transcript.json
contains word-level timestamps from whisper.cpp (
--output-json-full
with
--dtw
):
json
{
  "transcription": [
    {
      "offsets": { "from": 0, "to": 5000 },
      "text": " Hello world.",
      "tokens": [
        { "text": " Hello", "offsets": { "from": 0, "to": 1000 }, "p": 0.98 },
        { "text": " world", "offsets": { "from": 1000, "to": 2000 }, "p": 0.95 }
      ]
    }
  ]
}
Normalize tokens into a word array before grouping:
js
const words = [];
for (const segment of transcript.transcription) {
  for (const token of segment.tokens || []) {
    const text = token.text.trim();
    if (!text) continue;
    words.push({
      text,
      start: token.offsets.from / 1000,
      end: token.offsets.to / 1000,
    });
  }
}
If no
transcript.json
exists, check for
.srt
or
.vtt
files. If no transcript is available, ask the user to provide one or run
hyperframes transcribe
(when available).
项目中的
transcript.json
文件包含来自whisper.cpp的逐词时间戳(使用
--output-json-full
--dtw
参数生成):
json
{
  "transcription": [
    {
      "offsets": { "from": 0, "to": 5000 },
      "text": " Hello world.",
      "tokens": [
        { "text": " Hello", "offsets": { "from": 0, "to": 1000 }, "p": 0.98 },
        { "text": " world", "offsets": { "from": 1000, "to": 2000 }, "p": 0.95 }
      ]
    }
  ]
}
在分组前将标记标准化为单词数组:
js
const words = [];
for (const segment of transcript.transcription) {
  for (const token of segment.tokens || []) {
    const text = token.text.trim();
    if (!text) continue;
    words.push({
      text,
      start: token.offsets.from / 1000,
      end: token.offsets.to / 1000,
    });
  }
}
若不存在
transcript.json
文件,则检查是否有
.srt
.vtt
文件。若没有可用的转录文本,请要求用户提供,或运行
hyperframes transcribe
命令(当该功能可用时)。

Style Detection (Default — When No Style Is Specified)

样式检测(默认规则——未指定样式时)

Read the full transcript before choosing a style. The style comes from the content, not a template.
在选择样式前需通读完整转录文本。样式由内容决定,而非模板。

Four Dimensions

四个维度

1. Visual feel — the overall aesthetic personality:
  • Corporate/professional scripts → clean, minimal, restrained
  • Energetic/marketing scripts → bold, punchy, high-impact
  • Storytelling/narrative scripts → elegant, warm, cinematic
  • Technical/educational scripts → precise, high-contrast, structured
  • Social media/casual scripts → playful, dynamic, friendly
2. Color palette — driven by the content's mood:
  • Dark backgrounds with bright accents for high energy
  • Muted/neutral tones for professional or calm content
  • High contrast (white on black, black on white) for clarity
  • One accent color for emphasis — not multiple
3. Font mood — typography character, not specific font names:
  • Heavy/condensed for impact and energy
  • Clean sans-serif for modern and professional
  • Rounded for friendly and approachable
  • Serif for elegance and storytelling
4. Animation character — how words enter and exit:
  • Scale-pop/slam for punchy energy
  • Gentle fade/slide for calm or professional
  • Word-by-word reveal for emphasis
  • Typewriter for technical or narrative pacing
1. 视觉风格——整体美学调性:
  • 商务/正式脚本→简洁、极简、克制
  • 活力/营销脚本→醒目、有力、高冲击力
  • 故事/叙事脚本→优雅、温暖、电影感
  • 技术/教育脚本→精准、高对比度、结构化
  • 社交媒体/休闲脚本→活泼、动态、友好
2. 色彩方案——由内容情绪驱动:
  • 高能量内容:深色背景搭配亮色强调
  • 专业或平静内容:柔和/中性色调
  • 追求清晰度:高对比度(黑底白字或白底黑字)
  • 仅使用一种强调色,避免多色混杂
3. 字体调性——字体的性格特征,而非具体字体名称:
  • 粗体压缩字:用于营造冲击力与能量感
  • 简洁无衬线:现代、专业风格
  • 圆角字体:友好、易亲近
  • 衬线字体:优雅、叙事感
4. 动画特征——单词的进入与退出方式:
  • 缩放弹出/猛然出现:用于有力的能量感
  • 淡入/滑入:用于平静或专业场景
  • 逐词显示:用于强调重点
  • 打字机效果:用于技术内容或叙事节奏把控

Per-Word Styling

逐词样式设置

Scan the script for words that deserve distinct visual treatment. Not every word is equal — some carry the message.
扫描脚本,识别需要特殊视觉处理的单词。并非所有单词都同等重要——有些是核心信息载体。

What to Detect

需检测的内容

  • Brand names / product names — larger size, unique color, distinct entrance
  • ALL CAPS words — the author emphasized them intentionally. Scale boost, flash, or accent color.
  • Numbers / statistics — bold weight, accent color. Numbers are the payload in data-driven content.
  • Emotional keywords — "incredible", "insane", "amazing", "revolutionary" → exaggerated animation (overshoot, bounce)
  • Proper nouns — names of people, places, events → distinct accent or italic
  • Call-to-action phrases — "sign up", "get started", "try it now" → highlight, underline, or color pop
  • 品牌名/产品名——放大字号、独特颜色、专属入场动画
  • 全大写单词——作者有意强调。可放大尺寸、添加闪烁效果或使用强调色
  • 数字/统计数据——加粗、使用强调色。数字是数据驱动内容的核心信息
  • 情绪化关键词——如"incredible"、"insane"、"amazing"、"revolutionary"→夸张动画(过冲、弹跳)
  • 专有名词——人名、地名、事件名→独特强调色或斜体
  • 行动号召语(CTA)——如"sign up"、"get started"、"try it now"→高亮、下划线或颜色突出

How to Apply

应用方式

For each detected word, specify:
  • Font size multiplier (e.g., 1.3x for emphasis, 1.5x for hero moments)
  • Color override (specific hex value)
  • Weight/style change (bolder, italic)
  • Animation variant (overshoot entrance, glow pulse, scale pop)
为每个检测到的单词指定:
  • 字号倍数(如:强调时用1.3倍,核心时刻用1.5倍)
  • 颜色覆盖(指定十六进制值)
  • 字重/样式变更(加粗、斜体)
  • 动画变体(过冲入场、发光脉冲、缩放弹出)

Script-to-Style Mapping

脚本风格与字幕样式映射表

Script toneFont moodAnimationColorSize
Hype/launchHeavy condensed, 800-900 weightScale-pop, back.out(1.7), fast 0.1-0.2sBright accent on dark (cyan, yellow, lime)Large 72-96px
Corporate/pitchClean sans-serif, 600-700 weightFade + slide-up, power3.out, 0.3sWhite/neutral on dark, single muted accentMedium 56-72px
Tutorial/educationalMono or clean sans, 500-600 weightTypewriter or gentle fade, 0.4-0.5sHigh contrast, minimal colorMedium 48-64px
Storytelling/brandSerif or elegant sans, 400-500 weightSlow fade, power2.out, 0.5-0.6sWarm muted tones, low opacity (0.85-0.9)Smaller 44-56px
Social/casualRounded sans, 700-800 weightBounce, elastic.out, word-by-wordPlayful colors, colored backgrounds on pillsMedium-large 56-80px
脚本语气字体调性动画效果色彩方案字号范围
狂热推广/新品发布粗体压缩字,800-900级字重缩放弹出、back.out(1.7)、快速0.1-0.2s深色背景搭配亮色强调(青色、黄色、 lime绿)大字号72-96px
商务/提案简洁无衬线,600-700级字重淡入+上滑、power3.out、0.3s深色背景配白色/中性色,单一柔和强调色中字号56-72px
教程/教育等宽或简洁无衬线,500-600级字重打字机效果或淡入、0.4-0.5s高对比度、极简色彩中字号48-64px
故事/品牌叙事衬线或优雅无衬线,400-500级字重缓慢淡入、power2.out、0.5-0.6s温暖柔和色调、低透明度(0.85-0.9)较小字号44-56px
社交/休闲圆角无衬线,700-800级字重弹跳、elastic.out、逐词显示活泼色彩、胶囊形背景色中大号56-80px

Word Grouping by Tone

按语气分组单词

Group size affects pacing. Fast content needs fast caption turnover.
  • High energy: 2-3 words per group. Quick turnover matches rapid delivery.
  • Conversational: 3-5 words per group. Natural phrase length.
  • Measured/calm: 4-6 words per group. Longer groups match slower pace.
Break groups on sentence boundaries (
.
?
!
), pauses (>150ms gap), or max word count — whichever comes first.
分组大小影响节奏。快节奏内容需要字幕快速切换。
  • 高能量内容:每组2-3个单词。快速切换匹配语速
  • 对话式内容:每组3-5个单词。符合自然短语长度
  • 平缓/冷静内容:每组4-6个单词。较长分组匹配慢语速
分组需在句子边界(
.
?
!
)、停顿(间隔>150ms)或最大单词数处断开,以先出现的条件为准。

Positioning

字幕定位

  • Landscape (1920x1080): Bottom 80-120px, centered
  • Portrait (1080x1920): Lower middle ~600-700px from bottom, centered
  • Never cover the subject's face
  • Use
    position: absolute
    — never relative (causes overflow)
  • One caption group visible at a time
  • 横屏(1920x1080):底部80-120px处,居中显示
  • 竖屏(1080x1920):底部向上约600-700px的中下位置,居中显示
  • 绝对不可遮挡人物面部
  • 使用
    position: absolute
    定位——禁止使用相对定位(会导致溢出)
  • 同一时间仅显示一组字幕

Constraints

约束条件

  • Deterministic. No
    Math.random()
    , no
    Date.now()
    .
  • Sync to transcript timestamps. Words appear when spoken.
  • One group visible at a time. No overlapping caption groups.
  • Check project root for font files before defaulting to Google Fonts.
  • 确定性:禁止使用
    Math.random()
    Date.now()
    等随机/时间相关函数
  • 与转录时间戳同步:单词需在对应语音出现时显示
  • 单组显示:禁止字幕组重叠
  • 优先使用本地字体:检查项目根目录的字体文件,若无则默认使用Google Fonts