hyperframes-captions
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCaptions
字幕
Analyze the spoken content to determine caption style. If the user specifies a style, use that. Otherwise, detect tone from the transcript.
分析语音内容来确定字幕样式。如果用户指定了样式,则使用指定样式;否则从转录文本中检测语气。
Transcript Source
转录来源
The project's contains word-level timestamps from whisper.cpp ( with ):
transcript.json--output-json-full--dtwjson
{
"transcription": [
{
"offsets": { "from": 0, "to": 5000 },
"text": " Hello world.",
"tokens": [
{ "text": " Hello", "offsets": { "from": 0, "to": 1000 }, "p": 0.98 },
{ "text": " world", "offsets": { "from": 1000, "to": 2000 }, "p": 0.95 }
]
}
]
}Normalize tokens into a word array before grouping:
js
const words = [];
for (const segment of transcript.transcription) {
for (const token of segment.tokens || []) {
const text = token.text.trim();
if (!text) continue;
words.push({
text,
start: token.offsets.from / 1000,
end: token.offsets.to / 1000,
});
}
}If no exists, check for or files. If no transcript is available, ask the user to provide one or run (when available).
transcript.json.srt.vtthyperframes transcribe项目的包含来自whisper.cpp的单词级时间戳(使用和参数生成):
transcript.json--output-json-full--dtwjson
{
"transcription": [
{
"offsets": { "from": 0, "to": 5000 },
"text": " Hello world.",
"tokens": [
{ "text": " Hello", "offsets": { "from": 0, "to": 1000 }, "p": 0.98 },
{ "text": " world", "offsets": { "from": 1000, "to": 2000 }, "p": 0.95 }
]
}
]
}在分组前将令牌标准化为单词数组:
js
const words = [];
for (const segment of transcript.transcription) {
for (const token of segment.tokens || []) {
const text = token.text.trim();
if (!text) continue;
words.push({
text,
start: token.offsets.from / 1000,
end: token.offsets.to / 1000,
});
}
}如果不存在,则检查是否有或文件。如果没有可用的转录文本,请要求用户提供,或者运行(如果该功能可用)。
transcript.json.srt.vtthyperframes transcribeStyle Detection (Default — When No Style Is Specified)
样式检测(默认模式——未指定样式时)
Read the full transcript before choosing a style. The style comes from the content, not a template.
选择样式前先通读完整转录文本。样式由内容决定,而非模板。
Four Dimensions
四个维度
1. Visual feel — the overall aesthetic personality:
- Corporate/professional scripts → clean, minimal, restrained
- Energetic/marketing scripts → bold, punchy, high-impact
- Storytelling/narrative scripts → elegant, warm, cinematic
- Technical/educational scripts → precise, high-contrast, structured
- Social media/casual scripts → playful, dynamic, friendly
2. Color palette — driven by the content's mood:
- Dark backgrounds with bright accents for high energy
- Muted/neutral tones for professional or calm content
- High contrast (white on black, black on white) for clarity
- One accent color for emphasis — not multiple
3. Font mood — typography character, not specific font names:
- Heavy/condensed for impact and energy
- Clean sans-serif for modern and professional
- Rounded for friendly and approachable
- Serif for elegance and storytelling
4. Animation character — how words enter and exit:
- Scale-pop/slam for punchy energy
- Gentle fade/slide for calm or professional
- Word-by-word reveal for emphasis
- Typewriter for technical or narrative pacing
1. 视觉感受 —— 整体美学风格:
- 企业/专业脚本 → 简洁、极简、克制
- 活力/营销向脚本 → 醒目、有冲击力、高影响力
- 故事讲述/叙事类脚本 → 优雅、温暖、有电影感
- 技术/教育类脚本 → 精准、高对比度、结构化
- 社交媒体/休闲向脚本 → 活泼、动态、友好
2. 配色方案 —— 由内容的情绪基调决定:
- 高能量内容使用深色背景加亮色点缀
- 专业或平静内容使用柔和/中性色调
- 高对比度(黑底白字、白底黑字)提升清晰度
- 仅使用一种强调色突出重点,避免使用多种强调色
3. 字体风格 —— 指排版特征,而非具体字体名称:
- 粗体/窄体字用于提升冲击力和活力感
- 简洁无衬线字体用于现代专业的场景
- 圆角字体用于友好易亲近的场景
- 衬线字体用于优雅和故事讲述的场景
4. 动效特征 —— 文字的进入和退出方式:
- 缩放弹出/猛入效果用于提升冲击感
- 柔和淡入/滑动用于平静或专业的场景
- 逐字显示用于强调内容
- 打字机效果适配技术或叙事类内容的节奏
Per-Word Styling
逐字样式设置
Scan the script for words that deserve distinct visual treatment. Not every word is equal — some carry the message.
扫描脚本中需要特殊视觉处理的单词。不同单词的权重不同——部分单词承担了核心信息传递的作用。
What to Detect
检测范围
- Brand names / product names — larger size, unique color, distinct entrance
- ALL CAPS words — the author emphasized them intentionally. Scale boost, flash, or accent color.
- Numbers / statistics — bold weight, accent color. Numbers are the payload in data-driven content.
- Emotional keywords — "incredible", "insane", "amazing", "revolutionary" → exaggerated animation (overshoot, bounce)
- Proper nouns — names of people, places, events → distinct accent or italic
- Call-to-action phrases — "sign up", "get started", "try it now" → highlight, underline, or color pop
- 品牌名 / 产品名 —— 更大字号、独特颜色、特殊入场效果
- ALL CAPS 单词 —— 是作者特意强调的内容,放大字号、闪烁效果或使用强调色
- 数字 / 统计数据 —— 加粗、使用强调色,数字是数据驱动内容的核心信息
- 情绪关键词 —— "incredible"、"insane"、"amazing"、"revolutionary" → 使用夸张动效(超出回弹、弹跳效果)
- 专有名词 —— 人名、地名、事件名 → 特殊强调色或斜体
- 行动号召(CTA)短语 —— "sign up"、"get started"、"try it now" → 高亮、下划线或颜色突出
How to Apply
应用方式
For each detected word, specify:
- Font size multiplier (e.g., 1.3x for emphasis, 1.5x for hero moments)
- Color override (specific hex value)
- Weight/style change (bolder, italic)
- Animation variant (overshoot entrance, glow pulse, scale pop)
对于每个检测到的单词,指定以下设置:
- 字号倍数(例如强调内容用1.3倍,核心高光时刻用1.5倍)
- 颜色覆盖(指定具体十六进制值)
- 字重/样式修改(加粗、斜体)
- 动效变体(超出回弹入场、发光脉冲、缩放弹出)
Script-to-Style Mapping
脚本到样式的映射
| Script tone | Font mood | Animation | Color | Size |
|---|---|---|---|---|
| Hype/launch | Heavy condensed, 800-900 weight | Scale-pop, back.out(1.7), fast 0.1-0.2s | Bright accent on dark (cyan, yellow, lime) | Large 72-96px |
| Corporate/pitch | Clean sans-serif, 600-700 weight | Fade + slide-up, power3.out, 0.3s | White/neutral on dark, single muted accent | Medium 56-72px |
| Tutorial/educational | Mono or clean sans, 500-600 weight | Typewriter or gentle fade, 0.4-0.5s | High contrast, minimal color | Medium 48-64px |
| Storytelling/brand | Serif or elegant sans, 400-500 weight | Slow fade, power2.out, 0.5-0.6s | Warm muted tones, low opacity (0.85-0.9) | Smaller 44-56px |
| Social/casual | Rounded sans, 700-800 weight | Bounce, elastic.out, word-by-word | Playful colors, colored backgrounds on pills | Medium-large 56-80px |
| 脚本语气 | 字体风格 | 动效 | 配色 | 字号大小 |
|---|---|---|---|---|
| Hype/产品发布 | 粗窄体,字重800-900 | 缩放弹出,back.out(1.7),时长0.1-0.2s | 深色背景搭配亮色强调(青色、黄色、青柠色) | 大字号72-96px |
| 企业宣传/项目Pitch | 简洁无衬线体,字重600-700 | 淡入+上滑,power3.out,时长0.3s | 深色背景搭配白色/中性色,单一柔和强调色 | 中等字号56-72px |
| 教程/教育类 | 等宽字体或简洁无衬线体,字重500-600 | 打字机效果或柔和淡入,时长0.4-0.5s | 高对比度,少用颜色 | 中等字号48-64px |
| 故事讲述/品牌宣传 | 衬线体或优雅无衬线体,字重400-500 | 缓慢淡入,power2.out,时长0.5-0.6s | 暖调柔和色系,低不透明度(0.85-0.9) | 偏小字号44-56px |
| 社交/休闲向 | 圆角无衬线体,字重700-800 | 弹跳效果,elastic.out,逐字显示 | 活泼配色,胶囊状彩色背景 | 中大号字号56-80px |
Word Grouping by Tone
按语气分组单词
Group size affects pacing. Fast content needs fast caption turnover.
- High energy: 2-3 words per group. Quick turnover matches rapid delivery.
- Conversational: 3-5 words per group. Natural phrase length.
- Measured/calm: 4-6 words per group. Longer groups match slower pace.
Break groups on sentence boundaries ( ), pauses (>150ms gap), or max word count — whichever comes first.
.?!分组大小会影响播放节奏。快节奏内容需要更快的字幕切换。
- 高能量内容: 每组2-3个单词,快速切换匹配快节奏的语速
- 对话类内容: 每组3-5个单词,符合自然短语长度
- 平缓/平静内容: 每组4-6个单词,更长的分组适配慢节奏
在句子边界( )、停顿(间隔>150ms)或达到最大单词数时拆分分组,以最先满足的条件为准。
.?!Positioning
位置设置
- Landscape (1920x1080): Bottom 80-120px, centered
- Portrait (1080x1920): Lower middle ~600-700px from bottom, centered
- Never cover the subject's face
- Use — never relative (causes overflow)
position: absolute - One caption group visible at a time
- 横屏(1920x1080): 底部80-120px区域,居中
- 竖屏(1080x1920): 下半部分,距离底部约600-700px区域,居中
- 永远不要遮挡人物面部
- 使用——不要用relative(会导致溢出)
position: absolute - 同一时间仅显示一组字幕
Constraints
约束条件
- Deterministic. No , no
Math.random().Date.now() - Sync to transcript timestamps. Words appear when spoken.
- One group visible at a time. No overlapping caption groups.
- Check project root for font files before defaulting to Google Fonts.
- 确定性: 不要使用、
Math.random()这类随机方法Date.now() - 与转录时间戳同步: 单词在被念到的时候显示
- 同一时间仅显示一组字幕: 不要出现字幕组重叠
- 优先检查项目根目录是否有字体文件,再默认使用Google Fonts。