wjs-converting-text-to-video
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesewjs-converting-text-to-video
wjs-converting-text-to-video
把一篇王建硕风格的微信公众号 做成 1080×1920 竖屏、30-90 秒 的中文解说短视频:TTS 旁白 + HyperFrames CSS/GSAP 动画 + 抽象水彩背景 + 转场 SFX。输出 MP4 给视频号 / 抖音 / 小红书 / Reels。
article.mdConvert a Wang Jianshuo-style WeChat Official Account into a 1080×1920 vertical, 30-90 second Chinese narrated short video: TTS voiceover + HyperFrames CSS/GSAP animations + abstract watercolor backgrounds + transition SFX. Output MP4 for WeChat Channels / Douyin / Xiaohongshu / Reels.
article.mdWhat this skill produces
What this skill produces
| 维度 | 默认 |
|---|---|
| 尺寸 | 1080×1920 竖屏 (9:16) |
| 时长 | 30-90 秒 |
| Scene 数 | 5-10 |
| 旁白 | 火山引擎 Volcano TTS,默认阿虎对话男声 |
| 背景 | GPT Image 2 生成的抽象水彩 ( |
| 字体 | Noto Sans SC,hero 900,主文字暖奶白 |
| 输出 | |
| 发布 | 自动上传到 YouTube — Portrait → Shorts,Landscape → 普通 video;重新渲染会替换老视频(不累积) |
| Dimension | Default |
|---|---|
| Resolution | 1080×1920 vertical (9:16) |
| Duration | 30-90 seconds |
| Number of Scenes | 5-10 |
| Voiceover | Volcano Engine Volcano TTS, default "Ahu Conversation" male voice |
| Background | Abstract watercolor generated by GPT Image 2 ( |
| Font | Noto Sans SC, hero weight 900, main text warm cream white |
| Output | |
| Publishing | Auto-upload to YouTube — Portrait → Shorts, Landscape → regular video; re-rendering replaces old video (no accumulation) |
When this skill fires
When this skill fires
- 用户已有 ,说「做成视频」「做一个解说」「讲一遍」
article.md - 用户跑
/wjs-converting-text-to-video <article-folder> - 用户说「把昨天发的那 X 篇都做成视频」之类的批量请求
- The user already has and says: 「做成视频」「做一个解说」「讲一遍」
article.md - The user runs
/wjs-converting-text-to-video <article-folder> - The user requests batch conversion like "Turn all X articles I posted yesterday into videos"
When NOT to use
When NOT to use
- 没有文章稿,只是一个想法 → 先用 写出 article.md,再来
/wjs-publishing-wechat - 用户要的是字幕烧录 / 翻译 / 配音替换 → 用 /
/wjs-burning-subtitles//wjs-dubbing-video/wjs-localizing-video - 视频要英文 / 西语等非中文 → 本 skill 专注中文 TTS (Volcano 火山引擎);非中文走 hyperframes 自带 tts 命令 (kokoro 英文还可以)
- 横屏 16:9 → 本 skill 默认竖屏;横屏仅在用户明确要求时改
- No article draft, only an idea → First use to write article.md, then proceed
/wjs-publishing-wechat - User needs subtitle burning / translation / voiceover replacement → Use /
/wjs-burning-subtitles//wjs-dubbing-video/wjs-localizing-video - Video requires non-Chinese languages like English/Spanish → This skill focuses on Chinese TTS (Volcano Engine); use hyperframes' built-in tts command for non-Chinese (Kokoro works well for English)
- Landscape 16:9 format → This skill defaults to vertical; only change to landscape if explicitly requested by user
Core Principle
Core Principle
视频不是文章的可视化朗读,而是文章的视觉重构。
每个 scene 是一个独立的视觉时刻 —— 一个对比、一个排比、一个数字、一个比喻。文字撑满屏幕,黑体加粗,重点字橙色高亮。背景是抽象水彩 (blur 后柔化),整体调子稳重、克制、有冲击力。
节奏 > 模板。一段 5-10 scene 的视频,如果从头到尾都是"两行对照"的同一种排版,就不是视频,是 slideshow。现代感来自对比 —— 极端字号差、不对称布局、短 scene 与长 scene 交替、纯文字 scene 与几何元素 scene 交替、水彩底 scene 与亮色 punch scene 交替。
默认是平庸的。如果只是从模板表顶端挑几种最容易的,结果一定是"平铺直叙的两行格式"。强制走 Step 1b Scene Mix Rule 配比。
Video is not a visual reading of the article, but a visual reconstruction of it.
Each scene is an independent visual moment — a contrast, a parallelism, a number, a metaphor. Text fills the screen, bolded, with key words highlighted in orange. The background is abstract watercolor (softened with blur), with an overall tone that is steady, restrained, and impactful.
Rhythm > Templates. A video with 5-10 scenes that uses the same "two-line comparison" layout throughout is not a video, it's a slideshow. Modernity comes from contrast — extreme font size differences, asymmetric layouts, alternating short and long scenes, alternating text-only and geometric-element scenes, alternating watercolor-background and bright punch scenes.
Default is mediocre. If you just pick the easiest templates from the top of the list, the result will definitely be a "flat two-line format". Mandatorily follow the Step 1b Scene Mix Rule ratio.
Workflow
Workflow
Step 1: 设计 5-10 个视觉时刻
Step 1: Design 5-10 visual moments
读 ,按论证结构拆成 5-10 个 scene(控制在 30-90 秒总时长)。短文(核心 1-2 个要点)做 5-6 scene / 30-50s;长文 8-10 scene / 60-90s。每个 scene 一段叙述(旁白)+ 一个清晰的视觉骨架。
<article-folder>/article.md模板表 —— 6 类共 16 种,按需混搭:
Read , split it into 5-10 scenes according to the argument structure (control total duration to 30-90 seconds). Short articles (1-2 core points) use 5-6 scenes / 30-50s; long articles use 8-10 scenes / 60-90s. Each scene includes a narration segment + a clear visual framework.
<article-folder>/article.mdTemplate Library — 6 categories, 16 templates total, mix as needed:
A. Hero / Punch(强对比 climax,每片 ≥1,时长 ≤4s)
A. Hero / Punch (High-contrast climax, ≥1 per video, duration ≤4s)
| 模板 | 适合 |
|---|---|
| A1. 全屏单字 hero | 1-3 字 climax 词撑满屏,字号 280-400px |
| A2. Outline hero | 空心字 |
| A3. Color-flip punch | 整屏背景换亮色(橙/红/金/翠绿等),反白字 |
| A4. Gradient text hero | 大字加 |
| Template | Suitable for |
|---|---|
| A1. Full-screen single-character hero | 1-3 climax words filling the screen, font size 280-400px |
| A2. Outline hero | Hollow text with |
| A3. Color-flip punch | Full-screen background changes to bright color (orange/red/gold/green etc.), with reversed text color |
| A4. Gradient text hero | Large text with |
B. Contrast / 对照(反差结构,每片 1-2 个,时长 5-8s)
B. Contrast / Comparison (Contrast structure, 1-2 per video, duration 5-8s)
| 模板 | 适合 |
|---|---|
| B1. 双行对照 + strikethrough | 「以前 X,现在 Y」「不是 A,是 B」 — 整片最多 2 个 |
| B2. 左右分屏对照 | 屏幕一分为二(可加竖线分隔) |
| B3. 对角线对照 | 左上 ↔ 右下,中间大量留白 |
| Template | Suitable for |
|---|---|
| B1. Two-line comparison + strikethrough | "Previously X, now Y" / "Not A, but B" — max 2 per video |
| B2. Split-screen left-right comparison | Screen divided into two halves (can add vertical separator line) |
| B3. Diagonal comparison | Top-left ↔ Bottom-right, with large blank space in the middle |
C. List / 结构(多项并列,每片 1-2 个,时长 6-10s)
C. List / Structure (Parallel items, 1-2 per video, duration 6-10s)
| 模板 | 适合 |
|---|---|
| C1. N 个卡片横排 | 3-5 个并列,用深暖黑 + 单色边框 |
| C2. 垂直堆叠关键词 | 6-8 个排比项,可加大数字编号 01-08 |
| C3. 真网格 | 2×2 / 3×2 网格,每格图标 + 标签(竖屏宽度有限,4 列横排会挤) |
| C4. 阶梯 / 错位列表 | 每项 |
| Template | Suitable for |
|---|---|
| C1. Horizontal row of N cards | 3-5 parallel items, using dark warm black + monochrome border |
| C2. Vertically stacked keywords | 6-8 parallel items, can add large numbering 01-08 |
| C3. True grid | 2×2 / 3×2 grid, each cell with icon + label (vertical screen width is limited, 4 columns will be crowded) |
| C4. Stepped / staggered list | Each item has increasing |
D. Stat / 数据(数字 climax,每片 ≥1,时长 4-6s)
D. Stat / Data (Number climax, ≥1 per video, duration 4-6s)
| 模板 | 适合 |
|---|---|
| D1. 数字 ticker | 0 → N 滚动动画( |
| D2. 数字 + 标签 | 主数字 200-400px + 60-80px 解释 |
| D3. 进度条 / 时间轴 | 横向 progress bar + 节点 |
| Template | Suitable for |
|---|---|
| D1. Number ticker | 0 → N scrolling animation ( |
| D2. Number + label | Main number 200-400px + 60-80px explanation |
| D3. Progress bar / timeline | Horizontal progress bar + nodes |
E. Quote / Climax(金句落点,每片 1-2 个,时长 6-10s)
E. Quote / Climax (Key quote conclusion, 1-2 per video, duration 6-10s)
| 模板 | 适合 |
|---|---|
| E1. 段落级 hero text | 一句 60-100px 金句,左对齐 + 左侧 emphasis bar |
| E2. 大引号 + 内文 | 巨大半透明开引号作背景装饰 |
| Template | Suitable for |
|---|---|
| E1. Paragraph-level hero text | A 60-100px key quote, left-aligned + left emphasis bar |
| E2. Large quotation marks + content | Huge semi-transparent opening quotation marks as background decoration |
F. 装饰 / 几何(节奏调味,可选)
F. Decoration / Geometry (Rhythm seasoning, optional)
| 模板 | 适合 |
|---|---|
| F1. 格子 + spinner / 进度条 | 多并发画面 |
| F2. 对话气泡 ↔ 回应 | 角色 A 说 → 角色 B 做 |
每个 scene 的旁白控制在 3-12 秒(短 punch 3-4s,长 breath 10-12s,不要全部都是 5-7s)。所有 scene 加起来 30-90 秒,不要超过 90 秒。文章短就做短,5 个 scene × 6s = 30s 也是合格。
| Template | Suitable for |
|---|---|
| F1. Grid + spinner / progress bar | Multi-concurrent visuals |
| F2. Dialogue bubble ↔ Response | Character A speaks → Character B acts |
Each scene's narration should be 3-12 seconds (short punch scenes 3-4s, long breathing scenes 10-12s, don't make all scenes 5-7s). Total duration of all scenes should be 30-90 seconds, no more than 90 seconds. Short articles should be made short — 5 scenes × 6s = 30s is acceptable.
Step 1b: Scene Mix Rule(强制)
Step 1b: Scene Mix Rule (Mandatory)
写完 5-10 个 scene 设计后,按下面 checklist 自查。任何一条不满足 → 回去调整。
After designing 5-10 scenes, self-check using the following checklist. If any item is not met → go back and adjust.
配比硬规则
Ratio Hard Rules
- ≥1 个 A 类 / D 类 / C 类 / E 类
- ≤2 个 B1 模板(双行 strikethrough — 历史上最容易被滥用)
- ≥1 个 A3 color-flip scene(亮色背景反白字)
- ≥4 种不同的模板类型(A/B/C/D/E/F 至少 4 类)
- ≤2 个连续 scene 用同一类
- ≥1 scene from Category A / D / C / E
- ≤2 scenes using B1 template (two-line strikethrough — the most overused template in history)
- ≥1 A3 color-flip scene (bright background with reversed text)
- ≥4 different template categories (at least 4 from A/B/C/D/E/F)
- ≤2 consecutive scenes using the same category
节奏硬规则
Rhythm Hard Rules
- scene 时长跨度 ≥ 6s(最短 ≤ 4s、最长 ≥ 9s)
- ≥2 次"短 → 长 → 短"或"长 → 短"节奏切换
- 字号跨度 ≥ 240px(最大 hero ≥ 320px,最小 ≤ 80px)
- Scene duration span ≥ 6s (shortest ≤ 4s, longest ≥ 9s)
- ≥2 rhythm switches like "short → long → short" or "long → short"
- Font size span ≥ 240px (largest hero ≥ 320px, smallest ≤ 80px)
布局硬规则
Layout Hard Rules
- ≥2 个 scene 非居中(贴角、对角、左对齐、阶梯等)
- ≥1 个 scene 留白占 ≥ 60% 屏幕(呼吸)
- ≥1 个 scene 含几何装饰(粗线、色块、箭头、圆点、大编号)
- ≥2 scenes with non-centered layout (corner-aligned, diagonal, left-aligned, stepped etc.)
- ≥1 scene with blank space occupying ≥ 60% of the screen (breathing space)
- ≥1 scene containing geometric decorations (thick lines, color blocks, arrows, dots, large numbering)
配色硬规则
Color Hard Rules
- 大部分 scene 没有 色 — 让水彩 bg-image 透出;只有 A3 color-flip 才用纯色 bg
background: - color-flip scene 颜色不只是橙/蓝/白(深红 / 深金 / 翠绿 / 青松 / 暗紫 等都可)
- emphasis 至少 2-3 种颜色(技术词用蓝、价值词用金、增长词用绿、警告词用红)
- Most scenes do not have color — let the watercolor bg-image show through; only use solid color bg for A3 color-flip scenes
background: - Color-flip scene colors are not limited to orange/blue/white (deep red / deep gold / emerald green / pine green / dark purple etc. are all acceptable)
- At least 2-3 different emphasis colors (technical terms in blue, value terms in gold, growth terms in green, warning terms in red)
反单调自检
Anti-Monotony Self-Check
- 把所有 scene 截图缩成缩略图并排 — 能一眼分辨吗?如果 8 个看起来一样 → 重做
- 第 1、4、7 scene 的视觉密度是不是不一样?应该有的密、有的极简
- 有"meta-rhythm"吗?比如 A 开场 → 3 个 B/C 展开 → D climax → E 收尾 — 比线性铺更有戏剧弧
- Screenshot all scenes, shrink to thumbnails and arrange side by side — can you tell them apart at a glance? If 8 look the same → redo
- Are the visual densities of scenes 1,4,7 different? Some should be dense, some extremely minimal
- Is there a "meta-rhythm"? For example: A opening → 3 B/C expansion scenes → D climax → E conclusion — more dramatic than linear layout
Step 2: 写 narration_chunks.json
narration_chunks.jsonStep 2: Write narration_chunks.json
narration_chunks.jsonjson
[
{"id": "s01", "text": "我们以前,是 AI 的领导。现在,我们就是它的维修工。"},
{"id": "s02", "text": "..."}
]写旁白细节:
- 比 article.md 更口语、更短促,逗号/句号多用让 TTS 自然停顿
- 数字 / 英文混排 OK("Claude Code"、"100 倍"),Volcano 都能读
- 不写括号注释、不写 、不写破折号
...(TTS 会念出 "破折号" 三字)—— - 删掉 article.md 里的 ,只留纯文字
**加粗 markdown** - 去掉百姓网相关 facts:article.md 里如出现「百姓网」「百姓网现在 X 人」「百姓网员工」等都要 strip 或泛化("百姓网现在 158 个人" → "现实里没几个真人")。这是过时信息,不要进视频。同理 visuals 不要出现 "百姓网" label 或 "158 人" stat。详见 [[no-baixing-facts]]
json
[
{"id": "s01", "text": "我们以前,是 AI 的领导。现在,我们就是它的维修工。"},
{"id": "s02", "text": "..."}
]Narration Writing Details:
- More colloquial and concise than article.md, use more commas/periods to allow natural pauses in TTS
- Mixed numbers/English is OK ("Claude Code", "100 倍"), Volcano TTS can read them correctly
- Do not write parenthetical comments, , or em dashes
...(TTS will read "em dash" aloud)—— - Remove from article.md, leave only plain text
**bold markdown** - Remove Baixing.com-related facts: If article.md contains "百姓网", "百姓网 now has X people", "百姓网 employees" etc., strip or generalize them ("百姓网 now has 158 people" → "There are very few real people in reality"). This is outdated information and should not be included in the video. Similarly, do not include "百姓网" labels or "158 people" stats in visuals. See [[no-baixing-facts]]
Step 3: 生成 TTS narration
Step 3: Generate TTS narration
bash
cd <article-folder>/video
python3 tts_narration.py脚本默认用 (阿虎对话)— 段间插 0.35s 静音,输出 + 。
zh_male_ahu_conversation_wvae_bigttsnarration.mp3timing.jsonVolcano TTS 注意事项(踩过的坑):
- 用 resource ,speaker 选
volc.service_type.10029zh_*_*_bigtts - 绝对不要传 /
emotion— 大部分emotion_scale声音会返回_bigtts静默失败data: null - 绝对不要用 kokoro(hyperframes 自带 tts)— 中文质量差,用户明确不接受
- 避免 — 含英文专名(如 "Claude Code")会循环 hallucinate
zh_male_jieshuonansheng_mars_bigtts
备用声音(按推荐顺序):
- (阿虎对话) — 默认,自然口语
zh_male_ahu_conversation_wvae_bigtts - — 同 wvae 系列
zh_male_M392_conversation_wvae_bigtts - (温暖阿虎) — 更暖、播音感
zh_male_wennuanahu_moon_bigtts - (思朗) — 沉稳思考,戏剧感强
zh_male_silang_mars_bigtts - (霸气) — 更有力度
zh_male_baqiqingshu_mars_bigtts
切声音:
python3 tts_narration.py --voice zh_male_silang_mars_bigttsbash
cd <article-folder>/video
python3 tts_narration.pyThe script defaults to (Ahu Conversation) — inserts 0.35s silence between segments, outputs + .
zh_male_ahu_conversation_wvae_bigttsnarration.mp3timing.jsonVolcano TTS Notes (Lessons Learned):
- Use resource , select speakers with
volc.service_type.10029zh_*_*_bigtts - Never pass /
emotionparameters — mostemotion_scalevoices will return_bigttsand fail silentlydata: null - Never use Kokoro (hyperframes' built-in tts) — Chinese quality is poor, users explicitly reject it
- Avoid — will loop hallucinate when containing English proper nouns (e.g., "Claude Code")
zh_male_jieshuonansheng_mars_bigtts
Alternative Voices (in recommended order):
- (Ahu Conversation) — default, natural colloquial
zh_male_ahu_conversation_wvae_bigtts - — same wvae series
zh_male_M392_conversation_wvae_bigtts - (Warm Ahu) — warmer, broadcast-style
zh_male_wennuanahu_moon_bigtts - (Silang) — calm, thoughtful, dramatic
zh_male_silang_mars_bigtts - (Domineering) — more powerful
zh_male_baqiqingshu_mars_bigtts
Switch voices:
python3 tts_narration.py --voice zh_male_silang_mars_bigttsStep 4: 生成水彩背景图
Step 4: Generate watercolor background image
bg-image 是视觉主基调(柔化的抽象水彩)。不要用 article 的 — 手绘示意图细节太多,blur 后变成均匀深色泥(视觉上仍是纯黑)。必须用专门生成的抽象水彩。
illustration.pngbash
~/.claude/skills/wjs-converting-text-to-video/scripts/generate-bg.sh <article-folder> <theme><theme>| theme | 色板 | 适合 |
|---|---|---|
| bright warm yellow, soft coral pink, terracotta, sage green, cream | 个人、手作、温暖 |
| cool teal, electric blue, deep purple, mint, white | AI、技术、数据 |
| sage green, dusty blue, lavender, pearl, cream | 反思、沉静 |
| burnt orange, deep red, mustard, charcoal | 警示、张力 |
| fresh green, gold, soft yellow, sky blue | 增长、复利 |
| lavender, dusty rose, sage, soft amber | 抽象、哲思 |
输出: (1088×1920, ~3MB)。
<article-folder>/video/bg.png⚠️ 图片必须在 目录内 — 不能用 ,hyperframes render 不解析跨目录相对路径,会渲染成纯黑。
video/../illustration.pngThe bg-image is the main visual tone (softened abstract watercolor). Do not use the article's — hand-drawn schematics have too many details, and become uniform dark mud after blur (visually still pure black). Must use specially generated abstract watercolor.
illustration.pngbash
~/.claude/skills/wjs-converting-text-to-video/scripts/generate-bg.sh <article-folder> <theme>Choose (based on article topic):
<theme>| theme | Color Palette | Suitable for |
|---|---|---|
| bright warm yellow, soft coral pink, terracotta, sage green, cream | Personal, handcrafted, warm topics |
| cool teal, electric blue, deep purple, mint, white | AI, technology, data topics |
| sage green, dusty blue, lavender, pearl, cream | Reflection, calm topics |
| burnt orange, deep red, mustard, charcoal | Warning, tension topics |
| fresh green, gold, soft yellow, sky blue | Growth, compound interest topics |
| lavender, dusty rose, sage, soft amber | Abstract, philosophical topics |
Output: (1088×1920, ~3MB).
<article-folder>/video/bg.png⚠️ The image must be in the directory — cannot use , hyperframes render does not resolve cross-directory relative paths and will render pure black.
video/../illustration.pngStep 5: 写 HyperFrames composition (index.html
)
index.htmlStep 5: Write HyperFrames composition (index.html
)
index.html读 ,按每个 chunk 的 start/end 设计 scene。竖屏 1080×1920 结构:
timing.jsonhtml
<html><head><script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
<style>
html, body {
width: 1080px; height: 1920px; margin: 0; overflow: hidden;
background: #0e0b08;
font-family: 'Noto Sans SC', 'PingFang SC', 'Heiti SC', sans-serif;
font-weight: 900;
color: #f5efe5;
letter-spacing: -0.02em;
-webkit-font-smoothing: antialiased;
}
#bg-image {
position: absolute; inset: 0;
background-image: url('bg.png');
background-size: cover;
background-position: center;
filter: blur(30px) brightness(0.65) saturate(0.85);
z-index: 0;
transform: scale(1.1);
}
#bg-overlay {
position: absolute; inset: 0;
background: rgba(14, 11, 8, 0.28);
z-index: 1;
}
.scene { position: absolute; inset: 0; overflow: hidden; opacity: 0; z-index: 2; }
#s1 { opacity: 1; }
/* ... scene-specific styles ... */
</style></head>
<body>
<div id="root" data-composition-id="main" data-start="0" data-duration="<total+2>" data-width="1080" data-height="1920">
<div id="bg-image"></div>
<div id="bg-overlay"></div>
<!-- scene divs s1..sN -->
<!-- audio: narration + ticks + chimes + bell -->
</div>
<script>
/* GSAP timeline: paused + register to window.__timelines['main'] */
</script>
</body></html>Read , design scenes according to each chunk's start/end times. 1080×1920 vertical screen structure:
timing.jsonhtml
<html><head><script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
<style>
html, body {
width: 1080px; height: 1920px; margin: 0; overflow: hidden;
background: #0e0b08;
font-family: 'Noto Sans SC', 'PingFang SC', 'Heiti SC', sans-serif;
font-weight: 900;
color: #f5efe5;
letter-spacing: -0.02em;
-webkit-font-smoothing: antialiased;
}
#bg-image {
position: absolute; inset: 0;
background-image: url('bg.png');
background-size: cover;
background-position: center;
filter: blur(30px) brightness(0.65) saturate(0.85);
z-index: 0;
transform: scale(1.1);
}
#bg-overlay {
position: absolute; inset: 0;
background: rgba(14, 11, 8, 0.28);
z-index: 1;
}
.scene { position: absolute; inset: 0; overflow: hidden; opacity: 0; z-index: 2; }
#s1 { opacity: 1; }
/* ... scene-specific styles ... */
</style></head>
<body>
<div id="root" data-composition-id="main" data-start="0" data-duration="<total+2>" data-width="1080" data-height="1920">
<div id="bg-image"></div>
<div id="bg-overlay"></div>
<!-- scene divs s1..sN -->
<!-- audio: narration + ticks + chimes + bell -->
</div>
<script>
/* GSAP timeline: paused + register to window.__timelines['main'] */
</script>
</body></html>🎬 第一帧规则(硬性)
🎬 First Frame Rule (Mandatory)
视频 t=0 必须包含:
- bg-image 完全可见 — 永远 opacity 1,从不 fade-in(CSS 默认就可见,别在 GSAP 里改它的 opacity)
- 标题元素可见 — s1 的主要标题元素 可,但 不要
tl.from({y:30, scale:0.95}),否则 t=0 就是黑屏tl.from({opacity:0}) - s1 不能是 A3 color-flip — 否则盖住 bg-image,第一帧就看不到水彩。color-flip 留给 s2+
At video t=0, it must include:
- Full visibility of bg-image — always opacity 1, never fade-in (visible by default in CSS, do not change its opacity in GSAP)
- Visible title element — s1's main title element can use , but do not use
tl.from({y:30, scale:0.95}), otherwise t=0 will be black screentl.from({opacity:0}) - s1 cannot be A3 color-flip — otherwise it will cover the bg-image, and the watercolor will not be visible in the first frame. Save color-flip for s2+
色彩系统
Color System
主文字 / 锚定色(design system,全片一致):
| 角色 | 值 | 用法 |
|---|---|---|
| 主文字 | | hero / 主要内容 |
| 二级文字(副标题、caption) | | 不要用灰色( |
| 划掉文字本身 | | 不要用 |
| 装饰大编号(01-08) | | 不要用 |
| Outline 描边 | | A2 空心字 |
| 默认 fallback bg | | 被 bg-image + overlay 覆盖;color-flip 不用 |
核心原则:所有文字用 cream 或 橙系(accent palette),用 opacity + size 做 hierarchy,不用色相变化。灰色是黑底时代的遗物,水彩底上一律不用。 详见 [[no-low-contrast-text]]
#f5efe5#e87a3eColor-flip 背景 palette(A3,不只是橙/蓝/白):
| hex | 适合 |
|---|---|
| 警示、强调、climax punch |
| 数据、技术 climax |
| 收尾、安静的反差 |
| 警告、错误 climax |
| 成就、价值 climax |
| 增长、复利、生命力 |
| 冷静、长期主义 |
| 智慧、神秘 climax |
| 柔软、人性 |
color-flip 上的文字用 或 反相。
#0e0b08#f5efe5Emphasis / Accent palette(不只是橙):
| hex | 适合 |
|---|---|
| 默认 emphasis |
| 数据、技术词、AI |
| 价值、成就 |
| 增长、好结果 |
| 长期、稳定 |
| 警告、反差 |
| 抽象、智慧 |
| 柔软、人性化 |
整片 emphasis ≥ 2-3 种,根据 scene 主题选 accent。
Main Text / Anchor Colors (design system, consistent throughout the video):
| Role | Value | Usage |
|---|---|---|
| Main Text | | Hero text / main content |
| Secondary Text (subtitle, caption) | | Do not use gray ( |
| Strikethrough text itself | | Do not use |
| Decorative large numbering (01-08) | | Do not use |
| Outline stroke | | A2 hollow text |
| Default fallback bg | | Covered by bg-image + overlay; not used for color-flip |
Core Principle: All text uses cream or orange series (accent palette), use opacity + size for hierarchy, do not use hue changes. Gray is a relic of the black background era, never use it on watercolor backgrounds. See [[no-low-contrast-text]]
#f5efe5#e87a3eColor-flip Background Palette (A3, not limited to orange/blue/white):
| hex | Suitable for |
|---|---|
| Warning, emphasis, climax punch |
| Data, technology climax |
| Conclusion, quiet contrast |
| Warning, error climax |
| Achievement, value climax |
| Growth, compound interest, vitality |
| Calm, long-termism |
| Wisdom, mysterious climax |
| Soft, humanistic topics |
Text on color-flip backgrounds uses or reversed colors.
#0e0b08#f5efe5Emphasis / Accent Palette (not limited to orange):
| hex | Suitable for |
|---|---|
| Default emphasis |
| Data, technical terms, AI |
| Value, achievement |
| Growth, positive results |
| Long-term, stable |
| Warning, contrast |
| Abstract, wisdom |
| Soft, humanistic |
Use at least 2-3 different emphasis colors throughout the video, choose accents based on scene themes.
字体系统(竖屏 1080 宽)
Font System (1080px wide vertical screen)
| 项 | 值 |
|---|---|
| 字重 | hero 900 / 主文 800 / 二级 600-700 / caption 500 |
| 字距 | hero |
| Punch hero (A1/A2,1-3 字) | 280-400px |
| 短句 hero (4-6 字) | 160-240px |
| 长句 hero (7-10 字) | 100-150px |
| 卡片内容 | 56-130px |
| 副标题 | 40-72px |
| Caption / 序号 / 标签 | 20-40px |
| Item | Value |
|---|---|
| Font Weight | hero 900 / main text 800 / secondary 600-700 / caption 500 |
| Letter Spacing | hero |
| Punch hero (A1/A2, 1-3 characters) | 280-400px |
| Short sentence hero (4-6 characters) | 160-240px |
| Long sentence hero (7-10 characters) | 100-150px |
| Card content | 56-130px |
| Subtitle | 40-72px |
| Caption / numbering / label | 20-40px |
布局系统(反居中惯性)
Layout System (Anti-centering inertia)
| 布局 | CSS 关键 | 适合 |
|---|---|---|
| 居中 | | A 类 hero,但 ≤50% scene |
| 左对齐贴顶 | | E 类金句、长 quote |
| 右下角锚定 | | 落款、climax 词 |
| 对角线 | top-left / bottom-right | B3 对角对照 |
| 网格 | | C3(竖屏 2×N 而非 3×N) |
| 阶梯 | 每项 | C4 错位列表 |
| 贴底 + 上方留白 | | 呼吸 scene |
| 边角小元素 | 文字小贴一角,其他全空 | 极简 / 留白 punch |
Padding:撑满型 40-80px,呼吸型 120-200px。不要所有 scene 都用同一个 padding。
| Layout | CSS Key Points | Suitable for |
|---|---|---|
| Centered | | Category A hero scenes, but ≤50% of total scenes |
| Left-aligned top | | Category E key quotes, long quotes |
| Bottom-right anchored | | Signature, climax words |
| Diagonal | top-left / bottom-right | B3 diagonal comparison |
| Grid | | C3 (2×N instead of 3×N for vertical screen) |
| Stepped | Each item has | C4 staggered list |
| Bottom-aligned + top blank space | | Breathing scenes |
| Corner small element | Small text anchored to one corner, rest blank | Minimalist / blank punch scenes |
Padding: 40-80px for full-screen scenes, 120-200px for breathing scenes. Do not use the same padding for all scenes.
几何装饰元素
Geometric Decorative Elements
每隔几个 scene 用一个:
- 粗短线 8-16px × 40-200px,emphasis bar,橙色
- 左侧 emphasis bar 6px × 100%,配长 quote
- 大数字编号 01-08,list 项序号(淡灰、巨大、装饰性)
- 大引号字符 半透明超大置左上
" - 横向分隔线 2-4px 奶白 30% 透明
- 圆点 / 方块 12-20px、橙色,list bullet
- 箭头 ➜ 或自绘 SVG
Use one every few scenes:
- Thick short line 8-16px × 40-200px, emphasis bar, orange
- Left emphasis bar 6px × 100%, paired with long quotes
- Large numbering 01-08, list item numbering (light gray, huge, decorative)
- Large quotation mark character semi-transparent, placed top-left
" - Horizontal separator line 2-4px cream white with 30% transparency
- Dot / square 12-20px, orange, list bullet
- Arrow ➜ or custom SVG
Scene 转场(4 种 + 混用规则)
Scene Transitions (4 types + mixing rules)
不要全片都 blur crossfade。每 4 个转场必须 ≥2 种类型。
T1. Blur crossfade(默认柔和)
- 0.6s,
sine.inOut - 后 scene →
opacity: 0, filter: blur(24px)opacity: 1, filter: blur(0) - 前 scene 同时 fade-out + blur
T2. White flash cut(punch 切,最现代)
- 0.18s 总长:60ms 白闪 → 切 → 40ms 新 scene scale 1.05 → 1
- 适合:进入 A 类 hero、D 类 stat、climax 切换
js
tl.to('.flash', { opacity: 1, duration: 0.06, ease: 'none' }, T - 0.06)
.set(prevScene, { opacity: 0 }, T)
.set(nextScene, { opacity: 1 }, T)
.to('.flash', { opacity: 0, duration: 0.12, ease: 'power2.out' }, T)
.from(nextScene, { scale: 1.05, duration: 0.25, ease: 'expo.out' }, T);T3. Scale push(推进感)
- 0.55s,前 scene ,后 scene
scale: 1 → 0.85scale: 1.15 → 1 - 适合:从概览推到细节
T4. Color flash cut(橙/蓝闪一下,强烈节奏)
- 0.22s 总长:80ms 全屏橙 → 切 → 40ms 收
- 适合:进入 A3 color-flip 或关键转折
- 全片最多 2 次
flash overlay 在 HTML 里加 全屏定位、默认 opacity 0、z-index 100。
<div class="flash">Do not use blur crossfade for all transitions. For every 4 transitions, use at least 2 different types.
T1. Blur crossfade (default soft)
- 0.6s,
sine.inOut - Next scene transitions from →
opacity: 0, filter: blur(24px)opacity: 1, filter: blur(0) - Previous scene fades out + blurs simultaneously
T2. White flash cut (punch cut, most modern)
- Total 0.18s: 60ms white flash → cut → 40ms new scene scale 1.05 → 1
- Suitable for: entering Category A hero, Category D stat, climax transitions
js
tl.to('.flash', { opacity: 1, duration: 0.06, ease: 'none' }, T - 0.06)
.set(prevScene, { opacity: 0 }, T)
.set(nextScene, { opacity: 1 }, T)
.to('.flash', { opacity: 0, duration: 0.12, ease: 'power2.out' }, T)
.from(nextScene, { scale: 1.05, duration: 0.25, ease: 'expo.out' }, T);T3. Scale push (sense of advancement)
- 0.55s, previous scene , next scene
scale: 1 → 0.85scale: 1.15 → 1 - Suitable for: pushing from overview to details
T4. Color flash cut (orange/blue flash, strong rhythm)
- Total 0.22s: 80ms full-screen orange → cut → 40ms fade out
- Suitable for: entering A3 color-flip or key turning points
- Max 2 times per video
Add flash overlay in HTML: positioned full-screen, default opacity 0, z-index 100.
<div class="flash">入场动画规则
Entrance Animation Rules
- 每个 scene 的每个元素都用 入场(y/opacity/scale)
tl.from(...) - 入场 stagger 0.1-0.3s;首元素 t = scene.start + 0.3 起
- ≥3 种不同 ease(/
power3.out/back.out(1.3)/expo.out)elastic.out(1, 0.5) - 不要 退场 — 转场已处理。只有最后 scene 可 fade-to-black
gsap.to({opacity: 0}) - 整片必须用到 ≥3 种 Modern Motion Techniques
- Every element in every scene uses entrance animation (y/opacity/scale)
tl.from(...) - Entrance stagger 0.1-0.3s; first element starts at scene.start + 0.3s
- Use at least 3 different eases (/
power3.out/back.out(1.3)/expo.out)elastic.out(1, 0.5) - Do not use for exit — transitions handle exit. Only the last scene can fade-to-black
gsap.to({opacity: 0}) - Must use at least 3 types of Modern Motion Techniques throughout the video
Modern Motion Techniques
Modern Motion Techniques
平庸视频和现代视频的差别一半在排版、一半在 motion。下面 7 种每片必须用 ≥3 种(特定 scene 用,不要全片堆)。
Half the difference between mediocre and modern videos is layout, the other half is motion. Use at least 3 of the following 7 techniques per video (use in specific scenes, don't stack all throughout)
1. Kinetic Typography(字符 stagger 入场)—— A 类 hero
1. Kinetic Typography (Character stagger entrance) — Category A hero
html
<h1 class="kinetic">维 修 工</h1>js
tl.from('.kinetic span', {
y: 180, opacity: 0, rotateX: -90,
duration: 0.7, stagger: 0.06,
ease: 'back.out(1.4)',
transformOrigin: '50% 100%',
}, T);html
<h1 class="kinetic">维 修 工</h1>js
tl.from('.kinetic span', {
y: 180, opacity: 0, rotateX: -90,
duration: 0.7, stagger: 0.06,
ease: 'back.out(1.4)',
transformOrigin: '50% 100%',
}, T);2. Camera Punch(推近 / 拉远)—— A3、D 类
2. Camera Punch (Push in / Pull out) — A3, Category D
js
tl.from(scene, { scale: 1.15, opacity: 0, duration: 0.5, ease: 'expo.out' }, sceneStart);js
tl.from(scene, { scale: 1.15, opacity: 0, duration: 0.5, ease: 'expo.out' }, sceneStart);3. Mask Reveal(clip-path 揭示)—— E 类 quote
3. Mask Reveal (clip-path reveal) — Category E quote
css
.reveal { clip-path: inset(0 100% 0 0); }js
tl.to('.reveal', { clipPath: 'inset(0 0% 0 0)', duration: 0.9, ease: 'expo.inOut' }, T);css
.reveal { clip-path: inset(0 100% 0 0); }js
tl.to('.reveal', { clipPath: 'inset(0 0% 0 0)', duration: 0.9, ease: 'expo.inOut' }, T);4. Number Ticker(数字滚动)—— D1
4. Number Ticker (Number scrolling) — D1
html
<div class="ticker" data-end="3600">0</div>js
const ticker = document.querySelector('.ticker');
const obj = { val: 0 };
tl.to(obj, {
val: parseInt(ticker.dataset.end),
duration: 1.8, ease: 'power2.out',
onUpdate: () => { ticker.textContent = Math.round(obj.val).toLocaleString(); },
}, T);html
<div class="ticker" data-end="3600">0</div>js
const ticker = document.querySelector('.ticker');
const obj = { val: 0 };
tl.to(obj, {
val: parseInt(ticker.dataset.end),
duration: 1.8, ease: 'power2.out',
onUpdate: () => { ticker.textContent = Math.round(obj.val).toLocaleString(); },
}, T);5. Outline → Fill(空心字变实心)—— A2
5. Outline → Fill (Hollow text to solid) — A2
css
.morph { -webkit-text-stroke: 4px #f5efe5; color: transparent; }js
tl.to('.morph', { color: '#e87a3e', webkitTextStrokeColor: '#e87a3e', duration: 0.5, ease: 'power2.out' }, T);css
.morph { -webkit-text-stroke: 4px #f5efe5; color: transparent; }js
tl.to('.morph', { color: '#e87a3e', webkitTextStrokeColor: '#e87a3e', duration: 0.5, ease: 'power2.out' }, T);6. Letter Highlight Sweep(关键词扫光)—— E 类 climax 词
6. Letter Highlight Sweep (Keyword sweep highlight) — Category E climax word
html
<span class="sweep"><span class="sweep-bg"></span>搭档</span>css
.sweep { position: relative; display: inline-block; padding: 0 8px; }
.sweep-bg { position: absolute; inset: 0; background: #e87a3e; transform: scaleX(0); transform-origin: left; z-index: -1; }js
tl.to('.sweep-bg', { scaleX: 1, duration: 0.5, ease: 'power3.inOut' }, T);
tl.to('.sweep', { color: '#0e0b08', duration: 0.1 }, T + 0.25);html
<span class="sweep"><span class="sweep-bg"></span>搭档</span>css
.sweep { position: relative; display: inline-block; padding: 0 8px; }
.sweep-bg { position: absolute; inset: 0; background: #e87a3e; transform: scaleX(0); transform-origin: left; z-index: -1; }js
tl.to('.sweep-bg', { scaleX: 1, duration: 0.5, ease: 'power3.inOut' }, T);
tl.to('.sweep', { color: '#0e0b08', duration: 0.1 }, T + 0.25);7. Background Color Punch(背景闪变)—— 全片 1-2 次
7. Background Color Punch (Background flash change) — 1-2 times per video
js
tl.to(scene, { backgroundColor: '#e87a3e', duration: 0.08 }, T)
.to(scene, { backgroundColor: '#0e0b08', duration: 0.4, ease: 'power2.out' }, T + 0.1);Strike-through 动画:用真实 DOM 而不是 。伪元素 + CSS 变量在 hyperframes 某些渲染路径下不工作。
<span class="strike-line">::afterhtml
<span class="strike">领导<span class="strike-line"></span></span>css
.strike-line { position: absolute; left: -10px; right: -10px; top: 56%; height: 10px; background: #e87a3e; transform: scaleX(0); transform-origin: left; }js
tl.to('.strike .strike-line', { scaleX: 1, duration: 0.55, ease: 'power2.inOut' }, T);js
tl.to(scene, { backgroundColor: '#e87a3e', duration: 0.08 }, T)
.to(scene, { backgroundColor: '#0e0b08', duration: 0.4, ease: 'power2.out' }, T + 0.1);Strike-through animation: Use real DOM instead of . Pseudo-elements + CSS variables may fail in some hyperframes rendering paths.
<span class="strike-line">::afterhtml
<span class="strike">领导<span class="strike-line"></span></span>css
.strike-line { position: absolute; left: -10px; right: -10px; top: 56%; height: 10px; background: #e87a3e; transform: scaleX(0); transform-origin: left; }js
tl.to('.strike .strike-line', { scaleX: 1, duration: 0.55, ease: 'power2.inOut' }, T);Step 6: 加 SFX
Step 6: Add SFX
bash
~/.claude/skills/wjs-converting-text-to-video/scripts/synth-sfx.sh <article-folder>/video生成 :
video/sfx/{tick,chime,bell}.mp3- — 80ms 1.2kHz sine,转场用(每次 scene 切换前 0.3s)
tick.mp3 - — 220ms 880+1320Hz 双音,对话/列表某项亮起时(可选)
chime.mp3 - — 1.5s 低频钟,最后 climax 词出来时(全片最多 1 次)
bell.mp3
接入 timeline:
html
<audio id="aud-narration" src="narration.mp3" data-start="0" data-duration="<total>" data-track-index="0" data-volume="1"></audio>
<audio id="aud-tick-s02" src="sfx/tick.mp3" data-start="<scene2.start - 0.3>" data-duration="0.1" data-track-index="2" data-volume="0.55"></audio>
<!-- 重复每个 scene 切换;T2/T4 flash 转场可不加 tick -->
<audio id="aud-chime-s08-1" src="sfx/chime.mp3" data-start="<T>" data-duration="0.3" data-track-index="3" data-volume="0.45"></audio>
<audio id="aud-bell-s12" src="sfx/bell.mp3" data-start="<climax-T>" data-duration="1.6" data-track-index="4" data-volume="0.55"></audio>⚠️ 每个 必须有 ,否则 render 出 silent(hyperframes 强制要求)。
<audio>id不同 不冲突,同 track 不能时间重叠。
track-indexSFX 用量节制:转场 tick 必须;chime / bell 是装饰,scene 内容简单时不加;bell 全片只 1 次。
bash
~/.claude/skills/wjs-converting-text-to-video/scripts/synth-sfx.sh <article-folder>/videoGenerates :
video/sfx/{tick,chime,bell}.mp3- — 80ms 1.2kHz sine, for transitions (0.3s before each scene switch)
tick.mp3 - — 220ms 880+1320Hz dual-tone, used when dialogue/list items light up (optional)
chime.mp3 - — 1.5s low-frequency bell, used when final climax word appears (max 1 time per video)
bell.mp3
Integrate into timeline:
html
<audio id="aud-narration" src="narration.mp3" data-start="0" data-duration="<total>" data-track-index="0" data-volume="1"></audio>
<audio id="aud-tick-s02" src="sfx/tick.mp3" data-start="<scene2.start - 0.3>" data-duration="0.1" data-track-index="2" data-volume="0.55"></audio>
<!-- Repeat for each scene switch; no tick needed for T2/T4 flash transitions -->
<audio id="aud-chime-s08-1" src="sfx/chime.mp3" data-start="<T>" data-duration="0.3" data-track-index="3" data-volume="0.45"></audio>
<audio id="aud-bell-s12" src="sfx/bell.mp3" data-start="<climax-T>" data-duration="1.6" data-track-index="4" data-volume="0.55"></audio>⚠️ Every must have an — otherwise render will be silent (hyperframes mandatory requirement).
<audio>idDifferent values do not conflict; overlapping times on the same track are not allowed.
track-indexSFX Usage Discipline: Transition ticks are mandatory; chimes/bells are decorative, do not add when scene content is simple; bell can only be used once per video.
Step 7: Lint + Inspect + Render(必须按顺序)
Step 7: Lint + Inspect + Render (Must follow order)
bash
cd <article-folder>/videobash
cd <article-folder>/video必跑 1:linter(必须 0 errors)
Mandatory 1: Linter (must have 0 errors)
npx hyperframes lint
npx hyperframes lint
必跑 2:layout inspect 找溢出(必须 0 errors)
Mandatory 2: Layout inspection to find overflow (must have 0 errors)
npx hyperframes inspect --at 1,8,15,25,35,45,55,65
npx hyperframes inspect --at 1,8,15,25,35,45,55,65
推荐:snapshot 看排版
Recommended: Snapshot to check layout
npx hyperframes snapshot --at <t1>,<t2>,<t3> .
npx hyperframes snapshot --at <t1>,<t2>,<t3> .
渲染(lint + inspect 都通过才能跑)
Render (only run after lint + inspect pass)
⚠️ 输出到上级目录,与 video/ 平行 —— 最终 MP4 不放 video/ 里
⚠️ Output to parent directory, parallel to video/ — final MP4 is not stored in video/
npx hyperframes render --quality standard --fps 30 --output ../<slug>.mp4
**为什么 inspect 必跑**:竖屏 1080 宽很窄,3-4 字 hero 在 280-400px 字号下就接近溢出。每次必须 inspect,**0 errors 才能 render**。
**fix overflow**:
- 字号缩小(inspect 给具体建议)
- 长 hero 分行("没法积累" → 两行 "没法" / "积累")
- `white-space: nowrap` 只在确认字数 × 字号 < 屏宽时
- 若 `.em` 在 `reveal-wrap` 内溢出 → 加 `line-height: 1` 到 `.em`
**渲染质量**:
- `--quality draft` ~30s 渲染 — 迭代用
- `--quality standard` ~1.5min — 默认,发布够用
- `--quality high` ~3min — 投大屏 / 商务npx hyperframes render --quality standard --fps 30 --output ../<slug>.mp4
**Why inspection is mandatory**: The 1080px wide vertical screen is narrow, and 3-4 character hero text at 280-400px font size is close to overflow. Must inspect every time, **only render when 0 errors**.
**Fix overflow**:
- Reduce font size (inspect gives specific suggestions)
- Wrap long hero text ("没法积累" → two lines "没法" / "积累")
- Only use `white-space: nowrap` when confirming (number of characters × font size) < screen width
- If `.em` overflows inside `reveal-wrap` → add `line-height: 1` to `.em`
**Render Quality**:
- `--quality draft` ~30s rendering — for iteration
- `--quality standard` ~1.5min — default, sufficient for publishing
- `--quality high` ~3min — for large screens / business useStep 8: 预览
Step 8: Preview
输出:(与 平行,不在 内 —— 留给中间文件)。
<article-folder>/<slug>.mp4video/video/video/open <article-folder>/<slug>.mp4Output: (parallel to , not inside — is for intermediate files).
<article-folder>/<slug>.mp4video/video/video/Use to let user preview. Do not auto-upload to WeChat Channels (user may want to edit/adjust first).
open <article-folder>/<slug>.mp4Step 9: 发布到 YouTube(自动 cron,不在 render 流程内)
Step 9: Publish to YouTube (Auto cron, not part of render workflow)
新视频 render 完成后不立即上传 —— YouTube 有 daily quota 限制(默认 6 个/天 @ 1600 配额点/上传),渲染多了会卡 quota。
做法:cron 每天 10:00 自动跑 ,挑最多 5 个还没上传过的 MP4(按文章日期升序),上传后写 记录。
daily-upload-batch.sh.youtube.jsoncron 已注册(一次性,不用重复跑):
0 10 * * * /Users/jianshuo/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh手动触发(不要在 wjs-converting-text-to-video 流程里跑 — 让 cron 处理):
bash
~/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.shDo not upload immediately after new video rendering — YouTube has daily quota limits (default 6 videos/day @ 1600 quota points/upload), rendering multiple videos will cause quota blocking.
Method: Cron runs automatically at 10:00 every day, selects up to 5 MP4s that haven't been uploaded yet (sorted by article date ascending), and writes after upload.
daily-upload-batch.sh.youtube.jsonCron is already registered (one-time setup, no need to run again):
0 10 * * * /Users/jianshuo/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.shManual trigger (do not run in wjs-converting-text-to-video workflow — let cron handle it):
bash
~/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh或单个文章立即上传
Or upload single article immediately
~/.claude/skills/wjs-converting-text-to-video/scripts/publish-to-youtube.py <article-folder>
每个上传的脚本行为:
1. 检测 MP4 portrait/landscape → portrait 标题加 `#shorts`、landscape 普通 video
2. title 从 article.md H1 / description 从前几段
3. 检查 `<article-folder>/.youtube.json`:存在 → 尝试删老再传新(需 `youtube.force-ssl` scope,当前 token 没这个 scope → 跳过 delete + 上传新)
4. 写 `.youtube.json` 记录
详见 memory: [[auto-publish-youtube]]~/.claude/skills/wjs-converting-text-to-video/scripts/publish-to-youtube.py <article-folder>
Script behavior for each upload:
1. Detect MP4 portrait/landscape → add `#shorts` to title for portrait, regular video for landscape
2. Title from article.md H1 / description from first few paragraphs
3. Check `<article-folder>/.youtube.json`: if exists → try to delete old video and upload new one (requires `youtube.force-ssl` scope, current token does not have this scope → skip delete + upload new)
4. Write record to `.youtube.json`
See memory: [[auto-publish-youtube]]目录结构
Directory Structure
<article-folder>/
├── article.md
├── illustration.png # 用户原始示意图,不直接用作 bg
├── <slug>.mp4 # ⭐ 最终视频(与 video/ 平行,不放 video/ 里)
└── video/ # 所有中间产物
├── narration_chunks.json # 5-10 个 scene 的旁白文本
├── tts_narration.py # bootstrap 复制进来
├── narration.mp3 # 合并的全段 TTS
├── narration/ # 单段 mp3 (s01..sN)
├── timing.json # 每段 start/end/duration
├── bg.png # GPT Image 2 生成的水彩背景
├── sfx/{tick,chime,bell}.mp3
├── index.html # HyperFrames composition
├── hyperframes.json
├── meta.json
├── package.json
└── snapshots/ # 渲染前快照<article-folder>/
├── article.md
├── illustration.png # User's original schematic, not directly used as bg
├── <slug>.mp4 # ⭐ Final video (parallel to video/, not stored in video/)
└── video/ # All intermediate products
├── narration_chunks.json # Narration text for 5-10 scenes
├── tts_narration.py # Copied during bootstrap
├── narration.mp3 # Merged full TTS track
├── narration/ # Individual segment mp3 (s01..sN)
├── timing.json # Start/end/duration of each segment
├── bg.png # Abstract watercolor background generated by GPT Image 2
├── sfx/{tick,chime,bell}.mp3
├── index.html # HyperFrames composition
├── hyperframes.json
├── meta.json
├── package.json
└── snapshots/ # Pre-render snapshotsSkill 自身文件
Skill Own Files
~/.claude/skills/wjs-converting-text-to-video/
├── SKILL.md
└── scripts/
├── bootstrap-project.sh # init video/ 目录 + 复制 helper + 生成 sfx
├── generate-bg.sh # 调 GPT Image 2 生成抽象水彩 bg.png
├── tts.py # Volcano TTS narration 生成
├── synth-sfx.sh # tick/chime/bell 合成 (ffmpeg)
├── retrofit-bg-image.py # 给已有视频补 bg-image 层
├── strip-dark-scene-bgs.py # 剥离 scene-level 暗色 bg,让 bg-image 透出
└── publish-to-youtube.py # 自动上传 MP4 到 YouTube(portrait→Shorts),可替换已有上传~/.claude/skills/wjs-converting-text-to-video/
├── SKILL.md
└── scripts/
├── bootstrap-project.sh # Initialize video/ directory + copy helpers + generate sfx
├── generate-bg.sh # Call GPT Image 2 to generate abstract watercolor bg.png
├── tts.py # Generate Volcano TTS narration
├── synth-sfx.sh # Synthesize tick/chime/bell (ffmpeg)
├── retrofit-bg-image.py # Add bg-image layer to existing videos
├── strip-dark-scene-bgs.py # Remove scene-level dark backgrounds to let bg-image show through
└── publish-to-youtube.py # Auto-upload MP4 to YouTube (portrait→Shorts), can replace existing uploadsAnti-Patterns
Anti-Patterns
反单调(最重要 — "平铺直叙"的根源)
Anti-Monotony (Most important — root cause of "flat narration")
| 不要 | 原因 |
|---|---|
| 所有 scene 都用 B1 双行 strikethrough | 历史最大失败模式。B1 整片最多 2 次 |
| 所有 scene 居中布局 | 死气沉沉。≥2 非居中 |
| 所有 scene 字号差不多 | 跨度必须 ≥240px |
| 所有 scene 时长 5-7s | 跨度必须 ≥6s |
| 整片只用 blur crossfade | 每 4 个转场 ≥2 种 |
| 整片没有 color-flip | ≥1 个 A3 是硬要求 |
| 整片没有几何元素 | ≥1 个 scene 加粗线 / 大编号 / 引号 |
整片只用 | ≥3 种 Modern Motion Techniques |
| 每个 scene 都堆满 | ≥1 个 scene 留白 ≥60% |
给每个 scene 都加 | 盖住 bg-image,等于白生成水彩。普通 scene 不写 bg;只有 A3 color-flip 用纯色 |
| color-flip / emphasis 永远只用橙 | 至少 2-3 种 accent |
| 用灰色作 secondary text / strike / 装饰 | 水彩底上灰色对比度太低,会消失。改用 |
| Do NOT | Reason |
|---|---|
| Use B1 two-line strikethrough for all scenes | The biggest failure pattern in history. Max 2 B1 scenes per video |
| Center all scenes | Lifeless. ≥2 non-centered scenes required |
| Use similar font sizes for all scenes | Font size span must be ≥240px |
| Make all scenes 5-7s long | Duration span must be ≥6s |
| Use only blur crossfade for all transitions | At least 2 types per 4 transitions |
| No color-flip scenes in the video | ≥1 A3 scene is mandatory |
| No geometric elements in the video | ≥1 scene must have thick lines / large numbering / quotation marks |
Only use | At least 3 Modern Motion Techniques required |
| Fill every scene with content | ≥1 scene must have ≥60% blank space |
Add | Covers bg-image, making watercolor generation useless. Do not set bg for regular scenes; only use solid color bg for A3 color-flip scenes |
| Always use orange for color-flip / emphasis | At least 2-3 different accent colors required |
| Use gray for secondary text / strike-through / decoration | Gray has too low contrast on watercolor background and will disappear. Use |
内容 / 工程
Content / Engineering
| 不要 | 原因 |
|---|---|
| 用 Kokoro 做中文 TTS | 中文质量差,用户明确不接受 |
Volcano TTS 传 | |
用 | 含英文专名时循环 hallucinate |
| 用 serif 字体(Songti / 宋体 / Noto Serif) | 不够冲击 |
| 把整段文章贴屏 | 那是 PPT。视频每屏一个视觉时刻 |
| 超过 10 scene / 超过 90 秒 | 注意力放不下 |
| 短文硬填到 90 秒 | 文章短就做 30-50s,硬撑长会注水变浅 |
| 每个 scene 换字体配色风格 | 风格漂移。design system 固定,模板变化 |
| hyperframes 渲染路径下失效。用真实 DOM |
最后 scene 之外用 | 退场动画 hyperframes 禁止 — 转场才是退场 |
| 每段 chunk 都加 chime | 太吵 |
用 | hyperframes render 不解析跨目录路径,渲染成纯黑。bg.png 必须在 |
| render 会 silent。每个 |
| s1 是 A3 color-flip | 第一帧看不到 bg-image。color-flip 放 s2+ |
s1 标题元素都 | 第一帧黑屏。s1 主元素 |
| Do NOT | Reason |
|---|---|
| Use Kokoro for Chinese TTS | Poor Chinese quality, users explicitly reject it |
Pass | |
Use | Will loop hallucinate when containing English proper nouns |
| Use serif fonts (Songti / SimSun / Noto Serif) | Not impactful enough |
| Paste entire article on screen | That's PPT. Video should have one visual moment per screen |
| Use more than 10 scenes / exceed 90 seconds | Cannot hold audience attention |
| Force short articles to 90 seconds | Short articles should be 30-50s; forcing length will make content shallow |
| Change font/color style for each scene | Style drift. Keep design system fixed, only change templates |
Use | Fails in hyperframes rendering paths. Use real DOM |
Use | Exit animations are prohibited by hyperframes — transitions handle exit |
| Add chime to every segment | Too noisy |
Use | Hyperframes render does not resolve cross-directory paths, will render pure black. bg.png must be inside |
Omit | Render will be silent. Every |
| Make s1 an A3 color-flip scene | First frame cannot see bg-image. Put color-flip scenes in s2+ |
Use | First frame will be black screen. s1 main elements should have default |
Common Pitfalls
Common Pitfalls
- narration 写「——」破折号 → TTS 念出 "破折号"。删掉用句号或逗号
- 某段 chunk 异常长(>3 chars/s) → Volcano hallucinate 循环。换声音,或拆短
- scene 时长 < narration 时长 → 旁白被下一个 scene 切掉。scene 必须覆盖整段 narration + 0.3s 缓冲
- 黑底大字 opacity: 0 时仍可见 → 检查 是否有
.scene默认(除了 s1)opacity: 0 - 在
.em里少量溢出(top/bottom 几 px) → 给.reveal-wrap加.emline-height: 1 - snapshot 字形和 render 不一致 → 现在都用 Noto Sans SC,正常一致
- Write em dash in narration → TTS will read "em dash" aloud. Replace with period or comma
—— - 某段 chunk 异常长(>3 chars/s) → Volcano will hallucinate and loop. Switch voice or split into shorter segments
- Scene duration < narration duration → Voiceover will be cut off by next scene. Scene must cover entire narration + 0.3s buffer
- Black background with large text still visible when opacity: 0 → Check if has default
.scene(except s1)opacity: 0 - slightly overflows (a few px top/bottom) inside
.em→ Add.reveal-wraptoline-height: 1.em - Snapshot glyphs differ from render → Now using Noto Sans SC exclusively, should be consistent
Dependencies
Dependencies
- HyperFrames CLI () — composition lint / inspect / snapshot / render
npx hyperframes - GPT Image 2 () — 生成 bg.png;
~/.claude/skills/gpt-image-2-skill/用 ChatGPT auth--provider codex - Volcano TTS — /
VOLC_TTS_APPID在VOLC_TTS_ACCESS_TOKEN~/code/.env - ffmpeg — SFX 合成、audio concat、aspect-ratio 检测
- YouTube uploader () + OAuth token at
~/.claude/skills/wjs-uploading-video/—— Step 9 自动发布~/.config/youtube/token.json
- HyperFrames CLI () — composition lint / inspect / snapshot / render
npx hyperframes - GPT Image 2 () — generate bg.png; use
~/.claude/skills/gpt-image-2-skill/for ChatGPT auth--provider codex - Volcano TTS — /
VOLC_TTS_APPIDinVOLC_TTS_ACCESS_TOKEN~/code/.env - ffmpeg — SFX synthesis, audio concat, aspect-ratio detection
- YouTube uploader () + OAuth token at
~/.claude/skills/wjs-uploading-video/— Step 9 auto-publishing~/.config/youtube/token.json