video_toolkit
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo Toolkit
视频工具包
Create professional explainer videos from a text brief. The toolkit uses open-source AI models on cloud GPUs (Modal or RunPod) for voiceover, image generation, music, and talking head animation. Remotion (React) handles composition and rendering.
只需输入文字脚本即可创建专业的讲解视频。该工具包在云端GPU(Modal或RunPod)上运行开源AI模型,实现配音、图像生成、音乐创作、虚拟主播动效等功能,通过 Remotion(React)完成视频合成与渲染。
CRITICAL: Toolkit Path
重要提示:工具包路径
The toolkit lives at a fixed path. ALWAYS here before running any tool command.
cdbash
TOOLKIT=~/.openclaw/workspace/claude-code-video-toolkit
cd $TOOLKITNEVER run tool commands from inside a project directory. Tools resolve paths relative to the toolkit root.
工具包位于固定路径下。运行任何工具命令前务必先切换到该目录。
bash
TOOLKIT=~/.openclaw/workspace/claude-code-video-toolkit
cd $TOOLKIT绝对不要在项目目录内运行工具命令。 所有工具的路径解析都基于工具包根目录。
Setup
安装配置
Step 1: Check Current State
步骤1:检查当前状态
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/verify_setup.pyIf everything shows , skip to "Quick Test" below. Otherwise continue setup.
[x]bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/verify_setup.py如果所有项都显示,可以直接跳转到下方的“快速测试”,否则继续完成配置流程。
[x]Step 2: Install Python Dependencies
步骤2:安装Python依赖
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
pip3 install --break-system-packages -r tools/requirements.txtNote: is needed on Debian/Ubuntu with managed Python (PEP 668). Safe inside containers.
--break-system-packagesbash
cd ~/.openclaw/workspace/claude-code-video-toolkit
pip3 install --break-system-packages -r tools/requirements.txt注意:在使用系统托管Python的Debian/Ubuntu系统上需要添加参数(符合PEP 668规范),在容器环境中使用是安全的。
--break-system-packagesStep 3: Configure Cloud GPU Endpoints
步骤3:配置云端GPU端点
The toolkit needs cloud GPU endpoint URLs in . Check if exists and has Modal endpoints:
.env.envbash
cat ~/.openclaw/workspace/claude-code-video-toolkit/.env | grep MODALIf Modal endpoints are configured, you're ready. If not, ask the user to provide Modal endpoint URLs or set up Modal:
bash
pip3 install --break-system-packages modal
python3 -m modal setup # Opens browser for authentication工具包需要在文件中配置云端GPU端点URL。检查文件是否存在以及是否包含Modal端点配置:
.env.envbash
cat ~/.openclaw/workspace/claude-code-video-toolkit/.env | grep MODAL如果Modal端点已经配置完成,说明准备就绪。如果没有,请让用户提供Modal端点URL或者自行搭建Modal服务:
bash
pip3 install --break-system-packages modal
python3 -m modal setup # 会打开浏览器完成身份验证Deploy each tool — capture the endpoint URL from output
部署每个工具——从输出中复制端点URL
cd ~/.openclaw/workspace/claude-code-video-toolkit
modal deploy docker/modal-qwen3-tts/app.py
modal deploy docker/modal-flux2/app.py
modal deploy docker/modal-music-gen/app.py
modal deploy docker/modal-sadtalker/app.py
modal deploy docker/modal-image-edit/app.py
modal deploy docker/modal-upscale/app.py
modal deploy docker/modal-propainter/app.py
modal deploy docker/modal-ltx2/app.py # Requires: modal secret create huggingface-token HF_TOKEN=hf_...
**LTX-2 prerequisite:** Before deploying LTX-2, create a HuggingFace secret and accept the [Gemma 3 license](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized):
```bash
modal secret create huggingface-token HF_TOKEN=hf_your_read_access_tokenAdd each URL to :
.envMODAL_QWEN3_TTS_ENDPOINT_URL=https://...modal.run
MODAL_FLUX2_ENDPOINT_URL=https://...modal.run
MODAL_MUSIC_GEN_ENDPOINT_URL=https://...modal.run
MODAL_SADTALKER_ENDPOINT_URL=https://...modal.run
MODAL_IMAGE_EDIT_ENDPOINT_URL=https://...modal.run
MODAL_UPSCALE_ENDPOINT_URL=https://...modal.run
MODAL_DEWATERMARK_ENDPOINT_URL=https://...modal.run
MODAL_LTX2_ENDPOINT_URL=https://...modal.runOptional but recommended — Cloudflare R2 for reliable file transfer:
R2_ACCOUNT_ID=...
R2_ACCESS_KEY_ID=...
R2_SECRET_ACCESS_KEY=...
R2_BUCKET_NAME=video-toolkitcd ~/.openclaw/workspace/claude-code-video-toolkit
modal deploy docker/modal-qwen3-tts/app.py
modal deploy docker/modal-flux2/app.py
modal deploy docker/modal-music-gen/app.py
modal deploy docker/modal-sadtalker/app.py
modal deploy docker/modal-image-edit/app.py
modal deploy docker/modal-upscale/app.py
modal deploy docker/modal-propainter/app.py
modal deploy docker/modal-ltx2/app.py # 需要先执行:modal secret create huggingface-token HF_TOKEN=hf_...
**LTX-2前置要求:** 部署LTX-2之前,需要先创建HuggingFace密钥并同意[Gemma 3许可协议](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized):
```bash
modal secret create huggingface-token HF_TOKEN=hf_your_read_access_token将每个端点URL添加到文件中:
.envMODAL_QWEN3_TTS_ENDPOINT_URL=https://...modal.run
MODAL_FLUX2_ENDPOINT_URL=https://...modal.run
MODAL_MUSIC_GEN_ENDPOINT_URL=https://...modal.run
MODAL_SADTALKER_ENDPOINT_URL=https://...modal.run
MODAL_IMAGE_EDIT_ENDPOINT_URL=https://...modal.run
MODAL_UPSCALE_ENDPOINT_URL=https://...modal.run
MODAL_DEWATERMARK_ENDPOINT_URL=https://...modal.run
MODAL_LTX2_ENDPOINT_URL=https://...modal.run可选但推荐配置——用于稳定文件传输的Cloudflare R2:
R2_ACCOUNT_ID=...
R2_ACCESS_KEY_ID=...
R2_SECRET_ACCESS_KEY=...
R2_BUCKET_NAME=video-toolkitStep 4: Verify and Quick Test
步骤4:验证与快速测试
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/verify_setup.pyAll tools should show . Then run a quick test to confirm the GPU pipeline works:
[x]bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/qwen3_tts.py --text "Hello, this is a test." --speaker Ryan --tone warm --output /tmp/video-toolkit-test.mp3 --cloud modalIf you get a valid .mp3 file, setup is complete. If it fails, check:
- has the correct
.envMODAL_QWEN3_TTS_ENDPOINT_URL - Run and check
python3 tools/verify_setup.py --jsonfor which endpoints are missingmodal_tools
Cost: Modal includes $30/month free compute. A typical 60s video costs $1-3.
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/verify_setup.py所有工具都应该显示。之后运行快速测试确认GPU流水线正常工作:
[x]bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/qwen3_tts.py --text "Hello, this is a test." --speaker Ryan --tone warm --output /tmp/video-toolkit-test.mp3 --cloud modal如果生成了可用的.mp3文件,说明配置完成。如果失败,请检查:
- 中
.env配置是否正确MODAL_QWEN3_TTS_ENDPOINT_URL - 执行查看
python3 tools/verify_setup.py --json项确认缺失的端点modal_tools
成本说明: Modal每月提供30美元的免费计算额度,一条常规60秒视频的生成成本约1-3美元。
Creating a Video
创建视频
Step 1: Create Project
步骤1:创建项目
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
cp -r templates/product-demo projects/PROJECT_NAME
cd projects/PROJECT_NAME
npm installTemplates: (marketing/explainer), , (composable scenes).
product-demosprint-reviewsprint-review-v2bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
cp -r templates/product-demo projects/PROJECT_NAME
cd projects/PROJECT_NAME
npm install可用模板:(营销/讲解视频)、、(可组合场景)。
product-demosprint-reviewsprint-review-v2Step 2: Write Config
步骤2:编写配置
Edit :
projects/PROJECT_NAME/src/config/demo-config.tstypescript
export const demoConfig: ProductDemoConfig = {
product: {
name: 'My Product',
tagline: 'What it does in one line',
website: 'example.com',
},
scenes: [
{ type: 'title', durationSeconds: 9, content: { headline: '...', subheadline: '...' } },
{ type: 'problem', durationSeconds: 14, content: { headline: '...', problems: ['...', '...'] } },
{ type: 'solution', durationSeconds: 13, content: { headline: '...', highlights: ['...', '...'] } },
{ type: 'stats', durationSeconds: 12, content: { stats: [{value: '99%', label: '...'}, ...] } },
{ type: 'cta', durationSeconds: 10, content: { headline: '...', links: ['...'] } },
],
audio: {
backgroundMusicFile: 'audio/bg-music.mp3',
backgroundMusicVolume: 0.12,
},
};Scene types: , , , , , , .
titleproblemsolutiondemofeaturestatsctaDuration rule: Estimate as . You will adjust this after generating audio in Step 4.
durationSecondsceil(word_count / 2.5) + 2编辑文件:
projects/PROJECT_NAME/src/config/demo-config.tstypescript
export const demoConfig: ProductDemoConfig = {
product: {
name: 'My Product',
tagline: 'What it does in one line',
website: 'example.com',
},
scenes: [
{ type: 'title', durationSeconds: 9, content: { headline: '...', subheadline: '...' } },
{ type: 'problem', durationSeconds: 14, content: { headline: '...', problems: ['...', '...'] } },
{ type: 'solution', durationSeconds: 13, content: { headline: '...', highlights: ['...', '...'] } },
{ type: 'stats', durationSeconds: 12, content: { stats: [{value: '99%', label: '...'}, ...] } },
{ type: 'cta', durationSeconds: 10, content: { headline: '...', links: ['...'] } },
],
audio: {
backgroundMusicFile: 'audio/bg-music.mp3',
backgroundMusicVolume: 0.12,
},
};可用场景类型:(标题)、(问题)、(解决方案)、(演示)、(功能)、(数据)、(行动号召)。
titleproblemsolutiondemofeaturestatscta时长规则: 估算的公式为,你可以在步骤4生成音频后调整这个值。
durationSecondsceil(字数 / 2.5) + 2Step 3: Write Voiceover Script
步骤3:编写配音脚本
Create :
projects/PROJECT_NAME/VOICEOVER-SCRIPT.mdmarkdown
undefined创建文件:
projects/PROJECT_NAME/VOICEOVER-SCRIPT.mdmarkdown
undefinedScene 1: Title (9s, ~17 words)
Scene 1: Title (9s, ~17 words)
Build videos with AI. The product name toolkit makes it easy.
Build videos with AI. The product name toolkit makes it easy.
Scene 2: Problem (14s, ~30 words)
Scene 2: Problem (14s, ~30 words)
The problem statement goes here. Keep it punchy and relatable.
**Word budget per scene:** `(durationSeconds - 2) * 2.5` words. The -2 accounts for 1s audio delay + 1s padding.The problem statement goes here. Keep it punchy and relatable.
**每个场景的字数预算:** `(时长秒数 - 2) * 2.5` 字,减2是为1秒的音频延迟+1秒的留白预留空间。Step 4: Generate Assets
步骤4:生成素材
CRITICAL: All commands below MUST be run from the toolkit root, not the project directory.
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit重要提示:以下所有命令必须在工具包根目录执行,不能在项目目录执行。
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit4a. Background Music
4a. 背景音乐
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/music_gen.py \
--preset corporate-bg \
--duration 90 \
--output projects/PROJECT_NAME/public/audio/bg-music.mp3 \
--cloud modalPresets: , , , , , , , .
corporate-bgupbeat-techambientdramatictensionhopefulctalofibash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/music_gen.py \
--preset corporate-bg \
--duration 90 \
--output projects/PROJECT_NAME/public/audio/bg-music.mp3 \
--cloud modal可用预设:(企业背景音)、(轻快科技风)、(环境音)、(戏剧化)、(紧张感)、(希望感)、(行动号召)、(低保真)。
corporate-bgupbeat-techambientdramatictensionhopefulctalofi4b. Voiceover (per-scene)
4b. 配音(按场景生成)
Generate ONE .mp3 file PER SCENE. Do NOT generate a single voiceover file.
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit每个场景生成单独的.mp3文件,不要生成单条长配音文件。
bash
cd ~/.openclaw/workspace/claude-code-video-toolkitScene 01
场景1
python3 tools/qwen3_tts.py
--text "The voiceover text for scene one."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/01.mp3
--cloud modal
--text "The voiceover text for scene one."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/01.mp3
--cloud modal
python3 tools/qwen3_tts.py
--text "The voiceover text for scene one."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/01.mp3
--cloud modal
--text "The voiceover text for scene one."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/01.mp3
--cloud modal
Scene 02
场景2
python3 tools/qwen3_tts.py
--text "The voiceover text for scene two."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/02.mp3
--cloud modal
--text "The voiceover text for scene two."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/02.mp3
--cloud modal
python3 tools/qwen3_tts.py
--text "The voiceover text for scene two."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/02.mp3
--cloud modal
--text "The voiceover text for scene two."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/02.mp3
--cloud modal
... repeat for each scene
... 为每个场景重复执行
**Speakers:** `Ryan`, `Aiden`, `Vivian`, `Serena`, `Uncle_Fu`, `Dylan`, `Eric`, `Ono_Anna`, `Sohee`
**Tones:** `neutral`, `warm`, `professional`, `excited`, `calm`, `serious`, `storyteller`, `tutorial`
For voice cloning (needs a reference recording):
```bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/qwen3_tts.py \
--text "Text to speak" \
--ref-audio assets/voices/reference.m4a \
--ref-text "Exact transcript of the reference audio" \
--output projects/PROJECT_NAME/public/audio/scenes/01.mp3 \
--cloud modal
**可用音色:** `Ryan`、`Aiden`、`Vivian`、`Serena`、`Uncle_Fu`、`Dylan`、`Eric`、`Ono_Anna`、`Sohee`
**可用语气:** `neutral`(中性)、`warm`(温暖)、`professional`(专业)、`excited`(兴奋)、`calm`(平静)、`serious`(严肃)、`storyteller`(讲故事)、`tutorial`(教程)
如需音色克隆(需要参考录音):
```bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/qwen3_tts.py \
--text "Text to speak" \
--ref-audio assets/voices/reference.m4a \
--ref-text "Exact transcript of the reference audio" \
--output projects/PROJECT_NAME/public/audio/scenes/01.mp3 \
--cloud modal4c. Scene Images
4c. 场景图片
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/flux2.py \
--prompt "Dark tech background with blue geometric grid, cinematic lighting" \
--width 1920 --height 1080 \
--output projects/PROJECT_NAME/public/images/title-bg.png \
--cloud modalImage presets (use instead of ):
, , , , , , ,
--preset--prompt --width --heighttitle-bgproblemsolutiondemo-bgstats-bgctathumbnailportrait-bgbash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/flux2.py \
--preset title-bg \
--output projects/PROJECT_NAME/public/images/title-bg.png \
--cloud modalbash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/flux2.py \
--prompt "Dark tech background with blue geometric grid, cinematic lighting" \
--width 1920 --height 1080 \
--output projects/PROJECT_NAME/public/images/title-bg.png \
--cloud modal图片预设(使用替代参数):
(标题背景)、(问题页)、(解决方案页)、(演示背景)、(数据页背景)、(行动号召页)、(缩略图)、(人像背景)
--preset--prompt --width --heighttitle-bgproblemsolutiondemo-bgstats-bgctathumbnailportrait-bgbash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/flux2.py \
--preset title-bg \
--output projects/PROJECT_NAME/public/images/title-bg.png \
--cloud modal4d. Video Clips — B-Roll & Animated Backgrounds (optional)
4d. 视频片段——辅助镜头与动态背景(可选)
Generate AI video clips for b-roll cutaways, animated slide backgrounds, or intro/outro sequences:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit生成AI视频片段用于辅助镜头转场、动态幻灯片背景、开场/结尾序列:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkitB-roll clip from text
文本生成辅助镜头
python3 tools/ltx2.py
--prompt "Aerial drone shot over a European city at golden hour, cinematic wide angle"
--output projects/PROJECT_NAME/public/videos/broll-europe.mp4
--cloud modal
--prompt "Aerial drone shot over a European city at golden hour, cinematic wide angle"
--output projects/PROJECT_NAME/public/videos/broll-europe.mp4
--cloud modal
python3 tools/ltx2.py
--prompt "Aerial drone shot over a European city at golden hour, cinematic wide angle"
--output projects/PROJECT_NAME/public/videos/broll-europe.mp4
--cloud modal
--prompt "Aerial drone shot over a European city at golden hour, cinematic wide angle"
--output projects/PROJECT_NAME/public/videos/broll-europe.mp4
--cloud modal
Animate a slide/screenshot (image-to-video)
幻灯片/截图动效(图生视频)
python3 tools/ltx2.py
--prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift"
--input projects/PROJECT_NAME/public/images/title-bg.png
--output projects/PROJECT_NAME/public/videos/animated-title.mp4
--cloud modal
--prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift"
--input projects/PROJECT_NAME/public/images/title-bg.png
--output projects/PROJECT_NAME/public/videos/animated-title.mp4
--cloud modal
python3 tools/ltx2.py
--prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift"
--input projects/PROJECT_NAME/public/images/title-bg.png
--output projects/PROJECT_NAME/public/videos/animated-title.mp4
--cloud modal
--prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift"
--input projects/PROJECT_NAME/public/images/title-bg.png
--output projects/PROJECT_NAME/public/videos/animated-title.mp4
--cloud modal
Abstract intro/outro background
抽象开场/结尾背景
python3 tools/ltx2.py
--prompt "Dark moody abstract background with flowing blue light streaks, bokeh particles, cinematic"
--output projects/PROJECT_NAME/public/videos/intro-bg.mp4
--cloud modal
--prompt "Dark moody abstract background with flowing blue light streaks, bokeh particles, cinematic"
--output projects/PROJECT_NAME/public/videos/intro-bg.mp4
--cloud modal
Use in Remotion compositions with `<OffthreadVideo>`:
```tsx
<OffthreadVideo src={staticFile('videos/broll-europe.mp4')} />LTX-2 rules:
- Max ~8 seconds per clip (193 frames at 24fps). Default is ~5s (121 frames).
- Width/height must be divisible by 64. Default: 768x512.
- ~$0.20-0.25 per clip, ~2.5 min generation time.
- Cold start ~60-90s. Subsequent clips on warm GPU are faster.
- Generated audio is ambient only — use voiceover/music tools for speech and music.
- ~30% of generations may have training data artifacts (logos/text). Re-run with to vary.
--seed
python3 tools/ltx2.py
--prompt "Dark moody abstract background with flowing blue light streaks, bokeh particles, cinematic"
--output projects/PROJECT_NAME/public/videos/intro-bg.mp4
--cloud modal
--prompt "Dark moody abstract background with flowing blue light streaks, bokeh particles, cinematic"
--output projects/PROJECT_NAME/public/videos/intro-bg.mp4
--cloud modal
在Remotion合成中通过`<OffthreadVideo>`组件使用:
```tsx
<OffthreadVideo src={staticFile('videos/broll-europe.mp4')} />LTX-2使用规则:
- 每个片段最长约8秒(24fps下193帧),默认约5秒(121帧)
- 宽高必须是64的倍数,默认值:768x512
- 每个片段成本约0.20-0.25美元,生成时间约2.5分钟
- 冷启动时间约60-90秒,热GPU上后续生成速度更快
- 生成的音频只有环境音,语音和音乐请使用对应工具生成
- 约30%的生成结果可能包含训练数据残留(logo/文字),可添加参数重新生成
--seed
4e. Talking Head Narrator (optional)
4e. 虚拟主播(可选)
Generate a presenter portrait, then animate per-scene clips:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit先生成主播人像,再为每个场景生成对应的动效片段:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit1. Generate portrait
1. 生成主播人像
python3 tools/flux2.py
--prompt "Professional presenter portrait, clean style, dark background, facing camera, upper body"
--width 1024 --height 576
--output projects/PROJECT_NAME/public/images/presenter.png
--cloud modal
--prompt "Professional presenter portrait, clean style, dark background, facing camera, upper body"
--width 1024 --height 576
--output projects/PROJECT_NAME/public/images/presenter.png
--cloud modal
python3 tools/flux2.py
--prompt "Professional presenter portrait, clean style, dark background, facing camera, upper body"
--width 1024 --height 576
--output projects/PROJECT_NAME/public/images/presenter.png
--cloud modal
--prompt "Professional presenter portrait, clean style, dark background, facing camera, upper body"
--width 1024 --height 576
--output projects/PROJECT_NAME/public/images/presenter.png
--cloud modal
2. Generate per-scene narrator clips (one per scene, NOT one long video)
2. 按场景生成主播动效片段(每个场景一个,不要生成单条长视频)
python3 tools/sadtalker.py
--image projects/PROJECT_NAME/public/images/presenter.png
--audio projects/PROJECT_NAME/public/audio/scenes/01.mp3
--preprocess full --still --expression-scale 0.8
--output projects/PROJECT_NAME/public/narrator-01.mp4
--cloud modal
--image projects/PROJECT_NAME/public/images/presenter.png
--audio projects/PROJECT_NAME/public/audio/scenes/01.mp3
--preprocess full --still --expression-scale 0.8
--output projects/PROJECT_NAME/public/narrator-01.mp4
--cloud modal
python3 tools/sadtalker.py
--image projects/PROJECT_NAME/public/images/presenter.png
--audio projects/PROJECT_NAME/public/audio/scenes/01.mp3
--preprocess full --still --expression-scale 0.8
--output projects/PROJECT_NAME/public/narrator-01.mp4
--cloud modal
--image projects/PROJECT_NAME/public/images/presenter.png
--audio projects/PROJECT_NAME/public/audio/scenes/01.mp3
--preprocess full --still --expression-scale 0.8
--output projects/PROJECT_NAME/public/narrator-01.mp4
--cloud modal
Repeat for each scene that needs a narrator
为需要主播的每个场景重复执行
**SadTalker rules — follow these exactly:**
- **ALWAYS** use `--preprocess full` (default `crop` outputs a square, wrong aspect ratio)
- **ALWAYS** use `--still` (reduces head movement, looks professional)
- **ALWAYS** generate per-scene clips (6-15s each), NEVER one long video
- Processing: ~3-4 min per 10s of audio on Modal A10G
- `--expression-scale 0.8` keeps expressions subtle (range 0.0-1.5)
**SadTalker使用规则——请严格遵守:**
- **必须**添加`--preprocess full`参数(默认`crop`会输出正方形,比例错误)
- **必须**添加`--still`参数(减少头部移动,更显专业)
- **必须**按场景生成片段(每个6-15秒),绝对不要生成单条长视频
- 处理速度:Modal A10G上每10秒音频约需3-4分钟
- `--expression-scale 0.8`可以保持表情自然(取值范围0.0-1.5)4e. Image Editing (optional)
4e. 图片编辑(可选)
Create scene variants from existing images:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/image_edit.py \
--input projects/PROJECT_NAME/public/images/title-bg.png \
--prompt "Make it darker with red tones, more ominous" \
--output projects/PROJECT_NAME/public/images/problem-bg.png \
--cloud modal基于现有图片生成场景变体:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/image_edit.py \
--input projects/PROJECT_NAME/public/images/title-bg.png \
--prompt "Make it darker with red tones, more ominous" \
--output projects/PROJECT_NAME/public/images/problem-bg.png \
--cloud modal4f. Upscaling (optional)
4f. 超分辨率(可选)
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/upscale.py \
--input projects/PROJECT_NAME/public/images/some-image.png \
--output projects/PROJECT_NAME/public/images/some-image-4x.png \
--scale 4 --cloud modalbash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/upscale.py \
--input projects/PROJECT_NAME/public/images/some-image.png \
--output projects/PROJECT_NAME/public/images/some-image-4x.png \
--scale 4 --cloud modalStep 5: Sync Timing
步骤5:同步时长
ALWAYS do this after generating voiceover. Audio duration differs from estimates.
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
for f in projects/PROJECT_NAME/public/audio/scenes/*.mp3; do
echo "$(basename $f): $(ffprobe -v error -show_entries format=duration -of csv=p=0 "$f")s"
doneUpdate each scene's in to: .
durationSecondsdemo-config.tsceil(actual_audio_duration + 2)Example: if is 6.8s, set scene 1 to (ceil(6.8 + 2) = 9).
01.mp3durationSeconds9请务必在生成配音后执行此步骤,实际音频时长可能和预估不一致。
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
for f in projects/PROJECT_NAME/public/audio/scenes/*.mp3; do
echo "$(basename $f): $(ffprobe -v error -show_entries format=duration -of csv=p=0 "$f")s"
done将中每个场景的更新为:。
demo-config.tsdurationSecondsceil(实际音频时长 + 2)示例:如果时长为6.8秒,将场景1的设为(ceil(6.8 + 2) = 9)。
01.mp3durationSeconds9Step 6: Review Still Frames
步骤6:预览静态帧
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit/projects/PROJECT_NAME
npx remotion still src/index.ts ProductDemo --frame=100 --output=/tmp/review-scene1.png
npx remotion still src/index.ts ProductDemo --frame=400 --output=/tmp/review-scene2.pngCheck: text truncation, animation timing, narrator PiP positioning, background contrast.
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit/projects/PROJECT_NAME
npx remotion still src/index.ts ProductDemo --frame=100 --output=/tmp/review-scene1.png
npx remotion still src/index.ts ProductDemo --frame=400 --output=/tmp/review-scene2.png检查内容:文字是否截断、动画时长是否合理、主播画中画位置是否合适、背景对比度是否达标。
Step 7: Render
步骤7:渲染
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit/projects/PROJECT_NAME
npm run renderOutput:
out/ProductDemo.mp4bash
cd ~/.openclaw/workspace/claude-code-video-toolkit/projects/PROJECT_NAME
npm run render输出路径:
out/ProductDemo.mp4Composition Patterns
合成模式
Per-Scene Audio
分场景音频
Use per-scene audio with a 1-second delay ( = 30 frames = 1s at 30fps):
from={30}tsx
<Sequence from={30}>
<Audio src={staticFile('audio/scenes/01.mp3')} volume={1} />
</Sequence>给分场景音频添加1秒延迟( = 30帧 = 30fps下1秒):
from={30}tsx
<Sequence from={30}>
<Audio src={staticFile('audio/scenes/01.mp3')} volume={1} />
</Sequence>Per-Scene Narrator PiP
分场景主播画中画
tsx
<Sequence from={30}>
<OffthreadVideo
src={staticFile('narrator-01.mp4')}
style={{ width: 320, height: 180, objectFit: 'cover' }}
muted
/>
</Sequence>ALWAYS use , NEVER . Remotion requires its own component for frame-accurate rendering.
<OffthreadVideo><video>tsx
<Sequence from={30}>
<OffthreadVideo
src={staticFile('narrator-01.mp4')}
style={{ width: 320, height: 180, objectFit: 'cover' }}
muted
/>
</Sequence>必须使用组件,不要使用原生标签。 Remotion需要使用自带组件实现帧精准渲染。
<OffthreadVideo><video>Transitions
转场效果
tsx
import { TransitionSeries, linearTiming } from '@remotion/transitions';
import { fade } from '@remotion/transitions/fade';
import { glitch } from '../../../lib/transitions/presentations/glitch';
import { lightLeak } from '../../../lib/transitions/presentations/light-leak';NEVER import from barrel — import custom transitions from directly.
lib/transitionslib/transitions/presentations/tsx
import { TransitionSeries, linearTiming } from '@remotion/transitions';
import { fade } from '@remotion/transitions/fade';
import { glitch } from '../../../lib/transitions/presentations/glitch';
import { lightLeak } from '../../../lib/transitions/presentations/light-leak';不要从 barrel导入,请直接从导入自定义转场效果。
lib/transitionslib/transitions/presentations/Error Recovery
错误排查
| Problem | Solution |
|---|---|
| Tool command fails with "No module named..." | Run |
| "MODAL_*_ENDPOINT_URL not configured" | Check |
| SadTalker output is square/cropped | You forgot |
| Audio too short/long for scene | Re-run Step 5 (sync timing) and update config |
| Make sure you're in the project dir, not toolkit root. Run |
| "Cannot find module" in Remotion | Check import paths. Custom components use |
| Cold start timeout on Modal | First call after idle takes 30-120s. Retry once — second call uses warm GPU |
| SadTalker client timeout (long audio) | The client HTTP request can time out before Modal finishes. Modal still uploads the result to R2. Check |
| 问题 | 解决方案 |
|---|---|
| 工具命令执行失败,提示"No module named..." | 回到工具包根目录执行 |
| 提示"MODAL_*_ENDPOINT_URL not configured" | 检查 |
| SadTalker输出为正方形/被裁剪 | 你遗漏了 |
| 音频时长与场景不匹配 | 重新执行步骤5(同步时长)并更新配置 |
| 确认你在项目目录而非工具包根目录,先执行 |
| Remotion提示"Cannot find module" | 检查导入路径,自定义组件请使用 |
| Modal冷启动超时 | 闲置后的首次调用需要30-120秒,重试一次即可——第二次调用会使用热GPU |
| SadTalker客户端超时(音频较长时) | 客户端HTTP请求可能在Modal处理完成前超时,Modal仍会将结果上传到R2。请在 |
Cost Estimates (Modal)
成本估算(Modal)
| Tool | Typical Cost | Notes |
|---|---|---|
| Qwen3-TTS | ~$0.01/scene | ~20s per scene on warm GPU |
| FLUX.2 | ~$0.01/image | ~3s warm, ~30s cold |
| ACE-Step | ~$0.02-0.05 | Depends on duration |
| SadTalker | ~$0.05-0.20/scene | ~3-4 min per 10s audio |
| Qwen-Edit | ~$0.03-0.15 | ~8 min cold start (25GB model) |
| RealESRGAN | ~$0.005/image | Very fast |
| LTX-2.3 | ~$0.20-0.25/clip | ~2.5 min per 5s clip, A100-80GB |
Total for a 60s video: ~$1-3 depending on scenes and narrator clips.
Modal Starter plan: $30/month free compute. Apps scale to zero when idle.
| 工具 | 常规成本 | 说明 |
|---|---|---|
| Qwen3-TTS | 约每个场景0.01美元 | 热GPU上每个场景约20秒 |
| FLUX.2 | 约每张图0.01美元 | 热启动约3秒,冷启动约30秒 |
| ACE-Step | 约0.02-0.05美元 | 取决于时长 |
| SadTalker | 约每个场景0.05-0.20美元 | 每10秒音频约需3-4分钟 |
| Qwen-Edit | 约0.03-0.15美元 | 冷启动约8分钟(25GB模型) |
| RealESRGAN | 约每张图0.005美元 | 速度极快 |
| LTX-2.3 | 约每个片段0.20-0.25美元 | 每个5秒片段约需2.5分钟,使用A100-80GB |
60秒视频总成本: 约1-3美元,取决于场景数量和主播片段数量。
Modal入门套餐:每月30美元免费计算额度,闲置时应用自动缩容到0,不产生费用。