video_toolkit

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Video Toolkit

视频工具包

Create professional explainer videos from a text brief. The toolkit uses open-source AI models on cloud GPUs (Modal or RunPod) for voiceover, image generation, music, and talking head animation. Remotion (React) handles composition and rendering.
只需输入文字脚本即可创建专业的讲解视频。该工具包在云端GPU(Modal或RunPod)上运行开源AI模型,实现配音、图像生成、音乐创作、虚拟主播动效等功能,通过 Remotion(React)完成视频合成与渲染。

CRITICAL: Toolkit Path

重要提示:工具包路径

The toolkit lives at a fixed path. ALWAYS
cd
here before running any tool command.
bash
TOOLKIT=~/.openclaw/workspace/claude-code-video-toolkit
cd $TOOLKIT
NEVER run tool commands from inside a project directory. Tools resolve paths relative to the toolkit root.
工具包位于固定路径下。运行任何工具命令前务必先切换到该目录。
bash
TOOLKIT=~/.openclaw/workspace/claude-code-video-toolkit
cd $TOOLKIT
绝对不要在项目目录内运行工具命令。 所有工具的路径解析都基于工具包根目录。

Setup

安装配置

Step 1: Check Current State

步骤1:检查当前状态

bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/verify_setup.py
If everything shows
[x]
, skip to "Quick Test" below. Otherwise continue setup.
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/verify_setup.py
如果所有项都显示
[x]
,可以直接跳转到下方的“快速测试”,否则继续完成配置流程。

Step 2: Install Python Dependencies

步骤2:安装Python依赖

bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
pip3 install --break-system-packages -r tools/requirements.txt
Note:
--break-system-packages
is needed on Debian/Ubuntu with managed Python (PEP 668). Safe inside containers.
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
pip3 install --break-system-packages -r tools/requirements.txt
注意:在使用系统托管Python的Debian/Ubuntu系统上需要添加
--break-system-packages
参数(符合PEP 668规范),在容器环境中使用是安全的。

Step 3: Configure Cloud GPU Endpoints

步骤3:配置云端GPU端点

The toolkit needs cloud GPU endpoint URLs in
.env
. Check if
.env
exists and has Modal endpoints:
bash
cat ~/.openclaw/workspace/claude-code-video-toolkit/.env | grep MODAL
If Modal endpoints are configured, you're ready. If not, ask the user to provide Modal endpoint URLs or set up Modal:
bash
pip3 install --break-system-packages modal
python3 -m modal setup   # Opens browser for authentication
工具包需要在
.env
文件中配置云端GPU端点URL。检查
.env
文件是否存在以及是否包含Modal端点配置:
bash
cat ~/.openclaw/workspace/claude-code-video-toolkit/.env | grep MODAL
如果Modal端点已经配置完成,说明准备就绪。如果没有,请让用户提供Modal端点URL或者自行搭建Modal服务:
bash
pip3 install --break-system-packages modal
python3 -m modal setup   # 会打开浏览器完成身份验证

Deploy each tool — capture the endpoint URL from output

部署每个工具——从输出中复制端点URL

cd ~/.openclaw/workspace/claude-code-video-toolkit modal deploy docker/modal-qwen3-tts/app.py modal deploy docker/modal-flux2/app.py modal deploy docker/modal-music-gen/app.py modal deploy docker/modal-sadtalker/app.py modal deploy docker/modal-image-edit/app.py modal deploy docker/modal-upscale/app.py modal deploy docker/modal-propainter/app.py modal deploy docker/modal-ltx2/app.py # Requires: modal secret create huggingface-token HF_TOKEN=hf_...

**LTX-2 prerequisite:** Before deploying LTX-2, create a HuggingFace secret and accept the [Gemma 3 license](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized):
```bash
modal secret create huggingface-token HF_TOKEN=hf_your_read_access_token
Add each URL to
.env
:
MODAL_QWEN3_TTS_ENDPOINT_URL=https://...modal.run
MODAL_FLUX2_ENDPOINT_URL=https://...modal.run
MODAL_MUSIC_GEN_ENDPOINT_URL=https://...modal.run
MODAL_SADTALKER_ENDPOINT_URL=https://...modal.run
MODAL_IMAGE_EDIT_ENDPOINT_URL=https://...modal.run
MODAL_UPSCALE_ENDPOINT_URL=https://...modal.run
MODAL_DEWATERMARK_ENDPOINT_URL=https://...modal.run
MODAL_LTX2_ENDPOINT_URL=https://...modal.run
Optional but recommended — Cloudflare R2 for reliable file transfer:
R2_ACCOUNT_ID=...
R2_ACCESS_KEY_ID=...
R2_SECRET_ACCESS_KEY=...
R2_BUCKET_NAME=video-toolkit
cd ~/.openclaw/workspace/claude-code-video-toolkit modal deploy docker/modal-qwen3-tts/app.py modal deploy docker/modal-flux2/app.py modal deploy docker/modal-music-gen/app.py modal deploy docker/modal-sadtalker/app.py modal deploy docker/modal-image-edit/app.py modal deploy docker/modal-upscale/app.py modal deploy docker/modal-propainter/app.py modal deploy docker/modal-ltx2/app.py # 需要先执行:modal secret create huggingface-token HF_TOKEN=hf_...

**LTX-2前置要求:** 部署LTX-2之前,需要先创建HuggingFace密钥并同意[Gemma 3许可协议](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized):
```bash
modal secret create huggingface-token HF_TOKEN=hf_your_read_access_token
将每个端点URL添加到
.env
文件中:
MODAL_QWEN3_TTS_ENDPOINT_URL=https://...modal.run
MODAL_FLUX2_ENDPOINT_URL=https://...modal.run
MODAL_MUSIC_GEN_ENDPOINT_URL=https://...modal.run
MODAL_SADTALKER_ENDPOINT_URL=https://...modal.run
MODAL_IMAGE_EDIT_ENDPOINT_URL=https://...modal.run
MODAL_UPSCALE_ENDPOINT_URL=https://...modal.run
MODAL_DEWATERMARK_ENDPOINT_URL=https://...modal.run
MODAL_LTX2_ENDPOINT_URL=https://...modal.run
可选但推荐配置——用于稳定文件传输的Cloudflare R2:
R2_ACCOUNT_ID=...
R2_ACCESS_KEY_ID=...
R2_SECRET_ACCESS_KEY=...
R2_BUCKET_NAME=video-toolkit

Step 4: Verify and Quick Test

步骤4:验证与快速测试

bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/verify_setup.py
All tools should show
[x]
. Then run a quick test to confirm the GPU pipeline works:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/qwen3_tts.py --text "Hello, this is a test." --speaker Ryan --tone warm --output /tmp/video-toolkit-test.mp3 --cloud modal
If you get a valid .mp3 file, setup is complete. If it fails, check:
  • .env
    has the correct
    MODAL_QWEN3_TTS_ENDPOINT_URL
  • Run
    python3 tools/verify_setup.py --json
    and check
    modal_tools
    for which endpoints are missing
Cost: Modal includes $30/month free compute. A typical 60s video costs $1-3.

bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/verify_setup.py
所有工具都应该显示
[x]
。之后运行快速测试确认GPU流水线正常工作:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/qwen3_tts.py --text "Hello, this is a test." --speaker Ryan --tone warm --output /tmp/video-toolkit-test.mp3 --cloud modal
如果生成了可用的.mp3文件,说明配置完成。如果失败,请检查:
  • .env
    MODAL_QWEN3_TTS_ENDPOINT_URL
    配置是否正确
  • 执行
    python3 tools/verify_setup.py --json
    查看
    modal_tools
    项确认缺失的端点
成本说明: Modal每月提供30美元的免费计算额度,一条常规60秒视频的生成成本约1-3美元。

Creating a Video

创建视频

Step 1: Create Project

步骤1:创建项目

bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
cp -r templates/product-demo projects/PROJECT_NAME
cd projects/PROJECT_NAME
npm install
Templates:
product-demo
(marketing/explainer),
sprint-review
,
sprint-review-v2
(composable scenes).
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
cp -r templates/product-demo projects/PROJECT_NAME
cd projects/PROJECT_NAME
npm install
可用模板:
product-demo
(营销/讲解视频)、
sprint-review
sprint-review-v2
(可组合场景)。

Step 2: Write Config

步骤2:编写配置

Edit
projects/PROJECT_NAME/src/config/demo-config.ts
:
typescript
export const demoConfig: ProductDemoConfig = {
  product: {
    name: 'My Product',
    tagline: 'What it does in one line',
    website: 'example.com',
  },
  scenes: [
    { type: 'title', durationSeconds: 9, content: { headline: '...', subheadline: '...' } },
    { type: 'problem', durationSeconds: 14, content: { headline: '...', problems: ['...', '...'] } },
    { type: 'solution', durationSeconds: 13, content: { headline: '...', highlights: ['...', '...'] } },
    { type: 'stats', durationSeconds: 12, content: { stats: [{value: '99%', label: '...'}, ...] } },
    { type: 'cta', durationSeconds: 10, content: { headline: '...', links: ['...'] } },
  ],
  audio: {
    backgroundMusicFile: 'audio/bg-music.mp3',
    backgroundMusicVolume: 0.12,
  },
};
Scene types:
title
,
problem
,
solution
,
demo
,
feature
,
stats
,
cta
.
Duration rule: Estimate
durationSeconds
as
ceil(word_count / 2.5) + 2
. You will adjust this after generating audio in Step 4.
编辑
projects/PROJECT_NAME/src/config/demo-config.ts
文件:
typescript
export const demoConfig: ProductDemoConfig = {
  product: {
    name: 'My Product',
    tagline: 'What it does in one line',
    website: 'example.com',
  },
  scenes: [
    { type: 'title', durationSeconds: 9, content: { headline: '...', subheadline: '...' } },
    { type: 'problem', durationSeconds: 14, content: { headline: '...', problems: ['...', '...'] } },
    { type: 'solution', durationSeconds: 13, content: { headline: '...', highlights: ['...', '...'] } },
    { type: 'stats', durationSeconds: 12, content: { stats: [{value: '99%', label: '...'}, ...] } },
    { type: 'cta', durationSeconds: 10, content: { headline: '...', links: ['...'] } },
  ],
  audio: {
    backgroundMusicFile: 'audio/bg-music.mp3',
    backgroundMusicVolume: 0.12,
  },
};
可用场景类型:
title
(标题)、
problem
(问题)、
solution
(解决方案)、
demo
(演示)、
feature
(功能)、
stats
(数据)、
cta
(行动号召)。
时长规则: 估算
durationSeconds
的公式为
ceil(字数 / 2.5) + 2
,你可以在步骤4生成音频后调整这个值。

Step 3: Write Voiceover Script

步骤3:编写配音脚本

Create
projects/PROJECT_NAME/VOICEOVER-SCRIPT.md
:
markdown
undefined
创建
projects/PROJECT_NAME/VOICEOVER-SCRIPT.md
文件:
markdown
undefined

Scene 1: Title (9s, ~17 words)

Scene 1: Title (9s, ~17 words)

Build videos with AI. The product name toolkit makes it easy.
Build videos with AI. The product name toolkit makes it easy.

Scene 2: Problem (14s, ~30 words)

Scene 2: Problem (14s, ~30 words)

The problem statement goes here. Keep it punchy and relatable.

**Word budget per scene:** `(durationSeconds - 2) * 2.5` words. The -2 accounts for 1s audio delay + 1s padding.
The problem statement goes here. Keep it punchy and relatable.

**每个场景的字数预算:** `(时长秒数 - 2) * 2.5` 字,减2是为1秒的音频延迟+1秒的留白预留空间。

Step 4: Generate Assets

步骤4:生成素材

CRITICAL: All commands below MUST be run from the toolkit root, not the project directory.
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
重要提示:以下所有命令必须在工具包根目录执行,不能在项目目录执行。
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit

4a. Background Music

4a. 背景音乐

bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/music_gen.py \
  --preset corporate-bg \
  --duration 90 \
  --output projects/PROJECT_NAME/public/audio/bg-music.mp3 \
  --cloud modal
Presets:
corporate-bg
,
upbeat-tech
,
ambient
,
dramatic
,
tension
,
hopeful
,
cta
,
lofi
.
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/music_gen.py \
  --preset corporate-bg \
  --duration 90 \
  --output projects/PROJECT_NAME/public/audio/bg-music.mp3 \
  --cloud modal
可用预设:
corporate-bg
(企业背景音)、
upbeat-tech
(轻快科技风)、
ambient
(环境音)、
dramatic
(戏剧化)、
tension
(紧张感)、
hopeful
(希望感)、
cta
(行动号召)、
lofi
(低保真)。

4b. Voiceover (per-scene)

4b. 配音(按场景生成)

Generate ONE .mp3 file PER SCENE. Do NOT generate a single voiceover file.
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
每个场景生成单独的.mp3文件,不要生成单条长配音文件。
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit

Scene 01

场景1

python3 tools/qwen3_tts.py
--text "The voiceover text for scene one."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/01.mp3
--cloud modal
python3 tools/qwen3_tts.py
--text "The voiceover text for scene one."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/01.mp3
--cloud modal

Scene 02

场景2

python3 tools/qwen3_tts.py
--text "The voiceover text for scene two."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/02.mp3
--cloud modal
python3 tools/qwen3_tts.py
--text "The voiceover text for scene two."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/02.mp3
--cloud modal

... repeat for each scene

... 为每个场景重复执行


**Speakers:** `Ryan`, `Aiden`, `Vivian`, `Serena`, `Uncle_Fu`, `Dylan`, `Eric`, `Ono_Anna`, `Sohee`
**Tones:** `neutral`, `warm`, `professional`, `excited`, `calm`, `serious`, `storyteller`, `tutorial`

For voice cloning (needs a reference recording):
```bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/qwen3_tts.py \
  --text "Text to speak" \
  --ref-audio assets/voices/reference.m4a \
  --ref-text "Exact transcript of the reference audio" \
  --output projects/PROJECT_NAME/public/audio/scenes/01.mp3 \
  --cloud modal

**可用音色:** `Ryan`、`Aiden`、`Vivian`、`Serena`、`Uncle_Fu`、`Dylan`、`Eric`、`Ono_Anna`、`Sohee`
**可用语气:** `neutral`(中性)、`warm`(温暖)、`professional`(专业)、`excited`(兴奋)、`calm`(平静)、`serious`(严肃)、`storyteller`(讲故事)、`tutorial`(教程)

如需音色克隆(需要参考录音):
```bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/qwen3_tts.py \
  --text "Text to speak" \
  --ref-audio assets/voices/reference.m4a \
  --ref-text "Exact transcript of the reference audio" \
  --output projects/PROJECT_NAME/public/audio/scenes/01.mp3 \
  --cloud modal

4c. Scene Images

4c. 场景图片

bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/flux2.py \
  --prompt "Dark tech background with blue geometric grid, cinematic lighting" \
  --width 1920 --height 1080 \
  --output projects/PROJECT_NAME/public/images/title-bg.png \
  --cloud modal
Image presets (use
--preset
instead of
--prompt --width --height
):
title-bg
,
problem
,
solution
,
demo-bg
,
stats-bg
,
cta
,
thumbnail
,
portrait-bg
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/flux2.py \
  --preset title-bg \
  --output projects/PROJECT_NAME/public/images/title-bg.png \
  --cloud modal
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/flux2.py \
  --prompt "Dark tech background with blue geometric grid, cinematic lighting" \
  --width 1920 --height 1080 \
  --output projects/PROJECT_NAME/public/images/title-bg.png \
  --cloud modal
图片预设(使用
--preset
替代
--prompt --width --height
参数):
title-bg
(标题背景)、
problem
(问题页)、
solution
(解决方案页)、
demo-bg
(演示背景)、
stats-bg
(数据页背景)、
cta
(行动号召页)、
thumbnail
(缩略图)、
portrait-bg
(人像背景)
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/flux2.py \
  --preset title-bg \
  --output projects/PROJECT_NAME/public/images/title-bg.png \
  --cloud modal

4d. Video Clips — B-Roll & Animated Backgrounds (optional)

4d. 视频片段——辅助镜头与动态背景(可选)

Generate AI video clips for b-roll cutaways, animated slide backgrounds, or intro/outro sequences:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
生成AI视频片段用于辅助镜头转场、动态幻灯片背景、开场/结尾序列:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit

B-roll clip from text

文本生成辅助镜头

python3 tools/ltx2.py
--prompt "Aerial drone shot over a European city at golden hour, cinematic wide angle"
--output projects/PROJECT_NAME/public/videos/broll-europe.mp4
--cloud modal
python3 tools/ltx2.py
--prompt "Aerial drone shot over a European city at golden hour, cinematic wide angle"
--output projects/PROJECT_NAME/public/videos/broll-europe.mp4
--cloud modal

Animate a slide/screenshot (image-to-video)

幻灯片/截图动效(图生视频)

python3 tools/ltx2.py
--prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift"
--input projects/PROJECT_NAME/public/images/title-bg.png
--output projects/PROJECT_NAME/public/videos/animated-title.mp4
--cloud modal
python3 tools/ltx2.py
--prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift"
--input projects/PROJECT_NAME/public/images/title-bg.png
--output projects/PROJECT_NAME/public/videos/animated-title.mp4
--cloud modal

Abstract intro/outro background

抽象开场/结尾背景

python3 tools/ltx2.py
--prompt "Dark moody abstract background with flowing blue light streaks, bokeh particles, cinematic"
--output projects/PROJECT_NAME/public/videos/intro-bg.mp4
--cloud modal

Use in Remotion compositions with `<OffthreadVideo>`:
```tsx
<OffthreadVideo src={staticFile('videos/broll-europe.mp4')} />
LTX-2 rules:
  • Max ~8 seconds per clip (193 frames at 24fps). Default is ~5s (121 frames).
  • Width/height must be divisible by 64. Default: 768x512.
  • ~$0.20-0.25 per clip, ~2.5 min generation time.
  • Cold start ~60-90s. Subsequent clips on warm GPU are faster.
  • Generated audio is ambient only — use voiceover/music tools for speech and music.
  • ~30% of generations may have training data artifacts (logos/text). Re-run with
    --seed
    to vary.
python3 tools/ltx2.py
--prompt "Dark moody abstract background with flowing blue light streaks, bokeh particles, cinematic"
--output projects/PROJECT_NAME/public/videos/intro-bg.mp4
--cloud modal

在Remotion合成中通过`<OffthreadVideo>`组件使用:
```tsx
<OffthreadVideo src={staticFile('videos/broll-europe.mp4')} />
LTX-2使用规则:
  • 每个片段最长约8秒(24fps下193帧),默认约5秒(121帧)
  • 宽高必须是64的倍数,默认值:768x512
  • 每个片段成本约0.20-0.25美元,生成时间约2.5分钟
  • 冷启动时间约60-90秒,热GPU上后续生成速度更快
  • 生成的音频只有环境音,语音和音乐请使用对应工具生成
  • 约30%的生成结果可能包含训练数据残留(logo/文字),可添加
    --seed
    参数重新生成

4e. Talking Head Narrator (optional)

4e. 虚拟主播(可选)

Generate a presenter portrait, then animate per-scene clips:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
先生成主播人像,再为每个场景生成对应的动效片段:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit

1. Generate portrait

1. 生成主播人像

python3 tools/flux2.py
--prompt "Professional presenter portrait, clean style, dark background, facing camera, upper body"
--width 1024 --height 576
--output projects/PROJECT_NAME/public/images/presenter.png
--cloud modal
python3 tools/flux2.py
--prompt "Professional presenter portrait, clean style, dark background, facing camera, upper body"
--width 1024 --height 576
--output projects/PROJECT_NAME/public/images/presenter.png
--cloud modal

2. Generate per-scene narrator clips (one per scene, NOT one long video)

2. 按场景生成主播动效片段(每个场景一个,不要生成单条长视频)

python3 tools/sadtalker.py
--image projects/PROJECT_NAME/public/images/presenter.png
--audio projects/PROJECT_NAME/public/audio/scenes/01.mp3
--preprocess full --still --expression-scale 0.8
--output projects/PROJECT_NAME/public/narrator-01.mp4
--cloud modal
python3 tools/sadtalker.py
--image projects/PROJECT_NAME/public/images/presenter.png
--audio projects/PROJECT_NAME/public/audio/scenes/01.mp3
--preprocess full --still --expression-scale 0.8
--output projects/PROJECT_NAME/public/narrator-01.mp4
--cloud modal

Repeat for each scene that needs a narrator

为需要主播的每个场景重复执行


**SadTalker rules — follow these exactly:**
- **ALWAYS** use `--preprocess full` (default `crop` outputs a square, wrong aspect ratio)
- **ALWAYS** use `--still` (reduces head movement, looks professional)
- **ALWAYS** generate per-scene clips (6-15s each), NEVER one long video
- Processing: ~3-4 min per 10s of audio on Modal A10G
- `--expression-scale 0.8` keeps expressions subtle (range 0.0-1.5)

**SadTalker使用规则——请严格遵守:**
- **必须**添加`--preprocess full`参数(默认`crop`会输出正方形,比例错误)
- **必须**添加`--still`参数(减少头部移动,更显专业)
- **必须**按场景生成片段(每个6-15秒),绝对不要生成单条长视频
- 处理速度:Modal A10G上每10秒音频约需3-4分钟
- `--expression-scale 0.8`可以保持表情自然(取值范围0.0-1.5)

4e. Image Editing (optional)

4e. 图片编辑(可选)

Create scene variants from existing images:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/image_edit.py \
  --input projects/PROJECT_NAME/public/images/title-bg.png \
  --prompt "Make it darker with red tones, more ominous" \
  --output projects/PROJECT_NAME/public/images/problem-bg.png \
  --cloud modal
基于现有图片生成场景变体:
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/image_edit.py \
  --input projects/PROJECT_NAME/public/images/title-bg.png \
  --prompt "Make it darker with red tones, more ominous" \
  --output projects/PROJECT_NAME/public/images/problem-bg.png \
  --cloud modal

4f. Upscaling (optional)

4f. 超分辨率(可选)

bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/upscale.py \
  --input projects/PROJECT_NAME/public/images/some-image.png \
  --output projects/PROJECT_NAME/public/images/some-image-4x.png \
  --scale 4 --cloud modal
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/upscale.py \
  --input projects/PROJECT_NAME/public/images/some-image.png \
  --output projects/PROJECT_NAME/public/images/some-image-4x.png \
  --scale 4 --cloud modal

Step 5: Sync Timing

步骤5:同步时长

ALWAYS do this after generating voiceover. Audio duration differs from estimates.
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
for f in projects/PROJECT_NAME/public/audio/scenes/*.mp3; do
  echo "$(basename $f): $(ffprobe -v error -show_entries format=duration -of csv=p=0 "$f")s"
done
Update each scene's
durationSeconds
in
demo-config.ts
to:
ceil(actual_audio_duration + 2)
.
Example: if
01.mp3
is 6.8s, set scene 1
durationSeconds
to
9
(ceil(6.8 + 2) = 9).
请务必在生成配音后执行此步骤,实际音频时长可能和预估不一致。
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
for f in projects/PROJECT_NAME/public/audio/scenes/*.mp3; do
  echo "$(basename $f): $(ffprobe -v error -show_entries format=duration -of csv=p=0 "$f")s"
done
demo-config.ts
中每个场景的
durationSeconds
更新为:
ceil(实际音频时长 + 2)
示例:如果
01.mp3
时长为6.8秒,将场景1的
durationSeconds
设为
9
(ceil(6.8 + 2) = 9)。

Step 6: Review Still Frames

步骤6:预览静态帧

bash
cd ~/.openclaw/workspace/claude-code-video-toolkit/projects/PROJECT_NAME
npx remotion still src/index.ts ProductDemo --frame=100 --output=/tmp/review-scene1.png
npx remotion still src/index.ts ProductDemo --frame=400 --output=/tmp/review-scene2.png
Check: text truncation, animation timing, narrator PiP positioning, background contrast.
bash
cd ~/.openclaw/workspace/claude-code-video-toolkit/projects/PROJECT_NAME
npx remotion still src/index.ts ProductDemo --frame=100 --output=/tmp/review-scene1.png
npx remotion still src/index.ts ProductDemo --frame=400 --output=/tmp/review-scene2.png
检查内容:文字是否截断、动画时长是否合理、主播画中画位置是否合适、背景对比度是否达标。

Step 7: Render

步骤7:渲染

bash
cd ~/.openclaw/workspace/claude-code-video-toolkit/projects/PROJECT_NAME
npm run render
Output:
out/ProductDemo.mp4

bash
cd ~/.openclaw/workspace/claude-code-video-toolkit/projects/PROJECT_NAME
npm run render
输出路径:
out/ProductDemo.mp4

Composition Patterns

合成模式

Per-Scene Audio

分场景音频

Use per-scene audio with a 1-second delay (
from={30}
= 30 frames = 1s at 30fps):
tsx
<Sequence from={30}>
  <Audio src={staticFile('audio/scenes/01.mp3')} volume={1} />
</Sequence>
给分场景音频添加1秒延迟(
from={30}
= 30帧 = 30fps下1秒):
tsx
<Sequence from={30}>
  <Audio src={staticFile('audio/scenes/01.mp3')} volume={1} />
</Sequence>

Per-Scene Narrator PiP

分场景主播画中画

tsx
<Sequence from={30}>
  <OffthreadVideo
    src={staticFile('narrator-01.mp4')}
    style={{ width: 320, height: 180, objectFit: 'cover' }}
    muted
  />
</Sequence>
ALWAYS use
<OffthreadVideo>
, NEVER
<video>
.
Remotion requires its own component for frame-accurate rendering.
tsx
<Sequence from={30}>
  <OffthreadVideo
    src={staticFile('narrator-01.mp4')}
    style={{ width: 320, height: 180, objectFit: 'cover' }}
    muted
  />
</Sequence>
必须使用
<OffthreadVideo>
组件,不要使用原生
<video>
标签。
Remotion需要使用自带组件实现帧精准渲染。

Transitions

转场效果

tsx
import { TransitionSeries, linearTiming } from '@remotion/transitions';
import { fade } from '@remotion/transitions/fade';
import { glitch } from '../../../lib/transitions/presentations/glitch';
import { lightLeak } from '../../../lib/transitions/presentations/light-leak';
NEVER import from
lib/transitions
barrel
— import custom transitions from
lib/transitions/presentations/
directly.

tsx
import { TransitionSeries, linearTiming } from '@remotion/transitions';
import { fade } from '@remotion/transitions/fade';
import { glitch } from '../../../lib/transitions/presentations/glitch';
import { lightLeak } from '../../../lib/transitions/presentations/light-leak';
不要从
lib/transitions
barrel导入
,请直接从
lib/transitions/presentations/
导入自定义转场效果。

Error Recovery

错误排查

ProblemSolution
Tool command fails with "No module named..."Run
pip3 install --break-system-packages -r tools/requirements.txt
from toolkit root
"MODAL_*_ENDPOINT_URL not configured"Check
.env
has the endpoint URL. Run
python3 tools/verify_setup.py
SadTalker output is square/croppedYou forgot
--preprocess full
. Re-run with that flag
Audio too short/long for sceneRe-run Step 5 (sync timing) and update config
npm run render
fails
Make sure you're in the project dir, not toolkit root. Run
npm install
first
"Cannot find module" in RemotionCheck import paths. Custom components use
../../../lib/
relative paths
Cold start timeout on ModalFirst call after idle takes 30-120s. Retry once — second call uses warm GPU
SadTalker client timeout (long audio)The client HTTP request can time out before Modal finishes. Modal still uploads the result to R2. Check
sadtalker/results/
in the
video-toolkit
R2 bucket for the output. Use
python3 -c "import boto3; ..."
with the R2 creds from
.env
to list and generate a presigned URL

问题解决方案
工具命令执行失败,提示"No module named..."回到工具包根目录执行
pip3 install --break-system-packages -r tools/requirements.txt
提示"MODAL_*_ENDPOINT_URL not configured"检查
.env
文件中是否配置了对应端点URL,执行
python3 tools/verify_setup.py
排查
SadTalker输出为正方形/被裁剪你遗漏了
--preprocess full
参数,添加该参数重新运行即可
音频时长与场景不匹配重新执行步骤5(同步时长)并更新配置
npm run render
执行失败
确认你在项目目录而非工具包根目录,先执行
npm install
安装依赖
Remotion提示"Cannot find module"检查导入路径,自定义组件请使用
../../../lib/
相对路径
Modal冷启动超时闲置后的首次调用需要30-120秒,重试一次即可——第二次调用会使用热GPU
SadTalker客户端超时(音频较长时)客户端HTTP请求可能在Modal处理完成前超时,Modal仍会将结果上传到R2。请在
video-toolkit
R2存储桶的
sadtalker/results/
目录下查找输出文件,使用
.env
中的R2凭证执行
python3 -c "import boto3; ..."
列出文件并生成预签名URL

Cost Estimates (Modal)

成本估算(Modal)

ToolTypical CostNotes
Qwen3-TTS~$0.01/scene~20s per scene on warm GPU
FLUX.2~$0.01/image~3s warm, ~30s cold
ACE-Step~$0.02-0.05Depends on duration
SadTalker~$0.05-0.20/scene~3-4 min per 10s audio
Qwen-Edit~$0.03-0.15~8 min cold start (25GB model)
RealESRGAN~$0.005/imageVery fast
LTX-2.3~$0.20-0.25/clip~2.5 min per 5s clip, A100-80GB
Total for a 60s video: ~$1-3 depending on scenes and narrator clips.
Modal Starter plan: $30/month free compute. Apps scale to zero when idle.
工具常规成本说明
Qwen3-TTS约每个场景0.01美元热GPU上每个场景约20秒
FLUX.2约每张图0.01美元热启动约3秒,冷启动约30秒
ACE-Step约0.02-0.05美元取决于时长
SadTalker约每个场景0.05-0.20美元每10秒音频约需3-4分钟
Qwen-Edit约0.03-0.15美元冷启动约8分钟(25GB模型)
RealESRGAN约每张图0.005美元速度极快
LTX-2.3约每个片段0.20-0.25美元每个5秒片段约需2.5分钟,使用A100-80GB
60秒视频总成本: 约1-3美元,取决于场景数量和主播片段数量。
Modal入门套餐:每月30美元免费计算额度,闲置时应用自动缩容到0,不产生费用。