video_toolkit

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Video Toolkit

视频工具包

Create professional explainer videos from a text brief. The toolkit uses open-source AI models on cloud GPUs (Modal or RunPod) for voiceover, image generation, music, and talking head animation. Remotion (React) handles composition and rendering.

只需输入文字脚本即可创建专业的讲解视频。该工具包在云端GPU（Modal或RunPod）上运行开源AI模型，实现配音、图像生成、音乐创作、虚拟主播动效等功能，通过 Remotion（React）完成视频合成与渲染。

CRITICAL: Toolkit Path

重要提示：工具包路径

The toolkit lives at a fixed path. ALWAYS
cd
here before running any tool command.

bash

TOOLKIT=~/.openclaw/workspace/claude-code-video-toolkit
cd $TOOLKIT

NEVER run tool commands from inside a project directory. Tools resolve paths relative to the toolkit root.

工具包位于固定路径下。运行任何工具命令前务必先切换到该目录。

bash

TOOLKIT=~/.openclaw/workspace/claude-code-video-toolkit
cd $TOOLKIT

绝对不要在项目目录内运行工具命令。 所有工具的路径解析都基于工具包根目录。

Setup

安装配置

Step 1: Check Current State

步骤1：检查当前状态

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/verify_setup.py

If everything shows

[x]

, skip to "Quick Test" below. Otherwise continue setup.

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/verify_setup.py

如果所有项都显示

[x]

，可以直接跳转到下方的“快速测试”，否则继续完成配置流程。

Step 2: Install Python Dependencies

步骤2：安装Python依赖

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
pip3 install --break-system-packages -r tools/requirements.txt

Note:

--break-system-packages

is needed on Debian/Ubuntu with managed Python (PEP 668). Safe inside containers.

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
pip3 install --break-system-packages -r tools/requirements.txt

注意：在使用系统托管Python的Debian/Ubuntu系统上需要添加

--break-system-packages

参数（符合PEP 668规范），在容器环境中使用是安全的。

Step 3: Configure Cloud GPU Endpoints

步骤3：配置云端GPU端点

The toolkit needs cloud GPU endpoint URLs in

.env

. Check if

.env

exists and has Modal endpoints:

bash

cat ~/.openclaw/workspace/claude-code-video-toolkit/.env | grep MODAL

If Modal endpoints are configured, you're ready. If not, ask the user to provide Modal endpoint URLs or set up Modal:

bash

pip3 install --break-system-packages modal
python3 -m modal setup   # Opens browser for authentication

工具包需要在

.env

文件中配置云端GPU端点URL。检查

.env

文件是否存在以及是否包含Modal端点配置：

bash

cat ~/.openclaw/workspace/claude-code-video-toolkit/.env | grep MODAL

如果Modal端点已经配置完成，说明准备就绪。如果没有，请让用户提供Modal端点URL或者自行搭建Modal服务：

bash

pip3 install --break-system-packages modal
python3 -m modal setup   # 会打开浏览器完成身份验证

Deploy each tool — capture the endpoint URL from output

部署每个工具——从输出中复制端点URL

cd ~/.openclaw/workspace/claude-code-video-toolkit modal deploy docker/modal-qwen3-tts/app.py modal deploy docker/modal-flux2/app.py modal deploy docker/modal-music-gen/app.py modal deploy docker/modal-sadtalker/app.py modal deploy docker/modal-image-edit/app.py modal deploy docker/modal-upscale/app.py modal deploy docker/modal-propainter/app.py modal deploy docker/modal-ltx2/app.py # Requires: modal secret create huggingface-token HF_TOKEN=hf_...


**LTX-2 prerequisite:** Before deploying LTX-2, create a HuggingFace secret and accept the [Gemma 3 license](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized):
```bash
modal secret create huggingface-token HF_TOKEN=hf_your_read_access_token

Add each URL to

.env

MODAL_QWEN3_TTS_ENDPOINT_URL=https://...modal.run
MODAL_FLUX2_ENDPOINT_URL=https://...modal.run
MODAL_MUSIC_GEN_ENDPOINT_URL=https://...modal.run
MODAL_SADTALKER_ENDPOINT_URL=https://...modal.run
MODAL_IMAGE_EDIT_ENDPOINT_URL=https://...modal.run
MODAL_UPSCALE_ENDPOINT_URL=https://...modal.run
MODAL_DEWATERMARK_ENDPOINT_URL=https://...modal.run
MODAL_LTX2_ENDPOINT_URL=https://...modal.run

Optional but recommended — Cloudflare R2 for reliable file transfer:

R2_ACCOUNT_ID=...
R2_ACCESS_KEY_ID=...
R2_SECRET_ACCESS_KEY=...
R2_BUCKET_NAME=video-toolkit

cd ~/.openclaw/workspace/claude-code-video-toolkit modal deploy docker/modal-qwen3-tts/app.py modal deploy docker/modal-flux2/app.py modal deploy docker/modal-music-gen/app.py modal deploy docker/modal-sadtalker/app.py modal deploy docker/modal-image-edit/app.py modal deploy docker/modal-upscale/app.py modal deploy docker/modal-propainter/app.py modal deploy docker/modal-ltx2/app.py # 需要先执行：modal secret create huggingface-token HF_TOKEN=hf_...


**LTX-2前置要求：** 部署LTX-2之前，需要先创建HuggingFace密钥并同意[Gemma 3许可协议](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized)：
```bash
modal secret create huggingface-token HF_TOKEN=hf_your_read_access_token

将每个端点URL添加到

.env

文件中：

MODAL_QWEN3_TTS_ENDPOINT_URL=https://...modal.run
MODAL_FLUX2_ENDPOINT_URL=https://...modal.run
MODAL_MUSIC_GEN_ENDPOINT_URL=https://...modal.run
MODAL_SADTALKER_ENDPOINT_URL=https://...modal.run
MODAL_IMAGE_EDIT_ENDPOINT_URL=https://...modal.run
MODAL_UPSCALE_ENDPOINT_URL=https://...modal.run
MODAL_DEWATERMARK_ENDPOINT_URL=https://...modal.run
MODAL_LTX2_ENDPOINT_URL=https://...modal.run

可选但推荐配置——用于稳定文件传输的Cloudflare R2：

R2_ACCOUNT_ID=...
R2_ACCESS_KEY_ID=...
R2_SECRET_ACCESS_KEY=...
R2_BUCKET_NAME=video-toolkit

Step 4: Verify and Quick Test

步骤4：验证与快速测试

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/verify_setup.py

All tools should show

[x]

. Then run a quick test to confirm the GPU pipeline works:

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/qwen3_tts.py --text "Hello, this is a test." --speaker Ryan --tone warm --output /tmp/video-toolkit-test.mp3 --cloud modal

If you get a valid .mp3 file, setup is complete. If it fails, check:

```
.env
```
has the correct
```
MODAL_QWEN3_TTS_ENDPOINT_URL
```

Run

python3 tools/verify_setup.py --json

and check

modal_tools

for which endpoints are missing

Cost: Modal includes $30/month free compute. A typical 60s video costs $1-3.

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/verify_setup.py

所有工具都应该显示

[x]

。之后运行快速测试确认GPU流水线正常工作：

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/qwen3_tts.py --text "Hello, this is a test." --speaker Ryan --tone warm --output /tmp/video-toolkit-test.mp3 --cloud modal

如果生成了可用的.mp3文件，说明配置完成。如果失败，请检查：

```
.env
```
中
```
MODAL_QWEN3_TTS_ENDPOINT_URL
```
配置是否正确

执行

python3 tools/verify_setup.py --json

查看

modal_tools

项确认缺失的端点

成本说明： Modal每月提供30美元的免费计算额度，一条常规60秒视频的生成成本约1-3美元。

Creating a Video

创建视频

Step 1: Create Project

步骤1：创建项目

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
cp -r templates/product-demo projects/PROJECT_NAME
cd projects/PROJECT_NAME
npm install

Templates:

product-demo

(marketing/explainer),

sprint-review

sprint-review-v2

(composable scenes).

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
cp -r templates/product-demo projects/PROJECT_NAME
cd projects/PROJECT_NAME
npm install

可用模板：

product-demo

（营销/讲解视频）、

sprint-review

、

sprint-review-v2

（可组合场景）。

Step 2: Write Config

步骤2：编写配置

Edit

projects/PROJECT_NAME/src/config/demo-config.ts

typescript

export const demoConfig: ProductDemoConfig = {
  product: {
    name: 'My Product',
    tagline: 'What it does in one line',
    website: 'example.com',
  },
  scenes: [
    { type: 'title', durationSeconds: 9, content: { headline: '...', subheadline: '...' } },
    { type: 'problem', durationSeconds: 14, content: { headline: '...', problems: ['...', '...'] } },
    { type: 'solution', durationSeconds: 13, content: { headline: '...', highlights: ['...', '...'] } },
    { type: 'stats', durationSeconds: 12, content: { stats: [{value: '99%', label: '...'}, ...] } },
    { type: 'cta', durationSeconds: 10, content: { headline: '...', links: ['...'] } },
  ],
  audio: {
    backgroundMusicFile: 'audio/bg-music.mp3',
    backgroundMusicVolume: 0.12,
  },
};

Scene types:

title

problem

solution

demo

feature

stats

cta

Duration rule: Estimate

durationSeconds

ceil(word_count / 2.5) + 2

. You will adjust this after generating audio in Step 4.

编辑

projects/PROJECT_NAME/src/config/demo-config.ts

文件：

typescript

export const demoConfig: ProductDemoConfig = {
  product: {
    name: 'My Product',
    tagline: 'What it does in one line',
    website: 'example.com',
  },
  scenes: [
    { type: 'title', durationSeconds: 9, content: { headline: '...', subheadline: '...' } },
    { type: 'problem', durationSeconds: 14, content: { headline: '...', problems: ['...', '...'] } },
    { type: 'solution', durationSeconds: 13, content: { headline: '...', highlights: ['...', '...'] } },
    { type: 'stats', durationSeconds: 12, content: { stats: [{value: '99%', label: '...'}, ...] } },
    { type: 'cta', durationSeconds: 10, content: { headline: '...', links: ['...'] } },
  ],
  audio: {
    backgroundMusicFile: 'audio/bg-music.mp3',
    backgroundMusicVolume: 0.12,
  },
};

可用场景类型：

title

（标题）、

problem

（问题）、

solution

（解决方案）、

demo

（演示）、

feature

（功能）、

stats

（数据）、

cta

（行动号召）。

时长规则： 估算

durationSeconds

的公式为

ceil(字数 / 2.5) + 2

，你可以在步骤4生成音频后调整这个值。

Step 3: Write Voiceover Script

步骤3：编写配音脚本

Create

projects/PROJECT_NAME/VOICEOVER-SCRIPT.md

markdown

undefined

创建

projects/PROJECT_NAME/VOICEOVER-SCRIPT.md

文件：

markdown

undefined

Scene 1: Title (9s, ~17 words)

Build videos with AI. The product name toolkit makes it easy.

Scene 2: Problem (14s, ~30 words)

The problem statement goes here. Keep it punchy and relatable.


**Word budget per scene:** `(durationSeconds - 2) * 2.5` words. The -2 accounts for 1s audio delay + 1s padding.

The problem statement goes here. Keep it punchy and relatable.


**每个场景的字数预算：** `(时长秒数 - 2) * 2.5` 字，减2是为1秒的音频延迟+1秒的留白预留空间。

Step 4: Generate Assets

步骤4：生成素材

CRITICAL: All commands below MUST be run from the toolkit root, not the project directory.

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit

重要提示：以下所有命令必须在工具包根目录执行，不能在项目目录执行。

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit

4a. Background Music

4a. 背景音乐

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/music_gen.py \
  --preset corporate-bg \
  --duration 90 \
  --output projects/PROJECT_NAME/public/audio/bg-music.mp3 \
  --cloud modal

Presets:

corporate-bg

upbeat-tech

ambient

dramatic

tension

hopeful

cta

lofi

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/music_gen.py \
  --preset corporate-bg \
  --duration 90 \
  --output projects/PROJECT_NAME/public/audio/bg-music.mp3 \
  --cloud modal

可用预设：

corporate-bg

（企业背景音）、

upbeat-tech

（轻快科技风）、

ambient

（环境音）、

dramatic

（戏剧化）、

tension

（紧张感）、

hopeful

（希望感）、

cta

（行动号召）、

lofi

（低保真）。

4b. Voiceover (per-scene)

4b. 配音（按场景生成）

Generate ONE .mp3 file PER SCENE. Do NOT generate a single voiceover file.

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit

每个场景生成单独的.mp3文件，不要生成单条长配音文件。

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit

Scene 01

场景1

python3 tools/qwen3_tts.py
--text "The voiceover text for scene one."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/01.mp3
--cloud modal

Scene 02

场景2

python3 tools/qwen3_tts.py
--text "The voiceover text for scene two."
--speaker Ryan --tone warm
--output projects/PROJECT_NAME/public/audio/scenes/02.mp3
--cloud modal

... repeat for each scene

... 为每个场景重复执行


**Speakers:** `Ryan`, `Aiden`, `Vivian`, `Serena`, `Uncle_Fu`, `Dylan`, `Eric`, `Ono_Anna`, `Sohee`
**Tones:** `neutral`, `warm`, `professional`, `excited`, `calm`, `serious`, `storyteller`, `tutorial`

For voice cloning (needs a reference recording):
```bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/qwen3_tts.py \
  --text "Text to speak" \
  --ref-audio assets/voices/reference.m4a \
  --ref-text "Exact transcript of the reference audio" \
  --output projects/PROJECT_NAME/public/audio/scenes/01.mp3 \
  --cloud modal


**可用音色：** `Ryan`、`Aiden`、`Vivian`、`Serena`、`Uncle_Fu`、`Dylan`、`Eric`、`Ono_Anna`、`Sohee`
**可用语气：** `neutral`（中性）、`warm`（温暖）、`professional`（专业）、`excited`（兴奋）、`calm`（平静）、`serious`（严肃）、`storyteller`（讲故事）、`tutorial`（教程）

如需音色克隆（需要参考录音）：
```bash
cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/qwen3_tts.py \
  --text "Text to speak" \
  --ref-audio assets/voices/reference.m4a \
  --ref-text "Exact transcript of the reference audio" \
  --output projects/PROJECT_NAME/public/audio/scenes/01.mp3 \
  --cloud modal

4c. Scene Images

4c. 场景图片

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/flux2.py \
  --prompt "Dark tech background with blue geometric grid, cinematic lighting" \
  --width 1920 --height 1080 \
  --output projects/PROJECT_NAME/public/images/title-bg.png \
  --cloud modal

Image presets (use

--preset

instead of

--prompt --width --height

title-bg

problem

solution

demo-bg

stats-bg

cta

thumbnail

portrait-bg

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/flux2.py \
  --preset title-bg \
  --output projects/PROJECT_NAME/public/images/title-bg.png \
  --cloud modal

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/flux2.py \
  --prompt "Dark tech background with blue geometric grid, cinematic lighting" \
  --width 1920 --height 1080 \
  --output projects/PROJECT_NAME/public/images/title-bg.png \
  --cloud modal

图片预设（使用

--preset

替代

--prompt --width --height

参数）：

title-bg

（标题背景）、

problem

（问题页）、

solution

（解决方案页）、

demo-bg

（演示背景）、

stats-bg

（数据页背景）、

cta

（行动号召页）、

thumbnail

（缩略图）、

portrait-bg

（人像背景）

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/flux2.py \
  --preset title-bg \
  --output projects/PROJECT_NAME/public/images/title-bg.png \
  --cloud modal

4d. Video Clips — B-Roll & Animated Backgrounds (optional)

4d. 视频片段——辅助镜头与动态背景（可选）

Generate AI video clips for b-roll cutaways, animated slide backgrounds, or intro/outro sequences:

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit

生成AI视频片段用于辅助镜头转场、动态幻灯片背景、开场/结尾序列：

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit

B-roll clip from text

文本生成辅助镜头

python3 tools/ltx2.py
--prompt "Aerial drone shot over a European city at golden hour, cinematic wide angle"
--output projects/PROJECT_NAME/public/videos/broll-europe.mp4
--cloud modal

Animate a slide/screenshot (image-to-video)

幻灯片/截图动效（图生视频）

python3 tools/ltx2.py
--prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift"
--input projects/PROJECT_NAME/public/images/title-bg.png
--output projects/PROJECT_NAME/public/videos/animated-title.mp4
--cloud modal

Abstract intro/outro background

抽象开场/结尾背景

python3 tools/ltx2.py
--prompt "Dark moody abstract background with flowing blue light streaks, bokeh particles, cinematic"
--output projects/PROJECT_NAME/public/videos/intro-bg.mp4
--cloud modal


Use in Remotion compositions with `<OffthreadVideo>`:
```tsx
<OffthreadVideo src={staticFile('videos/broll-europe.mp4')} />

LTX-2 rules:

Max ~8 seconds per clip (193 frames at 24fps). Default is ~5s (121 frames).
Width/height must be divisible by 64. Default: 768x512.
~$0.20-0.25 per clip, ~2.5 min generation time.
Cold start ~60-90s. Subsequent clips on warm GPU are faster.
Generated audio is ambient only — use voiceover/music tools for speech and music.
~30% of generations may have training data artifacts (logos/text). Re-run with
```
--seed
```
to vary.

python3 tools/ltx2.py
--prompt "Dark moody abstract background with flowing blue light streaks, bokeh particles, cinematic"
--output projects/PROJECT_NAME/public/videos/intro-bg.mp4
--cloud modal


在Remotion合成中通过`<OffthreadVideo>`组件使用：
```tsx
<OffthreadVideo src={staticFile('videos/broll-europe.mp4')} />

LTX-2使用规则：

每个片段最长约8秒（24fps下193帧），默认约5秒（121帧）
宽高必须是64的倍数，默认值：768x512
每个片段成本约0.20-0.25美元，生成时间约2.5分钟
冷启动时间约60-90秒，热GPU上后续生成速度更快
生成的音频只有环境音，语音和音乐请使用对应工具生成
约30%的生成结果可能包含训练数据残留（logo/文字），可添加
```
--seed
```
参数重新生成

4e. Talking Head Narrator (optional)

4e. 虚拟主播（可选）

Generate a presenter portrait, then animate per-scene clips:

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit

先生成主播人像，再为每个场景生成对应的动效片段：

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit

1. Generate portrait

1. 生成主播人像

python3 tools/flux2.py
--prompt "Professional presenter portrait, clean style, dark background, facing camera, upper body"
--width 1024 --height 576
--output projects/PROJECT_NAME/public/images/presenter.png
--cloud modal

2. Generate per-scene narrator clips (one per scene, NOT one long video)

2. 按场景生成主播动效片段（每个场景一个，不要生成单条长视频）

python3 tools/sadtalker.py
--image projects/PROJECT_NAME/public/images/presenter.png
--audio projects/PROJECT_NAME/public/audio/scenes/01.mp3
--preprocess full --still --expression-scale 0.8
--output projects/PROJECT_NAME/public/narrator-01.mp4
--cloud modal

Repeat for each scene that needs a narrator

为需要主播的每个场景重复执行


**SadTalker rules — follow these exactly:**
- **ALWAYS** use `--preprocess full` (default `crop` outputs a square, wrong aspect ratio)
- **ALWAYS** use `--still` (reduces head movement, looks professional)
- **ALWAYS** generate per-scene clips (6-15s each), NEVER one long video
- Processing: ~3-4 min per 10s of audio on Modal A10G
- `--expression-scale 0.8` keeps expressions subtle (range 0.0-1.5)


**SadTalker使用规则——请严格遵守：**
- **必须**添加`--preprocess full`参数（默认`crop`会输出正方形，比例错误）
- **必须**添加`--still`参数（减少头部移动，更显专业）
- **必须**按场景生成片段（每个6-15秒），绝对不要生成单条长视频
- 处理速度：Modal A10G上每10秒音频约需3-4分钟
- `--expression-scale 0.8`可以保持表情自然（取值范围0.0-1.5）

4e. Image Editing (optional)

4e. 图片编辑（可选）

Create scene variants from existing images:

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/image_edit.py \
  --input projects/PROJECT_NAME/public/images/title-bg.png \
  --prompt "Make it darker with red tones, more ominous" \
  --output projects/PROJECT_NAME/public/images/problem-bg.png \
  --cloud modal

基于现有图片生成场景变体：

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/image_edit.py \
  --input projects/PROJECT_NAME/public/images/title-bg.png \
  --prompt "Make it darker with red tones, more ominous" \
  --output projects/PROJECT_NAME/public/images/problem-bg.png \
  --cloud modal

4f. Upscaling (optional)

4f. 超分辨率（可选）

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/upscale.py \
  --input projects/PROJECT_NAME/public/images/some-image.png \
  --output projects/PROJECT_NAME/public/images/some-image-4x.png \
  --scale 4 --cloud modal

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
python3 tools/upscale.py \
  --input projects/PROJECT_NAME/public/images/some-image.png \
  --output projects/PROJECT_NAME/public/images/some-image-4x.png \
  --scale 4 --cloud modal

Step 5: Sync Timing

步骤5：同步时长

ALWAYS do this after generating voiceover. Audio duration differs from estimates.

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
for f in projects/PROJECT_NAME/public/audio/scenes/*.mp3; do
  echo "$(basename $f): $(ffprobe -v error -show_entries format=duration -of csv=p=0 "$f")s"
done

Update each scene's

durationSeconds

demo-config.ts

to:

ceil(actual_audio_duration + 2)

Example: if

01.mp3

is 6.8s, set scene 1

durationSeconds

(ceil(6.8 + 2) = 9).

请务必在生成配音后执行此步骤，实际音频时长可能和预估不一致。

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit
for f in projects/PROJECT_NAME/public/audio/scenes/*.mp3; do
  echo "$(basename $f): $(ffprobe -v error -show_entries format=duration -of csv=p=0 "$f")s"
done

将

demo-config.ts

中每个场景的

durationSeconds

更新为：

ceil(实际音频时长 + 2)

。

示例：如果

01.mp3

时长为6.8秒，将场景1的

durationSeconds

设为

（ceil(6.8 + 2) = 9）。

Step 6: Review Still Frames

步骤6：预览静态帧

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit/projects/PROJECT_NAME
npx remotion still src/index.ts ProductDemo --frame=100 --output=/tmp/review-scene1.png
npx remotion still src/index.ts ProductDemo --frame=400 --output=/tmp/review-scene2.png

Check: text truncation, animation timing, narrator PiP positioning, background contrast.

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit/projects/PROJECT_NAME
npx remotion still src/index.ts ProductDemo --frame=100 --output=/tmp/review-scene1.png
npx remotion still src/index.ts ProductDemo --frame=400 --output=/tmp/review-scene2.png

检查内容：文字是否截断、动画时长是否合理、主播画中画位置是否合适、背景对比度是否达标。

Step 7: Render

步骤7：渲染

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit/projects/PROJECT_NAME
npm run render

Output:

out/ProductDemo.mp4

bash

cd ~/.openclaw/workspace/claude-code-video-toolkit/projects/PROJECT_NAME
npm run render

输出路径：

out/ProductDemo.mp4

Composition Patterns

合成模式

Per-Scene Audio

分场景音频

Use per-scene audio with a 1-second delay (

from={30}

= 30 frames = 1s at 30fps):

tsx

<Sequence from={30}>
  <Audio src={staticFile('audio/scenes/01.mp3')} volume={1} />
</Sequence>

给分场景音频添加1秒延迟（

from={30}

= 30帧 = 30fps下1秒）：

tsx

<Sequence from={30}>
  <Audio src={staticFile('audio/scenes/01.mp3')} volume={1} />
</Sequence>

Per-Scene Narrator PiP

分场景主播画中画

tsx

<Sequence from={30}>
  <OffthreadVideo
    src={staticFile('narrator-01.mp4')}
    style={{ width: 320, height: 180, objectFit: 'cover' }}
    muted
  />
</Sequence>

ALWAYS use
<OffthreadVideo>
, NEVER
<video>
. Remotion requires its own component for frame-accurate rendering.

tsx

<Sequence from={30}>
  <OffthreadVideo
    src={staticFile('narrator-01.mp4')}
    style={{ width: 320, height: 180, objectFit: 'cover' }}
    muted
  />
</Sequence>

必须使用
<OffthreadVideo>
组件，不要使用原生
<video>
标签。 Remotion需要使用自带组件实现帧精准渲染。

Transitions

转场效果

tsx

import { TransitionSeries, linearTiming } from '@remotion/transitions';
import { fade } from '@remotion/transitions/fade';
import { glitch } from '../../../lib/transitions/presentations/glitch';
import { lightLeak } from '../../../lib/transitions/presentations/light-leak';

NEVER import from
lib/transitions
barrel — import custom transitions from

lib/transitions/presentations/

directly.

tsx

import { TransitionSeries, linearTiming } from '@remotion/transitions';
import { fade } from '@remotion/transitions/fade';
import { glitch } from '../../../lib/transitions/presentations/glitch';
import { lightLeak } from '../../../lib/transitions/presentations/light-leak';

不要从
lib/transitions
barrel导入，请直接从

lib/transitions/presentations/

导入自定义转场效果。

Error Recovery

错误排查

Problem	Solution
Tool command fails with "No module named..."	Run `pip3 install --break-system-packages -r tools/requirements.txt` from toolkit root
"MODAL_*_ENDPOINT_URL not configured"	Check `.env` has the endpoint URL. Run `python3 tools/verify_setup.py`
SadTalker output is square/cropped	You forgot `--preprocess full` . Re-run with that flag
Audio too short/long for scene	Re-run Step 5 (sync timing) and update config
`npm run render` fails	Make sure you're in the project dir, not toolkit root. Run `npm install` first
"Cannot find module" in Remotion	Check import paths. Custom components use `../../../lib/` relative paths
Cold start timeout on Modal	First call after idle takes 30-120s. Retry once — second call uses warm GPU
SadTalker client timeout (long audio)	The client HTTP request can time out before Modal finishes. Modal still uploads the result to R2. Check `sadtalker/results/` in the `video-toolkit` R2 bucket for the output. Use `python3 -c "import boto3; ..."` with the R2 creds from `.env` to list and generate a presigned URL

问题	解决方案
工具命令执行失败，提示"No module named..."	回到工具包根目录执行 `pip3 install --break-system-packages -r tools/requirements.txt`
提示"MODAL_*_ENDPOINT_URL not configured"	检查 `.env` 文件中是否配置了对应端点URL，执行 `python3 tools/verify_setup.py` 排查
SadTalker输出为正方形/被裁剪	你遗漏了 `--preprocess full` 参数，添加该参数重新运行即可
音频时长与场景不匹配	重新执行步骤5（同步时长）并更新配置
`npm run render` 执行失败	确认你在项目目录而非工具包根目录，先执行 `npm install` 安装依赖
Remotion提示"Cannot find module"	检查导入路径，自定义组件请使用 `../../../lib/` 相对路径
Modal冷启动超时	闲置后的首次调用需要30-120秒，重试一次即可——第二次调用会使用热GPU
SadTalker客户端超时（音频较长时）	客户端HTTP请求可能在Modal处理完成前超时，Modal仍会将结果上传到R2。请在 `video-toolkit` R2存储桶的 `sadtalker/results/` 目录下查找输出文件，使用 `.env` 中的R2凭证执行 `python3 -c "import boto3; ..."` 列出文件并生成预签名URL

Cost Estimates (Modal)

成本估算（Modal）

Tool	Typical Cost	Notes
Qwen3-TTS	~$0.01/scene	~20s per scene on warm GPU
FLUX.2	~$0.01/image	~3s warm, ~30s cold
ACE-Step	~$0.02-0.05	Depends on duration
SadTalker	~$0.05-0.20/scene	~3-4 min per 10s audio
Qwen-Edit	~$0.03-0.15	~8 min cold start (25GB model)
RealESRGAN	~$0.005/image	Very fast
LTX-2.3	~$0.20-0.25/clip	~2.5 min per 5s clip, A100-80GB

Total for a 60s video: ~$1-3 depending on scenes and narrator clips.

Modal Starter plan: $30/month free compute. Apps scale to zero when idle.

工具	常规成本	说明
Qwen3-TTS	约每个场景0.01美元	热GPU上每个场景约20秒
FLUX.2	约每张图0.01美元	热启动约3秒，冷启动约30秒
ACE-Step	约0.02-0.05美元	取决于时长
SadTalker	约每个场景0.05-0.20美元	每10秒音频约需3-4分钟
Qwen-Edit	约0.03-0.15美元	冷启动约8分钟（25GB模型）
RealESRGAN	约每张图0.005美元	速度极快
LTX-2.3	约每个片段0.20-0.25美元	每个5秒片段约需2.5分钟，使用A100-80GB

60秒视频总成本： 约1-3美元，取决于场景数量和主播片段数量。

Modal入门套餐：每月30美元免费计算额度，闲置时应用自动缩容到0，不产生费用。