video

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

VoxFlow Video Skill

VoxFlow Video Skill

Generate short-form videos with AI: LLM writes the script, AI draws cards or scenes, TTS narrates, FFmpeg / Remotion renders the final MP4.
For PaperSlide / paper-textured article-to-card reels, switch to
voxflow:paper-slide
.
Five entry points — pick by what the user wants:
CommandOutputUse when
picstory
Vertical/landscape MP4 with hand-drawn cards or cinematic scenes"知识卡片视频", 小红书 / Twitter edu, sketchnote tutorials
present
1080×1920 narrated card video, 5 visual schemesPitch decks, explainer reels, branded short-form
explain
MP4 explainer with title / bullets / summary scenes"What is X?" tutorials, course intros
slides
Self-contained HTML deck with embedded TTS audioProduct launches, talks, share-as-link
image
Single PNGOne-off illustrations / thumbnails (Hunyuan TextToImage)
借助AI生成短视频:由LLM撰写脚本,AI绘制卡片或场景,TTS生成旁白,FFmpeg / Remotion渲染最终MP4文件。
若需生成PaperSlide / 纸质纹理风格的文章转卡片视频,请切换至
voxflow:paper-slide
五个入口选项——根据用户需求选择:
命令输出内容使用场景
picstory
包含手绘卡片或电影级场景的竖屏/横屏MP4"知识卡片视频"、小红书 / Twitter教育内容、手绘笔记教程
present
1080×1920带旁白的卡片视频,5种视觉方案路演演示、讲解短视频、品牌短视频
explain
包含标题/要点/总结场景的MP4讲解视频"什么是X?"类教程、课程引言
slides
嵌入TTS音频的独立HTML演示文稿产品发布、演讲、可链接分享
image
单张PNG图片一次性插图/缩略图(基于Hunyuan TextToImage)

Prerequisites

前置条件

  • npm install -g voxflow
    and
    voxflow login
  • ffmpeg
    installed (
    brew install ffmpeg
    /
    sudo apt install ffmpeg
    ) — required for MP4 render
  • For
    present
    /
    explain
    (Remotion-backed): the local plugin install includes
    remotion-cards/
    . If
    present
    says "Remotion not ready", run
    npm install
    inside the bundled
    remotion-cards/
    directory. For
    explain
    , you can skip local Remotion with
    --cloud
    .

  • 执行
    npm install -g voxflow
    voxflow login
  • 安装
    ffmpeg
    (执行
    brew install ffmpeg
    /
    sudo apt install ffmpeg
    )——MP4渲染必需
  • 对于
    present
    /
    explain
    (基于Remotion):本地插件安装包含
    remotion-cards/
    目录。若
    present
    提示"Remotion未就绪",在捆绑的
    remotion-cards/
    目录内执行
    npm install
    。对于
    explain
    ,可添加
    --cloud
    参数跳过本地Remotion渲染。

🎴 picstory — knowledge-card video

🎴 picstory — 知识卡片视频

LLM writes a structured script, AI draws one card per scene, TTS narrates, FFmpeg assembles. Best for 小红书 / Twitter edu / TikTok.
由LLM撰写结构化脚本,AI为每个场景绘制一张卡片,TTS生成旁白,FFmpeg组装视频。最适合小红书 / Twitter教育内容 / TikTok平台。

Quick start

快速开始

bash
undefined
bash
undefined

Default: Chinese, sketchnote style, portrait, 5 scenes

默认设置:中文、手绘笔记风格、竖屏、5个场景

voxflow picstory --topic "AI Agent 入门指南"
voxflow picstory --topic "AI Agent 入门指南"

2-scene quick test (no full video render)

2场景快速测试(不渲染完整视频)

voxflow picstory --topic "AI 入门" --scenes 2 --image-only
voxflow picstory --topic "AI 入门" --scenes 2 --image-only

English landscape video

英文横屏视频

voxflow picstory --topic "How React Hooks Work" --language en --ratio landscape --style photo

Output: `picstory-<timestamp>.mp4` + `.json` (script).
voxflow picstory --topic "How React Hooks Work" --language en --ratio landscape --style photo

输出:`picstory-<timestamp>.mp4` + `.json`(脚本文件)。

Card-type styles (structured heading + key points)

卡片类风格(结构化标题+要点)

--style
LookBest for
sketchnote
(default)
Colorful hand-drawn bullet journalKnowledge sharing, tutorials
neon_noir
Cyberpunk dark with neon glowTech, startup
minimal_3d
Soft 3D clay on pastel gradients小红书 lifestyle
chalkboard
White chalk on dark greenScience, academic
--style
参数
视觉效果最佳适用场景
sketchnote
(默认)
彩色手绘子弹笔记风格知识分享、教程
neon_noir
赛博朋克暗黑霓虹风格科技、创业主题
minimal_3d
柔和3D黏土搭配淡彩渐变小红书生活方式内容
chalkboard
深绿背景白色粉笔风格科学、学术内容

Scene-type styles (free-form illustration)

场景类风格(自由形式插图)

--style
LookBest for
photo
Cinematic / photo-realStorytelling, travel
manga_panel
Japanese manga ink lineworkDrama, step-by-step guides
vintage_newspaper
1940s broadsheetHistory, factual stories
--style
参数
视觉效果最佳适用场景
photo
电影级/写实风格故事讲述、旅行内容
manga_panel
日本漫画墨线风格剧情类、分步指南
vintage_newspaper
1940年代报纸风格历史、纪实内容

Ratios

画幅比例

--ratio
PixelsPlatform
portrait
(default)
1080×1920小红书, TikTok, Reels, 抖音
landscape
1920×1080YouTube, B站
square
1080×1080Instagram, Twitter
--ratio
参数
像素尺寸适用平台
portrait
(默认)
1080×1920小红书、TikTok、Reels、抖音
landscape
1920×1080YouTube、B站
square
1080×1080Instagram、Twitter

Script-model presets (
--script-model
)

脚本模型预设(
--script-model

PresetProviderStrength
omittedserver configBalanced default (gpt-4o-mini)
gemini-flash
OpenRouterMultilingual, good Chinese
deepseek
DeepSeekCheapest, excellent Chinese
hunyuan
腾讯混元Chinese-native
moonshot
MoonshotChinese long context
Server enforces an allowlist — only the preset names above work; arbitrary model IDs are rejected.
预设名称提供商优势
省略服务器配置平衡型默认设置(gpt-4o-mini)
gemini-flash
OpenRouter多语言支持,中文表现优秀
deepseek
DeepSeek成本最低,中文表现极佳
hunyuan
腾讯混元原生中文支持
moonshot
Moonshot中文长上下文处理
服务器启用白名单机制——仅支持上述预设名称,自定义模型ID会被拒绝。

Image quality (
--quality
)

图像质量(
--quality

Image generation is the only meaningful cost (~$0.005-0.08 per image). LLM script (~2K tokens) is negligible.
--quality
ProviderStrength5-image cost
fast
(default)
OpenRouter Gemini FlashCheapest, balanced~$0.025
hd
OpenRouter Gemini ProHigher detailmid
ultra
OpenRouter gpt-5.4-image-2Best overall, ~16× cost~$0.40
fast-aiberm
Aiberm Gemini FlashCheap Aiberm tierlow
hd-aiberm
Aiberm Gemini ProStrongest Chinese text rendering — best for 小红书 cards with Chinese headersmid
Use
fast
for iteration;
hd-aiberm
when cards must contain accurate Chinese characters;
ultra
for hero exports.
图像生成是唯一产生显著成本的环节(每张图片约0.005-0.08美元)。LLM脚本(约2K tokens)成本可忽略不计。
--quality
参数
提供商优势5张图片成本
fast
(默认)
OpenRouter Gemini Flash成本最低,效果平衡~$0.025
hd
OpenRouter Gemini Pro更高细节中等
ultra
OpenRouter gpt-5.4-image-2整体效果最佳,成本约为fast的16倍~$0.40
fast-aiberm
Aiberm Gemini Flash低成本Aiberm tier
hd-aiberm
Aiberm Gemini Pro中文文本渲染效果最优——最适合带中文标题的小红书卡片中等
迭代阶段使用
fast
;需要精准中文文字的卡片使用
hd-aiberm
;最终成品导出使用
ultra

Full options

完整参数选项

FlagDefaultDescription
--topic <text>
required (or
--text
)
Story topic
--text <content>
Paste full article instead of a topic
--style <name>
sketchnote
See styles above
--ratio <name>
portrait
portrait
|
landscape
|
square
--language <code>
zh
zh
|
en
|
ja
|
ko
| ...
--scenes <n>
5
2–10. Use
2
for quick tests.
--script-model <preset>
server defaultSee presets above
--quality <tier>
fast
See table above
--voice <id>
defaultTTS voice from
voxflow voices
--speed <n>
1.0
TTS speed 0.5–2.0
--bgm <file>
Background music (mp3/wav) mixed under narration
--bgm-volume <n>
0.1
BGM volume 0–1
--fade <n>
0.4
Per-scene fade in/out seconds (
0
to disable)
--image-only
falseSave images + audio without final video render
--output-dir <dir>
Directory for all outputs
--output <path>
autoFinal MP4 path
参数默认值描述
--topic <text>
必填(或使用
--text
故事主题
--text <content>
粘贴完整文章而非主题
--style <name>
sketchnote
参考上述风格列表
--ratio <name>
portrait
portrait
|
landscape
|
square
--language <code>
zh
zh
|
en
|
ja
|
ko
| ...
--scenes <n>
5
2–10。快速测试建议设为2
--script-model <preset>
服务器默认参考上述预设列表
--quality <tier>
fast
参考上述质量表格
--voice <id>
默认
voxflow voices
获取TTS语音ID
--speed <n>
1.0
TTS语速0.5–2.0
--bgm <file>
背景音乐(mp3/wav格式),混合在旁白下方
--bgm-volume <n>
0.1
背景音乐音量0–1
--fade <n>
0.4
每个场景的淡入淡出时长(设为0可禁用)
--image-only
false仅保存图片和音频,不渲染最终视频
--output-dir <dir>
所有输出文件的存储目录
--output <path>
自动生成最终MP4文件路径

Quota cost

配额消耗

OperationQuota
LLM script100
TTS / scene100
Image / scene500
2-scene test~1,300
5-scene full~3,100
Free tier (10K/month) ≈ 3 full picstory videos.
操作配额
LLM脚本生成100
单场景TTS100
单场景图像生成500
2场景测试~1,300
5场景完整视频~3,100
免费额度(每月10K)≈ 3个完整picstory视频。

Pipeline

处理流程

Topic / Text
  ├─[1] LLM script → { title, scenes: [{ heading, keyPoints, narration }] }
  ├─[2] TTS per scene → PCM audio
  ├─[3] Image per scene (parallel) → JPEG via /api/image/generate
  └─[4] FFmpeg: image + WAV → Ken Burns zoompan MP4 → concat → +BGM → final.mp4
主题 / 文本
  ├─[1] LLM生成脚本 → { title, scenes: [{ heading, keyPoints, narration }] }
  ├─[2] 为每个场景生成TTS → PCM音频
  ├─[3] 为每个场景生成图像(并行处理)→ 通过/api/image/generate接口获取JPEG
  └─[4] FFmpeg处理:图片 + WAV → 应用Ken Burns缩放平移效果生成MP4 → 拼接视频 → 添加背景音乐 → final.mp4

Troubleshooting

故障排查

ProblemFix
ffmpeg not found
brew install ffmpeg
(mac) /
sudo apt install ffmpeg
Image generation failed
Retry; if stuck on
hd
, drop to
--quality fast
Render too slowKen Burns is CPU-heavy. Use
--scenes 2 --image-only
to validate content first.
model_not_allowed
Use a
--script-model
preset name, not a raw model ID
Chinese text mangled in cardsSwitch to
--quality hd-aiberm

问题解决方案
ffmpeg not found
执行
brew install ffmpeg
(Mac)/
sudo apt install ffmpeg
(Linux)
Image generation failed
重试;若
hd
质量失败,切换为
--quality fast
渲染速度过慢Ken Burns效果对CPU要求高。先使用
--scenes 2 --image-only
验证内容
model_not_allowed
使用
--script-model
预设名称,而非原始模型ID
卡片中中文文本显示异常切换为
--quality hd-aiberm

📑 present — narrated card video (Remotion local)

📑 present — 带旁白的卡片视频(本地Remotion渲染)

Text or URL → LLM cards → TTS → Remotion render. 1080×1920, 5 visual schemes.
bash
voxflow present --text "Claude Code 是一个 AI 编程工具" --style aurora
voxflow present --url https://example.com/article --style noir
voxflow present --text "2025 AI 芯片格局" --web-search --style neon
voxflow present --cards pre-generated.json --no-audio
FlagDefaultNotes
--text
/
--url
/
--cards
one requiredinput source
--style
aurora
noir
|
neon
|
editorial
|
aurora
|
brutalist
--voice <id>
v-female-R2s4N9qJ
TTS voice
--speed <n>
1.0
0.5–2.0
--no-audio
falseSilent video only
--web-search
falseAugment LLM with up-to-date web facts
--output <path>
./present-<ts>.mp4
.mp4
or
.wav
If
present
reports "Remotion not ready", run
npm install
inside the bundled
remotion-cards/
directory to set up the local renderer.

文本或URL → LLM生成卡片 → TTS生成旁白 → Remotion渲染。分辨率1080×1920,5种视觉方案。
bash
voxflow present --text "Claude Code 是一个 AI 编程工具" --style aurora
voxflow present --url https://example.com/article --style noir
voxflow present --text "2025 AI 芯片格局" --web-search --style neon
voxflow present --cards pre-generated.json --no-audio
参数默认值说明
--text
/
--url
/
--cards
必填其一输入来源
--style
aurora
noir
|
neon
|
editorial
|
aurora
|
brutalist
--voice <id>
v-female-R2s4N9qJ
TTS语音ID
--speed <n>
1.0
语速0.5–2.0
--no-audio
false仅生成无声视频
--web-search
false结合最新网络信息增强LLM内容
--output <path>
./present-<ts>.mp4
支持
.mp4
.wav
格式
present
提示"Remotion未就绪",在捆绑的
remotion-cards/
目录内执行
npm install
以配置本地渲染器。

🧠 explain — AI explainer video

🧠 explain — AI讲解视频

Title / bullets / summary scene flow. Best for "What is X?" tutorials.
bash
voxflow explain --topic "What is React?"
voxflow explain --topic demo --output demo.mp4         # built-in demo (no API call)
voxflow explain --topic "区块链入门" --style chalkboard --voice v-male-Bk7vD3xP
voxflow explain --topic "Machine Learning" --audio-only
voxflow explain --topic "AI Agent 入门" --cloud         # render on server
FlagDefaultNotes
--topic <text>
requiredUse
demo
for built-in demo script
--style
modern
modern
|
playful
|
corporate
|
chalkboard
--language
en
en
|
zh
|
ja
|
ko
| ...
--voice <id>
v-female-R2s4N9qJ
--scenes <n>
5
3–12
--audio-only
falseSkip render, output WAV only
--cloud
falseUse cloud Remotion instead of local
--output <path>
./explain-<ts>.mp4

标题/要点/总结场景流程。最适合"什么是X?"类教程。
bash
voxflow explain --topic "What is React?"
voxflow explain --topic demo --output demo.mp4         # 内置演示(无需API调用)
voxflow explain --topic "区块链入门" --style chalkboard --voice v-male-Bk7vD3xP
voxflow explain --topic "Machine Learning" --audio-only
voxflow explain --topic "AI Agent 入门" --cloud         # 服务器端渲染
参数默认值说明
--topic <text>
必填使用
demo
调用内置演示脚本
--style
modern
modern
|
playful
|
corporate
|
chalkboard
--language
en
en
|
zh
|
ja
|
ko
| ...
--voice <id>
v-female-R2s4N9qJ
TTS语音ID
--scenes <n>
5
3–12
--audio-only
false跳过渲染,仅输出WAV音频
--cloud
false使用云端Remotion而非本地渲染
--output <path>
./explain-<ts>.mp4
输出路径

📊 slides — HTML presentation with TTS

📊 slides — 带TTS的HTML演示文稿

Generates a self-contained HTML deck with embedded base64 audio per slide. Open in any browser, no server needed.
bash
voxflow slides "AI in Healthcare"
voxflow slides "Q4 Revenue Report" --template report --theme paper
voxflow slides "React Tutorial" --template tutorial --model balanced
voxflow slides "Startup Pitch" --template pitch --theme ocean --no-audio
FlagDefaultNotes
--text <text>
(or positional)
requiredTopic
--template
free
product
|
report
|
tutorial
|
pitch
|
free
--theme
midnight
midnight
|
paper
|
ember
|
forest
|
ocean
--model
swift
swift
|
balanced
|
pro
|
creative
--voice <id>
v-female-R2s4N9qJ
--no-audio
falseSkip TTS, slides only
--output <path>
./slides-<ts>.html
10 layouts:
hero
,
title-bullets
,
two-column
,
three-cards
,
image-left
,
image-right
,
quote
,
timeline
,
stats
,
section
. Templates auto-pick a layout sequence.

生成包含base64嵌入音频的独立HTML演示文稿。可在任意浏览器打开,无需服务器支持。
bash
voxflow slides "AI in Healthcare"
voxflow slides "Q4 Revenue Report" --template report --theme paper
voxflow slides "React Tutorial" --template tutorial --model balanced
voxflow slides "Startup Pitch" --template pitch --theme ocean --no-audio
参数默认值说明
--text <text>
(或直接传入主题)
必填演示主题
--template
free
product
|
report
|
tutorial
|
pitch
|
free
--theme
midnight
midnight
|
paper
|
ember
|
forest
|
ocean
--model
swift
swift
|
balanced
|
pro
|
creative
--voice <id>
v-female-R2s4N9qJ
TTS语音ID
--no-audio
false跳过TTS,仅生成幻灯片
--output <path>
./slides-<ts>.html
输出路径
支持10种布局:
hero
title-bullets
two-column
three-cards
image-left
image-right
quote
timeline
stats
section
。模板会自动选择布局序列。

🖼 image — single Hunyuan illustration

🖼 image — 单张混元插图

Synchronous text → image (PNG). Useful for thumbnails or one-off art.
bash
voxflow image "a sleeping cat in a sunlit window" --resolution 1024:1024 -o cat.png
Resolutions:
768:768
,
768:1024
,
1024:768
,
1024:1024
,
720:1280
,
1280:720
,
768:1280
,
1280:768
,
1080:1920
,
1920:1080
.
Prompt max 1000 chars. Output: local PNG + COS URL.

同步文本转图像(PNG格式)。适用于缩略图或一次性创作。
bash
voxflow image "a sleeping cat in a sunlit window" --resolution 1024:1024 -o cat.png
支持分辨率:
768:768
768:1024
1024:768
1024:1024
720:1280
1280:720
768:1280
1280:768
1080:1920
1920:1080
提示词最大1000字符。输出:本地PNG文件 + COS链接。

Pick-the-right-tool checklist

工具选择指南

"小红书风格知识卡片"           → picstory
"AI 短视频 + caption"          → picstory --style sketchnote
"explainer / What is X?"        → explain
"branded short with my text"    → present
"already have cards/script"     → present --cards
"shareable HTML deck w/ audio"  → slides
"single illustration"           → image
"小红书风格知识卡片"           → picstory
"AI 短视频 + 字幕"          → picstory --style sketchnote
"讲解视频 / What is X?"        → explain
"自定义文本的品牌短视频"    → present
"已有卡片/脚本"     → present --cards
"可分享的带音频HTML演示文稿"  → slides
"单张插图"           → image

Rules

使用规则

  1. Search voices with
    voxflow voices
    before passing
    --voice
    . Never guess IDs.
  2. Check quota before video calls (
    voxflow status
    ): picstory ≈ 3K, present/explain ≈ 500–2K.
  3. Test cheap first:
    picstory --scenes 2 --image-only
    validates the script before paying for full render.
  4. After render finishes, auto-play:
    open output.mp4
    (macOS).
  1. 搜索语音:使用
    voxflow voices
    命令获取语音ID后再传入
    --voice
    参数,请勿猜测ID。
  2. 检查配额:生成视频前使用
    voxflow status
    查看配额:picstory≈3K,present/explain≈500–2K。
  3. 低成本测试:使用
    picstory --scenes 2 --image-only
    先验证脚本内容,再进行完整渲染。
  4. 渲染完成后自动播放:执行
    open output.mp4
    (macOS系统)。