video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

VoxFlow Video Skill

Generate short-form videos with AI: LLM writes the script, AI draws cards or scenes, TTS narrates, FFmpeg / Remotion renders the final MP4.

For PaperSlide / paper-textured article-to-card reels, switch to

voxflow:paper-slide

Five entry points — pick by what the user wants:

Command	Output	Use when
`picstory`	Vertical/landscape MP4 with hand-drawn cards or cinematic scenes	"知识卡片视频", 小红书 / Twitter edu, sketchnote tutorials
`present`	1080×1920 narrated card video, 5 visual schemes	Pitch decks, explainer reels, branded short-form
`explain`	MP4 explainer with title / bullets / summary scenes	"What is X?" tutorials, course intros
`slides`	Self-contained HTML deck with embedded TTS audio	Product launches, talks, share-as-link
`image`	Single PNG	One-off illustrations / thumbnails (Hunyuan TextToImage)

借助AI生成短视频：由LLM撰写脚本，AI绘制卡片或场景，TTS生成旁白，FFmpeg / Remotion渲染最终MP4文件。

若需生成PaperSlide / 纸质纹理风格的文章转卡片视频，请切换至

voxflow:paper-slide

。

五个入口选项——根据用户需求选择：

命令	输出内容	使用场景
`picstory`	包含手绘卡片或电影级场景的竖屏/横屏MP4	"知识卡片视频"、小红书 / Twitter教育内容、手绘笔记教程
`present`	1080×1920带旁白的卡片视频，5种视觉方案	路演演示、讲解短视频、品牌短视频
`explain`	包含标题/要点/总结场景的MP4讲解视频	"什么是X？"类教程、课程引言
`slides`	嵌入TTS音频的独立HTML演示文稿	产品发布、演讲、可链接分享
`image`	单张PNG图片	一次性插图/缩略图（基于Hunyuan TextToImage）

Prerequisites

前置条件

```
npm install -g voxflow
```
and
```
voxflow login
```

ffmpeg

installed (

brew install ffmpeg

sudo apt install ffmpeg

) — required for MP4 render

For
```
present
```
/
```
explain
```
(Remotion-backed): the local plugin install includes
```
remotion-cards/
```
. If
```
present
```
says "Remotion not ready", run
```
npm install
```
inside the bundled
```
remotion-cards/
```
directory. For
```
explain
```
, you can skip local Remotion with
```
--cloud
```
.

执行
```
npm install -g voxflow
```
和
```
voxflow login
```

安装

ffmpeg

（执行

brew install ffmpeg

sudo apt install ffmpeg

）——MP4渲染必需

对于
```
present
```
/
```
explain
```
（基于Remotion）：本地插件安装包含
```
remotion-cards/
```
目录。若
```
present
```
提示"Remotion未就绪"，在捆绑的
```
remotion-cards/
```
目录内执行
```
npm install
```
。对于
```
explain
```
，可添加
```
--cloud
```
参数跳过本地Remotion渲染。

🎴 picstory — knowledge-card video

🎴 picstory — 知识卡片视频

LLM writes a structured script, AI draws one card per scene, TTS narrates, FFmpeg assembles. Best for 小红书 / Twitter edu / TikTok.

由LLM撰写结构化脚本，AI为每个场景绘制一张卡片，TTS生成旁白，FFmpeg组装视频。最适合小红书 / Twitter教育内容 / TikTok平台。

Quick start

快速开始

bash

undefined

bash

undefined

Default: Chinese, sketchnote style, portrait, 5 scenes

默认设置：中文、手绘笔记风格、竖屏、5个场景

voxflow picstory --topic "AI Agent 入门指南"

2-scene quick test (no full video render)

2场景快速测试（不渲染完整视频）

voxflow picstory --topic "AI 入门" --scenes 2 --image-only

English landscape video

英文横屏视频

voxflow picstory --topic "How React Hooks Work" --language en --ratio landscape --style photo


Output: `picstory-<timestamp>.mp4` + `.json` (script).

voxflow picstory --topic "How React Hooks Work" --language en --ratio landscape --style photo


输出：`picstory-<timestamp>.mp4` + `.json`（脚本文件）。

Card-type styles (structured heading + key points)

卡片类风格（结构化标题+要点）

`--style`	Look	Best for
`sketchnote` (default)	Colorful hand-drawn bullet journal	Knowledge sharing, tutorials
`neon_noir`	Cyberpunk dark with neon glow	Tech, startup
`minimal_3d`	Soft 3D clay on pastel gradients	小红书 lifestyle
`chalkboard`	White chalk on dark green	Science, academic

`--style` 参数	视觉效果	最佳适用场景
`sketchnote` （默认）	彩色手绘子弹笔记风格	知识分享、教程
`neon_noir`	赛博朋克暗黑霓虹风格	科技、创业主题
`minimal_3d`	柔和3D黏土搭配淡彩渐变	小红书生活方式内容
`chalkboard`	深绿背景白色粉笔风格	科学、学术内容

Scene-type styles (free-form illustration)

场景类风格（自由形式插图）

`--style`	Look	Best for
`photo`	Cinematic / photo-real	Storytelling, travel
`manga_panel`	Japanese manga ink linework	Drama, step-by-step guides
`vintage_newspaper`	1940s broadsheet	History, factual stories

`--style` 参数	视觉效果	最佳适用场景
`photo`	电影级/写实风格	故事讲述、旅行内容
`manga_panel`	日本漫画墨线风格	剧情类、分步指南
`vintage_newspaper`	1940年代报纸风格	历史、纪实内容

Ratios

画幅比例

`--ratio`	Pixels	Platform
`portrait` (default)	1080×1920	小红书, TikTok, Reels, 抖音
`landscape`	1920×1080	YouTube, B站
`square`	1080×1080	Instagram, Twitter

`--ratio` 参数	像素尺寸	适用平台
`portrait` （默认）	1080×1920	小红书、TikTok、Reels、抖音
`landscape`	1920×1080	YouTube、B站
`square`	1080×1080	Instagram、Twitter

Script-model presets (

--script-model

)

脚本模型预设（

--script-model

）

Preset	Provider	Strength
omitted	server config	Balanced default (gpt-4o-mini)
`gemini-flash`	OpenRouter	Multilingual, good Chinese
`deepseek`	DeepSeek	Cheapest, excellent Chinese
`hunyuan`	腾讯混元	Chinese-native
`moonshot`	Moonshot	Chinese long context

Server enforces an allowlist — only the preset names above work; arbitrary model IDs are rejected.

预设名称	提供商	优势
省略	服务器配置	平衡型默认设置（gpt-4o-mini）
`gemini-flash`	OpenRouter	多语言支持，中文表现优秀
`deepseek`	DeepSeek	成本最低，中文表现极佳
`hunyuan`	腾讯混元	原生中文支持
`moonshot`	Moonshot	中文长上下文处理

服务器启用白名单机制——仅支持上述预设名称，自定义模型ID会被拒绝。

Image quality (

--quality

)

图像质量（

--quality

）

Image generation is the only meaningful cost (~$0.005-0.08 per image). LLM script (~2K tokens) is negligible.

`--quality`	Provider	Strength	5-image cost
`fast` (default)	OpenRouter Gemini Flash	Cheapest, balanced	~$0.025
`hd`	OpenRouter Gemini Pro	Higher detail	mid
`ultra`	OpenRouter gpt-5.4-image-2	Best overall, ~16× cost	~$0.40
`fast-aiberm`	Aiberm Gemini Flash	Cheap Aiberm tier	low
`hd-aiberm`	Aiberm Gemini Pro	Strongest Chinese text rendering — best for 小红书 cards with Chinese headers	mid

Use

fast

for iteration;

hd-aiberm

when cards must contain accurate Chinese characters;

ultra

for hero exports.

图像生成是唯一产生显著成本的环节（每张图片约0.005-0.08美元）。LLM脚本（约2K tokens）成本可忽略不计。

`--quality` 参数	提供商	优势	5张图片成本
`fast` （默认）	OpenRouter Gemini Flash	成本最低，效果平衡	~$0.025
`hd`	OpenRouter Gemini Pro	更高细节	中等
`ultra`	OpenRouter gpt-5.4-image-2	整体效果最佳，成本约为fast的16倍	~$0.40
`fast-aiberm`	Aiberm Gemini Flash	低成本Aiberm tier	低
`hd-aiberm`	Aiberm Gemini Pro	中文文本渲染效果最优——最适合带中文标题的小红书卡片	中等

迭代阶段使用

fast

；需要精准中文文字的卡片使用

hd-aiberm

；最终成品导出使用

ultra

。

Full options

完整参数选项

Flag	Default	Description
`--topic <text>`	required (or `--text` )	Story topic
`--text <content>`	—	Paste full article instead of a topic
`--style <name>`	`sketchnote`	See styles above
`--ratio <name>`	`portrait`	`portrait` \| `landscape` \| `square`
`--language <code>`	`zh`	`zh` \| `en` \| `ja` \| `ko` \| ...
`--scenes <n>`	`5`	2–10. Use `2` for quick tests.
`--script-model <preset>`	server default	See presets above
`--quality <tier>`	`fast`	See table above
`--voice <id>`	default	TTS voice from `voxflow voices`
`--speed <n>`	`1.0`	TTS speed 0.5–2.0
`--bgm <file>`	—	Background music (mp3/wav) mixed under narration
`--bgm-volume <n>`	`0.1`	BGM volume 0–1
`--fade <n>`	`0.4`	Per-scene fade in/out seconds ( `0` to disable)
`--image-only`	false	Save images + audio without final video render
`--output-dir <dir>`	—	Directory for all outputs
`--output <path>`	auto	Final MP4 path

参数	默认值	描述
`--topic <text>`	必填（或使用 `--text` ）	故事主题
`--text <content>`	—	粘贴完整文章而非主题
`--style <name>`	`sketchnote`	参考上述风格列表
`--ratio <name>`	`portrait`	`portrait` \| `landscape` \| `square`
`--language <code>`	`zh`	`zh` \| `en` \| `ja` \| `ko` \| ...
`--scenes <n>`	`5`	2–10。快速测试建议设为2
`--script-model <preset>`	服务器默认	参考上述预设列表
`--quality <tier>`	`fast`	参考上述质量表格
`--voice <id>`	默认	从 `voxflow voices` 获取TTS语音ID
`--speed <n>`	`1.0`	TTS语速0.5–2.0
`--bgm <file>`	—	背景音乐（mp3/wav格式），混合在旁白下方
`--bgm-volume <n>`	`0.1`	背景音乐音量0–1
`--fade <n>`	`0.4`	每个场景的淡入淡出时长（设为0可禁用）
`--image-only`	false	仅保存图片和音频，不渲染最终视频
`--output-dir <dir>`	—	所有输出文件的存储目录
`--output <path>`	自动生成	最终MP4文件路径

Quota cost

配额消耗

Operation	Quota
LLM script	100
TTS / scene	100
Image / scene	500
2-scene test	~1,300
5-scene full	~3,100

Free tier (10K/month) ≈ 3 full picstory videos.

操作	配额
LLM脚本生成	100
单场景TTS	100
单场景图像生成	500
2场景测试	~1,300
5场景完整视频	~3,100

免费额度（每月10K）≈ 3个完整picstory视频。

Pipeline

处理流程

Topic / Text
  ├─[1] LLM script → { title, scenes: [{ heading, keyPoints, narration }] }
  ├─[2] TTS per scene → PCM audio
  ├─[3] Image per scene (parallel) → JPEG via /api/image/generate
  └─[4] FFmpeg: image + WAV → Ken Burns zoompan MP4 → concat → +BGM → final.mp4

主题 / 文本
  ├─[1] LLM生成脚本 → { title, scenes: [{ heading, keyPoints, narration }] }
  ├─[2] 为每个场景生成TTS → PCM音频
  ├─[3] 为每个场景生成图像（并行处理）→ 通过/api/image/generate接口获取JPEG
  └─[4] FFmpeg处理：图片 + WAV → 应用Ken Burns缩放平移效果生成MP4 → 拼接视频 → 添加背景音乐 → final.mp4

Troubleshooting

故障排查

Problem	Fix
`ffmpeg not found`	`brew install ffmpeg` (mac) / `sudo apt install ffmpeg`
`Image generation failed`	Retry; if stuck on `hd` , drop to `--quality fast`
Render too slow	Ken Burns is CPU-heavy. Use `--scenes 2 --image-only` to validate content first.
`model_not_allowed`	Use a `--script-model` preset name, not a raw model ID
Chinese text mangled in cards	Switch to `--quality hd-aiberm`

问题	解决方案
`ffmpeg not found`	执行 `brew install ffmpeg` （Mac）/ `sudo apt install ffmpeg` （Linux）
`Image generation failed`	重试；若 `hd` 质量失败，切换为 `--quality fast`
渲染速度过慢	Ken Burns效果对CPU要求高。先使用 `--scenes 2 --image-only` 验证内容
`model_not_allowed`	使用 `--script-model` 预设名称，而非原始模型ID
卡片中中文文本显示异常	切换为 `--quality hd-aiberm`

📑 present — narrated card video (Remotion local)

📑 present — 带旁白的卡片视频（本地Remotion渲染）

Text or URL → LLM cards → TTS → Remotion render. 1080×1920, 5 visual schemes.

bash

voxflow present --text "Claude Code 是一个 AI 编程工具" --style aurora
voxflow present --url https://example.com/article --style noir
voxflow present --text "2025 AI 芯片格局" --web-search --style neon
voxflow present --cards pre-generated.json --no-audio

Flag	Default	Notes
`--text` / `--url` / `--cards`	one required	input source
`--style`	`aurora`	`noir` \| `neon` \| `editorial` \| `aurora` \| `brutalist`
`--voice <id>`	`v-female-R2s4N9qJ`	TTS voice
`--speed <n>`	`1.0`	0.5–2.0
`--no-audio`	false	Silent video only
`--web-search`	false	Augment LLM with up-to-date web facts
`--output <path>`	`./present-<ts>.mp4`	`.mp4` or `.wav`

If
present
reports "Remotion not ready", run
npm install
inside the bundled
remotion-cards/
directory to set up the local renderer.

文本或URL → LLM生成卡片 → TTS生成旁白 → Remotion渲染。分辨率1080×1920，5种视觉方案。

bash

voxflow present --text "Claude Code 是一个 AI 编程工具" --style aurora
voxflow present --url https://example.com/article --style noir
voxflow present --text "2025 AI 芯片格局" --web-search --style neon
voxflow present --cards pre-generated.json --no-audio

参数	默认值	说明
`--text` / `--url` / `--cards`	必填其一	输入来源
`--style`	`aurora`	`noir` \| `neon` \| `editorial` \| `aurora` \| `brutalist`
`--voice <id>`	`v-female-R2s4N9qJ`	TTS语音ID
`--speed <n>`	`1.0`	语速0.5–2.0
`--no-audio`	false	仅生成无声视频
`--web-search`	false	结合最新网络信息增强LLM内容
`--output <path>`	`./present-<ts>.mp4`	支持 `.mp4` 或 `.wav` 格式

若
present
提示"Remotion未就绪"，在捆绑的
remotion-cards/
目录内执行
npm install
以配置本地渲染器。

🧠 explain — AI explainer video

🧠 explain — AI讲解视频

Title / bullets / summary scene flow. Best for "What is X?" tutorials.

bash

voxflow explain --topic "What is React?"
voxflow explain --topic demo --output demo.mp4         # built-in demo (no API call)
voxflow explain --topic "区块链入门" --style chalkboard --voice v-male-Bk7vD3xP
voxflow explain --topic "Machine Learning" --audio-only
voxflow explain --topic "AI Agent 入门" --cloud         # render on server

Flag	Default	Notes
`--topic <text>`	required	Use `demo` for built-in demo script
`--style`	`modern`	`modern` \| `playful` \| `corporate` \| `chalkboard`
`--language`	`en`	`en` \| `zh` \| `ja` \| `ko` \| ...
`--voice <id>`	`v-female-R2s4N9qJ`
`--scenes <n>`	`5`	3–12
`--audio-only`	false	Skip render, output WAV only
`--cloud`	false	Use cloud Remotion instead of local
`--output <path>`	`./explain-<ts>.mp4`

标题/要点/总结场景流程。最适合"什么是X？"类教程。

bash

voxflow explain --topic "What is React?"
voxflow explain --topic demo --output demo.mp4         # 内置演示（无需API调用）
voxflow explain --topic "区块链入门" --style chalkboard --voice v-male-Bk7vD3xP
voxflow explain --topic "Machine Learning" --audio-only
voxflow explain --topic "AI Agent 入门" --cloud         # 服务器端渲染

参数	默认值	说明
`--topic <text>`	必填	使用 `demo` 调用内置演示脚本
`--style`	`modern`	`modern` \| `playful` \| `corporate` \| `chalkboard`
`--language`	`en`	`en` \| `zh` \| `ja` \| `ko` \| ...
`--voice <id>`	`v-female-R2s4N9qJ`	TTS语音ID
`--scenes <n>`	`5`	3–12
`--audio-only`	false	跳过渲染，仅输出WAV音频
`--cloud`	false	使用云端Remotion而非本地渲染
`--output <path>`	`./explain-<ts>.mp4`	输出路径

📊 slides — HTML presentation with TTS

📊 slides — 带TTS的HTML演示文稿

Generates a self-contained HTML deck with embedded base64 audio per slide. Open in any browser, no server needed.

bash

voxflow slides "AI in Healthcare"
voxflow slides "Q4 Revenue Report" --template report --theme paper
voxflow slides "React Tutorial" --template tutorial --model balanced
voxflow slides "Startup Pitch" --template pitch --theme ocean --no-audio

Flag	Default	Notes
`--text <text>` (or positional)	required	Topic
`--template`	`free`	`product` \| `report` \| `tutorial` \| `pitch` \| `free`
`--theme`	`midnight`	`midnight` \| `paper` \| `ember` \| `forest` \| `ocean`
`--model`	`swift`	`swift` \| `balanced` \| `pro` \| `creative`
`--voice <id>`	`v-female-R2s4N9qJ`
`--no-audio`	false	Skip TTS, slides only
`--output <path>`	`./slides-<ts>.html`

10 layouts:

hero

title-bullets

two-column

three-cards

image-left

image-right

quote

timeline

stats

section

. Templates auto-pick a layout sequence.

生成包含base64嵌入音频的独立HTML演示文稿。可在任意浏览器打开，无需服务器支持。

bash

voxflow slides "AI in Healthcare"
voxflow slides "Q4 Revenue Report" --template report --theme paper
voxflow slides "React Tutorial" --template tutorial --model balanced
voxflow slides "Startup Pitch" --template pitch --theme ocean --no-audio

参数	默认值	说明
`--text <text>` （或直接传入主题）	必填	演示主题
`--template`	`free`	`product` \| `report` \| `tutorial` \| `pitch` \| `free`
`--theme`	`midnight`	`midnight` \| `paper` \| `ember` \| `forest` \| `ocean`
`--model`	`swift`	`swift` \| `balanced` \| `pro` \| `creative`
`--voice <id>`	`v-female-R2s4N9qJ`	TTS语音ID
`--no-audio`	false	跳过TTS，仅生成幻灯片
`--output <path>`	`./slides-<ts>.html`	输出路径

支持10种布局：

hero

、

title-bullets

、

two-column

、

three-cards

、

image-left

、

image-right

、

quote

、

timeline

、

stats

、

section

。模板会自动选择布局序列。

🖼 image — single Hunyuan illustration

🖼 image — 单张混元插图

Synchronous text → image (PNG). Useful for thumbnails or one-off art.

bash

voxflow image "a sleeping cat in a sunlit window" --resolution 1024:1024 -o cat.png

Resolutions:

768:768

768:1024

1024:768

1024:1024

720:1280

1280:720

768:1280

1280:768

1080:1920

1920:1080

Prompt max 1000 chars. Output: local PNG + COS URL.

同步文本转图像（PNG格式）。适用于缩略图或一次性创作。

bash

voxflow image "a sleeping cat in a sunlit window" --resolution 1024:1024 -o cat.png

支持分辨率：

768:768

、

768:1024

、

1024:768

、

1024:1024

、

720:1280

、

1280:720

、

768:1280

、

1280:768

、

1080:1920

、

1920:1080

。

提示词最大1000字符。输出：本地PNG文件 + COS链接。

Pick-the-right-tool checklist

工具选择指南

"小红书风格知识卡片"           → picstory
"AI 短视频 + caption"          → picstory --style sketchnote
"explainer / What is X?"        → explain
"branded short with my text"    → present
"already have cards/script"     → present --cards
"shareable HTML deck w/ audio"  → slides
"single illustration"           → image

"小红书风格知识卡片"           → picstory
"AI 短视频 + 字幕"          → picstory --style sketchnote
"讲解视频 / What is X?"        → explain
"自定义文本的品牌短视频"    → present
"已有卡片/脚本"     → present --cards
"可分享的带音频HTML演示文稿"  → slides
"单张插图"           → image

Rules

使用规则

Search voices with
```
voxflow voices
```
before passing
```
--voice
```
. Never guess IDs.
Check quota before video calls (
```
voxflow status
```
): picstory ≈ 3K, present/explain ≈ 500–2K.
Test cheap first:
```
picstory --scenes 2 --image-only
```
validates the script before paying for full render.
After render finishes, auto-play:
```
open output.mp4
```
(macOS).

搜索语音：使用
```
voxflow voices
```
命令获取语音ID后再传入
```
--voice
```
参数，请勿猜测ID。
检查配额：生成视频前使用
```
voxflow status
```
查看配额：picstory≈3K，present/explain≈500–2K。
低成本测试：使用
```
picstory --scenes 2 --image-only
```
先验证脚本内容，再进行完整渲染。
渲染完成后自动播放：执行
```
open output.mp4
```
（macOS系统）。

video

Original

Translation

VoxFlow Video Skill

VoxFlow Video Skill

Prerequisites

前置条件

🎴 picstory — knowledge-card video

🎴 picstory — 知识卡片视频

Quick start

快速开始

Default: Chinese, sketchnote style, portrait, 5 scenes

默认设置：中文、手绘笔记风格、竖屏、5个场景

2-scene quick test (no full video render)

2场景快速测试（不渲染完整视频）

English landscape video

英文横屏视频

Card-type styles (structured heading + key points)

卡片类风格（结构化标题+要点）

Scene-type styles (free-form illustration)

场景类风格（自由形式插图）

Ratios

画幅比例

Script-model presets (--script-model)

脚本模型预设（--script-model）

Image quality (--quality)

图像质量（--quality）

Full options

完整参数选项

Quota cost

配额消耗

Pipeline

处理流程

Troubleshooting

故障排查

📑 present — narrated card video (Remotion local)

📑 present — 带旁白的卡片视频（本地Remotion渲染）

🧠 explain — AI explainer video

🧠 explain — AI讲解视频

📊 slides — HTML presentation with TTS

📊 slides — 带TTS的HTML演示文稿

🖼 image — single Hunyuan illustration

🖼 image — 单张混元插图

Pick-the-right-tool checklist

工具选择指南

Rules

使用规则

Script-model presets (
`--script-model`
)

脚本模型预设（
`--script-model`
）

Image quality (
`--quality`
)

图像质量（
`--quality`
）