video
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVoxFlow Video Skill
VoxFlow Video Skill
Generate short-form videos with AI: LLM writes the script, AI draws cards or scenes, TTS narrates, FFmpeg / Remotion renders the final MP4.
For PaperSlide / paper-textured article-to-card reels, switch to .
voxflow:paper-slideFive entry points — pick by what the user wants:
| Command | Output | Use when |
|---|---|---|
| Vertical/landscape MP4 with hand-drawn cards or cinematic scenes | "知识卡片视频", 小红书 / Twitter edu, sketchnote tutorials |
| 1080×1920 narrated card video, 5 visual schemes | Pitch decks, explainer reels, branded short-form |
| MP4 explainer with title / bullets / summary scenes | "What is X?" tutorials, course intros |
| Self-contained HTML deck with embedded TTS audio | Product launches, talks, share-as-link |
| Single PNG | One-off illustrations / thumbnails (Hunyuan TextToImage) |
借助AI生成短视频:由LLM撰写脚本,AI绘制卡片或场景,TTS生成旁白,FFmpeg / Remotion渲染最终MP4文件。
若需生成PaperSlide / 纸质纹理风格的文章转卡片视频,请切换至。
voxflow:paper-slide五个入口选项——根据用户需求选择:
| 命令 | 输出内容 | 使用场景 |
|---|---|---|
| 包含手绘卡片或电影级场景的竖屏/横屏MP4 | "知识卡片视频"、小红书 / Twitter教育内容、手绘笔记教程 |
| 1080×1920带旁白的卡片视频,5种视觉方案 | 路演演示、讲解短视频、品牌短视频 |
| 包含标题/要点/总结场景的MP4讲解视频 | "什么是X?"类教程、课程引言 |
| 嵌入TTS音频的独立HTML演示文稿 | 产品发布、演讲、可链接分享 |
| 单张PNG图片 | 一次性插图/缩略图(基于Hunyuan TextToImage) |
Prerequisites
前置条件
- and
npm install -g voxflowvoxflow login - installed (
ffmpeg/brew install ffmpeg) — required for MP4 rendersudo apt install ffmpeg - For /
present(Remotion-backed): the local plugin install includesexplain. Ifremotion-cards/says "Remotion not ready", runpresentinside the bundlednpm installdirectory. Forremotion-cards/, you can skip local Remotion withexplain.--cloud
- 执行和
npm install -g voxflowvoxflow login - 安装(执行
ffmpeg/brew install ffmpeg)——MP4渲染必需sudo apt install ffmpeg - 对于/
present(基于Remotion):本地插件安装包含explain目录。若remotion-cards/提示"Remotion未就绪",在捆绑的present目录内执行remotion-cards/。对于npm install,可添加explain参数跳过本地Remotion渲染。--cloud
🎴 picstory — knowledge-card video
🎴 picstory — 知识卡片视频
LLM writes a structured script, AI draws one card per scene, TTS narrates, FFmpeg assembles. Best for 小红书 / Twitter edu / TikTok.
由LLM撰写结构化脚本,AI为每个场景绘制一张卡片,TTS生成旁白,FFmpeg组装视频。最适合小红书 / Twitter教育内容 / TikTok平台。
Quick start
快速开始
bash
undefinedbash
undefinedDefault: Chinese, sketchnote style, portrait, 5 scenes
默认设置:中文、手绘笔记风格、竖屏、5个场景
voxflow picstory --topic "AI Agent 入门指南"
voxflow picstory --topic "AI Agent 入门指南"
2-scene quick test (no full video render)
2场景快速测试(不渲染完整视频)
voxflow picstory --topic "AI 入门" --scenes 2 --image-only
voxflow picstory --topic "AI 入门" --scenes 2 --image-only
English landscape video
英文横屏视频
voxflow picstory --topic "How React Hooks Work" --language en --ratio landscape --style photo
Output: `picstory-<timestamp>.mp4` + `.json` (script).voxflow picstory --topic "How React Hooks Work" --language en --ratio landscape --style photo
输出:`picstory-<timestamp>.mp4` + `.json`(脚本文件)。Card-type styles (structured heading + key points)
卡片类风格(结构化标题+要点)
| Look | Best for |
|---|---|---|
| Colorful hand-drawn bullet journal | Knowledge sharing, tutorials |
| Cyberpunk dark with neon glow | Tech, startup |
| Soft 3D clay on pastel gradients | 小红书 lifestyle |
| White chalk on dark green | Science, academic |
| 视觉效果 | 最佳适用场景 |
|---|---|---|
| 彩色手绘子弹笔记风格 | 知识分享、教程 |
| 赛博朋克暗黑霓虹风格 | 科技、创业主题 |
| 柔和3D黏土搭配淡彩渐变 | 小红书生活方式内容 |
| 深绿背景白色粉笔风格 | 科学、学术内容 |
Scene-type styles (free-form illustration)
场景类风格(自由形式插图)
| Look | Best for |
|---|---|---|
| Cinematic / photo-real | Storytelling, travel |
| Japanese manga ink linework | Drama, step-by-step guides |
| 1940s broadsheet | History, factual stories |
| 视觉效果 | 最佳适用场景 |
|---|---|---|
| 电影级/写实风格 | 故事讲述、旅行内容 |
| 日本漫画墨线风格 | 剧情类、分步指南 |
| 1940年代报纸风格 | 历史、纪实内容 |
Ratios
画幅比例
| Pixels | Platform |
|---|---|---|
| 1080×1920 | 小红书, TikTok, Reels, 抖音 |
| 1920×1080 | YouTube, B站 |
| 1080×1080 | Instagram, Twitter |
| 像素尺寸 | 适用平台 |
|---|---|---|
| 1080×1920 | 小红书、TikTok、Reels、抖音 |
| 1920×1080 | YouTube、B站 |
| 1080×1080 | Instagram、Twitter |
Script-model presets (--script-model
)
--script-model脚本模型预设(--script-model
)
--script-model| Preset | Provider | Strength |
|---|---|---|
| omitted | server config | Balanced default (gpt-4o-mini) |
| OpenRouter | Multilingual, good Chinese |
| DeepSeek | Cheapest, excellent Chinese |
| 腾讯混元 | Chinese-native |
| Moonshot | Chinese long context |
Server enforces an allowlist — only the preset names above work; arbitrary model IDs are rejected.
| 预设名称 | 提供商 | 优势 |
|---|---|---|
| 省略 | 服务器配置 | 平衡型默认设置(gpt-4o-mini) |
| OpenRouter | 多语言支持,中文表现优秀 |
| DeepSeek | 成本最低,中文表现极佳 |
| 腾讯混元 | 原生中文支持 |
| Moonshot | 中文长上下文处理 |
服务器启用白名单机制——仅支持上述预设名称,自定义模型ID会被拒绝。
Image quality (--quality
)
--quality图像质量(--quality
)
--qualityImage generation is the only meaningful cost (~$0.005-0.08 per image). LLM script (~2K tokens) is negligible.
| Provider | Strength | 5-image cost |
|---|---|---|---|
| OpenRouter Gemini Flash | Cheapest, balanced | ~$0.025 |
| OpenRouter Gemini Pro | Higher detail | mid |
| OpenRouter gpt-5.4-image-2 | Best overall, ~16× cost | ~$0.40 |
| Aiberm Gemini Flash | Cheap Aiberm tier | low |
| Aiberm Gemini Pro | Strongest Chinese text rendering — best for 小红书 cards with Chinese headers | mid |
Use for iteration; when cards must contain accurate Chinese characters; for hero exports.
fasthd-aibermultra图像生成是唯一产生显著成本的环节(每张图片约0.005-0.08美元)。LLM脚本(约2K tokens)成本可忽略不计。
| 提供商 | 优势 | 5张图片成本 |
|---|---|---|---|
| OpenRouter Gemini Flash | 成本最低,效果平衡 | ~$0.025 |
| OpenRouter Gemini Pro | 更高细节 | 中等 |
| OpenRouter gpt-5.4-image-2 | 整体效果最佳,成本约为fast的16倍 | ~$0.40 |
| Aiberm Gemini Flash | 低成本Aiberm tier | 低 |
| Aiberm Gemini Pro | 中文文本渲染效果最优——最适合带中文标题的小红书卡片 | 中等 |
迭代阶段使用;需要精准中文文字的卡片使用;最终成品导出使用。
fasthd-aibermultraFull options
完整参数选项
| Flag | Default | Description |
|---|---|---|
| required (or | Story topic |
| — | Paste full article instead of a topic |
| | See styles above |
| | |
| | |
| | 2–10. Use |
| server default | See presets above |
| | See table above |
| default | TTS voice from |
| | TTS speed 0.5–2.0 |
| — | Background music (mp3/wav) mixed under narration |
| | BGM volume 0–1 |
| | Per-scene fade in/out seconds ( |
| false | Save images + audio without final video render |
| — | Directory for all outputs |
| auto | Final MP4 path |
| 参数 | 默认值 | 描述 |
|---|---|---|
| 必填(或使用 | 故事主题 |
| — | 粘贴完整文章而非主题 |
| | 参考上述风格列表 |
| | |
| | |
| | 2–10。快速测试建议设为2 |
| 服务器默认 | 参考上述预设列表 |
| | 参考上述质量表格 |
| 默认 | 从 |
| | TTS语速0.5–2.0 |
| — | 背景音乐(mp3/wav格式),混合在旁白下方 |
| | 背景音乐音量0–1 |
| | 每个场景的淡入淡出时长(设为0可禁用) |
| false | 仅保存图片和音频,不渲染最终视频 |
| — | 所有输出文件的存储目录 |
| 自动生成 | 最终MP4文件路径 |
Quota cost
配额消耗
| Operation | Quota |
|---|---|
| LLM script | 100 |
| TTS / scene | 100 |
| Image / scene | 500 |
| 2-scene test | ~1,300 |
| 5-scene full | ~3,100 |
Free tier (10K/month) ≈ 3 full picstory videos.
| 操作 | 配额 |
|---|---|
| LLM脚本生成 | 100 |
| 单场景TTS | 100 |
| 单场景图像生成 | 500 |
| 2场景测试 | ~1,300 |
| 5场景完整视频 | ~3,100 |
免费额度(每月10K)≈ 3个完整picstory视频。
Pipeline
处理流程
Topic / Text
├─[1] LLM script → { title, scenes: [{ heading, keyPoints, narration }] }
├─[2] TTS per scene → PCM audio
├─[3] Image per scene (parallel) → JPEG via /api/image/generate
└─[4] FFmpeg: image + WAV → Ken Burns zoompan MP4 → concat → +BGM → final.mp4主题 / 文本
├─[1] LLM生成脚本 → { title, scenes: [{ heading, keyPoints, narration }] }
├─[2] 为每个场景生成TTS → PCM音频
├─[3] 为每个场景生成图像(并行处理)→ 通过/api/image/generate接口获取JPEG
└─[4] FFmpeg处理:图片 + WAV → 应用Ken Burns缩放平移效果生成MP4 → 拼接视频 → 添加背景音乐 → final.mp4Troubleshooting
故障排查
| Problem | Fix |
|---|---|
| |
| Retry; if stuck on |
| Render too slow | Ken Burns is CPU-heavy. Use |
| Use a |
| Chinese text mangled in cards | Switch to |
| 问题 | 解决方案 |
|---|---|
| 执行 |
| 重试;若 |
| 渲染速度过慢 | Ken Burns效果对CPU要求高。先使用 |
| 使用 |
| 卡片中中文文本显示异常 | 切换为 |
📑 present — narrated card video (Remotion local)
📑 present — 带旁白的卡片视频(本地Remotion渲染)
Text or URL → LLM cards → TTS → Remotion render. 1080×1920, 5 visual schemes.
bash
voxflow present --text "Claude Code 是一个 AI 编程工具" --style aurora
voxflow present --url https://example.com/article --style noir
voxflow present --text "2025 AI 芯片格局" --web-search --style neon
voxflow present --cards pre-generated.json --no-audio| Flag | Default | Notes |
|---|---|---|
| one required | input source |
| | |
| | TTS voice |
| | 0.5–2.0 |
| false | Silent video only |
| false | Augment LLM with up-to-date web facts |
| | |
Ifreports "Remotion not ready", runpresentinside the bundlednpm installdirectory to set up the local renderer.remotion-cards/
文本或URL → LLM生成卡片 → TTS生成旁白 → Remotion渲染。分辨率1080×1920,5种视觉方案。
bash
voxflow present --text "Claude Code 是一个 AI 编程工具" --style aurora
voxflow present --url https://example.com/article --style noir
voxflow present --text "2025 AI 芯片格局" --web-search --style neon
voxflow present --cards pre-generated.json --no-audio| 参数 | 默认值 | 说明 |
|---|---|---|
| 必填其一 | 输入来源 |
| | |
| | TTS语音ID |
| | 语速0.5–2.0 |
| false | 仅生成无声视频 |
| false | 结合最新网络信息增强LLM内容 |
| | 支持 |
若提示"Remotion未就绪",在捆绑的present目录内执行remotion-cards/以配置本地渲染器。npm install
🧠 explain — AI explainer video
🧠 explain — AI讲解视频
Title / bullets / summary scene flow. Best for "What is X?" tutorials.
bash
voxflow explain --topic "What is React?"
voxflow explain --topic demo --output demo.mp4 # built-in demo (no API call)
voxflow explain --topic "区块链入门" --style chalkboard --voice v-male-Bk7vD3xP
voxflow explain --topic "Machine Learning" --audio-only
voxflow explain --topic "AI Agent 入门" --cloud # render on server| Flag | Default | Notes |
|---|---|---|
| required | Use |
| | |
| | |
| | |
| | 3–12 |
| false | Skip render, output WAV only |
| false | Use cloud Remotion instead of local |
| |
标题/要点/总结场景流程。最适合"什么是X?"类教程。
bash
voxflow explain --topic "What is React?"
voxflow explain --topic demo --output demo.mp4 # 内置演示(无需API调用)
voxflow explain --topic "区块链入门" --style chalkboard --voice v-male-Bk7vD3xP
voxflow explain --topic "Machine Learning" --audio-only
voxflow explain --topic "AI Agent 入门" --cloud # 服务器端渲染| 参数 | 默认值 | 说明 |
|---|---|---|
| 必填 | 使用 |
| | |
| | |
| | TTS语音ID |
| | 3–12 |
| false | 跳过渲染,仅输出WAV音频 |
| false | 使用云端Remotion而非本地渲染 |
| | 输出路径 |
📊 slides — HTML presentation with TTS
📊 slides — 带TTS的HTML演示文稿
Generates a self-contained HTML deck with embedded base64 audio per slide. Open in any browser, no server needed.
bash
voxflow slides "AI in Healthcare"
voxflow slides "Q4 Revenue Report" --template report --theme paper
voxflow slides "React Tutorial" --template tutorial --model balanced
voxflow slides "Startup Pitch" --template pitch --theme ocean --no-audio| Flag | Default | Notes |
|---|---|---|
| required | Topic |
| | |
| | |
| | |
| | |
| false | Skip TTS, slides only |
| |
10 layouts: , , , , , , , , , . Templates auto-pick a layout sequence.
herotitle-bulletstwo-columnthree-cardsimage-leftimage-rightquotetimelinestatssection生成包含base64嵌入音频的独立HTML演示文稿。可在任意浏览器打开,无需服务器支持。
bash
voxflow slides "AI in Healthcare"
voxflow slides "Q4 Revenue Report" --template report --theme paper
voxflow slides "React Tutorial" --template tutorial --model balanced
voxflow slides "Startup Pitch" --template pitch --theme ocean --no-audio| 参数 | 默认值 | 说明 |
|---|---|---|
| 必填 | 演示主题 |
| | |
| | |
| | |
| | TTS语音ID |
| false | 跳过TTS,仅生成幻灯片 |
| | 输出路径 |
支持10种布局:、、、、、、、、、。模板会自动选择布局序列。
herotitle-bulletstwo-columnthree-cardsimage-leftimage-rightquotetimelinestatssection🖼 image — single Hunyuan illustration
🖼 image — 单张混元插图
Synchronous text → image (PNG). Useful for thumbnails or one-off art.
bash
voxflow image "a sleeping cat in a sunlit window" --resolution 1024:1024 -o cat.pngResolutions: , , , , , , , , , .
768:768768:10241024:7681024:1024720:12801280:720768:12801280:7681080:19201920:1080Prompt max 1000 chars. Output: local PNG + COS URL.
同步文本转图像(PNG格式)。适用于缩略图或一次性创作。
bash
voxflow image "a sleeping cat in a sunlit window" --resolution 1024:1024 -o cat.png支持分辨率:、、、、、、、、、。
768:768768:10241024:7681024:1024720:12801280:720768:12801280:7681080:19201920:1080提示词最大1000字符。输出:本地PNG文件 + COS链接。
Pick-the-right-tool checklist
工具选择指南
"小红书风格知识卡片" → picstory
"AI 短视频 + caption" → picstory --style sketchnote
"explainer / What is X?" → explain
"branded short with my text" → present
"already have cards/script" → present --cards
"shareable HTML deck w/ audio" → slides
"single illustration" → image"小红书风格知识卡片" → picstory
"AI 短视频 + 字幕" → picstory --style sketchnote
"讲解视频 / What is X?" → explain
"自定义文本的品牌短视频" → present
"已有卡片/脚本" → present --cards
"可分享的带音频HTML演示文稿" → slides
"单张插图" → imageRules
使用规则
- Search voices with before passing
voxflow voices. Never guess IDs.--voice - Check quota before video calls (): picstory ≈ 3K, present/explain ≈ 500–2K.
voxflow status - Test cheap first: validates the script before paying for full render.
picstory --scenes 2 --image-only - After render finishes, auto-play: (macOS).
open output.mp4
- 搜索语音:使用命令获取语音ID后再传入
voxflow voices参数,请勿猜测ID。--voice - 检查配额:生成视频前使用查看配额:picstory≈3K,present/explain≈500–2K。
voxflow status - 低成本测试:使用先验证脚本内容,再进行完整渲染。
picstory --scenes 2 --image-only - 渲染完成后自动播放:执行(macOS系统)。
open output.mp4