VoxFlow Skill
Use this skill whenever users ask for text-to-speech, voice synthesis, podcast creation, narrated stories, or voice/audio generation tasks.
VoxFlow provides 200+ voices in 40+ languages.
IMPORTANT: How to execute (read this first)
Follow this decision tree EVERY TIME:
Step 1: Try MCP tools
If VoxFlow MCP tools are available (voice_list, tts_synthesize, etc.), use them directly:
- — search voices
- — text to speech (returns base64, save to file then it)
- — transcribe audio
- — generate podcast dialogue
- — check remaining quota
Example: user says "给我讲个故事" → write a story → call
→ save mp3 →
it.
If MCP tools work, stop here. Do NOT install or use CLI.
Step 2: MCP not available → install CLI automatically
If MCP tools are not available, install the CLI yourself (do NOT ask the user to do it):
Then login:
This opens a browser. After the user logs in, the token is cached and all subsequent commands work.
Step 3: CLI not installable → guide user to add MCP
If
fails (permissions, no Node.js, etc.), tell the user:
Run this in your terminal, then restart the session:
claude mcp add voxflow https://api.voxflow.studio/api/mcp
Using CLI (when MCP is not available)
If MCP is not available but CLI is installed:
Quick Reference
| Command | What it does | Example |
|---|
| Text → speech audio | voxflow say "Hello" -o hello.mp3
|
| File/text → multi-segment TTS | voxflow narrate script.txt -o out.wav
|
| Topic → AI podcast episode | voxflow podcast "AI trends" --duration 3
|
| Topic → AI narrated story | voxflow story "太空冒险" -o story.wav
|
| Search voice library | voxflow voices --lang zh --gender female
|
| Audio → text transcript | |
| Check login & quota | |
Authentication
bash
# Login (opens browser for Google/email OTP)
voxflow login
# Check login status and remaining quota
voxflow status
# Logout
voxflow logout
- Login is required before any command that calls the API.
- Token is cached at
~/.config/voxflow/token.json
.
- For CI environments, set env var.
Commands
Text-to-Speech ()
The core command. Convert text to speech audio.
bash
# Basic usage
voxflow say "你好世界" -o hello.mp3
# With specific voice
voxflow say "Hello world" --voice v-female-R2s4N9qJ -o greeting.mp3
# Slow narration speed
voxflow say "慢速朗读" --speed 0.8 -o slow.mp3
# WAV format
voxflow say "高质量音频" --format wav -o output.wav
Narrate ()
Read a file or long text, split into segments, synthesize each with TTS.
bash
# From file
voxflow narrate article.txt -o narration.wav
# From stdin (pipe any text)
cat readme.md | voxflow narrate -o readme_audio.wav
# With voice
voxflow narrate script.txt --voice v-male-Bk7vD3xP -o output.wav
Best for: long documents, articles, README files, email newsletters.
Podcast ()
Generate a multi-speaker podcast episode with AI-written script.
bash
# From topic
voxflow podcast "程序员如何用AI提升效率" --duration 3
# With background music
voxflow podcast "Tech trends" --bgm lofi --duration 5
# From existing script
voxflow podcast --script dialogue.txt
# Control language
voxflow podcast "量子计算" --language zh-CN
Flags: ·
--bgm <lofi|jazz|ambient>
·
·
--language <zh-CN|en-US|ja-JP>
Story ()
AI writes a short story and narrates it with TTS.
bash
voxflow story "一只会飞的小猫" -o story.wav
voxflow story "space adventure" --lang en -o adventure.wav
Best for: bedtime stories, creative writing demos, content samples.
Voice Search ()
Browse the voice library. No login required.
bash
# Chinese female voices
voxflow voices --lang zh --gender female
# English voices
voxflow voices --lang en
# Search by keyword
voxflow voices --search "narrator"
# All voices
voxflow voices --all
Always call
first when the user wants a specific voice style.
Speech Recognition ()
Transcribe audio to text. Supports Chinese, English, Japanese, Korean.
bash
voxflow asr recording.mp3
voxflow asr meeting.wav --lang en
Note: Requires publicly accessible audio URL or local file upload.
Voice Selection Guide
bash
# Step 1: Search for matching voices
voxflow voices --lang zh --gender female
# Step 2: Use the voice ID in synthesis
voxflow say "测试" --voice v-female-R2s4N9qJ -o test.mp3
Popular voices:
- — 温柔姐姐 (Gentle Female, Chinese)
- — 威严霸总 (Authoritative Male, Chinese)
- — 傲娇学姐 (Sassy Female, Chinese)
Common Scenarios
"把这段话念出来"
bash
voxflow say "用户输入的文字" -o output.mp3 && open output.mp3
"用温柔女声读这个文件"
bash
voxflow voices --lang zh --gender female # 先找音色
voxflow narrate file.txt --voice v-female-R2s4N9qJ -o narration.mp3
"生成一个关于 XX 的播客"
bash
voxflow status # 检查配额(播客约 5000)
voxflow podcast "话题" --duration 3 --bgm lofi
"讲个睡前故事"
bash
voxflow story "小狐狸的星星种子" -o bedtime.mp3 && open bedtime.mp3
"转录这段录音"
bash
voxflow asr recording.mp3
Creative Workflows
These workflows combine VoxFlow TTS with the AI agent's own abilities (writing, coding, web fetching). The agent writes content, then calls
or
to synthesize each part.
Audio Storybook (有声绘本)
AI writes a children's story, generates SVG illustrations, synthesizes narration per page, and bundles everything into a single offline HTML file.
Steps:
- Write a 6-page children's story
- For each page: generate an inline SVG illustration (400×300)
- Synthesize narration:
voxflow say "page text" --voice v-female-R2s4N9qJ --speed 0.85 -o /tmp/page_N.mp3
- Read the mp3 files, base64 encode, embed inline in HTML
- Output a single self-contained HTML file with audio play buttons per page
Audio Presentation (有声演示文稿)
AI creates an HTML slide deck with TTS narration on each slide.
Steps:
- Generate N slides (title + bullet points + narration script)
- For each slide:
voxflow say "narration script" -o /tmp/slide_N.mp3
- Build an HTML file with slide navigation (prev/next) and audio buttons
- Embed audio inline as base64
open /tmp/presentation.html
Best for: product introductions, technical tutorials, course materials.
Article → Audio Briefing (文章有声摘要)
Read a URL or document, summarize, synthesize as audio.
Steps:
- Fetch/read the content
- Summarize into 3-5 key points
voxflow say "summary text" --voice v-male-Bk7vD3xP -o /tmp/briefing.mp3
Document Narration (文档朗读)
Read a README, code comments, or any text file aloud.
bash
voxflow narrate README.md --voice v-female-R2s4N9qJ --speed 0.9 -o /tmp/readme.mp3
open /tmp/readme.mp3
Multi-language Voice (多语言合成)
AI translates text, then synthesizes in each language with matching voices.
Steps:
- Translate user text to target languages (AI does this natively)
voxflow voices --lang en --gender female
→ pick English voice
voxflow voices --lang ja --gender female
→ pick Japanese voice
voxflow say "English text" --voice <en_id> -o /tmp/en.mp3
voxflow say "日本語テキスト" --voice <ja_id> -o /tmp/ja.mp3
Git Daily Report Audio (Git 日报音频)
Steps:
- Read
git log --oneline --since="1 day ago"
- Summarize changes into a brief report
voxflow say "today's summary..." -o /tmp/daily_report.mp3
open /tmp/daily_report.mp3
Code PR Explanation (PR 语音讲解)
Steps:
- Read the PR diff
- Write a plain-language explanation
voxflow say "explanation..." -o /tmp/pr_review.mp3
Mock Interview (模拟面试)
Steps:
- Generate 3 interview questions on the topic
- For each:
voxflow say "question N..." --voice v-male-Bk7vD3xP -o /tmp/q_N.mp3
- Play questions in sequence:
Quota
- Free: 10,000 quota/month
- 1 TTS call ≈ 100 quota
- 1 podcast ≈ 5,000 quota
- Check:
Prerequisites
- Node.js 20+ required
- ffmpeg optional (only for video-related commands)
- Install CLI:
Rules
- Always try MCP first — if MCP tools are available, use them instead of CLI.
- Always search voices before synthesizing — never guess voice IDs.
- Check quota before expensive operations (podcast ≈ 5000 quota).
- After synthesis, auto-play the file: (macOS).
- Never print tokens or secrets.
- If CLI fails with "not logged in", suggest MCP as alternative:
claude mcp add voxflow https://api.voxflow.studio/api/mcp
- If a command fails, check and correct flags before retrying.
MCP Setup (if not already configured)
If MCP tools are not available and you want the easiest setup:
bash
claude mcp add voxflow https://api.voxflow.studio/api/mcp
After adding, MCP tools work immediately — OAuth auto-login, no CLI install needed. Restart the agent session to load MCP tools.