voicebox-voice-synthesis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVoicebox Voice Synthesis Studio
Voicebox语音合成工作室
Skill by ara.so — Daily 2026 Skills collection.
Voicebox is a local-first, open-source voice cloning and TTS studio — a self-hosted alternative to ElevenLabs. It runs entirely on your machine (macOS MLX/Metal, Windows/Linux CUDA, CPU fallback), exposes a REST API on , and ships with 5 TTS engines, 23 languages, post-processing effects, and a multi-track Stories editor.
localhost:17493由ara.so开发的技能——属于Daily 2026技能合集。
Voicebox是一款本地优先的开源语音克隆与TTS工作室,是ElevenLabs的自托管替代方案。它完全在你的设备上运行(支持macOS MLX/Metal、Windows/Linux CUDA,CPU作为 fallback),在提供REST API,内置5种TTS引擎、支持23种语言、带有后期处理效果,以及多轨故事编辑器。
localhost:17493Installation
安装
Pre-built Binaries (Recommended)
预构建二进制文件(推荐)
| Platform | Link |
|---|---|
| macOS Apple Silicon | https://voicebox.sh/download/mac-arm |
| macOS Intel | https://voicebox.sh/download/mac-intel |
| Windows | https://voicebox.sh/download/windows |
| Docker | |
Linux requires building from source: https://voicebox.sh/linux-install
| 平台 | 链接 |
|---|---|
| macOS Apple Silicon | https://voicebox.sh/download/mac-arm |
| macOS Intel | https://voicebox.sh/download/mac-intel |
| Windows | https://voicebox.sh/download/windows |
| Docker | |
Linux需要从源码构建:https://voicebox.sh/linux-install
Build from Source
从源码构建
bash
git clone https://github.com/jamiepine/voicebox.git
cd voiceboxbash
git clone https://github.com/jamiepine/voicebox.git
cd voiceboxInstall just task runner
安装just任务运行器
brew install just # macOS
cargo install just # any platform
brew install just # macOS
cargo install just # 任意平台
Set up Python venv + all dependencies
设置Python虚拟环境并安装所有依赖
just setup
just setup
Start backend + desktop app in dev mode
以开发模式启动后端和桌面应用
just dev
```bashjust dev
```bashList all available commands
列出所有可用命令
just --list
---just --list
---Architecture
架构
| Layer | Technology |
|---|---|
| Desktop App | Tauri (Rust) |
| Frontend | React + TypeScript + Tailwind CSS |
| State | Zustand + React Query |
| Backend | FastAPI (Python) on port 17493 |
| TTS Engines | Qwen3-TTS, LuxTTS, Chatterbox, Chatterbox Turbo, TADA |
| Effects | Pedalboard (Spotify) |
| Transcription | Whisper / Whisper Turbo |
| Inference | MLX (Apple Silicon) / PyTorch (CUDA/ROCm/XPU/CPU) |
| Database | SQLite |
The Python FastAPI backend handles all ML inference. The Tauri Rust shell wraps the frontend and manages the backend process lifecycle. The API is accessible directly at even when using the desktop app.
http://localhost:17493| 层级 | 技术栈 |
|---|---|
| 桌面应用 | Tauri (Rust) |
| 前端 | React + TypeScript + Tailwind CSS |
| 状态管理 | Zustand + React Query |
| 后端 | FastAPI (Python)(端口17493) |
| TTS引擎 | Qwen3-TTS, LuxTTS, Chatterbox, Chatterbox Turbo, TADA |
| 音效处理 | Pedalboard (Spotify) |
| 语音转文字 | Whisper / Whisper Turbo |
| 推理引擎 | MLX (Apple Silicon) / PyTorch (CUDA/ROCm/XPU/CPU) |
| 数据库 | SQLite |
Python FastAPI后端处理所有机器学习推理任务。Tauri Rust外壳包裹前端,并管理后端进程的生命周期。即使使用桌面应用,你也可以直接访问的API。
http://localhost:17493REST API Reference
REST API参考
Base URL:
Interactive docs:
http://localhost:17493Interactive docs:
http://localhost:17493/docs基础URL:
交互式文档:
http://localhost:17493交互式文档:
http://localhost:17493/docsGenerate Speech
生成语音
bash
undefinedbash
undefinedBasic generation
基础生成
curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "Hello world, this is a voice clone.", "profile_id": "abc123", "language": "en" }'
-H "Content-Type: application/json"
-d '{ "text": "Hello world, this is a voice clone.", "profile_id": "abc123", "language": "en" }'
curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "Hello world, this is a voice clone.", "profile_id": "abc123", "language": "en" }'
-H "Content-Type: application/json"
-d '{ "text": "Hello world, this is a voice clone.", "profile_id": "abc123", "language": "en" }'
With engine selection
指定引擎生成
curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "Speak slowly and with gravitas.", "profile_id": "abc123", "language": "en", "engine": "qwen3-tts" }'
-H "Content-Type: application/json"
-d '{ "text": "Speak slowly and with gravitas.", "profile_id": "abc123", "language": "en", "engine": "qwen3-tts" }'
curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "Speak slowly and with gravitas.", "profile_id": "abc123", "language": "en", "engine": "qwen3-tts" }'
-H "Content-Type: application/json"
-d '{ "text": "Speak slowly and with gravitas.", "profile_id": "abc123", "language": "en", "engine": "qwen3-tts" }'
With paralinguistic tags (Chatterbox Turbo only)
使用副语言标签(仅Chatterbox Turbo支持)
curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "That is absolutely hilarious! [laugh] I cannot believe it.", "profile_id": "abc123", "engine": "chatterbox-turbo", "language": "en" }'
-H "Content-Type: application/json"
-d '{ "text": "That is absolutely hilarious! [laugh] I cannot believe it.", "profile_id": "abc123", "engine": "chatterbox-turbo", "language": "en" }'
undefinedcurl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "That is absolutely hilarious! [laugh] I cannot believe it.", "profile_id": "abc123", "engine": "chatterbox-turbo", "language": "en" }'
-H "Content-Type: application/json"
-d '{ "text": "That is absolutely hilarious! [laugh] I cannot believe it.", "profile_id": "abc123", "engine": "chatterbox-turbo", "language": "en" }'
undefinedVoice Profiles
语音配置文件
bash
undefinedbash
undefinedList all profiles
列出所有配置文件
Create a new profile
创建新配置文件
curl -X POST http://localhost:17493/profiles
-H "Content-Type: application/json"
-d '{ "name": "Narrator", "language": "en", "description": "Deep narrative voice" }'
-H "Content-Type: application/json"
-d '{ "name": "Narrator", "language": "en", "description": "Deep narrative voice" }'
curl -X POST http://localhost:17493/profiles
-H "Content-Type: application/json"
-d '{ "name": "Narrator", "language": "en", "description": "Deep narrative voice" }'
-H "Content-Type: application/json"
-d '{ "name": "Narrator", "language": "en", "description": "Deep narrative voice" }'
Upload audio sample to a profile
上传音频样本到配置文件
curl -X POST http://localhost:17493/profiles/{profile_id}/samples
-F "file=@/path/to/voice-sample.wav"
-F "file=@/path/to/voice-sample.wav"
curl -X POST http://localhost:17493/profiles/{profile_id}/samples
-F "file=@/path/to/voice-sample.wav"
-F "file=@/path/to/voice-sample.wav"
Export a profile
导出配置文件
curl http://localhost:17493/profiles/{profile_id}/export
--output narrator-profile.zip
--output narrator-profile.zip
curl http://localhost:17493/profiles/{profile_id}/export
--output narrator-profile.zip
--output narrator-profile.zip
Import a profile
导入配置文件
curl -X POST http://localhost:17493/profiles/import
-F "file=@narrator-profile.zip"
-F "file=@narrator-profile.zip"
undefinedcurl -X POST http://localhost:17493/profiles/import
-F "file=@narrator-profile.zip"
-F "file=@narrator-profile.zip"
undefinedGeneration Queue & Status
生成队列与状态
bash
undefinedbash
undefinedGet generation status (SSE stream)
获取生成状态(SSE流)
List recent generations
列出近期生成任务
Retry a failed generation
重试失败的生成任务
Download generated audio
下载生成的音频
curl http://localhost:17493/generations/{generation_id}/audio
--output output.wav
--output output.wav
undefinedcurl http://localhost:17493/generations/{generation_id}/audio
--output output.wav
--output output.wav
undefinedModels
模型管理
bash
undefinedbash
undefinedList available models and download status
列出可用模型及下载状态
Unload a model from GPU memory (without deleting)
从GPU内存卸载模型(不删除)
curl -X POST http://localhost:17493/models/{model_id}/unload
---curl -X POST http://localhost:17493/models/{model_id}/unload
---TypeScript/JavaScript Integration
TypeScript/JavaScript集成
Basic TTS Client
基础TTS客户端
typescript
const VOICEBOX_URL = process.env.VOICEBOX_API_URL ?? "http://localhost:17493";
interface GenerateRequest {
text: string;
profile_id: string;
language?: string;
engine?: "qwen3-tts" | "luxtts" | "chatterbox" | "chatterbox-turbo" | "tada";
}
interface GenerateResponse {
generation_id: string;
status: "queued" | "processing" | "complete" | "failed";
audio_url?: string;
}
async function generateSpeech(req: GenerateRequest): Promise<GenerateResponse> {
const response = await fetch(`${VOICEBOX_URL}/generate`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(req),
});
if (!response.ok) {
throw new Error(`Voicebox API error: ${response.status} ${await response.text()}`);
}
return response.json();
}
// Usage
const result = await generateSpeech({
text: "Welcome to our application.",
profile_id: "abc123",
language: "en",
engine: "qwen3-tts",
});
console.log("Generation ID:", result.generation_id);typescript
const VOICEBOX_URL = process.env.VOICEBOX_API_URL ?? "http://localhost:17493";
interface GenerateRequest {
text: string;
profile_id: string;
language?: string;
engine?: "qwen3-tts" | "luxtts" | "chatterbox" | "chatterbox-turbo" | "tada";
}
interface GenerateResponse {
generation_id: string;
status: "queued" | "processing" | "complete" | "failed";
audio_url?: string;
}
async function generateSpeech(req: GenerateRequest): Promise<GenerateResponse> {
const response = await fetch(`${VOICEBOX_URL}/generate`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(req),
});
if (!response.ok) {
throw new Error(`Voicebox API error: ${response.status} ${await response.text()}`);
}
return response.json();
}
// 使用示例
const result = await generateSpeech({
text: "Welcome to our application.",
profile_id: "abc123",
language: "en",
engine: "qwen3-tts",
});
console.log("Generation ID:", result.generation_id);Poll for Completion
轮询生成完成状态
typescript
async function waitForGeneration(
generationId: string,
timeoutMs = 60_000
): Promise<string> {
const start = Date.now();
while (Date.now() - start < timeoutMs) {
const res = await fetch(`${VOICEBOX_URL}/generations/${generationId}`);
const data = await res.json();
if (data.status === "complete") {
return `${VOICEBOX_URL}/generations/${generationId}/audio`;
}
if (data.status === "failed") {
throw new Error(`Generation failed: ${data.error}`);
}
await new Promise((r) => setTimeout(r, 1000));
}
throw new Error("Generation timed out");
}typescript
async function waitForGeneration(
generationId: string,
timeoutMs = 60_000
): Promise<string> {
const start = Date.now();
while (Date.now() - start < timeoutMs) {
const res = await fetch(`${VOICEBOX_URL}/generations/${generationId}`);
const data = await res.json();
if (data.status === "complete") {
return `${VOICEBOX_URL}/generations/${generationId}/audio`;
}
if (data.status === "failed") {
throw new Error(`Generation failed: ${data.error}`);
}
await new Promise((r) => setTimeout(r, 1000));
}
throw new Error("Generation timed out");
}Stream Status with SSE
使用SSE流式获取状态
typescript
function streamGenerationStatus(
generationId: string,
onStatus: (status: string) => void
): () => void {
const eventSource = new EventSource(
`${VOICEBOX_URL}/generate/${generationId}/status`
);
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
onStatus(data.status);
if (data.status === "complete" || data.status === "failed") {
eventSource.close();
}
};
eventSource.onerror = () => eventSource.close();
// Return cleanup function
return () => eventSource.close();
}
// Usage
const cleanup = streamGenerationStatus("gen_abc123", (status) => {
console.log("Status update:", status);
});typescript
function streamGenerationStatus(
generationId: string,
onStatus: (status: string) => void
): () => void {
const eventSource = new EventSource(
`${VOICEBOX_URL}/generate/${generationId}/status`
);
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
onStatus(data.status);
if (data.status === "complete" || data.status === "failed") {
eventSource.close();
}
};
eventSource.onerror = () => eventSource.close();
// 返回清理函数
return () => eventSource.close();
}
// 使用示例
const cleanup = streamGenerationStatus("gen_abc123", (status) => {
console.log("Status update:", status);
});Download Audio as Blob
下载音频为Blob
typescript
async function downloadAudio(generationId: string): Promise<Blob> {
const response = await fetch(
`${VOICEBOX_URL}/generations/${generationId}/audio`
);
if (!response.ok) {
throw new Error(`Failed to download audio: ${response.status}`);
}
return response.blob();
}
// Play in browser
async function playGeneratedAudio(generationId: string): Promise<void> {
const blob = await downloadAudio(generationId);
const url = URL.createObjectURL(blob);
const audio = new Audio(url);
audio.play();
audio.onended = () => URL.revokeObjectURL(url);
}typescript
async function downloadAudio(generationId: string): Promise<Blob> {
const response = await fetch(
`${VOICEBOX_URL}/generations/${generationId}/audio`
);
if (!response.ok) {
throw new Error(`Failed to download audio: ${response.status}`);
}
return response.blob();
}
// 在浏览器中播放
async function playGeneratedAudio(generationId: string): Promise<void> {
const blob = await downloadAudio(generationId);
const url = URL.createObjectURL(blob);
const audio = new Audio(url);
audio.play();
audio.onended = () => URL.revokeObjectURL(url);
}Python Integration
Python集成
python
import httpx
import asyncio
VOICEBOX_URL = "http://localhost:17493"
async def generate_speech(
text: str,
profile_id: str,
language: str = "en",
engine: str = "qwen3-tts"
) -> bytes:
async with httpx.AsyncClient(timeout=120.0) as client:
# Submit generation
resp = await client.post(
f"{VOICEBOX_URL}/generate",
json={
"text": text,
"profile_id": profile_id,
"language": language,
"engine": engine,
}
)
resp.raise_for_status()
generation_id = resp.json()["generation_id"]
# Poll until complete
for _ in range(120):
status_resp = await client.get(
f"{VOICEBOX_URL}/generations/{generation_id}"
)
status_data = status_resp.json()
if status_data["status"] == "complete":
audio_resp = await client.get(
f"{VOICEBOX_URL}/generations/{generation_id}/audio"
)
return audio_resp.content
if status_data["status"] == "failed":
raise RuntimeError(f"Generation failed: {status_data.get('error')}")
await asyncio.sleep(1.0)
raise TimeoutError("Generation timed out after 120s")python
import httpx
import asyncio
VOICEBOX_URL = "http://localhost:17493"
async def generate_speech(
text: str,
profile_id: str,
language: str = "en",
engine: str = "qwen3-tts"
) -> bytes:
async with httpx.AsyncClient(timeout=120.0) as client:
# 提交生成任务
resp = await client.post(
f"{VOICEBOX_URL}/generate",
json={
"text": text,
"profile_id": profile_id,
"language": language,
"engine": engine,
}
)
resp.raise_for_status()
generation_id = resp.json()["generation_id"]
# 轮询直到完成
for _ in range(120):
status_resp = await client.get(
f"{VOICEBOX_URL}/generations/{generation_id}"
)
status_data = status_resp.json()
if status_data["status"] == "complete":
audio_resp = await client.get(
f"{VOICEBOX_URL}/generations/{generation_id}/audio"
)
return audio_resp.content
if status_data["status"] == "failed":
raise RuntimeError(f"Generation failed: {status_data.get('error')}")
await asyncio.sleep(1.0)
raise TimeoutError("Generation timed out after 120s")Usage
使用示例
audio_bytes = asyncio.run(
generate_speech(
text="The quick brown fox jumps over the lazy dog.",
profile_id="your-profile-id",
language="en",
engine="chatterbox",
)
)
with open("output.wav", "wb") as f:
f.write(audio_bytes)
---audio_bytes = asyncio.run(
generate_speech(
text="The quick brown fox jumps over the lazy dog.",
profile_id="your-profile-id",
language="en",
engine="chatterbox",
)
)
with open("output.wav", "wb") as f:
f.write(audio_bytes)
---TTS Engine Selection Guide
TTS引擎选择指南
| Engine | Best For | Languages | VRAM | Notes |
|---|---|---|---|---|
| Quality + instructions | 10 | Medium | Supports delivery instructions in text |
| Fast CPU generation | English only | ~1GB | 150x realtime on CPU, 48kHz |
| Multilingual coverage | 23 | Medium | Arabic, Hindi, Swahili, CJK + more |
| Expressive/emotion | English only | Low (350M) | Use |
| Long-form coherence | 10 | High | 700s+ audio, HumeAI model |
| 引擎 | 最佳适用场景 | 支持语言 | 显存占用 | 说明 |
|---|---|---|---|---|
| 高质量合成+表达指令 | 10种 | 中等 | 支持在文本中嵌入表达指令 |
| 快速CPU生成 | 仅英文 | ~1GB | CPU上可达150倍实时速度,输出48kHz音频 |
| 多语言覆盖 | 23种 | 中等 | 支持阿拉伯语、印地语、斯瓦希里语、中日韩等语言 |
| 富有表现力/带情感 | 仅英文 | 低(350M) | 支持使用 |
| 长文本连贯性 | 10种 | 高 | 支持生成700秒以上音频,基于HumeAI模型 |
Delivery Instructions (Qwen3-TTS)
表达指令(Qwen3-TTS)
Embed natural language instructions directly in the text:
typescript
await generateSpeech({
text: "(whisper) I have a secret to tell you.",
profile_id: "abc123",
engine: "qwen3-tts",
});
await generateSpeech({
text: "(speak slowly and clearly) Step one: open the application.",
profile_id: "abc123",
engine: "qwen3-tts",
});可直接在文本中嵌入自然语言表达指令:
typescript
await generateSpeech({
text: "(whisper) I have a secret to tell you.",
profile_id: "abc123",
engine: "qwen3-tts",
});
await generateSpeech({
text: "(speak slowly and clearly) Step one: open the application.",
profile_id: "abc123",
engine: "qwen3-tts",
});Paralinguistic Tags (Chatterbox Turbo)
副语言标签(Chatterbox Turbo)
typescript
const tags = [
"[laugh]", "[chuckle]", "[gasp]", "[cough]",
"[sigh]", "[groan]", "[sniff]", "[shush]", "[clear throat]"
];
await generateSpeech({
text: "Oh really? [gasp] I had no idea! [laugh] That's incredible.",
profile_id: "abc123",
engine: "chatterbox-turbo",
});typescript
const tags = [
"[laugh]", "[chuckle]", "[gasp]", "[cough]",
"[sigh]", "[groan]", "[sniff]", "[shush]", "[clear throat]"
];
await generateSpeech({
text: "Oh really? [gasp] I had no idea! [laugh] That's incredible.",
profile_id: "abc123",
engine: "chatterbox-turbo",
});Environment & Configuration
环境与配置
bash
undefinedbash
undefinedCustom models directory (set before launching)
自定义模型目录(启动前设置)
export VOICEBOX_MODELS_DIR=/path/to/models
export VOICEBOX_MODELS_DIR=/path/to/models
For AMD ROCm GPU (auto-configured, but can override)
针对AMD ROCm GPU(自动配置,也可手动覆盖)
export HSA_OVERRIDE_GFX_VERSION=11.0.0
Docker configuration (`docker-compose.yml` override):
```yaml
services:
voicebox:
environment:
- VOICEBOX_MODELS_DIR=/models
volumes:
- /host/models:/models
ports:
- "17493:17493"
# For NVIDIA GPU passthrough:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]export HSA_OVERRIDE_GFX_VERSION=11.0.0
Docker配置(覆盖`docker-compose.yml`):
```yaml
services:
voicebox:
environment:
- VOICEBOX_MODELS_DIR=/models
volumes:
- /host/models:/models
ports:
- "17493:17493"
# NVIDIA GPU直通配置:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]Common Patterns
常见使用模式
Voice Profile Creation Flow
语音配置文件创建流程
typescript
// 1. Create profile
const profile = await fetch(`${VOICEBOX_URL}/profiles`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ name: "My Voice", language: "en" }),
}).then((r) => r.json());
// 2. Upload audio sample (WAV/MP3, ideally 5–30 seconds clean speech)
const formData = new FormData();
formData.append("file", audioBlob, "sample.wav");
await fetch(`${VOICEBOX_URL}/profiles/${profile.id}/samples`, {
method: "POST",
body: formData,
});
// 3. Generate with the new profile
const gen = await generateSpeech({
text: "Testing my cloned voice.",
profile_id: profile.id,
});typescript
// 1. 创建配置文件
const profile = await fetch(`${VOICEBOX_URL}/profiles`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ name: "My Voice", language: "en" }),
}).then((r) => r.json());
// 2. 上传音频样本(WAV/MP3格式,建议5-30秒清晰语音)
const formData = new FormData();
formData.append("file", audioBlob, "sample.wav");
await fetch(`${VOICEBOX_URL}/profiles/${profile.id}/samples`, {
method: "POST",
body: formData,
});
// 3. 使用新配置文件生成语音
const gen = await generateSpeech({
text: "Testing my cloned voice.",
profile_id: profile.id,
});Batch Generation with Queue
批量生成与队列管理
typescript
async function batchGenerate(
items: Array<{ text: string; profileId: string }>,
engine = "qwen3-tts"
): Promise<string[]> {
// Submit all — Voicebox queues them serially to avoid GPU contention
const submissions = await Promise.all(
items.map((item) =>
generateSpeech({ text: item.text, profile_id: item.profileId, engine })
)
);
// Wait for all completions
const audioUrls = await Promise.all(
submissions.map((s) => waitForGeneration(s.generation_id))
);
return audioUrls;
}typescript
async function batchGenerate(
items: Array<{ text: string; profileId: string }>,
engine = "qwen3-tts"
): Promise<string[]> {
// 提交所有任务——Voicebox会自动串行排队以避免GPU资源竞争
const submissions = await Promise.all(
items.map((item) =>
generateSpeech({ text: item.text, profile_id: item.profileId, engine })
)
);
// 等待所有任务完成
const audioUrls = await Promise.all(
submissions.map((s) => waitForGeneration(s.generation_id))
);
return audioUrls;
}Long-Form Text (Auto-Chunking)
长文本生成(自动分块)
Voicebox auto-chunks at sentence boundaries — just send the full text:
typescript
const longScript = `
Chapter one. The morning fog rolled across the valley floor...
// Up to 50,000 characters supported
`;
await generateSpeech({
text: longScript,
profile_id: "narrator-profile-id",
engine: "tada", // Best for long-form coherence
language: "en",
});Voicebox会自动按句子边界分块——直接传入完整文本即可:
typescript
const longScript = `
Chapter one. The morning fog rolled across the valley floor...
// 支持最多50,000字符
`;
await generateSpeech({
text: longScript,
profile_id: "narrator-profile-id",
engine: "tada", // 最适合长文本连贯性
language: "en",
});Troubleshooting
故障排查
API not responding
API无响应
bash
undefinedbash
undefinedCheck if backend is running
检查后端是否运行
Restart backend only (dev mode)
仅重启后端(开发模式)
just backend
just backend
Check logs
查看日志
just logs
undefinedjust logs
undefinedGPU not detected
GPU未被检测到
bash
undefinedbash
undefinedCheck detected backend
检查检测到的后端
Force CPU mode (set before launch)
强制使用CPU模式(启动前设置)
export VOICEBOX_FORCE_CPU=1
undefinedexport VOICEBOX_FORCE_CPU=1
undefinedModel download fails / slow
模型下载失败/缓慢
bash
undefinedbash
undefinedSet custom models directory with more space
设置有足够空间的自定义模型目录
export VOICEBOX_MODELS_DIR=/path/with/space
just dev
export VOICEBOX_MODELS_DIR=/path/with/space
just dev
Cancel stuck download via API
通过API取消卡住的下载
curl -X DELETE http://localhost:17493/models/{model_id}/download
undefinedcurl -X DELETE http://localhost:17493/models/{model_id}/download
undefinedOut of VRAM — unload models
显存不足——卸载模型
bash
undefinedbash
undefinedList loaded models
列出已加载的模型
curl http://localhost:17493/models | jq '.[] | select(.loaded == true)'
curl http://localhost:17493/models | jq '.[] | select(.loaded == true)'
Unload specific model
卸载指定模型
curl -X POST http://localhost:17493/models/{model_id}/unload
undefinedcurl -X POST http://localhost:17493/models/{model_id}/unload
undefinedAudio quality issues
音频质量问题
- Use 5–30 seconds of clean, noise-free speech for voice samples
- Multiple samples improve clone quality — upload 3–5 different sentences
- For multilingual cloning, use engine
chatterbox - Ensure sample audio is 16kHz+ mono WAV for best results
- Use for highest output quality (48kHz) in English
luxtts
- 语音样本使用5-30秒无噪音的清晰语音
- 多个样本可提升克隆质量——建议上传3-5句不同的语音
- 多语言克隆请使用引擎
chatterbox - 样本音频建议为16kHz以上的单声道WAV格式以获得最佳效果
- 英文场景下使用可获得最高输出质量(48kHz)
luxtts
Generation stuck in queue after crash
崩溃后生成任务卡在队列中
Voicebox auto-recovers stale generations on startup. If the issue persists:
bash
curl -X POST http://localhost:17493/generations/{generation_id}/retryVoicebox启动时会自动恢复停滞的生成任务。如果问题仍然存在:
bash
curl -X POST http://localhost:17493/generations/{generation_id}/retryFrontend Integration (React Example)
前端集成(React示例)
tsx
import { useState } from "react";
const VOICEBOX_URL = import.meta.env.VITE_VOICEBOX_URL ?? "http://localhost:17493";
export function VoiceGenerator({ profileId }: { profileId: string }) {
const [text, setText] = useState("");
const [audioUrl, setAudioUrl] = useState<string | null>(null);
const [loading, setLoading] = useState(false);
const handleGenerate = async () => {
setLoading(true);
try {
const res = await fetch(`${VOICEBOX_URL}/generate`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ text, profile_id: profileId, language: "en" }),
});
const { generation_id } = await res.json();
// Poll for completion
let done = false;
while (!done) {
await new Promise((r) => setTimeout(r, 1000));
const statusRes = await fetch(`${VOICEBOX_URL}/generations/${generation_id}`);
const { status } = await statusRes.json();
if (status === "complete") {
setAudioUrl(`${VOICEBOX_URL}/generations/${generation_id}/audio`);
done = true;
} else if (status === "failed") {
throw new Error("Generation failed");
}
}
} finally {
setLoading(false);
}
};
return (
<div>
<textarea value={text} onChange={(e) => setText(e.target.value)} />
<button onClick={handleGenerate} disabled={loading}>
{loading ? "Generating..." : "Generate Speech"}
</button>
{audioUrl && <audio controls src={audioUrl} />}
</div>
);
}tsx
import { useState } from "react";
const VOICEBOX_URL = import.meta.env.VITE_VOICEBOX_URL ?? "http://localhost:17493";
export function VoiceGenerator({ profileId }: { profileId: string }) {
const [text, setText] = useState("");
const [audioUrl, setAudioUrl] = useState<string | null>(null);
const [loading, setLoading] = useState(false);
const handleGenerate = async () => {
setLoading(true);
try {
const res = await fetch(`${VOICEBOX_URL}/generate`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ text, profile_id: profileId, language: "en" }),
});
const { generation_id } = await res.json();
// 轮询直到完成
let done = false;
while (!done) {
await new Promise((r) => setTimeout(r, 1000));
const statusRes = await fetch(`${VOICEBOX_URL}/generations/${generation_id}`);
const { status } = await statusRes.json();
if (status === "complete") {
setAudioUrl(`${VOICEBOX_URL}/generations/${generation_id}/audio`);
done = true;
} else if (status === "failed") {
throw new Error("Generation failed");
}
}
} finally {
setLoading(false);
}
};
return (
<div>
<textarea value={text} onChange={(e) => setText(e.target.value)} />
<button onClick={handleGenerate} disabled={loading}>
{loading ? "Generating..." : "Generate Speech"}
</button>
{audioUrl && <audio controls src={audioUrl} />}
</div>
);
}