voicebox-voice-synthesis

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Voicebox Voice Synthesis Studio

Voicebox语音合成工作室

Skill by ara.so — Daily 2026 Skills collection.

Voicebox is a local-first, open-source voice cloning and TTS studio — a self-hosted alternative to ElevenLabs. It runs entirely on your machine (macOS MLX/Metal, Windows/Linux CUDA, CPU fallback), exposes a REST API on

localhost:17493

, and ships with 5 TTS engines, 23 languages, post-processing effects, and a multi-track Stories editor.

由ara.so开发的技能——属于Daily 2026技能合集。

Voicebox是一款本地优先的开源语音克隆与TTS工作室，是ElevenLabs的自托管替代方案。它完全在你的设备上运行（支持macOS MLX/Metal、Windows/Linux CUDA，CPU作为 fallback），在

localhost:17493

提供REST API，内置5种TTS引擎、支持23种语言、带有后期处理效果，以及多轨故事编辑器。

Installation

安装

Pre-built Binaries (Recommended)

预构建二进制文件（推荐）

Platform	Link
macOS Apple Silicon	https://voicebox.sh/download/mac-arm
macOS Intel	https://voicebox.sh/download/mac-intel
Windows	https://voicebox.sh/download/windows
Docker	`docker compose up`

Linux requires building from source: https://voicebox.sh/linux-install

平台	链接
macOS Apple Silicon	https://voicebox.sh/download/mac-arm
macOS Intel	https://voicebox.sh/download/mac-intel
Windows	https://voicebox.sh/download/windows
Docker	`docker compose up`

Linux需要从源码构建：https://voicebox.sh/linux-install

Build from Source

从源码构建

Prerequisites: Bun, Rust, Python 3.11+, Tauri prerequisites

bash

git clone https://github.com/jamiepine/voicebox.git
cd voicebox

前置依赖： Bun、Rust、Python 3.11+、Tauri前置依赖

bash

git clone https://github.com/jamiepine/voicebox.git
cd voicebox

Install just task runner

安装just任务运行器

brew install just # macOS cargo install just # any platform

brew install just # macOS cargo install just # 任意平台

Set up Python venv + all dependencies

设置Python虚拟环境并安装所有依赖

just setup

Start backend + desktop app in dev mode

以开发模式启动后端和桌面应用

just dev


```bash

just dev


```bash

List all available commands

列出所有可用命令

just --list

---

just --list

---

Architecture

架构

Layer	Technology
Desktop App	Tauri (Rust)
Frontend	React + TypeScript + Tailwind CSS
State	Zustand + React Query
Backend	FastAPI (Python) on port 17493
TTS Engines	Qwen3-TTS, LuxTTS, Chatterbox, Chatterbox Turbo, TADA
Effects	Pedalboard (Spotify)
Transcription	Whisper / Whisper Turbo
Inference	MLX (Apple Silicon) / PyTorch (CUDA/ROCm/XPU/CPU)
Database	SQLite

The Python FastAPI backend handles all ML inference. The Tauri Rust shell wraps the frontend and manages the backend process lifecycle. The API is accessible directly at

http://localhost:17493

even when using the desktop app.

层级	技术栈
桌面应用	Tauri (Rust)
前端	React + TypeScript + Tailwind CSS
状态管理	Zustand + React Query
后端	FastAPI (Python)（端口17493）
TTS引擎	Qwen3-TTS, LuxTTS, Chatterbox, Chatterbox Turbo, TADA
音效处理	Pedalboard (Spotify)
语音转文字	Whisper / Whisper Turbo
推理引擎	MLX (Apple Silicon) / PyTorch (CUDA/ROCm/XPU/CPU)
数据库	SQLite

Python FastAPI后端处理所有机器学习推理任务。Tauri Rust外壳包裹前端，并管理后端进程的生命周期。即使使用桌面应用，你也可以直接访问

http://localhost:17493

的API。

REST API Reference

REST API参考

Base URL:

http://localhost:17493

Interactive docs:

http://localhost:17493/docs

基础URL：

http://localhost:17493

交互式文档：

http://localhost:17493/docs

Generate Speech

生成语音

bash

undefined

bash

undefined

Basic generation

基础生成

curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "Hello world, this is a voice clone.", "profile_id": "abc123", "language": "en" }'

With engine selection

指定引擎生成

curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "Speak slowly and with gravitas.", "profile_id": "abc123", "language": "en", "engine": "qwen3-tts" }'

With paralinguistic tags (Chatterbox Turbo only)

使用副语言标签（仅Chatterbox Turbo支持）

curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "That is absolutely hilarious! [laugh] I cannot believe it.", "profile_id": "abc123", "engine": "chatterbox-turbo", "language": "en" }'

undefined

undefined

Voice Profiles

语音配置文件

bash

undefined

bash

undefined

List all profiles

列出所有配置文件

curl http://localhost:17493/profiles

Create a new profile

创建新配置文件

curl -X POST http://localhost:17493/profiles
-H "Content-Type: application/json"
-d '{ "name": "Narrator", "language": "en", "description": "Deep narrative voice" }'

Upload audio sample to a profile

上传音频样本到配置文件

curl -X POST http://localhost:17493/profiles/{profile_id}/samples
-F "file=@/path/to/voice-sample.wav"

Export a profile

导出配置文件

curl http://localhost:17493/profiles/{profile_id}/export
--output narrator-profile.zip

Import a profile

导入配置文件

curl -X POST http://localhost:17493/profiles/import
-F "file=@narrator-profile.zip"

undefined

curl -X POST http://localhost:17493/profiles/import
-F "file=@narrator-profile.zip"

undefined

Generation Queue & Status

生成队列与状态

bash

undefined

bash

undefined

Get generation status (SSE stream)

获取生成状态（SSE流）

curl -N http://localhost:17493/generate/{generation_id}/status

List recent generations

列出近期生成任务

curl http://localhost:17493/generations

Retry a failed generation

重试失败的生成任务

curl -X POST http://localhost:17493/generations/{generation_id}/retry

Download generated audio

下载生成的音频

curl http://localhost:17493/generations/{generation_id}/audio
--output output.wav

undefined

curl http://localhost:17493/generations/{generation_id}/audio
--output output.wav

undefined

Models

模型管理

bash

undefined

bash

undefined

List available models and download status

列出可用模型及下载状态

curl http://localhost:17493/models

Unload a model from GPU memory (without deleting)

从GPU内存卸载模型（不删除）

curl -X POST http://localhost:17493/models/{model_id}/unload

---

curl -X POST http://localhost:17493/models/{model_id}/unload

---

TypeScript/JavaScript Integration

TypeScript/JavaScript集成

Basic TTS Client

基础TTS客户端

typescript

const VOICEBOX_URL = process.env.VOICEBOX_API_URL ?? "http://localhost:17493";

interface GenerateRequest {
  text: string;
  profile_id: string;
  language?: string;
  engine?: "qwen3-tts" | "luxtts" | "chatterbox" | "chatterbox-turbo" | "tada";
}

interface GenerateResponse {
  generation_id: string;
  status: "queued" | "processing" | "complete" | "failed";
  audio_url?: string;
}

async function generateSpeech(req: GenerateRequest): Promise<GenerateResponse> {
  const response = await fetch(`${VOICEBOX_URL}/generate`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(req),
  });

  if (!response.ok) {
    throw new Error(`Voicebox API error: ${response.status} ${await response.text()}`);
  }

  return response.json();
}

// Usage
const result = await generateSpeech({
  text: "Welcome to our application.",
  profile_id: "abc123",
  language: "en",
  engine: "qwen3-tts",
});

console.log("Generation ID:", result.generation_id);

typescript

const VOICEBOX_URL = process.env.VOICEBOX_API_URL ?? "http://localhost:17493";

interface GenerateRequest {
  text: string;
  profile_id: string;
  language?: string;
  engine?: "qwen3-tts" | "luxtts" | "chatterbox" | "chatterbox-turbo" | "tada";
}

interface GenerateResponse {
  generation_id: string;
  status: "queued" | "processing" | "complete" | "failed";
  audio_url?: string;
}

async function generateSpeech(req: GenerateRequest): Promise<GenerateResponse> {
  const response = await fetch(`${VOICEBOX_URL}/generate`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(req),
  });

  if (!response.ok) {
    throw new Error(`Voicebox API error: ${response.status} ${await response.text()}`);
  }

  return response.json();
}

// 使用示例
const result = await generateSpeech({
  text: "Welcome to our application.",
  profile_id: "abc123",
  language: "en",
  engine: "qwen3-tts",
});

console.log("Generation ID:", result.generation_id);

Poll for Completion

轮询生成完成状态

typescript

async function waitForGeneration(
  generationId: string,
  timeoutMs = 60_000
): Promise<string> {
  const start = Date.now();

  while (Date.now() - start < timeoutMs) {
    const res = await fetch(`${VOICEBOX_URL}/generations/${generationId}`);
    const data = await res.json();

    if (data.status === "complete") {
      return `${VOICEBOX_URL}/generations/${generationId}/audio`;
    }
    if (data.status === "failed") {
      throw new Error(`Generation failed: ${data.error}`);
    }

    await new Promise((r) => setTimeout(r, 1000));
  }

  throw new Error("Generation timed out");
}

typescript

async function waitForGeneration(
  generationId: string,
  timeoutMs = 60_000
): Promise<string> {
  const start = Date.now();

  while (Date.now() - start < timeoutMs) {
    const res = await fetch(`${VOICEBOX_URL}/generations/${generationId}`);
    const data = await res.json();

    if (data.status === "complete") {
      return `${VOICEBOX_URL}/generations/${generationId}/audio`;
    }
    if (data.status === "failed") {
      throw new Error(`Generation failed: ${data.error}`);
    }

    await new Promise((r) => setTimeout(r, 1000));
  }

  throw new Error("Generation timed out");
}

Stream Status with SSE

使用SSE流式获取状态

typescript

function streamGenerationStatus(
  generationId: string,
  onStatus: (status: string) => void
): () => void {
  const eventSource = new EventSource(
    `${VOICEBOX_URL}/generate/${generationId}/status`
  );

  eventSource.onmessage = (event) => {
    const data = JSON.parse(event.data);
    onStatus(data.status);

    if (data.status === "complete" || data.status === "failed") {
      eventSource.close();
    }
  };

  eventSource.onerror = () => eventSource.close();

  // Return cleanup function
  return () => eventSource.close();
}

// Usage
const cleanup = streamGenerationStatus("gen_abc123", (status) => {
  console.log("Status update:", status);
});

typescript

function streamGenerationStatus(
  generationId: string,
  onStatus: (status: string) => void
): () => void {
  const eventSource = new EventSource(
    `${VOICEBOX_URL}/generate/${generationId}/status`
  );

  eventSource.onmessage = (event) => {
    const data = JSON.parse(event.data);
    onStatus(data.status);

    if (data.status === "complete" || data.status === "failed") {
      eventSource.close();
    }
  };

  eventSource.onerror = () => eventSource.close();

  // 返回清理函数
  return () => eventSource.close();
}

// 使用示例
const cleanup = streamGenerationStatus("gen_abc123", (status) => {
  console.log("Status update:", status);
});

Download Audio as Blob

下载音频为Blob

typescript

async function downloadAudio(generationId: string): Promise<Blob> {
  const response = await fetch(
    `${VOICEBOX_URL}/generations/${generationId}/audio`
  );

  if (!response.ok) {
    throw new Error(`Failed to download audio: ${response.status}`);
  }

  return response.blob();
}

// Play in browser
async function playGeneratedAudio(generationId: string): Promise<void> {
  const blob = await downloadAudio(generationId);
  const url = URL.createObjectURL(blob);
  const audio = new Audio(url);
  audio.play();
  audio.onended = () => URL.revokeObjectURL(url);
}

typescript

async function downloadAudio(generationId: string): Promise<Blob> {
  const response = await fetch(
    `${VOICEBOX_URL}/generations/${generationId}/audio`
  );

  if (!response.ok) {
    throw new Error(`Failed to download audio: ${response.status}`);
  }

  return response.blob();
}

// 在浏览器中播放
async function playGeneratedAudio(generationId: string): Promise<void> {
  const blob = await downloadAudio(generationId);
  const url = URL.createObjectURL(blob);
  const audio = new Audio(url);
  audio.play();
  audio.onended = () => URL.revokeObjectURL(url);
}

Python Integration

Python集成

python

import httpx
import asyncio

VOICEBOX_URL = "http://localhost:17493"

async def generate_speech(
    text: str,
    profile_id: str,
    language: str = "en",
    engine: str = "qwen3-tts"
) -> bytes:
    async with httpx.AsyncClient(timeout=120.0) as client:
        # Submit generation
        resp = await client.post(
            f"{VOICEBOX_URL}/generate",
            json={
                "text": text,
                "profile_id": profile_id,
                "language": language,
                "engine": engine,
            }
        )
        resp.raise_for_status()
        generation_id = resp.json()["generation_id"]

        # Poll until complete
        for _ in range(120):
            status_resp = await client.get(
                f"{VOICEBOX_URL}/generations/{generation_id}"
            )
            status_data = status_resp.json()

            if status_data["status"] == "complete":
                audio_resp = await client.get(
                    f"{VOICEBOX_URL}/generations/{generation_id}/audio"
                )
                return audio_resp.content

            if status_data["status"] == "failed":
                raise RuntimeError(f"Generation failed: {status_data.get('error')}")

            await asyncio.sleep(1.0)

        raise TimeoutError("Generation timed out after 120s")

python

import httpx
import asyncio

VOICEBOX_URL = "http://localhost:17493"

async def generate_speech(
    text: str,
    profile_id: str,
    language: str = "en",
    engine: str = "qwen3-tts"
) -> bytes:
    async with httpx.AsyncClient(timeout=120.0) as client:
        # 提交生成任务
        resp = await client.post(
            f"{VOICEBOX_URL}/generate",
            json={
                "text": text,
                "profile_id": profile_id,
                "language": language,
                "engine": engine,
            }
        )
        resp.raise_for_status()
        generation_id = resp.json()["generation_id"]

        # 轮询直到完成
        for _ in range(120):
            status_resp = await client.get(
                f"{VOICEBOX_URL}/generations/{generation_id}"
            )
            status_data = status_resp.json()

            if status_data["status"] == "complete":
                audio_resp = await client.get(
                    f"{VOICEBOX_URL}/generations/{generation_id}/audio"
                )
                return audio_resp.content

            if status_data["status"] == "failed":
                raise RuntimeError(f"Generation failed: {status_data.get('error')}")

            await asyncio.sleep(1.0)

        raise TimeoutError("Generation timed out after 120s")

Usage

使用示例

audio_bytes = asyncio.run( generate_speech( text="The quick brown fox jumps over the lazy dog.", profile_id="your-profile-id", language="en", engine="chatterbox", ) )

with open("output.wav", "wb") as f: f.write(audio_bytes)

---

audio_bytes = asyncio.run( generate_speech( text="The quick brown fox jumps over the lazy dog.", profile_id="your-profile-id", language="en", engine="chatterbox", ) )

with open("output.wav", "wb") as f: f.write(audio_bytes)

---

TTS Engine Selection Guide

TTS引擎选择指南

Engine	Best For	Languages	VRAM	Notes
`qwen3-tts` (0.6B/1.7B)	Quality + instructions	10	Medium	Supports delivery instructions in text
`luxtts`	Fast CPU generation	English only	~1GB	150x realtime on CPU, 48kHz
`chatterbox`	Multilingual coverage	23	Medium	Arabic, Hindi, Swahili, CJK + more
`chatterbox-turbo`	Expressive/emotion	English only	Low (350M)	Use `[laugh]` , `[sigh]` , `[gasp]` tags
`tada` (1B/3B)	Long-form coherence	10	High	700s+ audio, HumeAI model

引擎	最佳适用场景	支持语言	显存占用	说明
`qwen3-tts` (0.6B/1.7B)	高质量合成+表达指令	10种	中等	支持在文本中嵌入表达指令
`luxtts`	快速CPU生成	仅英文	~1GB	CPU上可达150倍实时速度，输出48kHz音频
`chatterbox`	多语言覆盖	23种	中等	支持阿拉伯语、印地语、斯瓦希里语、中日韩等语言
`chatterbox-turbo`	富有表现力/带情感	仅英文	低（350M）	支持使用 `[laugh]` 、 `[sigh]` 、 `[gasp]` 等标签
`tada` (1B/3B)	长文本连贯性	10种	高	支持生成700秒以上音频，基于HumeAI模型

Delivery Instructions (Qwen3-TTS)

表达指令（Qwen3-TTS）

Embed natural language instructions directly in the text:

typescript

await generateSpeech({
  text: "(whisper) I have a secret to tell you.",
  profile_id: "abc123",
  engine: "qwen3-tts",
});

await generateSpeech({
  text: "(speak slowly and clearly) Step one: open the application.",
  profile_id: "abc123",
  engine: "qwen3-tts",
});

可直接在文本中嵌入自然语言表达指令：

typescript

await generateSpeech({
  text: "(whisper) I have a secret to tell you.",
  profile_id: "abc123",
  engine: "qwen3-tts",
});

await generateSpeech({
  text: "(speak slowly and clearly) Step one: open the application.",
  profile_id: "abc123",
  engine: "qwen3-tts",
});

Paralinguistic Tags (Chatterbox Turbo)

副语言标签（Chatterbox Turbo）

typescript

const tags = [
  "[laugh]", "[chuckle]", "[gasp]", "[cough]",
  "[sigh]", "[groan]", "[sniff]", "[shush]", "[clear throat]"
];

await generateSpeech({
  text: "Oh really? [gasp] I had no idea! [laugh] That's incredible.",
  profile_id: "abc123",
  engine: "chatterbox-turbo",
});

typescript

const tags = [
  "[laugh]", "[chuckle]", "[gasp]", "[cough]",
  "[sigh]", "[groan]", "[sniff]", "[shush]", "[clear throat]"
];

await generateSpeech({
  text: "Oh really? [gasp] I had no idea! [laugh] That's incredible.",
  profile_id: "abc123",
  engine: "chatterbox-turbo",
});

Environment & Configuration

环境与配置

bash

undefined

bash

undefined

Custom models directory (set before launching)

自定义模型目录（启动前设置）

export VOICEBOX_MODELS_DIR=/path/to/models

For AMD ROCm GPU (auto-configured, but can override)

针对AMD ROCm GPU（自动配置，也可手动覆盖）

export HSA_OVERRIDE_GFX_VERSION=11.0.0


Docker configuration (`docker-compose.yml` override):

```yaml
services:
  voicebox:
    environment:
      - VOICEBOX_MODELS_DIR=/models
    volumes:
      - /host/models:/models
    ports:
      - "17493:17493"
    # For NVIDIA GPU passthrough:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

export HSA_OVERRIDE_GFX_VERSION=11.0.0


Docker配置（覆盖`docker-compose.yml`）：

```yaml
services:
  voicebox:
    environment:
      - VOICEBOX_MODELS_DIR=/models
    volumes:
      - /host/models:/models
    ports:
      - "17493:17493"
    # NVIDIA GPU直通配置：
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Common Patterns

常见使用模式

Voice Profile Creation Flow

语音配置文件创建流程

typescript

// 1. Create profile
const profile = await fetch(`${VOICEBOX_URL}/profiles`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ name: "My Voice", language: "en" }),
}).then((r) => r.json());

// 2. Upload audio sample (WAV/MP3, ideally 5–30 seconds clean speech)
const formData = new FormData();
formData.append("file", audioBlob, "sample.wav");

await fetch(`${VOICEBOX_URL}/profiles/${profile.id}/samples`, {
  method: "POST",
  body: formData,
});

// 3. Generate with the new profile
const gen = await generateSpeech({
  text: "Testing my cloned voice.",
  profile_id: profile.id,
});

typescript

// 1. 创建配置文件
const profile = await fetch(`${VOICEBOX_URL}/profiles`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ name: "My Voice", language: "en" }),
}).then((r) => r.json());

// 2. 上传音频样本（WAV/MP3格式，建议5-30秒清晰语音）
const formData = new FormData();
formData.append("file", audioBlob, "sample.wav");

await fetch(`${VOICEBOX_URL}/profiles/${profile.id}/samples`, {
  method: "POST",
  body: formData,
});

// 3. 使用新配置文件生成语音
const gen = await generateSpeech({
  text: "Testing my cloned voice.",
  profile_id: profile.id,
});

Batch Generation with Queue

批量生成与队列管理

typescript

async function batchGenerate(
  items: Array<{ text: string; profileId: string }>,
  engine = "qwen3-tts"
): Promise<string[]> {
  // Submit all — Voicebox queues them serially to avoid GPU contention
  const submissions = await Promise.all(
    items.map((item) =>
      generateSpeech({ text: item.text, profile_id: item.profileId, engine })
    )
  );

  // Wait for all completions
  const audioUrls = await Promise.all(
    submissions.map((s) => waitForGeneration(s.generation_id))
  );

  return audioUrls;
}

typescript

async function batchGenerate(
  items: Array<{ text: string; profileId: string }>,
  engine = "qwen3-tts"
): Promise<string[]> {
  // 提交所有任务——Voicebox会自动串行排队以避免GPU资源竞争
  const submissions = await Promise.all(
    items.map((item) =>
      generateSpeech({ text: item.text, profile_id: item.profileId, engine })
    )
  );

  // 等待所有任务完成
  const audioUrls = await Promise.all(
    submissions.map((s) => waitForGeneration(s.generation_id))
  );

  return audioUrls;
}

Long-Form Text (Auto-Chunking)

长文本生成（自动分块）

Voicebox auto-chunks at sentence boundaries — just send the full text:

typescript

const longScript = `
  Chapter one. The morning fog rolled across the valley floor...
  // Up to 50,000 characters supported
`;

await generateSpeech({
  text: longScript,
  profile_id: "narrator-profile-id",
  engine: "tada", // Best for long-form coherence
  language: "en",
});

Voicebox会自动按句子边界分块——直接传入完整文本即可：

typescript

const longScript = `
  Chapter one. The morning fog rolled across the valley floor...
  // 支持最多50,000字符
`;

await generateSpeech({
  text: longScript,
  profile_id: "narrator-profile-id",
  engine: "tada", // 最适合长文本连贯性
  language: "en",
});

Troubleshooting

故障排查

API not responding

API无响应

bash

undefined

bash

undefined

Check if backend is running

检查后端是否运行

curl http://localhost:17493/health

Restart backend only (dev mode)

仅重启后端（开发模式）

just backend

Check logs

查看日志

just logs

undefined

just logs

undefined

GPU not detected

GPU未被检测到

bash

undefined

bash

undefined

Check detected backend

检查检测到的后端

curl http://localhost:17493/system/info

Force CPU mode (set before launch)

强制使用CPU模式（启动前设置）

export VOICEBOX_FORCE_CPU=1

undefined

export VOICEBOX_FORCE_CPU=1

undefined

Model download fails / slow

模型下载失败/缓慢

bash

undefined

bash

undefined

Set custom models directory with more space

设置有足够空间的自定义模型目录

export VOICEBOX_MODELS_DIR=/path/with/space just dev

Cancel stuck download via API

通过API取消卡住的下载

curl -X DELETE http://localhost:17493/models/{model_id}/download

undefined

curl -X DELETE http://localhost:17493/models/{model_id}/download

undefined

Out of VRAM — unload models

显存不足——卸载模型

bash

undefined

bash

undefined

List loaded models

列出已加载的模型

curl http://localhost:17493/models | jq '.[] | select(.loaded == true)'

Unload specific model

卸载指定模型

curl -X POST http://localhost:17493/models/{model_id}/unload

undefined

curl -X POST http://localhost:17493/models/{model_id}/unload

undefined

Audio quality issues

音频质量问题

Use 5–30 seconds of clean, noise-free speech for voice samples
Multiple samples improve clone quality — upload 3–5 different sentences
For multilingual cloning, use
```
chatterbox
```
engine
Ensure sample audio is 16kHz+ mono WAV for best results
Use
```
luxtts
```
for highest output quality (48kHz) in English

语音样本使用5-30秒无噪音的清晰语音
多个样本可提升克隆质量——建议上传3-5句不同的语音
多语言克隆请使用
```
chatterbox
```
引擎
样本音频建议为16kHz以上的单声道WAV格式以获得最佳效果
英文场景下使用
```
luxtts
```
可获得最高输出质量（48kHz）

Generation stuck in queue after crash

崩溃后生成任务卡在队列中

Voicebox auto-recovers stale generations on startup. If the issue persists:

bash

curl -X POST http://localhost:17493/generations/{generation_id}/retry

Voicebox启动时会自动恢复停滞的生成任务。如果问题仍然存在：

bash

curl -X POST http://localhost:17493/generations/{generation_id}/retry

Frontend Integration (React Example)

前端集成（React示例）

tsx

import { useState } from "react";

const VOICEBOX_URL = import.meta.env.VITE_VOICEBOX_URL ?? "http://localhost:17493";

export function VoiceGenerator({ profileId }: { profileId: string }) {
  const [text, setText] = useState("");
  const [audioUrl, setAudioUrl] = useState<string | null>(null);
  const [loading, setLoading] = useState(false);

  const handleGenerate = async () => {
    setLoading(true);
    try {
      const res = await fetch(`${VOICEBOX_URL}/generate`, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ text, profile_id: profileId, language: "en" }),
      });
      const { generation_id } = await res.json();

      // Poll for completion
      let done = false;
      while (!done) {
        await new Promise((r) => setTimeout(r, 1000));
        const statusRes = await fetch(`${VOICEBOX_URL}/generations/${generation_id}`);
        const { status } = await statusRes.json();
        if (status === "complete") {
          setAudioUrl(`${VOICEBOX_URL}/generations/${generation_id}/audio`);
          done = true;
        } else if (status === "failed") {
          throw new Error("Generation failed");
        }
      }
    } finally {
      setLoading(false);
    }
  };

  return (
    <div>
      <textarea value={text} onChange={(e) => setText(e.target.value)} />
      <button onClick={handleGenerate} disabled={loading}>
        {loading ? "Generating..." : "Generate Speech"}
      </button>
      {audioUrl && <audio controls src={audioUrl} />}
    </div>
  );
}

tsx

import { useState } from "react";

const VOICEBOX_URL = import.meta.env.VITE_VOICEBOX_URL ?? "http://localhost:17493";

export function VoiceGenerator({ profileId }: { profileId: string }) {
  const [text, setText] = useState("");
  const [audioUrl, setAudioUrl] = useState<string | null>(null);
  const [loading, setLoading] = useState(false);

  const handleGenerate = async () => {
    setLoading(true);
    try {
      const res = await fetch(`${VOICEBOX_URL}/generate`, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ text, profile_id: profileId, language: "en" }),
      });
      const { generation_id } = await res.json();

      // 轮询直到完成
      let done = false;
      while (!done) {
        await new Promise((r) => setTimeout(r, 1000));
        const statusRes = await fetch(`${VOICEBOX_URL}/generations/${generation_id}`);
        const { status } = await statusRes.json();
        if (status === "complete") {
          setAudioUrl(`${VOICEBOX_URL}/generations/${generation_id}/audio`);
          done = true;
        } else if (status === "failed") {
          throw new Error("Generation failed");
        }
      }
    } finally {
      setLoading(false);
    }
  };

  return (
    <div>
      <textarea value={text} onChange={(e) => setText(e.target.value)} />
      <button onClick={handleGenerate} disabled={loading}>
        {loading ? "Generating..." : "Generate Speech"}
      </button>
      {audioUrl && <audio controls src={audioUrl} />}
    </div>
  );
}