voicebox-voice-synthesis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Voicebox Voice Synthesis Studio

Voicebox语音合成工作室

Skill by ara.so — Daily 2026 Skills collection.
Voicebox is a local-first, open-source voice cloning and TTS studio — a self-hosted alternative to ElevenLabs. It runs entirely on your machine (macOS MLX/Metal, Windows/Linux CUDA, CPU fallback), exposes a REST API on
localhost:17493
, and ships with 5 TTS engines, 23 languages, post-processing effects, and a multi-track Stories editor.

ara.so开发的技能——属于Daily 2026技能合集。
Voicebox是一款本地优先的开源语音克隆与TTS工作室,是ElevenLabs的自托管替代方案。它完全在你的设备上运行(支持macOS MLX/Metal、Windows/Linux CUDA,CPU作为 fallback),在
localhost:17493
提供REST API,内置5种TTS引擎、支持23种语言、带有后期处理效果,以及多轨故事编辑器。

Installation

安装

Pre-built Binaries (Recommended)

预构建二进制文件(推荐)

Linux requires building from source: https://voicebox.sh/linux-install
Linux需要从源码构建:https://voicebox.sh/linux-install

Build from Source

从源码构建

Prerequisites: Bun, Rust, Python 3.11+, Tauri prerequisites
bash
git clone https://github.com/jamiepine/voicebox.git
cd voicebox
前置依赖: BunRustPython 3.11+、Tauri前置依赖
bash
git clone https://github.com/jamiepine/voicebox.git
cd voicebox

Install just task runner

安装just任务运行器

brew install just # macOS cargo install just # any platform
brew install just # macOS cargo install just # 任意平台

Set up Python venv + all dependencies

设置Python虚拟环境并安装所有依赖

just setup
just setup

Start backend + desktop app in dev mode

以开发模式启动后端和桌面应用

just dev

```bash
just dev

```bash

List all available commands

列出所有可用命令

just --list

---
just --list

---

Architecture

架构

LayerTechnology
Desktop AppTauri (Rust)
FrontendReact + TypeScript + Tailwind CSS
StateZustand + React Query
BackendFastAPI (Python) on port 17493
TTS EnginesQwen3-TTS, LuxTTS, Chatterbox, Chatterbox Turbo, TADA
EffectsPedalboard (Spotify)
TranscriptionWhisper / Whisper Turbo
InferenceMLX (Apple Silicon) / PyTorch (CUDA/ROCm/XPU/CPU)
DatabaseSQLite
The Python FastAPI backend handles all ML inference. The Tauri Rust shell wraps the frontend and manages the backend process lifecycle. The API is accessible directly at
http://localhost:17493
even when using the desktop app.

层级技术栈
桌面应用Tauri (Rust)
前端React + TypeScript + Tailwind CSS
状态管理Zustand + React Query
后端FastAPI (Python)(端口17493)
TTS引擎Qwen3-TTS, LuxTTS, Chatterbox, Chatterbox Turbo, TADA
音效处理Pedalboard (Spotify)
语音转文字Whisper / Whisper Turbo
推理引擎MLX (Apple Silicon) / PyTorch (CUDA/ROCm/XPU/CPU)
数据库SQLite
Python FastAPI后端处理所有机器学习推理任务。Tauri Rust外壳包裹前端,并管理后端进程的生命周期。即使使用桌面应用,你也可以直接访问
http://localhost:17493
的API。

REST API Reference

REST API参考

Base URL:
http://localhost:17493

Interactive docs:
http://localhost:17493/docs
基础URL:
http://localhost:17493

交互式文档:
http://localhost:17493/docs

Generate Speech

生成语音

bash
undefined
bash
undefined

Basic generation

基础生成

curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "Hello world, this is a voice clone.", "profile_id": "abc123", "language": "en" }'
curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "Hello world, this is a voice clone.", "profile_id": "abc123", "language": "en" }'

With engine selection

指定引擎生成

curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "Speak slowly and with gravitas.", "profile_id": "abc123", "language": "en", "engine": "qwen3-tts" }'
curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "Speak slowly and with gravitas.", "profile_id": "abc123", "language": "en", "engine": "qwen3-tts" }'

With paralinguistic tags (Chatterbox Turbo only)

使用副语言标签(仅Chatterbox Turbo支持)

curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "That is absolutely hilarious! [laugh] I cannot believe it.", "profile_id": "abc123", "engine": "chatterbox-turbo", "language": "en" }'
undefined
curl -X POST http://localhost:17493/generate
-H "Content-Type: application/json"
-d '{ "text": "That is absolutely hilarious! [laugh] I cannot believe it.", "profile_id": "abc123", "engine": "chatterbox-turbo", "language": "en" }'
undefined

Voice Profiles

语音配置文件

bash
undefined
bash
undefined

List all profiles

列出所有配置文件

Create a new profile

创建新配置文件

curl -X POST http://localhost:17493/profiles
-H "Content-Type: application/json"
-d '{ "name": "Narrator", "language": "en", "description": "Deep narrative voice" }'
curl -X POST http://localhost:17493/profiles
-H "Content-Type: application/json"
-d '{ "name": "Narrator", "language": "en", "description": "Deep narrative voice" }'

Upload audio sample to a profile

上传音频样本到配置文件

curl -X POST http://localhost:17493/profiles/{profile_id}/samples
-F "file=@/path/to/voice-sample.wav"
curl -X POST http://localhost:17493/profiles/{profile_id}/samples
-F "file=@/path/to/voice-sample.wav"

Export a profile

导出配置文件

curl http://localhost:17493/profiles/{profile_id}/export
--output narrator-profile.zip
curl http://localhost:17493/profiles/{profile_id}/export
--output narrator-profile.zip

Import a profile

导入配置文件

curl -X POST http://localhost:17493/profiles/import
-F "file=@narrator-profile.zip"
undefined
curl -X POST http://localhost:17493/profiles/import
-F "file=@narrator-profile.zip"
undefined

Generation Queue & Status

生成队列与状态

bash
undefined
bash
undefined

Get generation status (SSE stream)

获取生成状态(SSE流)

List recent generations

列出近期生成任务

Retry a failed generation

重试失败的生成任务

Download generated audio

下载生成的音频

Models

模型管理

bash
undefined
bash
undefined

List available models and download status

列出可用模型及下载状态

Unload a model from GPU memory (without deleting)

从GPU内存卸载模型(不删除)

TypeScript/JavaScript Integration

TypeScript/JavaScript集成

Basic TTS Client

基础TTS客户端

typescript
const VOICEBOX_URL = process.env.VOICEBOX_API_URL ?? "http://localhost:17493";

interface GenerateRequest {
  text: string;
  profile_id: string;
  language?: string;
  engine?: "qwen3-tts" | "luxtts" | "chatterbox" | "chatterbox-turbo" | "tada";
}

interface GenerateResponse {
  generation_id: string;
  status: "queued" | "processing" | "complete" | "failed";
  audio_url?: string;
}

async function generateSpeech(req: GenerateRequest): Promise<GenerateResponse> {
  const response = await fetch(`${VOICEBOX_URL}/generate`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(req),
  });

  if (!response.ok) {
    throw new Error(`Voicebox API error: ${response.status} ${await response.text()}`);
  }

  return response.json();
}

// Usage
const result = await generateSpeech({
  text: "Welcome to our application.",
  profile_id: "abc123",
  language: "en",
  engine: "qwen3-tts",
});

console.log("Generation ID:", result.generation_id);
typescript
const VOICEBOX_URL = process.env.VOICEBOX_API_URL ?? "http://localhost:17493";

interface GenerateRequest {
  text: string;
  profile_id: string;
  language?: string;
  engine?: "qwen3-tts" | "luxtts" | "chatterbox" | "chatterbox-turbo" | "tada";
}

interface GenerateResponse {
  generation_id: string;
  status: "queued" | "processing" | "complete" | "failed";
  audio_url?: string;
}

async function generateSpeech(req: GenerateRequest): Promise<GenerateResponse> {
  const response = await fetch(`${VOICEBOX_URL}/generate`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(req),
  });

  if (!response.ok) {
    throw new Error(`Voicebox API error: ${response.status} ${await response.text()}`);
  }

  return response.json();
}

// 使用示例
const result = await generateSpeech({
  text: "Welcome to our application.",
  profile_id: "abc123",
  language: "en",
  engine: "qwen3-tts",
});

console.log("Generation ID:", result.generation_id);

Poll for Completion

轮询生成完成状态

typescript
async function waitForGeneration(
  generationId: string,
  timeoutMs = 60_000
): Promise<string> {
  const start = Date.now();

  while (Date.now() - start < timeoutMs) {
    const res = await fetch(`${VOICEBOX_URL}/generations/${generationId}`);
    const data = await res.json();

    if (data.status === "complete") {
      return `${VOICEBOX_URL}/generations/${generationId}/audio`;
    }
    if (data.status === "failed") {
      throw new Error(`Generation failed: ${data.error}`);
    }

    await new Promise((r) => setTimeout(r, 1000));
  }

  throw new Error("Generation timed out");
}
typescript
async function waitForGeneration(
  generationId: string,
  timeoutMs = 60_000
): Promise<string> {
  const start = Date.now();

  while (Date.now() - start < timeoutMs) {
    const res = await fetch(`${VOICEBOX_URL}/generations/${generationId}`);
    const data = await res.json();

    if (data.status === "complete") {
      return `${VOICEBOX_URL}/generations/${generationId}/audio`;
    }
    if (data.status === "failed") {
      throw new Error(`Generation failed: ${data.error}`);
    }

    await new Promise((r) => setTimeout(r, 1000));
  }

  throw new Error("Generation timed out");
}

Stream Status with SSE

使用SSE流式获取状态

typescript
function streamGenerationStatus(
  generationId: string,
  onStatus: (status: string) => void
): () => void {
  const eventSource = new EventSource(
    `${VOICEBOX_URL}/generate/${generationId}/status`
  );

  eventSource.onmessage = (event) => {
    const data = JSON.parse(event.data);
    onStatus(data.status);

    if (data.status === "complete" || data.status === "failed") {
      eventSource.close();
    }
  };

  eventSource.onerror = () => eventSource.close();

  // Return cleanup function
  return () => eventSource.close();
}

// Usage
const cleanup = streamGenerationStatus("gen_abc123", (status) => {
  console.log("Status update:", status);
});
typescript
function streamGenerationStatus(
  generationId: string,
  onStatus: (status: string) => void
): () => void {
  const eventSource = new EventSource(
    `${VOICEBOX_URL}/generate/${generationId}/status`
  );

  eventSource.onmessage = (event) => {
    const data = JSON.parse(event.data);
    onStatus(data.status);

    if (data.status === "complete" || data.status === "failed") {
      eventSource.close();
    }
  };

  eventSource.onerror = () => eventSource.close();

  // 返回清理函数
  return () => eventSource.close();
}

// 使用示例
const cleanup = streamGenerationStatus("gen_abc123", (status) => {
  console.log("Status update:", status);
});

Download Audio as Blob

下载音频为Blob

typescript
async function downloadAudio(generationId: string): Promise<Blob> {
  const response = await fetch(
    `${VOICEBOX_URL}/generations/${generationId}/audio`
  );

  if (!response.ok) {
    throw new Error(`Failed to download audio: ${response.status}`);
  }

  return response.blob();
}

// Play in browser
async function playGeneratedAudio(generationId: string): Promise<void> {
  const blob = await downloadAudio(generationId);
  const url = URL.createObjectURL(blob);
  const audio = new Audio(url);
  audio.play();
  audio.onended = () => URL.revokeObjectURL(url);
}

typescript
async function downloadAudio(generationId: string): Promise<Blob> {
  const response = await fetch(
    `${VOICEBOX_URL}/generations/${generationId}/audio`
  );

  if (!response.ok) {
    throw new Error(`Failed to download audio: ${response.status}`);
  }

  return response.blob();
}

// 在浏览器中播放
async function playGeneratedAudio(generationId: string): Promise<void> {
  const blob = await downloadAudio(generationId);
  const url = URL.createObjectURL(blob);
  const audio = new Audio(url);
  audio.play();
  audio.onended = () => URL.revokeObjectURL(url);
}

Python Integration

Python集成

python
import httpx
import asyncio

VOICEBOX_URL = "http://localhost:17493"

async def generate_speech(
    text: str,
    profile_id: str,
    language: str = "en",
    engine: str = "qwen3-tts"
) -> bytes:
    async with httpx.AsyncClient(timeout=120.0) as client:
        # Submit generation
        resp = await client.post(
            f"{VOICEBOX_URL}/generate",
            json={
                "text": text,
                "profile_id": profile_id,
                "language": language,
                "engine": engine,
            }
        )
        resp.raise_for_status()
        generation_id = resp.json()["generation_id"]

        # Poll until complete
        for _ in range(120):
            status_resp = await client.get(
                f"{VOICEBOX_URL}/generations/{generation_id}"
            )
            status_data = status_resp.json()

            if status_data["status"] == "complete":
                audio_resp = await client.get(
                    f"{VOICEBOX_URL}/generations/{generation_id}/audio"
                )
                return audio_resp.content

            if status_data["status"] == "failed":
                raise RuntimeError(f"Generation failed: {status_data.get('error')}")

            await asyncio.sleep(1.0)

        raise TimeoutError("Generation timed out after 120s")
python
import httpx
import asyncio

VOICEBOX_URL = "http://localhost:17493"

async def generate_speech(
    text: str,
    profile_id: str,
    language: str = "en",
    engine: str = "qwen3-tts"
) -> bytes:
    async with httpx.AsyncClient(timeout=120.0) as client:
        # 提交生成任务
        resp = await client.post(
            f"{VOICEBOX_URL}/generate",
            json={
                "text": text,
                "profile_id": profile_id,
                "language": language,
                "engine": engine,
            }
        )
        resp.raise_for_status()
        generation_id = resp.json()["generation_id"]

        # 轮询直到完成
        for _ in range(120):
            status_resp = await client.get(
                f"{VOICEBOX_URL}/generations/{generation_id}"
            )
            status_data = status_resp.json()

            if status_data["status"] == "complete":
                audio_resp = await client.get(
                    f"{VOICEBOX_URL}/generations/{generation_id}/audio"
                )
                return audio_resp.content

            if status_data["status"] == "failed":
                raise RuntimeError(f"Generation failed: {status_data.get('error')}")

            await asyncio.sleep(1.0)

        raise TimeoutError("Generation timed out after 120s")

Usage

使用示例

audio_bytes = asyncio.run( generate_speech( text="The quick brown fox jumps over the lazy dog.", profile_id="your-profile-id", language="en", engine="chatterbox", ) )
with open("output.wav", "wb") as f: f.write(audio_bytes)

---
audio_bytes = asyncio.run( generate_speech( text="The quick brown fox jumps over the lazy dog.", profile_id="your-profile-id", language="en", engine="chatterbox", ) )
with open("output.wav", "wb") as f: f.write(audio_bytes)

---

TTS Engine Selection Guide

TTS引擎选择指南

EngineBest ForLanguagesVRAMNotes
qwen3-tts
(0.6B/1.7B)
Quality + instructions10MediumSupports delivery instructions in text
luxtts
Fast CPU generationEnglish only~1GB150x realtime on CPU, 48kHz
chatterbox
Multilingual coverage23MediumArabic, Hindi, Swahili, CJK + more
chatterbox-turbo
Expressive/emotionEnglish onlyLow (350M)Use
[laugh]
,
[sigh]
,
[gasp]
tags
tada
(1B/3B)
Long-form coherence10High700s+ audio, HumeAI model
引擎最佳适用场景支持语言显存占用说明
qwen3-tts
(0.6B/1.7B)
高质量合成+表达指令10种中等支持在文本中嵌入表达指令
luxtts
快速CPU生成仅英文~1GBCPU上可达150倍实时速度,输出48kHz音频
chatterbox
多语言覆盖23种中等支持阿拉伯语、印地语、斯瓦希里语、中日韩等语言
chatterbox-turbo
富有表现力/带情感仅英文低(350M)支持使用
[laugh]
[sigh]
[gasp]
等标签
tada
(1B/3B)
长文本连贯性10种支持生成700秒以上音频,基于HumeAI模型

Delivery Instructions (Qwen3-TTS)

表达指令(Qwen3-TTS)

Embed natural language instructions directly in the text:
typescript
await generateSpeech({
  text: "(whisper) I have a secret to tell you.",
  profile_id: "abc123",
  engine: "qwen3-tts",
});

await generateSpeech({
  text: "(speak slowly and clearly) Step one: open the application.",
  profile_id: "abc123",
  engine: "qwen3-tts",
});
可直接在文本中嵌入自然语言表达指令:
typescript
await generateSpeech({
  text: "(whisper) I have a secret to tell you.",
  profile_id: "abc123",
  engine: "qwen3-tts",
});

await generateSpeech({
  text: "(speak slowly and clearly) Step one: open the application.",
  profile_id: "abc123",
  engine: "qwen3-tts",
});

Paralinguistic Tags (Chatterbox Turbo)

副语言标签(Chatterbox Turbo)

typescript
const tags = [
  "[laugh]", "[chuckle]", "[gasp]", "[cough]",
  "[sigh]", "[groan]", "[sniff]", "[shush]", "[clear throat]"
];

await generateSpeech({
  text: "Oh really? [gasp] I had no idea! [laugh] That's incredible.",
  profile_id: "abc123",
  engine: "chatterbox-turbo",
});

typescript
const tags = [
  "[laugh]", "[chuckle]", "[gasp]", "[cough]",
  "[sigh]", "[groan]", "[sniff]", "[shush]", "[clear throat]"
];

await generateSpeech({
  text: "Oh really? [gasp] I had no idea! [laugh] That's incredible.",
  profile_id: "abc123",
  engine: "chatterbox-turbo",
});

Environment & Configuration

环境与配置

bash
undefined
bash
undefined

Custom models directory (set before launching)

自定义模型目录(启动前设置)

export VOICEBOX_MODELS_DIR=/path/to/models
export VOICEBOX_MODELS_DIR=/path/to/models

For AMD ROCm GPU (auto-configured, but can override)

针对AMD ROCm GPU(自动配置,也可手动覆盖)

export HSA_OVERRIDE_GFX_VERSION=11.0.0

Docker configuration (`docker-compose.yml` override):

```yaml
services:
  voicebox:
    environment:
      - VOICEBOX_MODELS_DIR=/models
    volumes:
      - /host/models:/models
    ports:
      - "17493:17493"
    # For NVIDIA GPU passthrough:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

export HSA_OVERRIDE_GFX_VERSION=11.0.0

Docker配置(覆盖`docker-compose.yml`):

```yaml
services:
  voicebox:
    environment:
      - VOICEBOX_MODELS_DIR=/models
    volumes:
      - /host/models:/models
    ports:
      - "17493:17493"
    # NVIDIA GPU直通配置:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Common Patterns

常见使用模式

Voice Profile Creation Flow

语音配置文件创建流程

typescript
// 1. Create profile
const profile = await fetch(`${VOICEBOX_URL}/profiles`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ name: "My Voice", language: "en" }),
}).then((r) => r.json());

// 2. Upload audio sample (WAV/MP3, ideally 5–30 seconds clean speech)
const formData = new FormData();
formData.append("file", audioBlob, "sample.wav");

await fetch(`${VOICEBOX_URL}/profiles/${profile.id}/samples`, {
  method: "POST",
  body: formData,
});

// 3. Generate with the new profile
const gen = await generateSpeech({
  text: "Testing my cloned voice.",
  profile_id: profile.id,
});
typescript
// 1. 创建配置文件
const profile = await fetch(`${VOICEBOX_URL}/profiles`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ name: "My Voice", language: "en" }),
}).then((r) => r.json());

// 2. 上传音频样本(WAV/MP3格式,建议5-30秒清晰语音)
const formData = new FormData();
formData.append("file", audioBlob, "sample.wav");

await fetch(`${VOICEBOX_URL}/profiles/${profile.id}/samples`, {
  method: "POST",
  body: formData,
});

// 3. 使用新配置文件生成语音
const gen = await generateSpeech({
  text: "Testing my cloned voice.",
  profile_id: profile.id,
});

Batch Generation with Queue

批量生成与队列管理

typescript
async function batchGenerate(
  items: Array<{ text: string; profileId: string }>,
  engine = "qwen3-tts"
): Promise<string[]> {
  // Submit all — Voicebox queues them serially to avoid GPU contention
  const submissions = await Promise.all(
    items.map((item) =>
      generateSpeech({ text: item.text, profile_id: item.profileId, engine })
    )
  );

  // Wait for all completions
  const audioUrls = await Promise.all(
    submissions.map((s) => waitForGeneration(s.generation_id))
  );

  return audioUrls;
}
typescript
async function batchGenerate(
  items: Array<{ text: string; profileId: string }>,
  engine = "qwen3-tts"
): Promise<string[]> {
  // 提交所有任务——Voicebox会自动串行排队以避免GPU资源竞争
  const submissions = await Promise.all(
    items.map((item) =>
      generateSpeech({ text: item.text, profile_id: item.profileId, engine })
    )
  );

  // 等待所有任务完成
  const audioUrls = await Promise.all(
    submissions.map((s) => waitForGeneration(s.generation_id))
  );

  return audioUrls;
}

Long-Form Text (Auto-Chunking)

长文本生成(自动分块)

Voicebox auto-chunks at sentence boundaries — just send the full text:
typescript
const longScript = `
  Chapter one. The morning fog rolled across the valley floor...
  // Up to 50,000 characters supported
`;

await generateSpeech({
  text: longScript,
  profile_id: "narrator-profile-id",
  engine: "tada", // Best for long-form coherence
  language: "en",
});

Voicebox会自动按句子边界分块——直接传入完整文本即可:
typescript
const longScript = `
  Chapter one. The morning fog rolled across the valley floor...
  // 支持最多50,000字符
`;

await generateSpeech({
  text: longScript,
  profile_id: "narrator-profile-id",
  engine: "tada", // 最适合长文本连贯性
  language: "en",
});

Troubleshooting

故障排查

API not responding

API无响应

bash
undefined
bash
undefined

Check if backend is running

检查后端是否运行

Restart backend only (dev mode)

仅重启后端(开发模式)

just backend
just backend

Check logs

查看日志

just logs
undefined
just logs
undefined

GPU not detected

GPU未被检测到

bash
undefined
bash
undefined

Check detected backend

检查检测到的后端

Force CPU mode (set before launch)

强制使用CPU模式(启动前设置)

export VOICEBOX_FORCE_CPU=1
undefined
export VOICEBOX_FORCE_CPU=1
undefined

Model download fails / slow

模型下载失败/缓慢

bash
undefined
bash
undefined

Set custom models directory with more space

设置有足够空间的自定义模型目录

export VOICEBOX_MODELS_DIR=/path/with/space just dev
export VOICEBOX_MODELS_DIR=/path/with/space just dev

Cancel stuck download via API

通过API取消卡住的下载

Out of VRAM — unload models

显存不足——卸载模型

bash
undefined
bash
undefined

List loaded models

列出已加载的模型

curl http://localhost:17493/models | jq '.[] | select(.loaded == true)'
curl http://localhost:17493/models | jq '.[] | select(.loaded == true)'

Unload specific model

卸载指定模型

Audio quality issues

音频质量问题

  • Use 5–30 seconds of clean, noise-free speech for voice samples
  • Multiple samples improve clone quality — upload 3–5 different sentences
  • For multilingual cloning, use
    chatterbox
    engine
  • Ensure sample audio is 16kHz+ mono WAV for best results
  • Use
    luxtts
    for highest output quality (48kHz) in English
  • 语音样本使用5-30秒无噪音的清晰语音
  • 多个样本可提升克隆质量——建议上传3-5句不同的语音
  • 多语言克隆请使用
    chatterbox
    引擎
  • 样本音频建议为16kHz以上的单声道WAV格式以获得最佳效果
  • 英文场景下使用
    luxtts
    可获得最高输出质量(48kHz)

Generation stuck in queue after crash

崩溃后生成任务卡在队列中

Voicebox auto-recovers stale generations on startup. If the issue persists:
bash
curl -X POST http://localhost:17493/generations/{generation_id}/retry

Voicebox启动时会自动恢复停滞的生成任务。如果问题仍然存在:
bash
curl -X POST http://localhost:17493/generations/{generation_id}/retry

Frontend Integration (React Example)

前端集成(React示例)

tsx
import { useState } from "react";

const VOICEBOX_URL = import.meta.env.VITE_VOICEBOX_URL ?? "http://localhost:17493";

export function VoiceGenerator({ profileId }: { profileId: string }) {
  const [text, setText] = useState("");
  const [audioUrl, setAudioUrl] = useState<string | null>(null);
  const [loading, setLoading] = useState(false);

  const handleGenerate = async () => {
    setLoading(true);
    try {
      const res = await fetch(`${VOICEBOX_URL}/generate`, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ text, profile_id: profileId, language: "en" }),
      });
      const { generation_id } = await res.json();

      // Poll for completion
      let done = false;
      while (!done) {
        await new Promise((r) => setTimeout(r, 1000));
        const statusRes = await fetch(`${VOICEBOX_URL}/generations/${generation_id}`);
        const { status } = await statusRes.json();
        if (status === "complete") {
          setAudioUrl(`${VOICEBOX_URL}/generations/${generation_id}/audio`);
          done = true;
        } else if (status === "failed") {
          throw new Error("Generation failed");
        }
      }
    } finally {
      setLoading(false);
    }
  };

  return (
    <div>
      <textarea value={text} onChange={(e) => setText(e.target.value)} />
      <button onClick={handleGenerate} disabled={loading}>
        {loading ? "Generating..." : "Generate Speech"}
      </button>
      {audioUrl && <audio controls src={audioUrl} />}
    </div>
  );
}
tsx
import { useState } from "react";

const VOICEBOX_URL = import.meta.env.VITE_VOICEBOX_URL ?? "http://localhost:17493";

export function VoiceGenerator({ profileId }: { profileId: string }) {
  const [text, setText] = useState("");
  const [audioUrl, setAudioUrl] = useState<string | null>(null);
  const [loading, setLoading] = useState(false);

  const handleGenerate = async () => {
    setLoading(true);
    try {
      const res = await fetch(`${VOICEBOX_URL}/generate`, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ text, profile_id: profileId, language: "en" }),
      });
      const { generation_id } = await res.json();

      // 轮询直到完成
      let done = false;
      while (!done) {
        await new Promise((r) => setTimeout(r, 1000));
        const statusRes = await fetch(`${VOICEBOX_URL}/generations/${generation_id}`);
        const { status } = await statusRes.json();
        if (status === "complete") {
          setAudioUrl(`${VOICEBOX_URL}/generations/${generation_id}/audio`);
          done = true;
        } else if (status === "failed") {
          throw new Error("Generation failed");
        }
      }
    } finally {
      setLoading(false);
    }
  };

  return (
    <div>
      <textarea value={text} onChange={(e) => setText(e.target.value)} />
      <button onClick={handleGenerate} disabled={loading}>
        {loading ? "Generating..." : "Generate Speech"}
      </button>
      {audioUrl && <audio controls src={audioUrl} />}
    </div>
  );
}