elevenlabs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ElevenLabs - Text-to-Speech & Podcast Skill

ElevenLabs - 文本转语音与播客生成技能

Overview

概述

This skill converts text and documents into high-quality audio using ElevenLabs TTS API. It supports two modes: single-voice narration and two-host conversational podcast generation.

本技能利用ElevenLabs TTS API将文本和文档转换为高质量音频。它支持两种模式：单语音朗读和双主持人对话式播客生成。

When to Use This Skill

何时使用此技能

Activate when the user mentions:

"create podcast", "generate podcast", "podcast from document"
"narrate document", "narrate this file", "read aloud"
"text to speech", "TTS", "convert to audio"
"audio from document", "audio version of"

当用户提及以下内容时激活此技能：

"创建播客"、"生成播客"、"从文档生成播客"
"朗读文档"、"朗读此文件"、"文本朗读"
"文本转语音"、"TTS"、"转换为音频"
"从文档生成音频"、"音频版本"

Setup

配置

Config at

skills/elevenlabs/config.json

json

{
  "api_key": "your-elevenlabs-api-key",
  "default_voice": "JBFqnCBsd6RMkjVDRZzb",
  "default_model": "eleven_multilingual_v2",
  "podcast_voice1": "JBFqnCBsd6RMkjVDRZzb",
  "podcast_voice2": "EXAVITQu4vr4xnSDxMaL"
}

Only

api_key

is required. Or set

ELEVENLABS_API_KEY

env var.

Dependencies:

pip install PyPDF2 python-docx

(only needed for PDF/DOCX files).

Requires

ffmpeg

for multi-chunk narration and podcasts.

在

skills/elevenlabs/config.json

中进行配置：

json

{
  "api_key": "your-elevenlabs-api-key",
  "default_voice": "JBFqnCBsd6RMkjVDRZzb",
  "default_model": "eleven_multilingual_v2",
  "podcast_voice1": "JBFqnCBsd6RMkjVDRZzb",
  "podcast_voice2": "EXAVITQu4vr4xnSDxMaL"
}

仅

api_key

为必填项。也可以设置

ELEVENLABS_API_KEY

环境变量。

依赖项：

pip install PyPDF2 python-docx

（仅在处理PDF/DOCX文件时需要）。

多片段朗读和播客生成需要依赖

ffmpeg

。

Commands

命令

List Voices

列出可用语音

bash

python skills/elevenlabs/scripts/elevenlabs.py voices
python skills/elevenlabs/scripts/elevenlabs.py voices --json

Use this to find voice IDs for the user.

bash

python skills/elevenlabs/scripts/elevenlabs.py voices
python skills/elevenlabs/scripts/elevenlabs.py voices --json

使用此命令为用户查找语音ID。

Single-Voice TTS

单语音文本转语音

bash

undefined

bash

undefined

From text

从文本生成

python skills/elevenlabs/scripts/elevenlabs.py tts --text "Hello world" --output ~/Downloads/hello.mp3

From document

从文档生成

python skills/elevenlabs/scripts/elevenlabs.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3

With specific voice

指定语音

python skills/elevenlabs/scripts/elevenlabs.py tts --file doc.md --voice VOICE_ID --output out.mp3


The script handles text extraction, chunking at sentence boundaries (~4000 chars), TTS per chunk with voice continuity, and ffmpeg concatenation automatically.

python skills/elevenlabs/scripts/elevenlabs.py tts --file doc.md --voice VOICE_ID --output out.mp3


该脚本会自动处理文本提取、按句子边界拆分片段（约4000字符）、保持语音连贯性的逐片段TTS转换，以及通过ffmpeg合并音频。

Podcast Generation

播客生成

Podcast mode requires a JSON script file with conversation segments:

json

[
  {"speaker": "host1", "text": "Welcome to our podcast! Today we're diving into..."},
  {"speaker": "host2", "text": "That's right! I found the section on..."},
  {"speaker": "host1", "text": "Let's break that down..."}
]

bash

python skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/script.json --voice1 ID1 --voice2 ID2 --output ~/Downloads/podcast.mp3

播客模式需要包含对话片段的JSON脚本文件：

json

[
  {"speaker": "host1", "text": "Welcome to our podcast! Today we're diving into..."},
  {"speaker": "host2", "text": "That's right! I found the section on..."},
  {"speaker": "host1", "text": "Let's break that down..."}
]

bash

python skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/script.json --voice1 ID1 --voice2 ID2 --output ~/Downloads/podcast.mp3

Podcast Workflow (for Claude)

播客生成流程（适用于Claude）

When the user asks to create a podcast from a document:

Extract the document text:

bash

python skills/elevenlabs/scripts/extract.py /path/to/document.pdf

Generate a two-host conversation script from the extracted text. Follow these guidelines:
- Write as a natural, engaging discussion between two hosts
- Host 1 typically leads/introduces topics, Host 2 adds analysis and reactions
- Start with a brief intro welcoming listeners and stating the topic
- End with a summary/outro
- Keep each turn under 3000 characters
- Vary turn lengths - mix short reactions with longer explanations
- Use conversational language: "That's a great point", "What I found interesting was..."
- Reference specific details from the source document
- Avoid reading the document verbatim - discuss and interpret it

Write the script as a JSON array to a temp file:

python

# Write to /tmp/podcast_script.json
[
  {"speaker": "host1", "text": "Welcome to today's episode..."},
  {"speaker": "host2", "text": "Thanks for having me..."},
  ...
]

Generate the podcast:

bash

python skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3

Clean up the temp script file.

当用户要求从文档创建播客时：

提取文档文本:

bash

python skills/elevenlabs/scripts/extract.py /path/to/document.pdf

从提取的文本生成双主持人对话脚本。请遵循以下准则：
- 以两位主持人之间自然、引人入胜的讨论形式撰写
- 主持人1通常主导/介绍话题，主持人2补充分析和观点
- 以简短的开场介绍听众并说明话题
- 以总结/结尾收尾
- 每个发言回合不超过3000字符
- 变换发言时长 - 混合简短反应和较长篇幅的解释
- 使用口语化表达："这点说得好"、"我觉得有趣的是..."
- 引用源文档中的具体细节
- 避免逐字朗读文档 - 要进行讨论和解读

将脚本写入临时JSON文件:

python

# 写入/tmp/podcast_script.json
[
  {"speaker": "host1", "text": "Welcome to today's episode..."},
  {"speaker": "host2", "text": "Thanks for having me..."},
  ...
]

生成播客:

bash

python skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3

清理临时脚本文件。

Tips

小贴士

Run
```
voices
```
first to let the user pick voices they like
For podcasts, suggest voice pairs with contrasting qualities (e.g., one deep, one bright)
Default output to
```
~/Downloads/
```
unless the user specifies otherwise
For large documents, warn the user about character usage on their ElevenLabs plan

先运行
```
voices
```
命令让用户选择喜欢的语音
对于播客，建议选择音质对比鲜明的语音组合（例如一个低沉、一个明亮）
默认输出到
```
~/Downloads/
```
，除非用户指定其他路径
对于大文档，提醒用户注意其ElevenLabs套餐的字符使用限制