elevenlabs

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ElevenLabs - Text-to-Speech & Podcast Skill

ElevenLabs - 文本转语音与播客生成技能

Overview

概述

This skill converts text and documents into high-quality audio using ElevenLabs TTS API. It supports two modes: single-voice narration and two-host conversational podcast generation.
本技能利用ElevenLabs TTS API将文本和文档转换为高质量音频。它支持两种模式:单语音朗读和双主持人对话式播客生成。

When to Use This Skill

何时使用此技能

Activate when the user mentions:
  • "create podcast", "generate podcast", "podcast from document"
  • "narrate document", "narrate this file", "read aloud"
  • "text to speech", "TTS", "convert to audio"
  • "audio from document", "audio version of"
当用户提及以下内容时激活此技能:
  • "创建播客"、"生成播客"、"从文档生成播客"
  • "朗读文档"、"朗读此文件"、"文本朗读"
  • "文本转语音"、"TTS"、"转换为音频"
  • "从文档生成音频"、"音频版本"

Setup

配置

Config at
skills/elevenlabs/config.json
:
json
{
  "api_key": "your-elevenlabs-api-key",
  "default_voice": "JBFqnCBsd6RMkjVDRZzb",
  "default_model": "eleven_multilingual_v2",
  "podcast_voice1": "JBFqnCBsd6RMkjVDRZzb",
  "podcast_voice2": "EXAVITQu4vr4xnSDxMaL"
}
Only
api_key
is required. Or set
ELEVENLABS_API_KEY
env var.
Dependencies:
pip install PyPDF2 python-docx
(only needed for PDF/DOCX files).
Requires
ffmpeg
for multi-chunk narration and podcasts.
skills/elevenlabs/config.json
中进行配置:
json
{
  "api_key": "your-elevenlabs-api-key",
  "default_voice": "JBFqnCBsd6RMkjVDRZzb",
  "default_model": "eleven_multilingual_v2",
  "podcast_voice1": "JBFqnCBsd6RMkjVDRZzb",
  "podcast_voice2": "EXAVITQu4vr4xnSDxMaL"
}
api_key
为必填项。也可以设置
ELEVENLABS_API_KEY
环境变量。
依赖项:
pip install PyPDF2 python-docx
(仅在处理PDF/DOCX文件时需要)。
多片段朗读和播客生成需要依赖
ffmpeg

Commands

命令

List Voices

列出可用语音

bash
python skills/elevenlabs/scripts/elevenlabs.py voices
python skills/elevenlabs/scripts/elevenlabs.py voices --json
Use this to find voice IDs for the user.
bash
python skills/elevenlabs/scripts/elevenlabs.py voices
python skills/elevenlabs/scripts/elevenlabs.py voices --json
使用此命令为用户查找语音ID。

Single-Voice TTS

单语音文本转语音

bash
undefined
bash
undefined

From text

从文本生成

python skills/elevenlabs/scripts/elevenlabs.py tts --text "Hello world" --output ~/Downloads/hello.mp3
python skills/elevenlabs/scripts/elevenlabs.py tts --text "Hello world" --output ~/Downloads/hello.mp3

From document

从文档生成

python skills/elevenlabs/scripts/elevenlabs.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3
python skills/elevenlabs/scripts/elevenlabs.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3

With specific voice

指定语音

python skills/elevenlabs/scripts/elevenlabs.py tts --file doc.md --voice VOICE_ID --output out.mp3

The script handles text extraction, chunking at sentence boundaries (~4000 chars), TTS per chunk with voice continuity, and ffmpeg concatenation automatically.
python skills/elevenlabs/scripts/elevenlabs.py tts --file doc.md --voice VOICE_ID --output out.mp3

该脚本会自动处理文本提取、按句子边界拆分片段(约4000字符)、保持语音连贯性的逐片段TTS转换,以及通过ffmpeg合并音频。

Podcast Generation

播客生成

Podcast mode requires a JSON script file with conversation segments:
json
[
  {"speaker": "host1", "text": "Welcome to our podcast! Today we're diving into..."},
  {"speaker": "host2", "text": "That's right! I found the section on..."},
  {"speaker": "host1", "text": "Let's break that down..."}
]
bash
python skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/script.json --voice1 ID1 --voice2 ID2 --output ~/Downloads/podcast.mp3
播客模式需要包含对话片段的JSON脚本文件:
json
[
  {"speaker": "host1", "text": "Welcome to our podcast! Today we're diving into..."},
  {"speaker": "host2", "text": "That's right! I found the section on..."},
  {"speaker": "host1", "text": "Let's break that down..."}
]
bash
python skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/script.json --voice1 ID1 --voice2 ID2 --output ~/Downloads/podcast.mp3

Podcast Workflow (for Claude)

播客生成流程(适用于Claude)

When the user asks to create a podcast from a document:
  1. Extract the document text:
    bash
    python skills/elevenlabs/scripts/extract.py /path/to/document.pdf
  2. Generate a two-host conversation script from the extracted text. Follow these guidelines:
    • Write as a natural, engaging discussion between two hosts
    • Host 1 typically leads/introduces topics, Host 2 adds analysis and reactions
    • Start with a brief intro welcoming listeners and stating the topic
    • End with a summary/outro
    • Keep each turn under 3000 characters
    • Vary turn lengths - mix short reactions with longer explanations
    • Use conversational language: "That's a great point", "What I found interesting was..."
    • Reference specific details from the source document
    • Avoid reading the document verbatim - discuss and interpret it
  3. Write the script as a JSON array to a temp file:
    python
    # Write to /tmp/podcast_script.json
    [
      {"speaker": "host1", "text": "Welcome to today's episode..."},
      {"speaker": "host2", "text": "Thanks for having me..."},
      ...
    ]
  4. Generate the podcast:
    bash
    python skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3
  5. Clean up the temp script file.
当用户要求从文档创建播客时:
  1. 提取文档文本:
    bash
    python skills/elevenlabs/scripts/extract.py /path/to/document.pdf
  2. 从提取的文本生成双主持人对话脚本。请遵循以下准则:
    • 以两位主持人之间自然、引人入胜的讨论形式撰写
    • 主持人1通常主导/介绍话题,主持人2补充分析和观点
    • 以简短的开场介绍听众并说明话题
    • 以总结/结尾收尾
    • 每个发言回合不超过3000字符
    • 变换发言时长 - 混合简短反应和较长篇幅的解释
    • 使用口语化表达:"这点说得好"、"我觉得有趣的是..."
    • 引用源文档中的具体细节
    • 避免逐字朗读文档 - 要进行讨论和解读
  3. 将脚本写入临时JSON文件:
    python
    # 写入/tmp/podcast_script.json
    [
      {"speaker": "host1", "text": "Welcome to today's episode..."},
      {"speaker": "host2", "text": "Thanks for having me..."},
      ...
    ]
  4. 生成播客:
    bash
    python skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3
  5. 清理临时脚本文件

Tips

小贴士

  • Run
    voices
    first to let the user pick voices they like
  • For podcasts, suggest voice pairs with contrasting qualities (e.g., one deep, one bright)
  • Default output to
    ~/Downloads/
    unless the user specifies otherwise
  • For large documents, warn the user about character usage on their ElevenLabs plan
  • 先运行
    voices
    命令让用户选择喜欢的语音
  • 对于播客,建议选择音质对比鲜明的语音组合(例如一个低沉、一个明亮)
  • 默认输出到
    ~/Downloads/
    ,除非用户指定其他路径
  • 对于大文档,提醒用户注意其ElevenLabs套餐的字符使用限制