podcast-producer-agent

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Podcast Producer

播客制作工具

Create complete podcast episodes, interviews, and conversation-style audio content.
This is an orchestrator skill that combines:
  • Script/dialogue generation (Claude)
  • Multi-speaker voice synthesis (Gemini TTS)
  • Intro/outro music (Lyria)
  • Audio assembly (FFmpeg via media-utils)
创建完整的播客剧集、访谈以及对话式音频内容。
这是一个统筹型技能,整合了以下功能:
  • 脚本/对话生成(Claude)
  • 多说话人语音合成(Gemini TTS)
  • 片头/片尾音乐制作(Lyria)
  • 音频组装(通过media-utils调用FFmpeg)

What You Can Create

可创建的内容类型

TypeExample
Podcast episodeTwo hosts discussing a topic
InterviewQ&A format with host and guest
DialogueScripted conversation between characters
Audio dramaStory with multiple characters
Radio showFormatted audio program with segments
类型示例
播客剧集两位主持人讨论某个话题
访谈主持人与嘉宾的问答形式
对话角色之间的脚本化对话
有声剧包含多个角色的故事内容
广播节目带有板块的格式化音频节目

Prerequisites

前置条件

  • GOOGLE_API_KEY
    - For Gemini TTS (voices) and Lyria (music)
  • FFmpeg installed:
    brew install ffmpeg
    (macOS) or
    apt install ffmpeg
    (Linux)
  • GOOGLE_API_KEY
    - 用于Gemini TTS(语音合成)和Lyria(音乐生成)
  • 已安装FFmpeg:macOS用户执行
    brew install ffmpeg
    ,Linux用户执行
    apt install ffmpeg

Workflow

工作流程

Step 1: Gather Requirements (REQUIRED)

步骤1:收集需求(必填)

⚠️ DO NOT skip this step. Use interactive questioning — ask ONE question at a time.
⚠️ 请勿跳过此步骤。通过交互式提问收集信息——每次只提一个问题。

Question Flow

提问流程

⚠️ Use the
AskUserQuestion
tool for each question below.
Do not just print questions in your response — use the tool to create interactive prompts with the options shown.
Q1: Topic
"I'll create that podcast episode! First — what's the topic?
(What should the hosts discuss?)"
Wait for response.
Q2: Hosts
"Who are the hosts/speakers?
(Names and brief personality — e.g., 'Sarah, enthusiastic tech expert' — max 2 for TTS)"
Wait for response.
Q3: Duration
"How long should the episode be?
  • 5 minutes
  • 10 minutes
  • 15 minutes
  • Or specify your own"
Wait for response.
Q4: Tone
"What tone?
  • Professional
  • Casual/conversational
  • Funny/entertaining
  • Serious/educational
  • Or describe your own"
Wait for response.
Q5: Music
"What music style for intro/outro?
  • Upbeat pop
  • Chill lo-fi
  • Corporate/professional
  • Electronic
  • Or describe your own"
Wait for response.
⚠️ 请使用
AskUserQuestion
工具提出以下每个问题
,不要直接在回复中打印问题——使用工具创建带有选项的交互式提示。
问题1:主题
"我将为您创建播客剧集!首先——主题是什么?
(主持人要讨论什么内容?)"
等待用户回复。
问题2:主持人
"主持人/说话人是谁?
(请提供姓名和简短性格描述,例如:'Sarah,热情的技术专家'——TTS最多支持2位说话人)"
等待用户回复。
问题3:时长
"剧集时长要多久?
  • 5分钟
  • 10分钟
  • 15分钟
  • 或自定义时长"
等待用户回复。
问题4:语气
"要什么语气
  • 专业正式
  • 轻松随意/对话式
  • 有趣搞笑/娱乐向
  • 严肃认真/教育向
  • 或自定义描述"
等待用户回复。
问题5:音乐
"片头/片尾的音乐风格是什么?
  • 欢快流行
  • 舒缓lo-fi
  • 商务/专业
  • 电子风格
  • 或自定义描述"
等待用户回复。

Quick Reference

快速参考

QuestionDetermines
TopicScript content and discussion points
HostsVoice selection and script style
DurationScript length
ToneWriting style and energy
MusicLyria prompt for intro/outro

问题决定内容
主题脚本内容和讨论要点
主持人语音选择和脚本风格
时长脚本长度
语气写作风格和内容活力
音乐Lyria生成片头/片尾音乐的提示词

Step 2: Generate the Script

步骤2:生成脚本

Use Claude to create the dialogue script with speaker labels:
[INTRO MUSIC: 10 seconds, upbeat tech podcast vibe]

Sarah: Welcome back to Tech Talk! I'm Sarah, and today we have something exciting.

Mike: Hey everyone! I'm Mike, and yes - we're diving into AI in healthcare.

Sarah: So Mike, what's the biggest change you've seen this year?

Mike: Great question! The use of AI for diagnostic imaging has exploded...

[Continue dialogue...]

[OUTRO MUSIC: Fade under last lines, 5 seconds after]

Sarah: Thanks for listening! Follow us for more episodes.

Mike: See you next time!
Script Guidelines:
  • Use speaker names exactly as they'll be configured in TTS
  • Include music cues in brackets:
    [INTRO MUSIC: description]
  • Natural conversation flow with back-and-forth
  • Aim for ~150 words per minute of audio

使用Claude生成带有说话人标签的对话脚本:
[片头音乐:10秒,欢快的科技播客风格]

Sarah:欢迎回到《科技访谈》!我是Sarah,今天有个激动人心的话题要聊。

Mike:大家好!我是Mike,没错——我们今天要深入探讨AI在医疗领域的应用。

Sarah:Mike,你今年看到的最大变化是什么?

Mike:问得好!AI在诊断成像中的应用呈爆发式增长……

[继续对话……]

[片尾音乐:在最后几句台词下渐弱,结束后持续5秒]

Sarah:感谢收听!欢迎关注我们获取更多剧集。

Mike:下次再见!
脚本编写指南:
  • 说话人姓名需与TTS配置中的完全一致
  • 用方括号标注音乐提示:
    [片头音乐:描述]
  • 对话流程自然,有来有回
  • 目标语速约为每分钟150词

Step 3: Plan Asset Generation

步骤3:规划资源生成

Create a manifest of what needs to be generated:
json
{
  "project": "tech_talk_ai_healthcare",
  "duration_target": "5 minutes",
  "speakers": [
    {"name": "Sarah", "voice": "Kore", "style": "Enthusiastic, upbeat"},
    {"name": "Mike", "voice": "Puck", "style": "Friendly, knowledgeable"}
  ],
  "assets": [
    {
      "type": "music",
      "name": "intro_music",
      "prompt": "upbeat tech podcast, electronic, modern",
      "duration": 15,
      "script": "lyria"
    },
    {
      "type": "dialogue", 
      "name": "main_content",
      "speakers": ["Sarah", "Mike"],
      "text": "[the dialogue script]",
      "script": "gemini_tts"
    },
    {
      "type": "music",
      "name": "outro_music",
      "prompt": "same as intro, fade out",
      "duration": 10,
      "script": "lyria"
    }
  ],
  "assembly": [
    {"action": "mix", "voice": "intro_with_music", "music": "intro_music", "music_volume": 0.8},
    {"action": "concat", "files": ["intro_with_music", "main_content"]},
    {"action": "mix", "voice": "main_content_end", "music": "outro_music", "fade_out": 5}
  ]
}

创建需要生成的资源清单:
json
{
  "project": "tech_talk_ai_healthcare",
  "duration_target": "5 minutes",
  "speakers": [
    {"name": "Sarah", "voice": "Kore", "style": "Enthusiastic, upbeat"},
    {"name": "Mike", "voice": "Puck", "style": "Friendly, knowledgeable"}
  ],
  "assets": [
    {
      "type": "music",
      "name": "intro_music",
      "prompt": "upbeat tech podcast, electronic, modern",
      "duration": 15,
      "script": "lyria"
    },
    {
      "type": "dialogue", 
      "name": "main_content",
      "speakers": ["Sarah", "Mike"],
      "text": "[the dialogue script]",
      "script": "gemini_tts"
    },
    {
      "type": "music",
      "name": "outro_music",
      "prompt": "same as intro, fade out",
      "duration": 10,
      "script": "lyria"
    }
  ],
  "assembly": [
    {"action": "mix", "voice": "intro_with_music", "music": "intro_music", "music_volume": 0.8},
    {"action": "concat", "files": ["intro_with_music", "main_content"]},
    {"action": "mix", "voice": "main_content_end", "music": "outro_music", "fade_out": 5}
  ]
}

Step 4: Generate Assets

步骤4:生成资源

Execute each generation step:
Generate intro music (Lyria):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "upbeat tech podcast, electronic, modern, positive energy" \
  --duration 15 \
  --bpm 120
Generate dialogue (Gemini TTS multi-speaker):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --multi \
  --speaker "Sarah:Kore" \
  --speaker "Mike:Puck" \
  --text "[dialogue script from Step 2]" \
  --style "Make Sarah sound enthusiastic and upbeat. Mike sounds friendly and knowledgeable."
Generate outro music (same as intro or variation):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "upbeat tech podcast, electronic, fade out feel" \
  --duration 10 \
  --bpm 120

执行每个生成步骤:
生成片头音乐(Lyria):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "upbeat tech podcast, electronic, modern, positive energy" \
  --duration 15 \
  --bpm 120
生成对话(Gemini TTS多说话人):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --multi \
  --speaker "Sarah:Kore" \
  --speaker "Mike:Puck" \
  --text "[dialogue script from Step 2]" \
  --style "Make Sarah sound enthusiastic and upbeat. Mike sounds friendly and knowledgeable."
生成片尾音乐(与片头相同或变体):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "upbeat tech podcast, electronic, fade out feel" \
  --duration 10 \
  --bpm 120

Step 5: Assemble the Podcast

步骤5:组装播客

Use media-utils to stitch everything together:
Mix intro music with beginning of dialogue:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice dialogue.wav \
  --music intro_music.wav \
  --music-volume 0.4 \
  --fade-out 3 \
  -o intro_mixed.wav
Concatenate all segments:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_concat.py \
  -i intro_mixed.wav main_dialogue.wav outro_mixed.wav \
  --crossfade 1.0 \
  -o final_podcast.mp3

使用media-utils将所有内容拼接在一起:
将片头音乐与对话开头混合:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice dialogue.wav \
  --music intro_music.wav \
  --music-volume 0.4 \
  --fade-out 3 \
  -o intro_mixed.wav
拼接所有片段:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_concat.py \
  -i intro_mixed.wav main_dialogue.wav outro_mixed.wav \
  --crossfade 1.0 \
  -o final_podcast.mp3

Step 6: Deliver the Result

步骤6:交付结果

Provide:
  1. The final audio file
  2. Summary of what was created
  3. Offer adjustments
Example delivery:
"✅ Your podcast episode is ready!
File:
tech_talk_ai_healthcare.mp3
(5:23)
What I created:
  • Dialogue between Sarah (Kore voice) and Mike (Puck voice)
  • 15s upbeat electronic intro music
  • 10s outro music fade
Want me to:
  • Adjust the voices or tone?
  • Change the music style?
  • Extend or shorten any section?
  • Add more topics to the discussion?"

提供以下内容:
  1. 最终音频文件
  2. 内容创建摘要
  3. 提供调整选项
交付示例:
"✅ 您的播客剧集已制作完成!
文件:
tech_talk_ai_healthcare.mp3
(时长5分23秒)
制作内容说明:
  • Sarah(Kore语音)与Mike(Puck语音)之间的对话
  • 15秒欢快电子风格片头音乐
  • 10秒渐弱片尾音乐
是否需要调整:
  • 更换语音或语气?
  • 更改音乐风格?
  • 延长或缩短某部分内容?
  • 在讨论中添加更多话题?"

Voice Pairing Suggestions

语音搭配建议

Host TypeSuggested VoiceDescription
Main host (energetic)Kore, Puck, LaomedeiaFirm, upbeat
Co-host (calm)Charon, AlgiebaInformative, smooth
Guest (expert)Rasalgethi, GacruxKnowledgeable, mature
InterviewerAchirdFriendly
NarratorCharon, OrusInformative, clear
主持人类型推荐语音描述
主主持人(活力型)Kore、Puck、Laomedeia坚定、欢快
副主持人(沉稳型)Charon、Algieba信息量足、音色流畅
嘉宾(专家型)Rasalgethi、Gacrux知识渊博、成熟稳重
访谈主持人Achird友好亲切
旁白Charon、Orus信息清晰、表达明确

Music Style Suggestions

音乐风格建议

Podcast TypeMusic Prompt
Tech/Business"upbeat electronic, modern, corporate, positive"
Casual/Comedy"fun, playful, acoustic guitar, lighthearted"
News/Serious"subtle, professional, ambient, understated"
Storytelling"cinematic, emotional, orchestral, atmospheric"
Health/Wellness"calm, peaceful, ambient, gentle piano"
True Crime"dark, suspenseful, minimal, tension"
播客类型音乐提示词
科技/商务"欢快电子、现代、商务风、积极向上"
轻松/喜剧"有趣、活泼、原声吉他、轻松愉快"
新闻/严肃"低调、专业、氛围音、不突兀"
故事讲述"电影感、富有情感、管弦乐、氛围感"
健康/养生"舒缓、平和、氛围音、轻柔钢琴"
真实犯罪"暗黑、悬疑、极简、紧张感"

Limitations

局限性

  • Max 2 speakers per Gemini TTS call (for more, generate separate files and concatenate)
  • Lyria is instrumental only - no vocals in music
  • Duration estimates may vary - dialogue length depends on speaking pace
  • Music loops if shorter than dialogue
  • 每次Gemini TTS调用最多支持2位说话人(如需更多,可生成单独文件后拼接)
  • Lyria仅支持纯音乐生成——音乐中无 vocals
  • 时长估算可能有偏差——对话长度取决于语速
  • 如果音乐短于对话,会自动循环播放

Error Handling

错误处理

ErrorSolution
"GOOGLE_API_KEY not set"Set up API key per README
"FFmpeg not found"Install:
brew install ffmpeg
"google-genai not installed"Run:
pip install google-genai
TTS too longSplit script into segments, generate separately, concat
错误解决方案
"GOOGLE_API_KEY not set"按照README设置API密钥
"FFmpeg not found"安装FFmpeg:
brew install ffmpeg
"google-genai not installed"执行:
pip install google-genai
TTS内容过长将脚本拆分为多个片段,分别生成后再拼接

Example Prompts

示例提示词

Simple:
"Create a 3-minute podcast episode about remote work tips with two hosts"
Detailed:
"Create a 5-minute tech podcast episode. Hosts: Alex (enthusiastic, tech-savvy) and Jordan (skeptical, asks good questions). Topic: The future of AI assistants. Include upbeat electronic intro/outro music. Casual but informative tone."
With provided script:
"Turn this dialogue into a podcast episode with intro/outro music: Alex: Hey everyone, welcome back! Jordan: Today we're talking about..."
简单版:
"创建一个3分钟的关于远程办公技巧的播客剧集,包含两位主持人"
详细版:
"创建一个5分钟的科技播客剧集。主持人:Alex(热情、精通技术)和Jordan(持怀疑态度、擅长提问)。主题:AI助手的未来。搭配欢快电子风格的片头/片尾音乐。语气轻松但信息量充足。"
提供脚本版:
"将这段对话转换成带有片头/片尾音乐的播客剧集: Alex:大家好,欢迎回来! Jordan:今天我们要聊的是……"