podcast-producer-agent

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Podcast Producer

播客制作工具

Create complete podcast episodes, interviews, and conversation-style audio content.

This is an orchestrator skill that combines:

Script/dialogue generation (Claude)
Multi-speaker voice synthesis (Gemini TTS)
Intro/outro music (Lyria)
Audio assembly (FFmpeg via media-utils)

创建完整的播客剧集、访谈以及对话式音频内容。

这是一个统筹型技能，整合了以下功能：

脚本/对话生成（Claude）
多说话人语音合成（Gemini TTS）
片头/片尾音乐制作（Lyria）
音频组装（通过media-utils调用FFmpeg）

What You Can Create

可创建的内容类型

Type	Example
Podcast episode	Two hosts discussing a topic
Interview	Q&A format with host and guest
Dialogue	Scripted conversation between characters
Audio drama	Story with multiple characters
Radio show	Formatted audio program with segments

类型	示例
播客剧集	两位主持人讨论某个话题
访谈	主持人与嘉宾的问答形式
对话	角色之间的脚本化对话
有声剧	包含多个角色的故事内容
广播节目	带有板块的格式化音频节目

Prerequisites

前置条件

```
GOOGLE_API_KEY
```
- For Gemini TTS (voices) and Lyria (music)
FFmpeg installed:
```
brew install ffmpeg
```
(macOS) or
```
apt install ffmpeg
```
(Linux)

```
GOOGLE_API_KEY
```
- 用于Gemini TTS（语音合成）和Lyria（音乐生成）
已安装FFmpeg：macOS用户执行
```
brew install ffmpeg
```
，Linux用户执行
```
apt install ffmpeg
```

Workflow

工作流程

Step 1: Gather Requirements (REQUIRED)

步骤1：收集需求（必填）

⚠️ DO NOT skip this step. Use interactive questioning — ask ONE question at a time.

⚠️ 请勿跳过此步骤。通过交互式提问收集信息——每次只提一个问题。

Question Flow

提问流程

⚠️ Use the
AskUserQuestion
tool for each question below. Do not just print questions in your response — use the tool to create interactive prompts with the options shown.

Q1: Topic

"I'll create that podcast episode! First — what's the topic?

(What should the hosts discuss?)"

Wait for response.

Q2: Hosts

"Who are the hosts/speakers?

(Names and brief personality — e.g., 'Sarah, enthusiastic tech expert' — max 2 for TTS)"

Wait for response.

Q3: Duration

"How long should the episode be?

5 minutes

10 minutes

15 minutes

Or specify your own"

Wait for response.

Q4: Tone

"What tone?

Professional

Casual/conversational

Funny/entertaining

Serious/educational

Or describe your own"

Wait for response.

Q5: Music

"What music style for intro/outro?

Upbeat pop

Chill lo-fi

Corporate/professional

Electronic

Or describe your own"

Wait for response.

⚠️ 请使用
AskUserQuestion
工具提出以下每个问题，不要直接在回复中打印问题——使用工具创建带有选项的交互式提示。

问题1：主题

"我将为您创建播客剧集！首先——主题是什么？

（主持人要讨论什么内容？）"

等待用户回复。

问题2：主持人

"主持人/说话人是谁？

（请提供姓名和简短性格描述，例如：'Sarah，热情的技术专家'——TTS最多支持2位说话人）"

等待用户回复。

问题3：时长

"剧集时长要多久？

5分钟

10分钟

15分钟

或自定义时长"

等待用户回复。

问题4：语气

"要什么语气？

专业正式

轻松随意/对话式

有趣搞笑/娱乐向

严肃认真/教育向

或自定义描述"

等待用户回复。

问题5：音乐

"片头/片尾的音乐风格是什么？

欢快流行

舒缓lo-fi

商务/专业

电子风格

或自定义描述"

等待用户回复。

Quick Reference

快速参考

Question	Determines
Topic	Script content and discussion points
Hosts	Voice selection and script style
Duration	Script length
Tone	Writing style and energy
Music	Lyria prompt for intro/outro

问题	决定内容
主题	脚本内容和讨论要点
主持人	语音选择和脚本风格
时长	脚本长度
语气	写作风格和内容活力
音乐	Lyria生成片头/片尾音乐的提示词

Step 2: Generate the Script

步骤2：生成脚本

Use Claude to create the dialogue script with speaker labels:

[INTRO MUSIC: 10 seconds, upbeat tech podcast vibe]

Sarah: Welcome back to Tech Talk! I'm Sarah, and today we have something exciting.

Mike: Hey everyone! I'm Mike, and yes - we're diving into AI in healthcare.

Sarah: So Mike, what's the biggest change you've seen this year?

Mike: Great question! The use of AI for diagnostic imaging has exploded...

[Continue dialogue...]

[OUTRO MUSIC: Fade under last lines, 5 seconds after]

Sarah: Thanks for listening! Follow us for more episodes.

Mike: See you next time!

Script Guidelines:

Use speaker names exactly as they'll be configured in TTS
Include music cues in brackets:
```
[INTRO MUSIC: description]
```
Natural conversation flow with back-and-forth
Aim for ~150 words per minute of audio

使用Claude生成带有说话人标签的对话脚本：

[片头音乐：10秒，欢快的科技播客风格]

Sarah：欢迎回到《科技访谈》！我是Sarah，今天有个激动人心的话题要聊。

Mike：大家好！我是Mike，没错——我们今天要深入探讨AI在医疗领域的应用。

Sarah：Mike，你今年看到的最大变化是什么？

Mike：问得好！AI在诊断成像中的应用呈爆发式增长……

[继续对话……]

[片尾音乐：在最后几句台词下渐弱，结束后持续5秒]

Sarah：感谢收听！欢迎关注我们获取更多剧集。

Mike：下次再见！

脚本编写指南：

说话人姓名需与TTS配置中的完全一致
用方括号标注音乐提示：
```
[片头音乐：描述]
```
对话流程自然，有来有回
目标语速约为每分钟150词

Step 3: Plan Asset Generation

步骤3：规划资源生成

Create a manifest of what needs to be generated:

json

{
  "project": "tech_talk_ai_healthcare",
  "duration_target": "5 minutes",
  "speakers": [
    {"name": "Sarah", "voice": "Kore", "style": "Enthusiastic, upbeat"},
    {"name": "Mike", "voice": "Puck", "style": "Friendly, knowledgeable"}
  ],
  "assets": [
    {
      "type": "music",
      "name": "intro_music",
      "prompt": "upbeat tech podcast, electronic, modern",
      "duration": 15,
      "script": "lyria"
    },
    {
      "type": "dialogue", 
      "name": "main_content",
      "speakers": ["Sarah", "Mike"],
      "text": "[the dialogue script]",
      "script": "gemini_tts"
    },
    {
      "type": "music",
      "name": "outro_music",
      "prompt": "same as intro, fade out",
      "duration": 10,
      "script": "lyria"
    }
  ],
  "assembly": [
    {"action": "mix", "voice": "intro_with_music", "music": "intro_music", "music_volume": 0.8},
    {"action": "concat", "files": ["intro_with_music", "main_content"]},
    {"action": "mix", "voice": "main_content_end", "music": "outro_music", "fade_out": 5}
  ]
}

创建需要生成的资源清单：

json

{
  "project": "tech_talk_ai_healthcare",
  "duration_target": "5 minutes",
  "speakers": [
    {"name": "Sarah", "voice": "Kore", "style": "Enthusiastic, upbeat"},
    {"name": "Mike", "voice": "Puck", "style": "Friendly, knowledgeable"}
  ],
  "assets": [
    {
      "type": "music",
      "name": "intro_music",
      "prompt": "upbeat tech podcast, electronic, modern",
      "duration": 15,
      "script": "lyria"
    },
    {
      "type": "dialogue", 
      "name": "main_content",
      "speakers": ["Sarah", "Mike"],
      "text": "[the dialogue script]",
      "script": "gemini_tts"
    },
    {
      "type": "music",
      "name": "outro_music",
      "prompt": "same as intro, fade out",
      "duration": 10,
      "script": "lyria"
    }
  ],
  "assembly": [
    {"action": "mix", "voice": "intro_with_music", "music": "intro_music", "music_volume": 0.8},
    {"action": "concat", "files": ["intro_with_music", "main_content"]},
    {"action": "mix", "voice": "main_content_end", "music": "outro_music", "fade_out": 5}
  ]
}

Step 4: Generate Assets

步骤4：生成资源

Execute each generation step:

Generate intro music (Lyria):

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "upbeat tech podcast, electronic, modern, positive energy" \
  --duration 15 \
  --bpm 120

Generate dialogue (Gemini TTS multi-speaker):

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --multi \
  --speaker "Sarah:Kore" \
  --speaker "Mike:Puck" \
  --text "[dialogue script from Step 2]" \
  --style "Make Sarah sound enthusiastic and upbeat. Mike sounds friendly and knowledgeable."

Generate outro music (same as intro or variation):

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "upbeat tech podcast, electronic, fade out feel" \
  --duration 10 \
  --bpm 120

执行每个生成步骤：

生成片头音乐（Lyria）：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "upbeat tech podcast, electronic, modern, positive energy" \
  --duration 15 \
  --bpm 120

生成对话（Gemini TTS多说话人）：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --multi \
  --speaker "Sarah:Kore" \
  --speaker "Mike:Puck" \
  --text "[dialogue script from Step 2]" \
  --style "Make Sarah sound enthusiastic and upbeat. Mike sounds friendly and knowledgeable."

生成片尾音乐（与片头相同或变体）：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "upbeat tech podcast, electronic, fade out feel" \
  --duration 10 \
  --bpm 120

Step 5: Assemble the Podcast

步骤5：组装播客

Use media-utils to stitch everything together:

Mix intro music with beginning of dialogue:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice dialogue.wav \
  --music intro_music.wav \
  --music-volume 0.4 \
  --fade-out 3 \
  -o intro_mixed.wav

Concatenate all segments:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_concat.py \
  -i intro_mixed.wav main_dialogue.wav outro_mixed.wav \
  --crossfade 1.0 \
  -o final_podcast.mp3

使用media-utils将所有内容拼接在一起：

将片头音乐与对话开头混合：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice dialogue.wav \
  --music intro_music.wav \
  --music-volume 0.4 \
  --fade-out 3 \
  -o intro_mixed.wav

拼接所有片段：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_concat.py \
  -i intro_mixed.wav main_dialogue.wav outro_mixed.wav \
  --crossfade 1.0 \
  -o final_podcast.mp3

Step 6: Deliver the Result

步骤6：交付结果

Provide:

The final audio file
Summary of what was created
Offer adjustments

Example delivery:

"✅ Your podcast episode is ready!

File:

tech_talk_ai_healthcare.mp3

(5:23)

What I created:

Dialogue between Sarah (Kore voice) and Mike (Puck voice)
15s upbeat electronic intro music
10s outro music fade

Want me to:

Adjust the voices or tone?
Change the music style?
Extend or shorten any section?
Add more topics to the discussion?"

提供以下内容：

最终音频文件
内容创建摘要
提供调整选项

交付示例：

"✅ 您的播客剧集已制作完成！

文件：

tech_talk_ai_healthcare.mp3

（时长5分23秒）

制作内容说明：

Sarah（Kore语音）与Mike（Puck语音）之间的对话
15秒欢快电子风格片头音乐
10秒渐弱片尾音乐

是否需要调整：

更换语音或语气？
更改音乐风格？
延长或缩短某部分内容？
在讨论中添加更多话题？"

Voice Pairing Suggestions

语音搭配建议

Host Type	Suggested Voice	Description
Main host (energetic)	Kore, Puck, Laomedeia	Firm, upbeat
Co-host (calm)	Charon, Algieba	Informative, smooth
Guest (expert)	Rasalgethi, Gacrux	Knowledgeable, mature
Interviewer	Achird	Friendly
Narrator	Charon, Orus	Informative, clear

主持人类型	推荐语音	描述
主主持人（活力型）	Kore、Puck、Laomedeia	坚定、欢快
副主持人（沉稳型）	Charon、Algieba	信息量足、音色流畅
嘉宾（专家型）	Rasalgethi、Gacrux	知识渊博、成熟稳重
访谈主持人	Achird	友好亲切
旁白	Charon、Orus	信息清晰、表达明确

Music Style Suggestions

音乐风格建议

Podcast Type	Music Prompt
Tech/Business	"upbeat electronic, modern, corporate, positive"
Casual/Comedy	"fun, playful, acoustic guitar, lighthearted"
News/Serious	"subtle, professional, ambient, understated"
Storytelling	"cinematic, emotional, orchestral, atmospheric"
Health/Wellness	"calm, peaceful, ambient, gentle piano"
True Crime	"dark, suspenseful, minimal, tension"

播客类型	音乐提示词
科技/商务	"欢快电子、现代、商务风、积极向上"
轻松/喜剧	"有趣、活泼、原声吉他、轻松愉快"
新闻/严肃	"低调、专业、氛围音、不突兀"
故事讲述	"电影感、富有情感、管弦乐、氛围感"
健康/养生	"舒缓、平和、氛围音、轻柔钢琴"
真实犯罪	"暗黑、悬疑、极简、紧张感"

Limitations

局限性

Max 2 speakers per Gemini TTS call (for more, generate separate files and concatenate)
Lyria is instrumental only - no vocals in music
Duration estimates may vary - dialogue length depends on speaking pace
Music loops if shorter than dialogue

每次Gemini TTS调用最多支持2位说话人（如需更多，可生成单独文件后拼接）
Lyria仅支持纯音乐生成——音乐中无 vocals
时长估算可能有偏差——对话长度取决于语速
如果音乐短于对话，会自动循环播放

Error Handling

错误处理

Error	Solution
"GOOGLE_API_KEY not set"	Set up API key per README
"FFmpeg not found"	Install: `brew install ffmpeg`
"google-genai not installed"	Run: `pip install google-genai`
TTS too long	Split script into segments, generate separately, concat

错误	解决方案
"GOOGLE_API_KEY not set"	按照README设置API密钥
"FFmpeg not found"	安装FFmpeg： `brew install ffmpeg`
"google-genai not installed"	执行： `pip install google-genai`
TTS内容过长	将脚本拆分为多个片段，分别生成后再拼接

Example Prompts

示例提示词

Simple:

"Create a 3-minute podcast episode about remote work tips with two hosts"

Detailed:

"Create a 5-minute tech podcast episode. Hosts: Alex (enthusiastic, tech-savvy) and Jordan (skeptical, asks good questions). Topic: The future of AI assistants. Include upbeat electronic intro/outro music. Casual but informative tone."

With provided script:

"Turn this dialogue into a podcast episode with intro/outro music: Alex: Hey everyone, welcome back! Jordan: Today we're talking about..."

简单版：

"创建一个3分钟的关于远程办公技巧的播客剧集，包含两位主持人"

详细版：

"创建一个5分钟的科技播客剧集。主持人：Alex（热情、精通技术）和Jordan（持怀疑态度、擅长提问）。主题：AI助手的未来。搭配欢快电子风格的片头/片尾音乐。语气轻松但信息量充足。"

提供脚本版：

"将这段对话转换成带有片头/片尾音乐的播客剧集： Alex：大家好，欢迎回来！ Jordan：今天我们要聊的是……"