podcast-generation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePodcast Generation Skill
播客生成Skill
Overview
概述
This skill generates high-quality podcast audio from text content. The workflow includes creating a structured JSON script (conversational dialogue) and executing audio generation through text-to-speech synthesis.
此Skill可根据文本内容生成高质量的播客音频。工作流程包括创建结构化的JSON脚本(对话式内容),并通过文本转语音合成执行音频生成。
Core Capabilities
核心功能
- Convert any text content (articles, reports, documentation) into podcast scripts
- Generate natural two-host conversational dialogue (male and female hosts)
- Synthesize speech audio using text-to-speech
- Mix audio chunks into a final podcast MP3 file
- Support both English and Chinese content
- 将任意文本内容(文章、报告、文档)转换为播客脚本
- 生成自然的双主播对话内容(男女主播)
- 通过文本转语音(TTS)合成语音音频
- 将音频片段混合为最终的播客MP3文件
- 支持英文和中文内容
Workflow
工作流程
Step 1: Understand Requirements
步骤1:理解需求
When a user requests podcast generation, identify:
- Source content: The text/article/report to convert into a podcast
- Language: English or Chinese (based on content)
- Output location: Where to save the generated podcast
- You don't need to check the folder under
/mnt/user-data
当用户请求生成播客时,需明确:
- 源内容:要转换为播客的文本/文章/报告
- 语言:英文或中文(根据内容确定)
- 输出位置:生成的播客保存路径
- 无需检查下的文件夹
/mnt/user-data
Step 2: Create Structured Script JSON
步骤2:创建结构化脚本JSON
Generate a structured JSON script file in with naming pattern:
/mnt/user-data/workspace/{descriptive-name}-script.jsonThe JSON structure:
json
{
"locale": "en",
"lines": [
{"speaker": "male", "paragraph": "dialogue text"},
{"speaker": "female", "paragraph": "dialogue text"}
]
}在路径下生成结构化的JSON脚本文件,命名格式为:
/mnt/user-data/workspace/{描述性名称}-script.jsonJSON结构:
json
{
"locale": "en",
"lines": [
{"speaker": "male", "paragraph": "dialogue text"},
{"speaker": "female", "paragraph": "dialogue text"}
]
}Step 3: Execute Generation
步骤3:执行生成
Call the Python script:
bash
python /mnt/skills/public/podcast-generation/scripts/generate.py \
--script-file /mnt/user-data/workspace/script-file.json \
--output-file /mnt/user-data/outputs/generated-podcast.mp3 \
--transcript-file /mnt/user-data/outputs/generated-podcast-transcript.mdParameters:
- : Absolute path to JSON script file (required)
--script-file - : Absolute path to output MP3 file (required)
--output-file - : Absolute path to output transcript markdown file (optional, but recommended)
--transcript-file
[!IMPORTANT]
- Execute the script in one complete call. Do NOT split the workflow into separate steps.
- The script handles all TTS API calls and audio generation internally.
- Do NOT read the Python file, just call it with the parameters.
- Always include
to generate a readable transcript for the user.--transcript-file
调用Python脚本:
bash
python /mnt/skills/public/podcast-generation/scripts/generate.py \
--script-file /mnt/user-data/workspace/script-file.json \
--output-file /mnt/user-data/outputs/generated-podcast.mp3 \
--transcript-file /mnt/user-data/outputs/generated-podcast-transcript.md参数说明:
- :JSON脚本文件的绝对路径(必填)
--script-file - :输出MP3文件的绝对路径(必填)
--output-file - :输出转录文本Markdown文件的绝对路径(可选,但推荐使用)
--transcript-file
[!IMPORTANT]
- 需一次性完整调用脚本执行全流程,不要将工作拆分为单独步骤。
- 脚本会在内部处理所有TTS API调用和音频生成操作。
- 无需读取Python文件内容,只需传入参数调用即可。
- 务必包含
参数,为用户生成可读的转录文本。--transcript-file
Script JSON Format
脚本JSON格式
The script JSON file must follow this structure:
json
{
"title": "The History of Artificial Intelligence",
"locale": "en",
"lines": [
{"speaker": "male", "paragraph": "Hello Deer! Welcome back to another episode."},
{"speaker": "female", "paragraph": "Hey everyone! Today we have an exciting topic to discuss."},
{"speaker": "male", "paragraph": "That's right! We're going to talk about..."}
]
}Fields:
- : Title of the podcast episode (optional, used as heading in transcript)
title - : Language code - "en" for English or "zh" for Chinese
locale - : Array of dialogue lines
lines- : Either "male" or "female"
speaker - : The dialogue text for this speaker
paragraph
脚本JSON文件必须遵循以下结构:
json
{
"title": "The History of Artificial Intelligence",
"locale": "en",
"lines": [
{"speaker": "male", "paragraph": "Hello Deer! Welcome back to another episode."},
{"speaker": "female", "paragraph": "Hey everyone! Today we have an exciting topic to discuss."},
{"speaker": "male", "paragraph": "That's right! We're going to talk about..."}
]
}字段说明:
- :播客集的标题(可选,用作转录文本的标题)
title - :语言代码 - "en"代表英文,"zh"代表中文
locale - :对话行数组
lines- :取值为"male"或"female"
speaker - :该主播的对话文本
paragraph
Script Writing Guidelines
脚本编写指南
When creating the script JSON, follow these guidelines:
创建脚本JSON时,请遵循以下指南:
Format Requirements
格式要求
- Only two hosts: male and female, alternating naturally
- Target runtime: approximately 10 minutes of dialogue (around 40-60 lines)
- Start with the male host saying a greeting that includes "Hello Deer"
- 仅设置两位主播:男性和女性,自然交替对话
- 目标时长:约10分钟的对话内容(约40-60行)
- 以男性主播说出包含"Hello Deer"的问候语开场
Tone & Style
语气与风格
- Natural, conversational dialogue - like two friends chatting
- Use casual expressions and conversational transitions
- Avoid overly formal language or academic tone
- Include reactions, follow-up questions, and natural interjections
- 自然、口语化的对话 - 就像两位朋友聊天一样
- 使用随意的表达和对话过渡语
- 避免过于正式的语言或学术性语气
- 加入反应、后续问题和自然的感叹词
Content Guidelines
内容指南
- Frequent back-and-forth between hosts
- Keep sentences short and easy to follow when spoken
- Plain text only - no markdown formatting in the output
- Translate technical concepts into accessible language
- No mathematical formulas, code, or complex notation
- Make content engaging and accessible for audio-only listeners
- Exclude meta information like dates, author names, or document structure
- 主播之间频繁交替对话
- 句子要简短,便于口语表达和听众理解
- 仅使用纯文本 - 输出中不要包含Markdown格式
- 将技术概念转化为通俗易懂的语言
- 不要包含数学公式、代码或复杂符号
- 内容要有趣且适合纯音频听众
- 排除日期、作者姓名或文档结构等元信息
Podcast Generation Example
播客生成示例
User request: "Generate a podcast about the history of artificial intelligence"
Step 1: Create script file :
/mnt/user-data/workspace/ai-history-script.jsonjson
{
"title": "The History of Artificial Intelligence",
"locale": "en",
"lines": [
{"speaker": "male", "paragraph": "Hello Deer! Welcome back to another fascinating episode. Today we're diving into something that's literally shaping our future - the history of artificial intelligence."},
{"speaker": "female", "paragraph": "Oh, I love this topic! You know, AI feels so modern, but it actually has roots going back over seventy years."},
{"speaker": "male", "paragraph": "Exactly! It all started back in the 1950s. The term artificial intelligence was actually coined by John McCarthy in 1956 at a famous conference at Dartmouth."},
{"speaker": "female", "paragraph": "Wait, so they were already thinking about machines that could think back then? That's incredible!"},
{"speaker": "male", "paragraph": "Right? The early pioneers were so optimistic. They thought we'd have human-level AI within a generation."},
{"speaker": "female", "paragraph": "But things didn't quite work out that way, did they?"},
{"speaker": "male", "paragraph": "No, not at all. The 1970s brought what's called the first AI winter..."}
]
}Step 2: Execute generation:
bash
python /mnt/skills/public/podcast-generation/scripts/generate.py \
--script-file /mnt/user-data/workspace/ai-history-script.json \
--output-file /mnt/user-data/outputs/ai-history-podcast.mp3 \
--transcript-file /mnt/user-data/outputs/ai-history-transcript.mdThis will generate:
- : The audio podcast file
ai-history-podcast.mp3 - : A readable markdown transcript of the podcast
ai-history-transcript.md
用户请求:"Generate a podcast about the history of artificial intelligence"
步骤1:创建脚本文件:
/mnt/user-data/workspace/ai-history-script.jsonjson
{
"title": "The History of Artificial Intelligence",
"locale": "en",
"lines": [
{"speaker": "male", "paragraph": "Hello Deer! Welcome back to another fascinating episode. Today we're diving into something that's literally shaping our future - the history of artificial intelligence."},
{"speaker": "female", "paragraph": "Oh, I love this topic! You know, AI feels so modern, but it actually has roots going back over seventy years."},
{"speaker": "male", "paragraph": "Exactly! It all started back in the 1950s. The term artificial intelligence was actually coined by John McCarthy in 1956 at a famous conference at Dartmouth."},
{"speaker": "female", "paragraph": "Wait, so they were already thinking about machines that could think back then? That's incredible!"},
{"speaker": "male", "paragraph": "Right? The early pioneers were so optimistic. They thought we'd have human-level AI within a generation."},
{"speaker": "female", "paragraph": "But things didn't quite work out that way, did they?"},
{"speaker": "male", "paragraph": "No, not at all. The 1970s brought what's called the first AI winter..."}
]
}步骤2:执行生成:
bash
python /mnt/skills/public/podcast-generation/scripts/generate.py \
--script-file /mnt/user-data/workspace/ai-history-script.json \
--output-file /mnt/user-data/outputs/ai-history-podcast.mp3 \
--transcript-file /mnt/user-data/outputs/ai-history-transcript.md生成的内容包括:
- :播客音频文件
ai-history-podcast.mp3 - :可读的播客转录文本Markdown文件
ai-history-transcript.md
Specific Templates
特定模板
Read the following template file only when matching the user request.
- Tech Explainer - For converting technical documentation and tutorials
仅当用户请求匹配时,才读取以下模板文件:
- Tech Explainer - 用于转换技术文档和教程
Output Format
输出格式
The generated podcast follows the "Hello Deer" format:
- Two hosts: one male, one female
- Natural conversational dialogue
- Starts with "Hello Deer" greeting
- Target duration: approximately 10 minutes
- Alternating speakers for engaging flow
生成的播客遵循"Hello Deer"格式:
- 两位主播:一位男性,一位女性
- 自然的对话式内容
- 以"Hello Deer"问候语开场
- 目标时长:约10分钟
- 主播交替对话,提升内容吸引力
Output Handling
输出处理
After generation:
- Podcasts and transcripts are saved in
/mnt/user-data/outputs/ - Share both the podcast MP3 and transcript MD with user using tool
present_files - Provide brief description of the generation result (topic, duration, hosts)
- Offer to regenerate if adjustments needed
生成完成后:
- 播客和转录文本将保存至路径
/mnt/user-data/outputs/ - 使用工具向用户分享播客MP3和转录文本MD文件
present_files - 简要说明生成结果(主题、时长、主播配置)
- 若用户需要调整,可提供重新生成的服务
Requirements
环境要求
The following environment variables must be set:
- : Volcengine TTS application ID
VOLCENGINE_TTS_APPID - : Volcengine TTS access token
VOLCENGINE_TTS_ACCESS_TOKEN - : Volcengine TTS cluster (optional, defaults to "volcano_tts")
VOLCENGINE_TTS_CLUSTER
必须设置以下环境变量:
- :火山引擎TTS应用ID
VOLCENGINE_TTS_APPID - :火山引擎TTS访问令牌
VOLCENGINE_TTS_ACCESS_TOKEN - :火山引擎TTS集群(可选,默认值为"volcano_tts")
VOLCENGINE_TTS_CLUSTER
Notes
注意事项
- Always execute the full pipeline in one call - no need to test individual steps or worry about timeouts
- The script JSON should match the content language (en or zh)
- Technical content should be simplified for audio accessibility in the script
- Complex notations (formulas, code) should be translated to plain language in the script
- Long content may result in longer podcasts
- 务必一次性执行完整流程 - 无需测试单个步骤,也无需担心超时问题
- 脚本JSON的语言需与内容语言匹配(en或zh)
- 技术内容在脚本中应简化,以便音频听众理解
- 复杂符号(公式、代码)需在脚本中转换为通俗易懂的语言
- 较长的源内容可能会生成更长的播客