blog-audio

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Blog Audio -- Gemini TTS Narration for Blog Posts

博客音频 -- 基于Gemini TTS的博客文章旁白生成工具

Generate professional audio narration of blog content using Google's Gemini TTS. Three modes: summary (200-300 word spoken overview), full article read-aloud, or two-speaker podcast dialogue. 30 voices, 80+ languages, HTML5 embed output.
使用Google的Gemini TTS生成专业的博客内容音频旁白。提供三种模式:摘要模式(200-300词的语音概述)、全文朗读模式,以及双主播播客对话模式。支持30种语音、80+种语言,输出HTML5嵌入代码。

Quick Reference

快速参考

CommandWhat it does
/blog audio generate <file>
Generate audio narration of a blog post
/blog audio voices
Show available voices with characteristics
/blog audio setup
Check/configure API key for Gemini TTS
命令功能
/blog audio generate <file>
生成博客文章的音频旁白
/blog audio voices
展示可用语音及其特性
/blog audio setup
检查/配置Gemini TTS的API密钥

Prerequisites

前置条件

  • Python 3.11+ (venv managed automatically by
    run.py
    )
  • GOOGLE_AI_API_KEY
    environment variable (same key used by blog-image)
  • FFmpeg (for WAV-to-MP3 conversion; falls back to WAV if missing)
  • Python 3.11+(
    run.py
    会自动管理venv)
  • GOOGLE_AI_API_KEY
    环境变量(与blog-image使用相同的密钥)
  • FFmpeg(用于WAV转MP3转换;若缺失则降级输出WAV格式)

Always Use run.py Wrapper

请始终使用run.py包装器

bash
undefined
bash
undefined

CORRECT:

CORRECT:

python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json
python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json

WRONG:

WRONG:

python3 scripts/generate_audio.py --text "..." # Fails without venv
undefined
python3 scripts/generate_audio.py --text "..." # Fails without venv
undefined

API Key Check (Gate Pattern)

API密钥检查(门控模式)

Before generating audio, check for the API key:
bash
echo $GOOGLE_AI_API_KEY
  • If set: proceed with generation
  • If not set: guide the user: "Audio generation requires a Google AI API key. Get one free at https://aistudio.google.com/apikey Then set it:
    export GOOGLE_AI_API_KEY=your-key
    This is the same key used by
    /blog image
    -- if image generation works, audio works too."
  • When called internally (from blog-write): return silently if key is missing. Never block the writing workflow.
生成音频前,请检查API密钥:
bash
echo $GOOGLE_AI_API_KEY
  • 若已设置:继续生成流程
  • 若未设置:引导用户操作: "音频生成需要Google AI API密钥。可前往https://aistudio.google.com/apikey免费获取, 然后设置:
    export GOOGLE_AI_API_KEY=your-key
    此密钥与
    /blog image
    使用的密钥相同——如果图片生成功能可用,音频生成也可正常使用。"
  • 内部调用时(来自blog-write):若密钥缺失则静默返回,绝不能阻塞写作流程。

Setup

设置流程

For
/blog audio setup
:
  1. Check if
    GOOGLE_AI_API_KEY
    is set in environment
  2. If blog-image is configured (check
    .mcp.json
    ), the key is already available
  3. If not, guide user to https://aistudio.google.com/apikey
  4. Verify with a dry run:
    python3 scripts/run.py generate_audio.py --text "Test" --dry-run --json
对于
/blog audio setup
命令:
  1. 检查环境变量中是否已设置
    GOOGLE_AI_API_KEY
  2. 若blog-image已配置(检查
    .mcp.json
    ),则密钥已可用
  3. 若未配置,引导用户前往https://aistudio.google.com/apikey
  4. 通过试运行验证:
    python3 scripts/run.py generate_audio.py --text "Test" --dry-run --json

Voice Selection

语音选择

For
/blog audio voices
:
Load
references/voices.md
and present the voice catalog to the user.
Ask the user which voice they prefer, or recommend based on content type:
  • Article narration: Charon (Informative) or Sadaltager (Knowledgeable)
  • Tutorial/how-to: Achird (Friendly) or Sulafat (Warm)
  • News/analysis: Rasalgethi (Informative) or Schedar (Even)
  • Lifestyle/wellness: Aoede (Breezy) or Vindemiatrix (Gentle)
  • Dialogue host: Puck (Upbeat) or Laomedeia (Upbeat)
  • Dialogue expert: Kore (Firm) or Charon (Informative)
对于
/blog audio voices
命令:
加载
references/voices.md
并向用户展示语音目录。
询问用户偏好的语音,或根据内容类型推荐:
  • 文章旁白:Charon(风格:资讯性)或Sadaltager(风格:知识性)
  • 教程/操作指南:Achird(风格:友好)或Sulafat(风格:温暖)
  • 新闻/分析:Rasalgethi(风格:资讯性)或Schedar(风格:平稳)
  • 生活方式/健康:Aoede(风格:轻快)或Vindemiatrix(风格:温和)
  • 对话主持人:Puck(风格:活泼)或Laomedeia(风格:活泼)
  • 对话专家:Kore(风格:坚定)或Charon(风格:资讯性)

Generation Workflow

生成流程

For
/blog audio generate <file>
:
对于
/blog audio generate <file>
命令:

Step 1: Read the Blog Post

步骤1:读取博客文章

Read the file and extract:
  • Title (from H1 or frontmatter)
  • Full content (markdown body)
  • Approximate word count
读取文件并提取:
  • 标题(来自H1或前置元数据)
  • 完整内容(Markdown正文)
  • 大致字数

Step 2: Choose Mode

步骤2:选择模式

Ask the user (or auto-select if they specified
--mode
):
ModeWhen to useOutput
SummaryQuick audio overview (1-2 min)200-300 word spoken summary
FullComplete read-aloud (5-15 min)Full article as natural speech
DialoguePodcast-style (3-8 min)Two-person conversation about the article
询问用户(或若用户指定
--mode
则自动选择):
模式使用场景输出
摘要模式快速音频概述(1-2分钟)200-300词的语音摘要
全文模式完整朗读(5-15分钟)全文转换为自然语音
对话模式播客风格(3-8分钟)关于文章的双人对话内容

Step 3: Prepare Text

步骤3:准备文本

CRITICAL: Claude prepares the text. The script does TTS only.
Summary mode: Write a 200-300 word spoken summary of the article. Rules:
  • Write as natural speech, not written text
  • Open with the article's key finding or answer
  • Cover 3-5 main takeaways
  • Close with actionable advice
  • No markdown, no "In this article...", no meta-commentary
  • Use conversational transitions ("Here's what matters...", "The key finding is...")
Full mode: Strip the markdown content to clean spoken text:
  • Headings become natural transitions ("Next, let's look at...")
  • Links become plain text (remove URLs, keep anchor text)
  • Images and charts: omit or briefly describe ("As the data shows...")
  • Code blocks: describe verbally ("The code uses a for-loop to...")
  • Lists: convert to natural sentences
  • Remove frontmatter, schema markup, HTML tags
  • Add brief intro: "This is [title], published on [date]."
Dialogue mode: Write a 2-person conversation script about the article:
  • Speaker1 = Host (curious, asks good questions)
  • Speaker2 = Expert (knowledgeable, gives clear answers)
  • Format each line as:
    [Speaker1] What's the key takeaway here?
  • Cover the article's main points conversationally
  • 15-25 exchanges (produces ~3-8 minutes)
  • Natural, not stilted ("That's a great point" over "Indeed, as the research indicates")
关键说明:文本由Claude准备,脚本仅负责TTS转换。
摘要模式: 撰写文章的200-300词语音摘要,需遵循以下规则:
  • 以自然口语风格撰写,而非书面语
  • 开篇点明文章的核心结论或答案
  • 涵盖3-5个主要要点
  • 结尾给出可操作建议
  • 不含Markdown格式、“在本文中...”等元注释
  • 使用口语化过渡语(如“重点内容如下...”“核心结论是...”)
全文模式: 将Markdown内容转换为干净的口语文本:
  • 标题转换为自然过渡语(如“接下来,我们来看...”)
  • 链接转换为纯文本(移除URL,保留锚文本)
  • 图片和图表:省略或简要描述(如“数据显示...”)
  • 代码块:用语言描述(如“这段代码使用for循环来...”)
  • 列表:转换为自然语句
  • 移除前置元数据、Schema标记、HTML标签
  • 添加简短引言:“这是《[标题]》,发布于[日期]。”
对话模式: 撰写关于文章的双人对话脚本:
  • Speaker1 = 主持人(充满好奇,善于提问)
  • Speaker2 = 专家(知识渊博,回答清晰)
  • 每行格式:
    [Speaker1] 这里的核心要点是什么?
  • 以对话形式涵盖文章主要内容
  • 15-25轮对话(时长约3-8分钟)
  • 风格自然,避免生硬(用“这个观点很棒”替代“确实,正如研究表明的那样”)

Step 4: Select Voice

步骤4:选择语音

If the user chose a voice, use it. Otherwise, recommend based on mode:
  • Summary/Full: default to Charon (Informative)
  • Dialogue: default to Puck (Host) + Kore (Expert)
若用户已选择语音则使用该语音,否则根据模式推荐:
  • 摘要/全文模式:默认使用Charon(资讯性)
  • 对话模式:默认使用Puck(主持人)+ Kore(专家)

Step 5: Generate Audio

步骤5:生成音频

Write the prepared text to a temp file, then call:
bash
undefined
将准备好的文本写入临时文件,然后调用:
bash
undefined

Single voice (summary or full mode)

单语音(摘要或全文模式)

python3 scripts/run.py generate_audio.py
--text-file /tmp/blog_audio_prepared.txt
--voice Charon
--model flash
--output /path/to/audio/post-slug.mp3
--json
python3 scripts/run.py generate_audio.py
--text-file /tmp/blog_audio_prepared.txt
--voice Charon
--model flash
--output /path/to/audio/post-slug.mp3
--json

Two voices (dialogue mode)

双语音(对话模式)

python3 scripts/run.py generate_audio.py
--text-file /tmp/blog_audio_dialogue.txt
--voice Puck
--voice2 Kore
--model pro
--output /path/to/audio/post-slug-dialogue.mp3
--json

**Model selection:**
- `flash` (default): Fast, cheap. Good for summaries and standard narration.
- `pro`: Higher quality. Use for dialogue mode or premium content.
python3 scripts/run.py generate_audio.py
--text-file /tmp/blog_audio_dialogue.txt
--voice Puck
--voice2 Kore
--model pro
--output /path/to/audio/post-slug-dialogue.mp3
--json

**模型选择**:
- `flash`(默认):快速、低成本,适合摘要和标准旁白
- `pro`:更高质量,适合对话模式或 premium 内容

Step 6: Deliver

步骤6:交付结果

Present the result to the user:
  1. File path -- where the audio was saved
  2. Duration -- human-readable (e.g., "3:42")
  3. Embed code -- ready-to-paste HTML5 audio tag
  4. Cost -- estimated API cost
  5. Placement suggestion -- where to insert the embed in the blog post
向用户展示以下结果:
  1. 文件路径 -- 音频保存位置
  2. 时长 -- 易读格式(如“3:42”)
  3. 嵌入代码 -- 可直接粘贴的HTML5音频标签
  4. 成本 -- API估算费用
  5. 放置建议 -- 嵌入代码在博客文章中的插入位置

Embedding Guide

嵌入指南

Standard HTML (Hugo, Jekyll, static sites)

标准HTML(Hugo、Jekyll、静态站点)

html
<audio controls preload="metadata">
  <source src="audio/post-slug.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>
html
<audio controls preload="metadata">
  <source src="audio/post-slug.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

MDX (Next.js, Gatsby)

MDX(Next.js、Gatsby)

jsx
<audio controls preload="metadata">
  <source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>
jsx
<audio controls preload="metadata">
  <source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>

WordPress

WordPress

[audio src="audio/post-slug.mp3"]
[audio src="audio/post-slug.mp3"]

Placement

放置位置

Insert the audio player after the introduction (below the first H2) or at the very top of the article with a label: "Listen to this article" or "Audio version".
将音频播放器插入引言之后(第一个H2下方),或放在文章最顶部并添加标签:“收听本文”或“音频版本”。

Internal API (for blog-write)

内部API(供blog-write调用)

When invoked internally from blog-write:
Input:
  • text
    : Prepared text (already cleaned by Claude)
  • voice
    : Voice name (default: Charon)
  • voice2
    : Second voice for dialogue (optional)
  • model
    : flash or pro
  • output_path
    : Where to save the file
Output:
markdown
undefined
当从blog-write内部调用时:
输入参数
  • text
    : 准备好的文本(已由Claude清理)
  • voice
    : 语音名称(默认:Charon)
  • voice2
    : 对话模式的第二个语音(可选)
  • model
    : flash或pro
  • output_path
    : 文件保存路径
输出格式
markdown
undefined

Audio Narration

音频旁白

  • Path: /path/to/audio/post-slug.mp3
  • Duration: 3:42
  • Voice: Charon
  • Embed:
    <audio controls preload="metadata"><source src="audio/post-slug.mp3" type="audio/mpeg"></audio>

**Graceful fallback:** If `GOOGLE_AI_API_KEY` is not set, return immediately
with no error. The writing workflow continues without audio. Never block
blog-write because audio generation is unavailable.
  • 路径: /path/to/audio/post-slug.mp3
  • 时长: 3:42
  • 语音: Charon
  • 嵌入代码:
    <audio controls preload="metadata"><source src="audio/post-slug.mp3" type="audio/mpeg"></audio>

**优雅降级**:若未设置`GOOGLE_AI_API_KEY`,则立即静默返回,不报错。写作流程将继续,不会因音频生成不可用而阻塞。

Error Handling

错误处理

ErrorResolution
GOOGLE_AI_API_KEY not setGet key at https://aistudio.google.com/apikey
FFmpeg not foundInstall:
sudo apt install ffmpeg
. Falls back to WAV output.
Rate limitedWait and retry. Check limits at https://aistudio.google.com/rate-limit
Text too long (>32k tokens)Split into sections, generate separately
Unknown voice nameRun
/blog audio voices
to see valid options
API errorCheck key validity, model availability (preview models)
API key missing (internal call)Return silently -- writing workflow continues
错误解决方案
未设置GOOGLE_AI_API_KEY前往https://aistudio.google.com/apikey获取密钥
未找到FFmpeg安装:
sudo apt install ffmpeg
,降级输出WAV格式
速率限制等待后重试,查看限制:https://aistudio.google.com/rate-limit
文本过长(>32k tokens)拆分段落,分别生成
未知语音名称运行
/blog audio voices
查看有效选项
API错误检查密钥有效性、模型可用性(预览模型)
内部调用时缺失API密钥静默返回——写作流程继续

Reference Documentation

参考文档

Load on-demand -- do NOT load all at startup:
  • references/voices.md
    -- Full 30-voice catalog, recommendations by content type, dialogue pairings
按需加载——请勿在启动时全部加载:
  • references/voices.md
    -- 完整的30种语音目录、按内容类型推荐的语音、对话组合推荐