blog-audio

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Blog Audio -- Gemini TTS Narration for Blog Posts

博客音频 -- 基于Gemini TTS的博客文章旁白生成工具

Generate professional audio narration of blog content using Google's Gemini TTS. Three modes: summary (200-300 word spoken overview), full article read-aloud, or two-speaker podcast dialogue. 30 voices, 80+ languages, HTML5 embed output.

使用Google的Gemini TTS生成专业的博客内容音频旁白。提供三种模式：摘要模式（200-300词的语音概述）、全文朗读模式，以及双主播播客对话模式。支持30种语音、80+种语言，输出HTML5嵌入代码。

Quick Reference

快速参考

Command	What it does
`/blog audio generate <file>`	Generate audio narration of a blog post
`/blog audio voices`	Show available voices with characteristics
`/blog audio setup`	Check/configure API key for Gemini TTS

命令	功能
`/blog audio generate <file>`	生成博客文章的音频旁白
`/blog audio voices`	展示可用语音及其特性
`/blog audio setup`	检查/配置Gemini TTS的API密钥

Prerequisites

前置条件

Python 3.11+ (venv managed automatically by
```
run.py
```
)
```
GOOGLE_AI_API_KEY
```
environment variable (same key used by blog-image)
FFmpeg (for WAV-to-MP3 conversion; falls back to WAV if missing)

Python 3.11+（
```
run.py
```
会自动管理venv）
```
GOOGLE_AI_API_KEY
```
环境变量（与blog-image使用相同的密钥）
FFmpeg（用于WAV转MP3转换；若缺失则降级输出WAV格式）

Always Use run.py Wrapper

请始终使用run.py包装器

bash

undefined

bash

undefined

CORRECT:

python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json

WRONG:

python3 scripts/generate_audio.py --text "..." # Fails without venv

undefined

python3 scripts/generate_audio.py --text "..." # Fails without venv

undefined

API Key Check (Gate Pattern)

API密钥检查（门控模式）

Before generating audio, check for the API key:

bash

echo $GOOGLE_AI_API_KEY

If set: proceed with generation
If not set: guide the user: "Audio generation requires a Google AI API key. Get one free at https://aistudio.google.com/apikey Then set it:
```
export GOOGLE_AI_API_KEY=your-key
```
This is the same key used by
```
/blog image
```
-- if image generation works, audio works too."
When called internally (from blog-write): return silently if key is missing. Never block the writing workflow.

生成音频前，请检查API密钥：

bash

echo $GOOGLE_AI_API_KEY

若已设置：继续生成流程
若未设置：引导用户操作： "音频生成需要Google AI API密钥。可前往https://aistudio.google.com/apikey免费获取，然后设置：
```
export GOOGLE_AI_API_KEY=your-key
```
此密钥与
```
/blog image
```
使用的密钥相同——如果图片生成功能可用，音频生成也可正常使用。"
内部调用时（来自blog-write）：若密钥缺失则静默返回，绝不能阻塞写作流程。

Setup

设置流程

For

/blog audio setup

Check if
```
GOOGLE_AI_API_KEY
```
is set in environment
If blog-image is configured (check
```
.mcp.json
```
), the key is already available
If not, guide user to https://aistudio.google.com/apikey

Verify with a dry run:

python3 scripts/run.py generate_audio.py --text "Test" --dry-run --json

对于

/blog audio setup

命令：

检查环境变量中是否已设置
```
GOOGLE_AI_API_KEY
```
若blog-image已配置（检查
```
.mcp.json
```
），则密钥已可用
若未配置，引导用户前往https://aistudio.google.com/apikey

通过试运行验证：

python3 scripts/run.py generate_audio.py --text "Test" --dry-run --json

Voice Selection

语音选择

For

/blog audio voices

Load

references/voices.md

and present the voice catalog to the user.

Ask the user which voice they prefer, or recommend based on content type:

Article narration: Charon (Informative) or Sadaltager (Knowledgeable)
Tutorial/how-to: Achird (Friendly) or Sulafat (Warm)
News/analysis: Rasalgethi (Informative) or Schedar (Even)
Lifestyle/wellness: Aoede (Breezy) or Vindemiatrix (Gentle)
Dialogue host: Puck (Upbeat) or Laomedeia (Upbeat)
Dialogue expert: Kore (Firm) or Charon (Informative)

对于

/blog audio voices

命令：

加载

references/voices.md

并向用户展示语音目录。

询问用户偏好的语音，或根据内容类型推荐：

文章旁白：Charon（风格：资讯性）或Sadaltager（风格：知识性）
教程/操作指南：Achird（风格：友好）或Sulafat（风格：温暖）
新闻/分析：Rasalgethi（风格：资讯性）或Schedar（风格：平稳）
生活方式/健康：Aoede（风格：轻快）或Vindemiatrix（风格：温和）
对话主持人：Puck（风格：活泼）或Laomedeia（风格：活泼）
对话专家：Kore（风格：坚定）或Charon（风格：资讯性）

Generation Workflow

生成流程

For

/blog audio generate <file>

对于

/blog audio generate <file>

命令：

Step 1: Read the Blog Post

步骤1：读取博客文章

Read the file and extract:

Title (from H1 or frontmatter)
Full content (markdown body)
Approximate word count

读取文件并提取：

标题（来自H1或前置元数据）
完整内容（Markdown正文）
大致字数

Step 2: Choose Mode

步骤2：选择模式

Ask the user (or auto-select if they specified

--mode

Mode	When to use	Output
Summary	Quick audio overview (1-2 min)	200-300 word spoken summary
Full	Complete read-aloud (5-15 min)	Full article as natural speech
Dialogue	Podcast-style (3-8 min)	Two-person conversation about the article

询问用户（或若用户指定

--mode

则自动选择）：

模式	使用场景	输出
摘要模式	快速音频概述（1-2分钟）	200-300词的语音摘要
全文模式	完整朗读（5-15分钟）	全文转换为自然语音
对话模式	播客风格（3-8分钟）	关于文章的双人对话内容

Step 3: Prepare Text

步骤3：准备文本

CRITICAL: Claude prepares the text. The script does TTS only.

Summary mode: Write a 200-300 word spoken summary of the article. Rules:

Write as natural speech, not written text
Open with the article's key finding or answer
Cover 3-5 main takeaways
Close with actionable advice
No markdown, no "In this article...", no meta-commentary
Use conversational transitions ("Here's what matters...", "The key finding is...")

Full mode: Strip the markdown content to clean spoken text:

Headings become natural transitions ("Next, let's look at...")
Links become plain text (remove URLs, keep anchor text)
Images and charts: omit or briefly describe ("As the data shows...")
Code blocks: describe verbally ("The code uses a for-loop to...")
Lists: convert to natural sentences
Remove frontmatter, schema markup, HTML tags
Add brief intro: "This is [title], published on [date]."

Dialogue mode: Write a 2-person conversation script about the article:

Speaker1 = Host (curious, asks good questions)
Speaker2 = Expert (knowledgeable, gives clear answers)

Format each line as:

[Speaker1] What's the key takeaway here?

Cover the article's main points conversationally
15-25 exchanges (produces ~3-8 minutes)
Natural, not stilted ("That's a great point" over "Indeed, as the research indicates")

关键说明：文本由Claude准备，脚本仅负责TTS转换。

摘要模式：撰写文章的200-300词语音摘要，需遵循以下规则：

以自然口语风格撰写，而非书面语
开篇点明文章的核心结论或答案
涵盖3-5个主要要点
结尾给出可操作建议
不含Markdown格式、“在本文中...”等元注释
使用口语化过渡语（如“重点内容如下...”“核心结论是...”）

全文模式：将Markdown内容转换为干净的口语文本：

标题转换为自然过渡语（如“接下来，我们来看...”）
链接转换为纯文本（移除URL，保留锚文本）
图片和图表：省略或简要描述（如“数据显示...”）
代码块：用语言描述（如“这段代码使用for循环来...”）
列表：转换为自然语句
移除前置元数据、Schema标记、HTML标签
添加简短引言：“这是《[标题]》，发布于[日期]。”

对话模式：撰写关于文章的双人对话脚本：

Speaker1 = 主持人（充满好奇，善于提问）
Speaker2 = 专家（知识渊博，回答清晰）

每行格式：

[Speaker1] 这里的核心要点是什么？

以对话形式涵盖文章主要内容
15-25轮对话（时长约3-8分钟）
风格自然，避免生硬（用“这个观点很棒”替代“确实，正如研究表明的那样”）

Step 4: Select Voice

步骤4：选择语音

If the user chose a voice, use it. Otherwise, recommend based on mode:

Summary/Full: default to Charon (Informative)
Dialogue: default to Puck (Host) + Kore (Expert)

若用户已选择语音则使用该语音，否则根据模式推荐：

摘要/全文模式：默认使用Charon（资讯性）
对话模式：默认使用Puck（主持人）+ Kore（专家）

Step 5: Generate Audio

步骤5：生成音频

Write the prepared text to a temp file, then call:

bash

undefined

将准备好的文本写入临时文件，然后调用：

bash

undefined

Single voice (summary or full mode)

单语音（摘要或全文模式）

python3 scripts/run.py generate_audio.py
--text-file /tmp/blog_audio_prepared.txt
--voice Charon
--model flash
--output /path/to/audio/post-slug.mp3
--json

Two voices (dialogue mode)

双语音（对话模式）

python3 scripts/run.py generate_audio.py
--text-file /tmp/blog_audio_dialogue.txt
--voice Puck
--voice2 Kore
--model pro
--output /path/to/audio/post-slug-dialogue.mp3
--json


**Model selection:**
- `flash` (default): Fast, cheap. Good for summaries and standard narration.
- `pro`: Higher quality. Use for dialogue mode or premium content.

python3 scripts/run.py generate_audio.py
--text-file /tmp/blog_audio_dialogue.txt
--voice Puck
--voice2 Kore
--model pro
--output /path/to/audio/post-slug-dialogue.mp3
--json


**模型选择**：
- `flash`（默认）：快速、低成本，适合摘要和标准旁白
- `pro`：更高质量，适合对话模式或 premium 内容

Step 6: Deliver

步骤6：交付结果

Present the result to the user:

File path -- where the audio was saved
Duration -- human-readable (e.g., "3:42")
Embed code -- ready-to-paste HTML5 audio tag
Cost -- estimated API cost
Placement suggestion -- where to insert the embed in the blog post

向用户展示以下结果：

文件路径 -- 音频保存位置
时长 -- 易读格式（如“3:42”）
嵌入代码 -- 可直接粘贴的HTML5音频标签
成本 -- API估算费用
放置建议 -- 嵌入代码在博客文章中的插入位置

Embedding Guide

嵌入指南

Standard HTML (Hugo, Jekyll, static sites)

标准HTML（Hugo、Jekyll、静态站点）

html

<audio controls preload="metadata">
  <source src="audio/post-slug.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

html

<audio controls preload="metadata">
  <source src="audio/post-slug.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

MDX (Next.js, Gatsby)

MDX（Next.js、Gatsby）

jsx

<audio controls preload="metadata">
  <source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>

jsx

<audio controls preload="metadata">
  <source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>

WordPress

[audio src="audio/post-slug.mp3"]

[audio src="audio/post-slug.mp3"]

Placement

放置位置

Insert the audio player after the introduction (below the first H2) or at the very top of the article with a label: "Listen to this article" or "Audio version".

将音频播放器插入引言之后（第一个H2下方），或放在文章最顶部并添加标签：“收听本文”或“音频版本”。

Internal API (for blog-write)

内部API（供blog-write调用）

When invoked internally from blog-write:

Input:

```
text
```
: Prepared text (already cleaned by Claude)
```
voice
```
: Voice name (default: Charon)
```
voice2
```
: Second voice for dialogue (optional)
```
model
```
: flash or pro
```
output_path
```
: Where to save the file

Output:

markdown

undefined

当从blog-write内部调用时：

输入参数：

```
text
```
: 准备好的文本（已由Claude清理）
```
voice
```
: 语音名称（默认：Charon）
```
voice2
```
: 对话模式的第二个语音（可选）
```
model
```
: flash或pro
```
output_path
```
: 文件保存路径

输出格式：

markdown

undefined

Audio Narration

音频旁白

Path: /path/to/audio/post-slug.mp3
Duration: 3:42
Voice: Charon

Embed:

<audio controls preload="metadata"><source src="audio/post-slug.mp3" type="audio/mpeg"></audio>


**Graceful fallback:** If `GOOGLE_AI_API_KEY` is not set, return immediately
with no error. The writing workflow continues without audio. Never block
blog-write because audio generation is unavailable.

路径: /path/to/audio/post-slug.mp3
时长: 3:42
语音: Charon

嵌入代码:

<audio controls preload="metadata"><source src="audio/post-slug.mp3" type="audio/mpeg"></audio>


**优雅降级**：若未设置`GOOGLE_AI_API_KEY`，则立即静默返回，不报错。写作流程将继续，不会因音频生成不可用而阻塞。

Error Handling

错误处理

Error	Resolution
GOOGLE_AI_API_KEY not set	Get key at https://aistudio.google.com/apikey
FFmpeg not found	Install: `sudo apt install ffmpeg` . Falls back to WAV output.
Rate limited	Wait and retry. Check limits at https://aistudio.google.com/rate-limit
Text too long (>32k tokens)	Split into sections, generate separately
Unknown voice name	Run `/blog audio voices` to see valid options
API error	Check key validity, model availability (preview models)
API key missing (internal call)	Return silently -- writing workflow continues

错误	解决方案
未设置GOOGLE_AI_API_KEY	前往https://aistudio.google.com/apikey获取密钥
未找到FFmpeg	安装： `sudo apt install ffmpeg` ，降级输出WAV格式
速率限制	等待后重试，查看限制：https://aistudio.google.com/rate-limit
文本过长（>32k tokens）	拆分段落，分别生成
未知语音名称	运行 `/blog audio voices` 查看有效选项
API错误	检查密钥有效性、模型可用性（预览模型）
内部调用时缺失API密钥	静默返回——写作流程继续

Reference Documentation

参考文档

Load on-demand -- do NOT load all at startup:

```
references/voices.md
```
-- Full 30-voice catalog, recommendations by content type, dialogue pairings

按需加载——请勿在启动时全部加载：

```
references/voices.md
```
-- 完整的30种语音目录、按内容类型推荐的语音、对话组合推荐