speech-use

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Speech Use

语音功能使用指南

Use this skill to perform Text-to-Speech (TTS), Speech-to-Text (STT), and Voice Cloning operations.

This skill uses portable Python scripts managed by

uv

使用该skill可执行文本转语音（TTS）、语音转文字（STT）以及语音克隆操作。

本skill采用由

uv

管理的可移植Python脚本。

Prerequisites

前置条件

Environment Variables:
- ```
GOOGLE_API_KEY
```
  (for TTS via Gemini)
- ```
GOOGLE_CLOUD_PROJECT
```
  (Required for STT and Voice Cloning)
- ```
GOOGLE_APPLICATION_CREDENTIALS
```
  (Recommended for STT/Voice Cloning)
APIs Enabled:
- Text-to-Speech API (
```
texttospeech.googleapis.com
```
  )
- Speech-to-Text API (
```
speech.googleapis.com
```
  )

环境变量:
- ```
GOOGLE_API_KEY
```
  （用于通过Gemini实现TTS功能）
- ```
GOOGLE_CLOUD_PROJECT
```
  （STT和语音克隆功能必填）
- ```
GOOGLE_APPLICATION_CREDENTIALS
```
  （推荐用于STT/语音克隆功能）
已启用的API:
- 文本转语音API（
```
texttospeech.googleapis.com
```
  ）
- 语音转文字API（
```
speech.googleapis.com
```
  ）

Usage

使用方法

1. Generate Speech (TTS)

1. 生成语音（TTS）

Generate audio from text using Gemini-TTS.

Standard Voice:

bash

uv run skills/speech-use/scripts/generate_speech.py "Hello world, this is a test." --voice Puck --output hello.wav

Custom Voice (Cloned):

bash

uv run skills/speech-use/scripts/generate_speech.py "This is my custom voice speaking." --voice-cloning-key "YOUR_KEY_HERE" --output custom.wav

通过Gemini-TTS将文本转换为音频。

标准语音:

bash

uv run skills/speech-use/scripts/generate_speech.py "Hello world, this is a test." --voice Puck --output hello.wav

自定义语音（克隆语音）:

bash

uv run skills/speech-use/scripts/generate_speech.py "This is my custom voice speaking." --voice-cloning-key "YOUR_KEY_HERE" --output custom.wav

2. Create Custom Voice (Voice Cloning)

2. 创建自定义语音（语音克隆）

Generate a

voiceCloningKey

from a reference audio file and a consent file.

Requirements:

```
reference.wav
```
: 10-30s of clear speech (the voice to clone).
```
consent.wav
```
: The speaker saying: "I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model."

bash

uv run skills/speech-use/scripts/create_custom_voice.py --reference-audio reference.wav --consent-audio consent.wav

Save the output key to use with
generate_speech.py
.

从参考音频文件和同意音频文件生成

voiceCloningKey

。

要求:

```
reference.wav
```
: 10-30秒清晰语音（待克隆的语音）。
```
consent.wav
```
: 说话者需清晰说出："I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model."

bash

uv run skills/speech-use/scripts/create_custom_voice.py --reference-audio reference.wav --consent-audio consent.wav

保存输出的密钥，以便在
generate_speech.py
中使用。

3. Transcribe Audio (STT)

3. 音频转文字（STT）

Transcribe audio files using Chirp 3.

bash

uv run skills/speech-use/scripts/transcribe_audio.py audio.wav --language en-US --output transcript.txt

通过Chirp 3将音频文件转换为文字。

bash

uv run skills/speech-use/scripts/transcribe_audio.py audio.wav --language en-US --output transcript.txt

Options

可选参数

generate_speech.py

```
--voice
```
: Prebuilt voice (e.g.,
```
Kore
```
,
```
Puck
```
,
```
Fenrir
```
,
```
Aoede
```
).

--voice-cloning-key

: Key from

create_custom_voice.py

```
--model
```
: Default
```
gemini-2.5-flash-preview-tts
```
.

transcribe_audio.py

```
--model
```
: Default
```
chirp_3
```
.
```
--language
```
: Default
```
auto
```
.
```
--location
```
: Cloud region (default
```
us
```
).

generate_speech.py

```
--voice
```
: 预构建语音（例如
```
Kore
```
、
```
Puck
```
、
```
Fenrir
```
、
```
Aoede
```
）。

--voice-cloning-key

: 来自

create_custom_voice.py

的密钥。

```
--model
```
: 默认值为
```
gemini-2.5-flash-preview-tts
```
。

transcribe_audio.py

```
--model
```
: 默认值为
```
chirp_3
```
。
```
--language
```
: 默认值为
```
auto
```
。
```
--location
```
: 云区域（默认值为
```
us
```
）。

References

参考资料

Before running scripts, review the reference guides for available voices and options.

Voices Guide - 30+ voice options with styles (Puck, Kore, Fenrir, Aoede, etc.)

运行脚本前，请查阅参考指南了解可用语音及选项。

语音指南 - 30+种带风格的语音选项（Puck、Kore、Fenrir、Aoede等）