speech-use
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSpeech Use
语音功能使用指南
Use this skill to perform Text-to-Speech (TTS), Speech-to-Text (STT), and Voice Cloning operations.
This skill uses portable Python scripts managed by .
uv使用该skill可执行文本转语音(TTS)、语音转文字(STT)以及语音克隆操作。
本skill采用由管理的可移植Python脚本。
uvPrerequisites
前置条件
-
Environment Variables:
- (for TTS via Gemini)
GOOGLE_API_KEY - (Required for STT and Voice Cloning)
GOOGLE_CLOUD_PROJECT - (Recommended for STT/Voice Cloning)
GOOGLE_APPLICATION_CREDENTIALS
-
APIs Enabled:
- Text-to-Speech API ()
texttospeech.googleapis.com - Speech-to-Text API ()
speech.googleapis.com
- Text-to-Speech API (
-
环境变量:
- (用于通过Gemini实现TTS功能)
GOOGLE_API_KEY - (STT和语音克隆功能必填)
GOOGLE_CLOUD_PROJECT - (推荐用于STT/语音克隆功能)
GOOGLE_APPLICATION_CREDENTIALS
-
已启用的API:
- 文本转语音API()
texttospeech.googleapis.com - 语音转文字API()
speech.googleapis.com
- 文本转语音API(
Usage
使用方法
1. Generate Speech (TTS)
1. 生成语音(TTS)
Generate audio from text using Gemini-TTS.
Standard Voice:
bash
uv run skills/speech-use/scripts/generate_speech.py "Hello world, this is a test." --voice Puck --output hello.wavCustom Voice (Cloned):
bash
uv run skills/speech-use/scripts/generate_speech.py "This is my custom voice speaking." --voice-cloning-key "YOUR_KEY_HERE" --output custom.wav通过Gemini-TTS将文本转换为音频。
标准语音:
bash
uv run skills/speech-use/scripts/generate_speech.py "Hello world, this is a test." --voice Puck --output hello.wav自定义语音(克隆语音):
bash
uv run skills/speech-use/scripts/generate_speech.py "This is my custom voice speaking." --voice-cloning-key "YOUR_KEY_HERE" --output custom.wav2. Create Custom Voice (Voice Cloning)
2. 创建自定义语音(语音克隆)
Generate a from a reference audio file and a consent file.
voiceCloningKeyRequirements:
- : 10-30s of clear speech (the voice to clone).
reference.wav - : The speaker saying: "I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model."
consent.wav
bash
uv run skills/speech-use/scripts/create_custom_voice.py --reference-audio reference.wav --consent-audio consent.wavSave the output key to use with .
generate_speech.py从参考音频文件和同意音频文件生成。
voiceCloningKey要求:
- : 10-30秒清晰语音(待克隆的语音)。
reference.wav - : 说话者需清晰说出:"I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model."
consent.wav
bash
uv run skills/speech-use/scripts/create_custom_voice.py --reference-audio reference.wav --consent-audio consent.wav保存输出的密钥,以便在中使用。
generate_speech.py3. Transcribe Audio (STT)
3. 音频转文字(STT)
Transcribe audio files using Chirp 3.
bash
uv run skills/speech-use/scripts/transcribe_audio.py audio.wav --language en-US --output transcript.txt通过Chirp 3将音频文件转换为文字。
bash
uv run skills/speech-use/scripts/transcribe_audio.py audio.wav --language en-US --output transcript.txtOptions
可选参数
generate_speech.py
- : Prebuilt voice (e.g.,
--voice,Kore,Puck,Fenrir).Aoede - : Key from
--voice-cloning-key.create_custom_voice.py - : Default
--model.gemini-2.5-flash-preview-tts
transcribe_audio.py
- : Default
--model.chirp_3 - : Default
--language.auto - : Cloud region (default
--location).us
generate_speech.py
- : 预构建语音(例如
--voice、Kore、Puck、Fenrir)。Aoede - : 来自
--voice-cloning-key的密钥。create_custom_voice.py - : 默认值为
--model。gemini-2.5-flash-preview-tts
transcribe_audio.py
- : 默认值为
--model。chirp_3 - : 默认值为
--language。auto - : 云区域(默认值为
--location)。us
References
参考资料
Before running scripts, review the reference guides for available voices and options.
- Voices Guide - 30+ voice options with styles (Puck, Kore, Fenrir, Aoede, etc.)
运行脚本前,请查阅参考指南了解可用语音及选项。
- 语音指南 - 30+种带风格的语音选项(Puck、Kore、Fenrir、Aoede等)