giggle-generation-speech
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese简体中文 | English
简体中文 | 英文
Text-to-Audio
文本转音频
Synthesizes text into AI voice/voiceover via giggle.pro. Supports multiple voice tones, emotions, and speaking rates.
通过giggle.pro将文本合成为AI语音/旁白。支持多种音色、情绪和语速。
⚠️ Review Before Installing
⚠️ 安装前须知
Please review the following before installing. This skill will:
- Write to – Task state files for Cron deduplication
~/.openclaw/skills/giggle-generation-speech/logs/ - Register Cron (30s interval) – Async polling when user initiates speech generation; removed when complete
- Forward raw stdout – Script output (audio links, status) is passed to the user as-is
Requirements: , (system environment variable), pip packages:
python3GIGGLE_API_KEYrequestsAPI Key: Set system environment variable . The script will prompt if not configured.
GIGGLE_API_KEYNo inline Python: All commands must be executed via thetool. Never use heredoc inline code.exec
No Retry on Error: If script execution encounters an error, do not retry. Report the error to the user directly and stop.
请在安装前查看以下内容。 本Skill将:
- 写入至 – 用于Cron去重的任务状态文件
~/.openclaw/skills/giggle-generation-speech/logs/ - 注册Cron任务(30秒间隔)– 用户发起语音生成时进行异步轮询;任务完成后自动移除
- 转发原始标准输出 – 脚本输出(音频链接、状态)将直接传递给用户
要求:、(系统环境变量)、pip包:
python3GIGGLE_API_KEYrequestsAPI密钥:设置系统环境变量。若未配置,脚本将提示用户。
GIGGLE_API_KEY禁止内嵌Python代码:所有命令必须通过工具执行。绝对不要使用 heredoc 内嵌代码。exec
错误时不重试:若脚本执行出错,请勿重试。直接向用户报告错误并停止操作。
Execution Flow (Phase 1 Submit + Phase 2 Cron + Phase 3 Sync Fallback)
执行流程(阶段1提交 + 阶段2Cron轮询 + 阶段3同步回退)
Speech generation typically takes 10–30 seconds. Uses "fast submit + Cron poll + sync fallback" three-phase architecture.
Important: Never passin exec'sGIGGLE_API_KEYparameter. API Key is read from system environment variable.env
语音生成通常需要10–30秒。采用“快速提交 + Cron轮询 + 同步回退”的三阶段架构。
重要提示:绝对不要在exec的参数中传递env。API密钥将从系统环境变量读取。GIGGLE_API_KEY
Phase 0: Guide User to Select Voice and Emotion (required)
阶段0:引导用户选择音色和情绪(必填)
Before submitting, you must guide the user to select voice and emotion. Do not use defaults.
- Run to get available voices:
--list-voices
bash
python3 scripts/text_to_audio_api.py --list-voices- Display the voice list to the user in a readable format (voice_id, name, style, gender, etc.) and guide them to pick one
- Ask the user's preferred emotion (e.g. joy, sad, neutral, angry, surprise). Use neutral if no preference
- Only after the user confirms voice and emotion, proceed to Phase 1 submit
提交任务前,必须引导用户选择音色和情绪。不得使用默认值。
- 运行获取可用音色:
--list-voices
bash
python3 scripts/text_to_audio_api.py --list-voices- 以易读格式(voice_id、名称、风格、性别等)向用户展示音色列表,并引导用户选择
- 询问用户偏好的情绪(如开心、悲伤、中性、愤怒、惊讶)。若无偏好则使用中性
- 仅在用户确认音色和情绪后,进入阶段1提交任务
Phase 1: Submit Task (exec completes in ~10 seconds)
阶段1:提交任务(exec执行约10秒完成)
First send a message to the user: "Speech generation in progress, usually takes 10–30 seconds. Results will be sent automatically."
bash
undefined首先向用户发送消息:“语音生成中,通常需要10–30秒。结果将自动发送给您。”
bash
undefinedMust specify user-selected voice and emotion
必须指定用户选择的音色和情绪
python3 scripts/text_to_audio_api.py
--text "The weather is nice today"
--voice-id "Calm_Woman"
--emotion "joy"
--speed 1.2
--no-wait --json
--text "The weather is nice today"
--voice-id "Calm_Woman"
--emotion "joy"
--speed 1.2
--no-wait --json
python3 scripts/text_to_audio_api.py
--text "今天天气很好"
--voice-id "Calm_Woman"
--emotion "joy"
--speed 1.2
--no-wait --json
--text "今天天气很好"
--voice-id "Calm_Woman"
--emotion "joy"
--speed 1.2
--no-wait --json
View available voices
查看可用音色
python3 scripts/text_to_audio_api.py --list-voices
Response example:
```json
{"status": "started", "task_id": "xxx"}Immediately store task_id in memory ():
addMemorygiggle-generation-speech task_id: xxx (submitted: YYYY-MM-DD HH:mm)python3 scripts/text_to_audio_api.py --list-voices
响应示例:
```json
{"status": "started", "task_id": "xxx"}立即将task_id存储到内存(使用):
addMemorygiggle-generation-speech task_id: xxx (submitted: YYYY-MM-DD HH:mm)Phase 2: Register Cron (30 second interval)
阶段2:注册Cron任务(30秒间隔)
Use the tool to register the polling job. Strictly follow the parameter format:
cronjson
{
"action": "add",
"job": {
"name": "giggle-generation-speech-<first 8 chars of task_id>",
"schedule": {
"kind": "every",
"everyMs": 30000
},
"payload": {
"kind": "systemEvent",
"text": "Speech task poll: exec python3 scripts/text_to_audio_api.py --query --task-id <full task_id>, handle stdout per Cron logic. If stdout is non-JSON plain text, forward to user and remove Cron. If stdout is JSON, do not send message, keep waiting. If stdout is empty, remove Cron immediately."
},
"sessionTarget": "main"
}
}Cron trigger handling (based on exec stdout):
| stdout pattern | Action |
|---|---|
Non-empty plain text (not starting with | Forward to user as-is, remove Cron |
| stdout empty | Already pushed, remove Cron immediately, do not send message |
JSON (starts with | Do not send message, do not remove Cron, keep waiting |
使用工具注册轮询任务。严格遵循参数格式:
cronjson
{
"action": "add",
"job": {
"name": "giggle-generation-speech-<first 8 chars of task_id>",
"schedule": {
"kind": "every",
"everyMs": 30000
},
"payload": {
"kind": "systemEvent",
"text": "Speech task poll: exec python3 scripts/text_to_audio_api.py --query --task-id <full task_id>, handle stdout per Cron logic. If stdout is non-JSON plain text, forward to user and remove Cron. If stdout is JSON, do not send message, keep waiting. If stdout is empty, remove Cron immediately."
},
"sessionTarget": "main"
}
}Cron触发处理逻辑(基于exec的标准输出):
| 标准输出模式 | 操作 |
|---|---|
非空纯文本(不以 | 直接转发给用户,移除Cron任务 |
| 标准输出为空 | 结果已推送,立即移除Cron任务,不发送消息 |
JSON格式(以 | 不发送消息,不移除Cron任务,继续等待 |
Phase 3: Sync Wait (optimistic path, fallback when Cron hasn't fired)
阶段3:同步等待(优化路径,Cron未触发时的回退方案)
Execute this step whether or not Cron registration succeeded.
bash
python3 scripts/text_to_audio_api.py --query --task-id <task_id> --poll --max-wait 120Handling logic:
- Returns plain text (speech ready/failed message) → Forward to user as-is, remove Cron
- stdout empty → Cron already pushed, remove Cron, do not send message
- exec timeout → Cron continues polling
无论Cron注册是否成功,都需执行此步骤。
bash
python3 scripts/text_to_audio_api.py --query --task-id <task_id> --poll --max-wait 120处理逻辑:
- 返回纯文本(语音就绪/失败消息)→ 直接转发给用户,移除Cron任务
- 标准输出为空 → Cron已推送结果,移除Cron任务,不发送消息
- exec超时 → Cron继续轮询
View Voice List
查看音色列表
When the user wants to see available voices, run:
bash
python3 scripts/text_to_audio_api.py --list-voicesThe script calls and displays voice_id, name, style, gender, age, language to the user.
GET /api/v1/project/preset_tones当用户想要查看可用音色时,运行:
bash
python3 scripts/text_to_audio_api.py --list-voices该脚本调用接口,并向用户展示voice_id、名称、风格、性别、年龄、语言等信息。
GET /api/v1/project/preset_tonesLink Return Rule
链接返回规则
Audio links returned to the user must be full signed URLs (with Policy, Key-Pair-Id, Signature query params). Correct: . Wrong: do not return unsigned URLs with only the base path (no query params). The script handles encoding to ; keep as-is when forwarding.
https://assets.giggle.pro/...?Policy=...&Key-Pair-Id=...&Signature=...~%7E返回给用户的音频链接必须是完整的签名URL(包含Policy、Key-Pair-Id、Signature查询参数)。正确示例:。错误示例:不得仅返回不带查询参数的基础路径的未签名URL。脚本会将编码为;转发时请保持原样。
https://assets.giggle.pro/...?Policy=...&Key-Pair-Id=...&Signature=...~%7ENew Request vs Query Old Task
新请求与查询旧任务
When the user initiates a new speech generation request, must run Phase 1 to submit a new task. Do not reuse old task_id from memory.
Only when the user explicitly asks about a previous task's progress should you query the old task_id from memory.
当用户发起新的语音生成请求时,必须执行阶段1提交新任务。不得复用内存中的旧task_id。
仅当用户明确询问之前任务的进度时,才从内存中查询旧task_id。
Parameter Reference
参数参考
| Parameter | Required | Default | Description |
|---|---|---|---|
| yes | - | Text to synthesize |
| yes | - | Voice ID; must get via |
| yes | - | Emotion: joy, sad, neutral, angry, surprise, etc. Guide user to choose |
| no | 1 | Speaking rate multiplier |
| - | - | Get available voice list |
| - | - | Query task status |
| required for query | - | Task ID |
| no | - | Sync poll with |
| no | 120 | Max wait seconds |
| 参数 | 必填 | 默认值 | 描述 |
|---|---|---|---|
| 是 | - | 待合成的文本 |
| 是 | - | 音色ID;必须通过 |
| 是 | - | 情绪:开心、悲伤、中性、愤怒、惊讶等。引导用户选择 |
| 否 | 1 | 语速倍数 |
| - | - | 获取可用音色列表 |
| - | - | 查询任务状态 |
| 查询时必填 | - | 任务ID |
| 否 | - | 结合 |
| 否 | 120 | 最大等待秒数 |
Interaction Guide
交互指南
Before each speech generation, complete this interaction:
- If the user did not provide text, ask: "Which text would you like to convert to speech?"
- Must guide user to select voice: Run , display list, have user choose. Do not use default voice
--list-voices - Must guide user to select emotion: Ask the user's preferred emotion (joy, sad, neutral, angry, surprise, etc.)
- After user confirms text, voice, and emotion, run Phase 1 submit → Phase 2 register Cron → Phase 3 sync wait
每次语音生成前,必须完成以下交互:
- 若用户未提供文本,询问:“您想要将哪段文本转换为语音?”
- 必须引导用户选择音色:运行,展示列表并让用户选择。不得使用默认音色
--list-voices - 必须引导用户选择情绪:询问用户偏好的情绪(开心、悲伤、中性、愤怒、惊讶等)
- 在用户确认文本、音色和情绪后,执行阶段1提交任务 → 阶段2注册Cron任务 → 阶段3同步等待