giggle-generation-speech

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

简体中文 | English

简体中文 | 英文

Text-to-Audio

文本转音频

Synthesizes text into AI voice/voiceover via giggle.pro. Supports multiple voice tones, emotions, and speaking rates.

通过giggle.pro将文本合成为AI语音/旁白。支持多种音色、情绪和语速。

⚠️ Review Before Installing

⚠️ 安装前须知

Please review the following before installing. This skill will:

Write to
```
~/.openclaw/skills/giggle-generation-speech/logs/
```
– Task state files for Cron deduplication
Register Cron (30s interval) – Async polling when user initiates speech generation; removed when complete
Forward raw stdout – Script output (audio links, status) is passed to the user as-is

Requirements:

python3

GIGGLE_API_KEY

(system environment variable), pip packages:

requests

API Key: Set system environment variable

GIGGLE_API_KEY

. The script will prompt if not configured.

No inline Python: All commands must be executed via the
exec
tool. Never use heredoc inline code.

No Retry on Error: If script execution encounters an error, do not retry. Report the error to the user directly and stop.

请在安装前查看以下内容。 本Skill将：

写入至
```
~/.openclaw/skills/giggle-generation-speech/logs/
```
– 用于Cron去重的任务状态文件
注册Cron任务（30秒间隔）– 用户发起语音生成时进行异步轮询；任务完成后自动移除
转发原始标准输出 – 脚本输出（音频链接、状态）将直接传递给用户

要求：

python3

、

GIGGLE_API_KEY

（系统环境变量）、pip包：

requests

API密钥：设置系统环境变量

GIGGLE_API_KEY

。若未配置，脚本将提示用户。

禁止内嵌Python代码：所有命令必须通过
exec
工具执行。绝对不要使用 heredoc 内嵌代码。

错误时不重试：若脚本执行出错，请勿重试。直接向用户报告错误并停止操作。

Execution Flow (Phase 1 Submit + Phase 2 Cron + Phase 3 Sync Fallback)

执行流程（阶段1提交 + 阶段2Cron轮询 + 阶段3同步回退）

Speech generation typically takes 10–30 seconds. Uses "fast submit + Cron poll + sync fallback" three-phase architecture.

Important: Never pass
GIGGLE_API_KEY
in exec's
env
parameter. API Key is read from system environment variable.

语音生成通常需要10–30秒。采用“快速提交 + Cron轮询 + 同步回退”的三阶段架构。

重要提示：绝对不要在exec的
env
参数中传递
GIGGLE_API_KEY
。API密钥将从系统环境变量读取。

Phase 0: Guide User to Select Voice and Emotion (required)

阶段0：引导用户选择音色和情绪（必填）

Before submitting, you must guide the user to select voice and emotion. Do not use defaults.

Run
```
--list-voices
```
to get available voices:

bash

python3 scripts/text_to_audio_api.py --list-voices

Display the voice list to the user in a readable format (voice_id, name, style, gender, etc.) and guide them to pick one
Ask the user's preferred emotion (e.g. joy, sad, neutral, angry, surprise). Use neutral if no preference
Only after the user confirms voice and emotion, proceed to Phase 1 submit

提交任务前，必须引导用户选择音色和情绪。不得使用默认值。

运行
```
--list-voices
```
获取可用音色：

bash

python3 scripts/text_to_audio_api.py --list-voices

以易读格式（voice_id、名称、风格、性别等）向用户展示音色列表，并引导用户选择
询问用户偏好的情绪（如开心、悲伤、中性、愤怒、惊讶）。若无偏好则使用中性
仅在用户确认音色和情绪后，进入阶段1提交任务

Phase 1: Submit Task (exec completes in ~10 seconds)

阶段1：提交任务（exec执行约10秒完成）

First send a message to the user: "Speech generation in progress, usually takes 10–30 seconds. Results will be sent automatically."

bash

undefined

首先向用户发送消息：“语音生成中，通常需要10–30秒。结果将自动发送给您。”

bash

undefined

Must specify user-selected voice and emotion

必须指定用户选择的音色和情绪

python3 scripts/text_to_audio_api.py
--text "The weather is nice today"
--voice-id "Calm_Woman"
--emotion "joy"
--speed 1.2
--no-wait --json

python3 scripts/text_to_audio_api.py
--text "今天天气很好"
--voice-id "Calm_Woman"
--emotion "joy"
--speed 1.2
--no-wait --json

View available voices

查看可用音色

python3 scripts/text_to_audio_api.py --list-voices


Response example:

```json
{"status": "started", "task_id": "xxx"}

Immediately store task_id in memory (

addMemory

giggle-generation-speech task_id: xxx (submitted: YYYY-MM-DD HH:mm)

python3 scripts/text_to_audio_api.py --list-voices


响应示例：

```json
{"status": "started", "task_id": "xxx"}

立即将task_id存储到内存（使用

addMemory

）：

giggle-generation-speech task_id: xxx (submitted: YYYY-MM-DD HH:mm)

Phase 2: Register Cron (30 second interval)

阶段2：注册Cron任务（30秒间隔）

Use the

cron

tool to register the polling job. Strictly follow the parameter format:

json

{
  "action": "add",
  "job": {
    "name": "giggle-generation-speech-<first 8 chars of task_id>",
    "schedule": {
      "kind": "every",
      "everyMs": 30000
    },
    "payload": {
      "kind": "systemEvent",
      "text": "Speech task poll: exec python3 scripts/text_to_audio_api.py --query --task-id <full task_id>, handle stdout per Cron logic. If stdout is non-JSON plain text, forward to user and remove Cron. If stdout is JSON, do not send message, keep waiting. If stdout is empty, remove Cron immediately."
    },
    "sessionTarget": "main"
  }
}

Cron trigger handling (based on exec stdout):

stdout pattern	Action
Non-empty plain text (not starting with `{` )	Forward to user as-is, remove Cron
stdout empty	Already pushed, remove Cron immediately, do not send message
JSON (starts with `{` , has `"status"` field)	Do not send message, do not remove Cron, keep waiting

使用

cron

工具注册轮询任务。严格遵循参数格式：

json

{
  "action": "add",
  "job": {
    "name": "giggle-generation-speech-<first 8 chars of task_id>",
    "schedule": {
      "kind": "every",
      "everyMs": 30000
    },
    "payload": {
      "kind": "systemEvent",
      "text": "Speech task poll: exec python3 scripts/text_to_audio_api.py --query --task-id <full task_id>, handle stdout per Cron logic. If stdout is non-JSON plain text, forward to user and remove Cron. If stdout is JSON, do not send message, keep waiting. If stdout is empty, remove Cron immediately."
    },
    "sessionTarget": "main"
  }
}

Cron触发处理逻辑（基于exec的标准输出）：

标准输出模式	操作
非空纯文本（不以 `{` 开头）	直接转发给用户，移除Cron任务
标准输出为空	结果已推送，立即移除Cron任务，不发送消息
JSON格式（以 `{` 开头，包含 `"status"` 字段）	不发送消息，不移除Cron任务，继续等待

Phase 3: Sync Wait (optimistic path, fallback when Cron hasn't fired)

阶段3：同步等待（优化路径，Cron未触发时的回退方案）

Execute this step whether or not Cron registration succeeded.

bash

python3 scripts/text_to_audio_api.py --query --task-id <task_id> --poll --max-wait 120

Handling logic:

Returns plain text (speech ready/failed message) → Forward to user as-is, remove Cron
stdout empty → Cron already pushed, remove Cron, do not send message
exec timeout → Cron continues polling

无论Cron注册是否成功，都需执行此步骤。

bash

python3 scripts/text_to_audio_api.py --query --task-id <task_id> --poll --max-wait 120

处理逻辑：

返回纯文本（语音就绪/失败消息）→ 直接转发给用户，移除Cron任务
标准输出为空 → Cron已推送结果，移除Cron任务，不发送消息
exec超时 → Cron继续轮询

View Voice List

查看音色列表

When the user wants to see available voices, run:

bash

python3 scripts/text_to_audio_api.py --list-voices

The script calls

GET /api/v1/project/preset_tones

and displays voice_id, name, style, gender, age, language to the user.

当用户想要查看可用音色时，运行：

bash

python3 scripts/text_to_audio_api.py --list-voices

该脚本调用

GET /api/v1/project/preset_tones

接口，并向用户展示voice_id、名称、风格、性别、年龄、语言等信息。

Link Return Rule

链接返回规则

Audio links returned to the user must be full signed URLs (with Policy, Key-Pair-Id, Signature query params). Correct:

https://assets.giggle.pro/...?Policy=...&Key-Pair-Id=...&Signature=...

. Wrong: do not return unsigned URLs with only the base path (no query params). The script handles

encoding to

%7E

; keep as-is when forwarding.

返回给用户的音频链接必须是完整的签名URL（包含Policy、Key-Pair-Id、Signature查询参数）。正确示例：

https://assets.giggle.pro/...?Policy=...&Key-Pair-Id=...&Signature=...

。错误示例：不得仅返回不带查询参数的基础路径的未签名URL。脚本会将

编码为

%7E

；转发时请保持原样。

New Request vs Query Old Task

新请求与查询旧任务

When the user initiates a new speech generation request, must run Phase 1 to submit a new task. Do not reuse old task_id from memory.

Only when the user explicitly asks about a previous task's progress should you query the old task_id from memory.

当用户发起新的语音生成请求时，必须执行阶段1提交新任务。不得复用内存中的旧task_id。

仅当用户明确询问之前任务的进度时，才从内存中查询旧task_id。

Parameter Reference

参数参考

Parameter	Required	Default	Description
`--text`	yes	-	Text to synthesize
`--voice-id`	yes	-	Voice ID; must get via `--list-voices` and guide user to choose
`--emotion`	yes	-	Emotion: joy, sad, neutral, angry, surprise, etc. Guide user to choose
`--speed`	no	1	Speaking rate multiplier
`--list-voices`	-	-	Get available voice list
`--query`	-	-	Query task status
`--task-id`	required for query	-	Task ID
`--poll`	no	-	Sync poll with `--query`
`--max-wait`	no	120	Max wait seconds

参数	必填	默认值	描述
`--text`	是	-	待合成的文本
`--voice-id`	是	-	音色ID；必须通过 `--list-voices` 获取并引导用户选择
`--emotion`	是	-	情绪：开心、悲伤、中性、愤怒、惊讶等。引导用户选择
`--speed`	否	1	语速倍数
`--list-voices`	-	-	获取可用音色列表
`--query`	-	-	查询任务状态
`--task-id`	查询时必填	-	任务ID
`--poll`	否	-	结合 `--query` 进行同步轮询
`--max-wait`	否	120	最大等待秒数

Interaction Guide

交互指南

Before each speech generation, complete this interaction:

If the user did not provide text, ask: "Which text would you like to convert to speech?"
Must guide user to select voice: Run
```
--list-voices
```
, display list, have user choose. Do not use default voice
Must guide user to select emotion: Ask the user's preferred emotion (joy, sad, neutral, angry, surprise, etc.)
After user confirms text, voice, and emotion, run Phase 1 submit → Phase 2 register Cron → Phase 3 sync wait

每次语音生成前，必须完成以下交互：

若用户未提供文本，询问：“您想要将哪段文本转换为语音？”
必须引导用户选择音色：运行
```
--list-voices
```
，展示列表并让用户选择。不得使用默认音色
必须引导用户选择情绪：询问用户偏好的情绪（开心、悲伤、中性、愤怒、惊讶等）
在用户确认文本、音色和情绪后，执行阶段1提交任务 → 阶段2注册Cron任务 → 阶段3同步等待

{ "action": "add", "job": { "name": "giggle-generation-speech-<first 8 chars of task_id>", "schedule": { "kind": "every", "everyMs": 30000 }, "payload": { "kind": "systemEvent", "text": "Speech task poll: exec python3 scripts/text_to_audio_api.py --query --task-id <full task_id>, handle stdout per Cron logic. If stdout is non-JSON plain text, forward to user and remove Cron. If stdout is JSON, do not send message, keep waiting. If stdout is empty, remove Cron immediately." }, "sessionTarget": "main" } }

stdout pattern

Action

Non-empty plain text (not starting with

)

Forward to user as-is, remove Cron

stdout empty

Already pushed, remove Cron immediately, do not send message

JSON (starts with

, has

"status"

field)

Do not send message, do not remove Cron, keep waiting

标准输出模式

操作

非空纯文本（不以

开头）

直接转发给用户，移除Cron任务

标准输出为空

结果已推送，立即移除Cron任务，不发送消息

JSON格式（以

开头，包含

"status"

字段）

不发送消息，不移除Cron任务，继续等待

Parameter

Required

Default

Description

--text

yes

Text to synthesize

--voice-id

yes

Voice ID; must get via

--list-voices

and guide user to choose

--emotion

yes

Emotion: joy, sad, neutral, angry, surprise, etc. Guide user to choose

--speed

Speaking rate multiplier

--list-voices

Get available voice list

--query

Query task status

--task-id

required for query

Task ID

--poll

Sync poll with

--query

--max-wait

120

Max wait seconds

参数

必填

默认值

描述

--text

是

待合成的文本

--voice-id

是

音色ID；必须通过

--list-voices

获取并引导用户选择

--emotion

是

情绪：开心、悲伤、中性、愤怒、惊讶等。引导用户选择

--speed

否

语速倍数

--list-voices

获取可用音色列表

--query

查询任务状态

--task-id

查询时必填

任务ID

--poll

否

结合

--query

进行同步轮询

--max-wait

否

120

最大等待秒数