audio-jingle
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAudio Jingle Skill
音频短曲Skill
Three sub-modes. The active project's decides which one
runs:
audioKind | Models we route to | Plan focus |
|---|---|---|
| Suno V5 (default), Udio, Lyria 2 | genre + tempo + instrumentation |
| MiniMax TTS (default), Fish, ElevenLabs V3 | script + voice + pacing |
| ElevenLabs SFX (default), AudioCraft | texture + impact + duration |
包含三种子模式,由当前项目的决定启用哪一种:
audioKind | 路由至的模型 | 规划重点 |
|---|---|---|
| Suno V5(默认)、Udio、Lyria 2 | 流派 + 节奏 + 乐器配置 |
| MiniMax TTS(默认)、Fish、ElevenLabs V3 | 脚本 + 音色 + 语速 |
| ElevenLabs SFX(默认)、AudioCraft | 质感 + 冲击力 + 时长 |
Resource map
资源结构
audio-jingle/
├── SKILL.md
└── example.htmlaudio-jingle/
├── SKILL.md
└── example.htmlWorkflow
工作流程
Step 0 — Read the project metadata
步骤0 — 读取项目元数据
audioKindaudioModelaudioDurationvoiceaudioKind(unknown — ask)Important: is provider-specific. For ,
must be a valid MiniMax (for example ), not
a natural-language description. If you only have a prose voice brief
("warm female narrator", "neutral Mandarin"), keep that in your plan
but omit so the daemon's default voice id applies, or ask the
user to choose a specific id.
voiceminimax-tts--voicevoice_idmale-qn-qingse--voice读取、、(秒)以及(语音模式下的)。根据分支处理,直接使用这些值——除非标记为,否则无需额外确认。
audioKindaudioModelaudioDurationvoiceaudioKind(unknown — ask)重要提示:是服务商专属参数。对于,必须是有效的MiniMax (例如),而非自然语言描述。如果只有文字化的音色需求(如“温暖的女性旁白”、“中性普通话”),可将其纳入规划,但省略参数,使用守护进程的默认音色ID,或请用户选择具体的ID。
voiceminimax-tts--voicevoice_idmale-qn-qingse--voiceStep 1 — Plan
步骤1 — 制定规划
Music
- Genre + reference artists (1-2)
- Tempo (BPM) + key
- Instrumentation (3-5 instruments max)
- Vocals: yes / no / hummed / choir
- Mood arc (intro → chorus → outro)
Speech
- Script (final, not draft — TTS runs verbatim)
- Voice target + pacing
For MiniMax this means a real , not prose in
voice_id--voice - Pronunciation hints for proper nouns / acronyms
SFX
- Texture (impact / whoosh / ambience / foley)
- Duration + envelope (sharp attack vs. gentle swell)
- Layering note (single hit vs. stacked)
State the plan in 2-3 sentences before dispatching.
音乐模式
- 流派 + 参考艺术家(1-2位)
- 节奏(BPM) + 调式
- 乐器配置(最多3-5种乐器)
- 人声:是/否/哼唱/合唱
- 情绪曲线(开场 → 高潮 → 收尾)
语音模式
- 脚本(最终版本,非草稿——TTS将严格按原文朗读)
- 目标音色 + 语速
对于MiniMax,需使用真实的,而非文字描述作为
voice_id参数值--voice - 专有名词/缩写的发音提示
音效模式
- 质感(撞击声/呼啸声/环境音/拟音)
- 时长 + 包络(尖锐爆发 vs 平缓渐起)
- 分层说明(单次音效 vs 叠加音效)
在发送请求前,用2-3句话说明规划内容。
Step 2 — Compose the prompt
步骤2 — 编写提示词
Use the format the upstream model prefers. Bind to the
API parameter directly; never put "make it 30 seconds" in prose.
audioDuration采用上游模型偏好的格式。将直接绑定到API参数;切勿在提示词中写入“制作30秒时长”这类描述。
audioDurationStep 3 — Dispatch via the media contract
步骤3 — 通过媒体协议发送请求
Use the unified dispatcher — do not call provider APIs by hand:
bash
node "$OD_BIN" media generate \
--project "$OD_PROJECT_ID" \
--surface audio \
--audio-kind "<music|speech|sfx>" \
--model "<audioModel from metadata>" \
--duration <audioDuration seconds> \
[--voice "<provider voice id (speech only)>"] \
--output "<short-slug>-<duration>s.mp3" \
--prompt "<assembled prompt from Step 2 — for speech, the literal script>"The command prints one line of JSON: .
The bytes land in the project; the FileViewer renders the audio
transport controls automatically.
{"file": {"name": "...", ...}}使用统一调度器——不要手动调用服务商API:
bash
node "$OD_BIN" media generate \
--project "$OD_PROJECT_ID" \
--surface audio \
--audio-kind "<music|speech|sfx>" \
--model "<audioModel from metadata>" \
--duration <audioDuration seconds> \
[--voice "<provider voice id (speech only)>"] \
--output "<short-slug>-<duration>s.mp3" \
--prompt "<assembled prompt from Step 2 — for speech, the literal script>"该命令会输出一行JSON:。音频文件将存入项目,FileViewer会自动渲染音频播放控件。
{"file": {"name": "...", ...}}Step 4 — Hand off
步骤4 — 交付结果
Reply with: plan summary, the filename returned by the dispatcher, and
one sentence on what to try if the user wants a variation (e.g. "swap
tempo from 92 to 108 BPM" rather than "make it different").
回复内容需包含:规划摘要、调度器返回的文件名,以及一句关于如何调整变体的建议(例如“将节奏从92 BPM改为108 BPM”,而非“做出改变”)。
Hard rules
硬性规则
- TTS runs your script literally. Proof it before dispatching — even one stray comma changes the cadence.
- MiniMax TTS rejects free-form voice prose in . Use a real MiniMax
--voice(for examplevoice_id) or omit the flag and let the daemon's default voice apply.male-qn-qingse - Music: under 30s = single section; 30–90s = intro + body; 90s+ = full arc. Don't try to fit a 3-act song into 15 seconds.
- SFX: prefer one well-described layer over a paragraph of "make it cool" — generators reward specific texture words.
- Save the file every turn. The audio viewer shows transport controls the moment the file lands.
- TTS将严格按照脚本朗读。发送请求前务必校对——哪怕一个多余的逗号都会改变语调。
- MiniMax TTS不接受参数中的自由格式文字描述。请使用真实的MiniMax
--voice(例如voice_id),或省略该参数,使用守护进程的默认音色。male-qn-qingse - 音乐:时长小于30秒=单段结构;30–90秒=开场+主体;90秒以上=完整结构。不要试图在15秒内塞进三段式歌曲。
- 音效:优先使用精准描述的单层音效,而非堆砌“让它更酷”这类模糊表述——生成器更青睐具体的质感词汇。
- 每次操作都要保存文件。文件存入后,音频查看器会立即显示播放控件。