audio-jingle

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Audio Jingle Skill

音频短曲Skill

Three sub-modes. The active project's
audioKind
decides which one runs:
audioKind
Models we route toPlan focus
music
Suno V5 (default), Udio, Lyria 2genre + tempo + instrumentation
speech
MiniMax TTS (default), Fish, ElevenLabs V3script + voice + pacing
sfx
ElevenLabs SFX (default), AudioCrafttexture + impact + duration
包含三种子模式,由当前项目的
audioKind
决定启用哪一种:
audioKind
路由至的模型规划重点
music
Suno V5(默认)、Udio、Lyria 2流派 + 节奏 + 乐器配置
speech
MiniMax TTS(默认)、Fish、ElevenLabs V3脚本 + 音色 + 语速
sfx
ElevenLabs SFX(默认)、AudioCraft质感 + 冲击力 + 时长

Resource map

资源结构

audio-jingle/
├── SKILL.md
└── example.html
audio-jingle/
├── SKILL.md
└── example.html

Workflow

工作流程

Step 0 — Read the project metadata

步骤0 — 读取项目元数据

audioKind
,
audioModel
,
audioDuration
(seconds), and (for speech)
voice
. Branch by
audioKind
and use the values verbatim — no clarifying form unless something is marked
(unknown — ask)
.
Important:
voice
is provider-specific. For
minimax-tts
,
--voice
must be a valid MiniMax
voice_id
(for example
male-qn-qingse
), not a natural-language description. If you only have a prose voice brief ("warm female narrator", "neutral Mandarin"), keep that in your plan but omit
--voice
so the daemon's default voice id applies, or ask the user to choose a specific id.
读取
audioKind
audioModel
audioDuration
(秒)以及(语音模式下的)
voice
。根据
audioKind
分支处理,直接使用这些值——除非标记为
(unknown — ask)
,否则无需额外确认。
重要提示:
voice
是服务商专属参数。对于
minimax-tts
--voice
必须是有效的MiniMax
voice_id
(例如
male-qn-qingse
),而非自然语言描述。如果只有文字化的音色需求(如“温暖的女性旁白”、“中性普通话”),可将其纳入规划,但省略
--voice
参数,使用守护进程的默认音色ID,或请用户选择具体的ID。

Step 1 — Plan

步骤1 — 制定规划

Music
  • Genre + reference artists (1-2)
  • Tempo (BPM) + key
  • Instrumentation (3-5 instruments max)
  • Vocals: yes / no / hummed / choir
  • Mood arc (intro → chorus → outro)
Speech
  • Script (final, not draft — TTS runs verbatim)
  • Voice target + pacing For MiniMax this means a real
    voice_id
    , not prose in
    --voice
  • Pronunciation hints for proper nouns / acronyms
SFX
  • Texture (impact / whoosh / ambience / foley)
  • Duration + envelope (sharp attack vs. gentle swell)
  • Layering note (single hit vs. stacked)
State the plan in 2-3 sentences before dispatching.
音乐模式
  • 流派 + 参考艺术家(1-2位)
  • 节奏(BPM) + 调式
  • 乐器配置(最多3-5种乐器)
  • 人声:是/否/哼唱/合唱
  • 情绪曲线(开场 → 高潮 → 收尾)
语音模式
  • 脚本(最终版本,非草稿——TTS将严格按原文朗读)
  • 目标音色 + 语速 对于MiniMax,需使用真实的
    voice_id
    ,而非文字描述作为
    --voice
    参数值
  • 专有名词/缩写的发音提示
音效模式
  • 质感(撞击声/呼啸声/环境音/拟音)
  • 时长 + 包络(尖锐爆发 vs 平缓渐起)
  • 分层说明(单次音效 vs 叠加音效)
在发送请求前,用2-3句话说明规划内容。

Step 2 — Compose the prompt

步骤2 — 编写提示词

Use the format the upstream model prefers. Bind
audioDuration
to the API parameter directly; never put "make it 30 seconds" in prose.
采用上游模型偏好的格式。将
audioDuration
直接绑定到API参数;切勿在提示词中写入“制作30秒时长”这类描述。

Step 3 — Dispatch via the media contract

步骤3 — 通过媒体协议发送请求

Use the unified dispatcher — do not call provider APIs by hand:
bash
node "$OD_BIN" media generate \
  --project "$OD_PROJECT_ID" \
  --surface audio \
  --audio-kind "<music|speech|sfx>" \
  --model "<audioModel from metadata>" \
  --duration <audioDuration seconds> \
  [--voice "<provider voice id (speech only)>"] \
  --output "<short-slug>-<duration>s.mp3" \
  --prompt "<assembled prompt from Step 2 — for speech, the literal script>"
The command prints one line of JSON:
{"file": {"name": "...", ...}}
. The bytes land in the project; the FileViewer renders the audio transport controls automatically.
使用统一调度器——不要手动调用服务商API:
bash
node "$OD_BIN" media generate \
  --project "$OD_PROJECT_ID" \
  --surface audio \
  --audio-kind "<music|speech|sfx>" \
  --model "<audioModel from metadata>" \
  --duration <audioDuration seconds> \
  [--voice "<provider voice id (speech only)>"] \
  --output "<short-slug>-<duration>s.mp3" \
  --prompt "<assembled prompt from Step 2 — for speech, the literal script>"
该命令会输出一行JSON:
{"file": {"name": "...", ...}}
。音频文件将存入项目,FileViewer会自动渲染音频播放控件。

Step 4 — Hand off

步骤4 — 交付结果

Reply with: plan summary, the filename returned by the dispatcher, and one sentence on what to try if the user wants a variation (e.g. "swap tempo from 92 to 108 BPM" rather than "make it different").
回复内容需包含:规划摘要、调度器返回的文件名,以及一句关于如何调整变体的建议(例如“将节奏从92 BPM改为108 BPM”,而非“做出改变”)。

Hard rules

硬性规则

  • TTS runs your script literally. Proof it before dispatching — even one stray comma changes the cadence.
  • MiniMax TTS rejects free-form voice prose in
    --voice
    . Use a real MiniMax
    voice_id
    (for example
    male-qn-qingse
    ) or omit the flag and let the daemon's default voice apply.
  • Music: under 30s = single section; 30–90s = intro + body; 90s+ = full arc. Don't try to fit a 3-act song into 15 seconds.
  • SFX: prefer one well-described layer over a paragraph of "make it cool" — generators reward specific texture words.
  • Save the file every turn. The audio viewer shows transport controls the moment the file lands.
  • TTS将严格按照脚本朗读。发送请求前务必校对——哪怕一个多余的逗号都会改变语调。
  • MiniMax TTS不接受
    --voice
    参数中的自由格式文字描述。请使用真实的MiniMax
    voice_id
    (例如
    male-qn-qingse
    ),或省略该参数,使用守护进程的默认音色。
  • 音乐:时长小于30秒=单段结构;30–90秒=开场+主体;90秒以上=完整结构。不要试图在15秒内塞进三段式歌曲。
  • 音效:优先使用精准描述的单层音效,而非堆砌“让它更酷”这类模糊表述——生成器更青睐具体的质感词汇。
  • 每次操作都要保存文件。文件存入后,音频查看器会立即显示播放控件。