audio-generation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAudio Generation (Music & Sound Effects)
音频生成(音乐与音效)
Use this skill when the user wants to generate music compositions or sound effects from text descriptions. The system supports 8 provider backends with automatic fallback chains and user-configurable provider preferences.
This skill covers two complementary APIs:
- generateMusic() — Full-length musical compositions from text prompts
- generateSFX() — Short sound effects from text descriptions
当用户想要通过文本描述生成音乐作品或音效时,可以使用此技能。系统支持8种服务商后端,具备自动回退链和用户可配置的服务商偏好设置。
本技能包含两个互补的API:
- generateMusic() — 通过文本提示生成完整长度的音乐作品
- generateSFX() — 通过文本描述生成短音效
Music Generation
音乐生成
Basic Usage
基础用法
Generate music from a text prompt. The system auto-detects the best available provider from environment variables in priority order: (highest quality) -> -> -> -> -> local MusicGen (no key required).
SUNO_API_KEYUDIO_API_KEYSTABILITY_API_KEYREPLICATE_API_TOKENFAL_API_KEYtypescript
import { generateMusic } from 'agentos';
const result = await generateMusic({
prompt: 'Upbeat lo-fi hip hop beat with vinyl crackle and mellow piano',
durationSec: 60,
});
console.log(result.audio[0].url);通过文本提示生成音乐。系统会根据环境变量的优先级自动检测最佳可用服务商:(最高质量)→ → → → → 本地MusicGen(无需密钥)。
SUNO_API_KEYUDIO_API_KEYSTABILITY_API_KEYREPLICATE_API_TOKENFAL_API_KEYtypescript
import { generateMusic } from 'agentos';
const result = await generateMusic({
prompt: 'Upbeat lo-fi hip hop beat with vinyl crackle and mellow piano',
durationSec: 60,
});
console.log(result.audio[0].url);Prompt Tips for Music
音乐提示词技巧
- Specify genre and mood first: "melancholic jazz ballad", "aggressive drum and bass", "peaceful ambient soundscape"
- Include instrumentation: "with acoustic guitar, soft brushed drums, and upright bass"
- Mention tempo and energy: "slow tempo, 70 BPM", "high energy, driving rhythm"
- Add texture and production: "lo-fi with vinyl crackle", "clean studio recording", "reverb-heavy shoegaze"
- Reference eras or styles: "1970s progressive rock", "modern trap production", "classical baroque harpsichord"
- Use negative prompts where supported:
negativePrompt: 'vocals, singing, lyrics'
- 先指定流派和情绪:"忧郁爵士抒情曲"、"激进鼓打贝斯"、"宁静氛围音景"
- 包含乐器信息:"搭配原声吉他、轻柔刷击鼓和立式贝斯"
- 提及速度和能量:"慢节奏,70 BPM"、"高能量,强劲节奏"
- 添加质感和制作风格:"带黑胶杂音的lo-fi风格"、"干净的工作室录制"、"混响浓重的自赏摇滚"
- 参考时代或风格:"1970年代前卫摇滚"、"现代陷阱制作"、"古典巴洛克大键琴"
- 在支持的服务商中使用负面提示词:
negativePrompt: 'vocals, singing, lyrics'
Music Options
音乐生成选项
| Option | Default | Description |
|---|---|---|
| (required) | Text description of the desired music |
| auto-detect | Provider ID ( |
| provider default | Model identifier within the provider |
| provider default | Output duration in seconds (Suno: up to ~240s, Stable Audio: ~47s) |
| — | Musical elements to avoid (not all providers support this) |
| | Output format: |
| random | Seed for reproducible output (provider-dependent) |
| | Number of clips to generate |
| provider default | Max wait time before polling providers time out |
| — | Best-effort callback for |
| — | Reorder, block, or weight providers for auto-selection and fallback |
| 选项 | 默认值 | 说明 |
|---|---|---|
| 必填 | 目标音乐的文本描述 |
| 自动检测 | 服务商ID( |
| 服务商默认值 | 服务商内的模型标识符 |
| 服务商默认值 | 输出时长(单位:秒,Suno最长约240秒,Stable Audio约47秒) |
| — | 需要避免的音乐元素(并非所有服务商都支持) |
| | 输出格式: |
| 随机值 | 用于生成可复现结果的种子(取决于服务商) |
| | 生成的音频片段数量 |
| 服务商默认值 | 轮询服务商超时前的最长等待时间 |
| — | 用于跟踪状态的回调函数( |
| — | 重新排序、屏蔽或设置服务商权重,用于自动选择和回退 |
Sound Effect Generation
音效生成
Basic Usage
基础用法
Generate a sound effect from a text description. The SFX detection order is: (highest quality) -> -> -> -> local AudioGen (no key required).
ELEVENLABS_API_KEYSTABILITY_API_KEYREPLICATE_API_TOKENFAL_API_KEYtypescript
import { generateSFX } from 'agentos';
const result = await generateSFX({
prompt: 'Thunder crack followed by heavy rain on a tin roof',
durationSec: 5,
});
console.log(result.audio[0].url);通过文本描述生成音效。音效服务商的检测优先级为:(最高质量)→ → → → 本地AudioGen(无需密钥)。
ELEVENLABS_API_KEYSTABILITY_API_KEYREPLICATE_API_TOKENFAL_API_KEYtypescript
import { generateSFX } from 'agentos';
const result = await generateSFX({
prompt: 'Thunder crack followed by heavy rain on a tin roof',
durationSec: 5,
});
console.log(result.audio[0].url);Prompt Tips for Sound Effects
音效提示词技巧
- Be specific about the sound: "glass bottle shattering on concrete floor" rather than just "glass breaking"
- Describe the environment: "footsteps on gravel in an empty parking garage with echo"
- Layer multiple sounds: "busy city intersection with car horns, distant sirens, and pedestrian chatter"
- Specify duration context: short stingers (1-3s) vs ambient loops (10-15s)
- Include physical properties: "heavy wooden door creaking open slowly", "small metallic click of a light switch"
- 明确描述声音细节:比如用“玻璃瓶在混凝土地面碎裂”代替“玻璃破碎”
- 描述环境:“空旷停车场内碎石路上的脚步声,带有回声”
- 叠加多种声音:“繁忙城市路口,包含汽车喇叭、远处警笛声和行人交谈声”
- 明确时长场景:短提示音(1-3秒) vs 环境循环音(10-15秒)
- 包含物理属性:“厚重木门缓慢吱呀打开”、“电灯开关的细微金属咔哒声”
SFX Options
音效生成选项
| Option | Default | Description |
|---|---|---|
| (required) | Text description of the desired sound effect |
| auto-detect | Provider ID ( |
| provider default | Model identifier within the provider |
| provider default | Output duration in seconds (SFX is typically 1-15s) |
| | Output format: |
| random | Seed for reproducible output (provider-dependent) |
| | Number of clips to generate |
| provider default | Max wait time before polling providers time out |
| — | Best-effort callback for |
| — | Reorder, block, or weight providers for auto-selection and fallback |
| 选项 | 默认值 | 说明 |
|---|---|---|
| 必填 | 目标音效的文本描述 |
| 自动检测 | 服务商ID( |
| 服务商默认值 | 服务商内的模型标识符 |
| 服务商默认值 | 输出时长(单位:秒,音效通常为1-15秒) |
| | 输出格式: |
| 随机值 | 用于生成可复现结果的种子(取决于服务商) |
| | 生成的音频片段数量 |
| 服务商默认值 | 轮询服务商超时前的最长等待时间 |
| — | 用于跟踪状态的回调函数( |
| — | 重新排序、屏蔽或设置服务商权重,用于自动选择和回退 |
Provider Selection Guide
服务商选择指南
Music Providers
音乐服务商
| Provider | ID | Best For | Env Var | Key Required |
|---|---|---|---|---|
| Suno | | Highest quality vocals + instrumentals, full songs | | Yes |
| Udio | | High quality music, alternative to Suno | | Yes |
| Stable Audio | | Instrumentals, loops, ambient, fast generation | | Yes |
| Replicate | | Open-source models (MusicGen), pay-per-use | | Yes |
| Fal | | Fast serverless GPU, cost-effective | | Yes |
| MusicGen Local | | Offline generation, no API key needed, privacy | — | No |
| 服务商 | ID | 适用场景 | 环境变量 | 是否需要密钥 |
|---|---|---|---|---|
| Suno | | 最高质量的人声+器乐、完整歌曲 | | 是 |
| Udio | | 高质量音乐,Suno的替代选项 | | 是 |
| Stable Audio | | 器乐、循环音、氛围音,生成速度快 | | 是 |
| Replicate | | 开源模型(MusicGen),按使用付费 | | 是 |
| Fal | | 快速无服务器GPU,性价比高 | | 是 |
| MusicGen Local | | 离线生成,无需API密钥,注重隐私 | — | 否 |
SFX Providers
音效服务商
| Provider | ID | Best For | Env Var | Key Required |
|---|---|---|---|---|
| ElevenLabs | | Highest quality SFX, fast turnaround | | Yes |
| Stable Audio | | Good SFX + music in one provider | | Yes |
| Replicate | | Open-source AudioGen model, pay-per-use | | Yes |
| Fal | | Fast serverless GPU | | Yes |
| AudioGen Local | | Offline SFX generation, no API key needed | — | No |
| 服务商 | ID | 适用场景 | 环境变量 | 是否需要密钥 |
|---|---|---|---|---|
| ElevenLabs | | 最高质量音效,生成速度快 | | 是 |
| Stable Audio | | 同一家服务商支持音效+音乐 | | 是 |
| Replicate | | 开源AudioGen模型,按使用付费 | | 是 |
| Fal | | 快速无服务器GPU | | 是 |
| AudioGen Local | | 离线音效生成,无需API密钥 | — | 否 |
Forcing a Specific Provider
指定特定服务商
typescript
const result = await generateMusic({
prompt: 'Chill synthwave with arpeggiated synths',
provider: 'stable-audio',
apiKey: 'your-stability-key',
durationSec: 30,
});typescript
const result = await generateMusic({
prompt: 'Chill synthwave with arpeggiated synths',
provider: 'stable-audio',
apiKey: 'your-stability-key',
durationSec: 30,
});Provider Preferences
服务商偏好设置
Use to control both auto-selection and fallback ordering without hardcoding a single provider. This is useful for load balancing, cost optimization, or respecting user preferences.
providerPreferencestypescript
import { generateMusic } from 'agentos';
// Prefer Suno, fall back to Stable Audio, never use Udio
const result = await generateMusic({
prompt: 'Orchestral film score with dramatic strings',
providerPreferences: {
preferred: ['suno', 'stable-audio'],
blocked: ['udio'],
},
});使用控制自动选择和回退顺序,无需硬编码单个服务商。这适用于负载均衡、成本优化或遵循用户偏好的场景。
providerPreferencestypescript
import { generateMusic } from 'agentos';
// 优先使用Suno,回退到Stable Audio,绝不使用Udio
const result = await generateMusic({
prompt: 'Orchestral film score with dramatic strings',
providerPreferences: {
preferred: ['suno', 'stable-audio'],
blocked: ['udio'],
},
});Preference Fields
偏好设置字段
| Field | Description |
|---|---|
| Ordered list of provider IDs to try first. Providers not in this list are excluded. |
| Provider IDs to unconditionally exclude from the chain. |
| Weight map for weighted primary selection after filtering/reordering (useful for A/B testing or load balancing). |
Provider preferences work identically across , , , and .
generateMusic()generateSFX()generateImage()generateVideo()| 字段 | 说明 |
|---|---|
| 优先尝试的服务商ID有序列表。未在此列表中的服务商会被排除。 |
| 无条件排除在回退链之外的服务商ID。 |
| 过滤/重排序后用于加权选择的权重映射(适用于A/B测试或负载均衡)。 |
服务商偏好设置在、、和中的工作方式完全相同。
generateMusic()generateSFX()generateImage()generateVideo()When to Use Music vs SFX vs TTS
音乐、音效与TTS的适用场景区分
| Need | API | Why |
|---|---|---|
| Background music, songs, jingles | | Optimized for musical compositions with melody, harmony, rhythm |
| Sound effects, foley, ambient sounds | | Optimized for short, non-musical audio (impacts, nature, UI sounds) |
| Speech, narration, voice cloning | TTS (speech subsystem) | Use the speech/TTS APIs instead — audio generation is for non-speech |
| Podcast intros with music + voice | Combine both | Generate music with |
| 需求 | API | 原因 |
|---|---|---|
| 背景音乐、歌曲、广告配乐 | | 针对包含旋律、和声、节奏的音乐作品优化 |
| 音效、拟音、环境音 | | 针对短时长、非音乐类音频(撞击声、自然音、UI音效)优化 |
| 语音、旁白、语音克隆 | TTS(语音子系统) | 使用语音/TTS API替代——音频生成适用于非语音内容 |
| 包含音乐+语音的播客开场 | 结合两者 | 用 |
Combining Audio
音频合成
The audio generation APIs return URLs or base64 data that can be combined in downstream workflows:
- Generate background music:
generateMusic({ prompt: 'Gentle ambient pad' }) - Generate SFX stingers:
generateSFX({ prompt: 'Notification chime' }) - Generate speech: Use the TTS subsystem for narration
- Mix: Use ffmpeg or a Web Audio API pipeline to layer the tracks
typescript
import { generateMusic, generateSFX } from 'agentos';
// Generate assets in parallel
const [music, sfx] = await Promise.all([
generateMusic({ prompt: 'Calm podcast background music', durationSec: 120 }),
generateSFX({ prompt: 'Soft transition whoosh', durationSec: 2 }),
]);
// Use the URLs/base64 data in your mixing pipeline
console.log('Music:', music.audio[0].url);
console.log('SFX:', sfx.audio[0].url);音频生成API返回的URL或base64数据可在下游工作流中合成:
- 生成背景音乐:
generateMusic({ prompt: 'Gentle ambient pad' }) - 生成提示音效:
generateSFX({ prompt: 'Notification chime' }) - 生成语音:使用TTS子系统生成旁白
- 混音:使用ffmpeg或Web Audio API流水线叠加音轨
typescript
import { generateMusic, generateSFX } from 'agentos';
// 并行生成资源
const [music, sfx] = await Promise.all([
generateMusic({ prompt: 'Calm podcast background music', durationSec: 120 }),
generateSFX({ prompt: 'Soft transition whoosh', durationSec: 2 }),
]);
// 在混音流水线中使用URL/base64数据
console.log('Music:', music.audio[0].url);
console.log('SFX:', sfx.audio[0].url);Local Providers (No API Key)
本地服务商(无需API密钥)
Both MusicGen and AudioGen can run locally without any API keys using HuggingFace Transformers.js. The models are downloaded on first use and cached locally.
Requirements:
- must be installed as a peer dependency
@huggingface/transformers - Sufficient RAM for model inference (MusicGen Small ~1GB, AudioGen Medium ~2GB)
typescript
// Explicitly use local generation
const result = await generateMusic({
prompt: 'Simple piano melody',
provider: 'musicgen-local',
});Local providers are automatically used as the last fallback when no cloud API keys are configured.
MusicGen和AudioGen均可通过HuggingFace Transformers.js在本地运行,无需任何API密钥。模型会在首次使用时下载并缓存到本地。
要求:
- 必须安装作为对等依赖
@huggingface/transformers - 具备足够的RAM用于模型推理(MusicGen Small约1GB,AudioGen Medium约2GB)
typescript
// 明确使用本地生成
const result = await generateMusic({
prompt: 'Simple piano melody',
provider: 'musicgen-local',
});当未配置云端API密钥时,本地服务商会自动作为最后的回退选项。
Prerequisites
前置条件
- At least one audio provider API key for cloud generation, OR for local generation
@huggingface/transformers - For music: ,
SUNO_API_KEY,UDIO_API_KEY,STABILITY_API_KEY, orREPLICATE_API_TOKENFAL_API_KEY - For SFX: ,
ELEVENLABS_API_KEY,STABILITY_API_KEY, orREPLICATE_API_TOKENFAL_API_KEY
- 至少拥有一个音频服务商的API密钥用于云端生成,或安装用于本地生成
@huggingface/transformers - 音乐生成:、
SUNO_API_KEY、UDIO_API_KEY、STABILITY_API_KEY或REPLICATE_API_TOKENFAL_API_KEY - 音效生成:、
ELEVENLABS_API_KEY、STABILITY_API_KEY或REPLICATE_API_TOKENFAL_API_KEY
Examples
示例
- "Generate a 60-second lo-fi hip hop beat for a study playlist"
- "Create a thunder and rain sound effect for my podcast intro"
- "Make upbeat electronic music for a product demo video"
- "Generate a notification chime sound effect"
- "Create ambient forest sounds with birds and a gentle stream"
- "Generate a dramatic orchestral score for a trailer"
- "Make a retro 8-bit video game soundtrack"
- "Create footstep sounds on different surfaces — wood, gravel, snow"
- "为学习播放列表生成一段60秒的lo-fi嘻哈节拍"
- "为我的播客开场创建雷雨音效"
- "为产品演示视频制作 upbeat电子音乐"
- "生成通知提示音"
- "创建包含鸟鸣和轻柔溪流声的森林环境音"
- "为预告片生成戏剧性管弦乐配乐"
- "制作复古8位电子游戏原声"
- "生成不同表面上的脚步声——木质地面、碎石、雪地"