audio-generation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Audio Generation (Music & Sound Effects)

音频生成(音乐与音效)

Use this skill when the user wants to generate music compositions or sound effects from text descriptions. The system supports 8 provider backends with automatic fallback chains and user-configurable provider preferences.
This skill covers two complementary APIs:
  1. generateMusic() — Full-length musical compositions from text prompts
  2. generateSFX() — Short sound effects from text descriptions
当用户想要通过文本描述生成音乐作品或音效时,可以使用此技能。系统支持8种服务商后端,具备自动回退链和用户可配置的服务商偏好设置。
本技能包含两个互补的API:
  1. generateMusic() — 通过文本提示生成完整长度的音乐作品
  2. generateSFX() — 通过文本描述生成短音效

Music Generation

音乐生成

Basic Usage

基础用法

Generate music from a text prompt. The system auto-detects the best available provider from environment variables in priority order:
SUNO_API_KEY
(highest quality) ->
UDIO_API_KEY
->
STABILITY_API_KEY
->
REPLICATE_API_TOKEN
->
FAL_API_KEY
-> local MusicGen (no key required).
typescript
import { generateMusic } from 'agentos';

const result = await generateMusic({
  prompt: 'Upbeat lo-fi hip hop beat with vinyl crackle and mellow piano',
  durationSec: 60,
});
console.log(result.audio[0].url);
通过文本提示生成音乐。系统会根据环境变量的优先级自动检测最佳可用服务商:
SUNO_API_KEY
(最高质量)→
UDIO_API_KEY
STABILITY_API_KEY
REPLICATE_API_TOKEN
FAL_API_KEY
→ 本地MusicGen(无需密钥)。
typescript
import { generateMusic } from 'agentos';

const result = await generateMusic({
  prompt: 'Upbeat lo-fi hip hop beat with vinyl crackle and mellow piano',
  durationSec: 60,
});
console.log(result.audio[0].url);

Prompt Tips for Music

音乐提示词技巧

  • Specify genre and mood first: "melancholic jazz ballad", "aggressive drum and bass", "peaceful ambient soundscape"
  • Include instrumentation: "with acoustic guitar, soft brushed drums, and upright bass"
  • Mention tempo and energy: "slow tempo, 70 BPM", "high energy, driving rhythm"
  • Add texture and production: "lo-fi with vinyl crackle", "clean studio recording", "reverb-heavy shoegaze"
  • Reference eras or styles: "1970s progressive rock", "modern trap production", "classical baroque harpsichord"
  • Use negative prompts where supported:
    negativePrompt: 'vocals, singing, lyrics'
  • 先指定流派和情绪:"忧郁爵士抒情曲"、"激进鼓打贝斯"、"宁静氛围音景"
  • 包含乐器信息:"搭配原声吉他、轻柔刷击鼓和立式贝斯"
  • 提及速度和能量:"慢节奏,70 BPM"、"高能量,强劲节奏"
  • 添加质感和制作风格:"带黑胶杂音的lo-fi风格"、"干净的工作室录制"、"混响浓重的自赏摇滚"
  • 参考时代或风格:"1970年代前卫摇滚"、"现代陷阱制作"、"古典巴洛克大键琴"
  • 在支持的服务商中使用负面提示词
    negativePrompt: 'vocals, singing, lyrics'

Music Options

音乐生成选项

OptionDefaultDescription
prompt
(required)Text description of the desired music
provider
auto-detectProvider ID (
"suno"
,
"udio"
,
"stable-audio"
, etc.)
model
provider defaultModel identifier within the provider
durationSec
provider defaultOutput duration in seconds (Suno: up to ~240s, Stable Audio: ~47s)
negativePrompt
Musical elements to avoid (not all providers support this)
outputFormat
"mp3"
Output format:
"mp3"
,
"wav"
,
"flac"
,
"ogg"
,
"aac"
seed
randomSeed for reproducible output (provider-dependent)
n
1
Number of clips to generate
timeoutMs
provider defaultMax wait time before polling providers time out
onProgress
Best-effort callback for
queued
processing
complete
/
failed
providerPreferences
Reorder, block, or weight providers for auto-selection and fallback
选项默认值说明
prompt
必填目标音乐的文本描述
provider
自动检测服务商ID(
"suno"
"udio"
"stable-audio"
等)
model
服务商默认值服务商内的模型标识符
durationSec
服务商默认值输出时长(单位:秒,Suno最长约240秒,Stable Audio约47秒)
negativePrompt
需要避免的音乐元素(并非所有服务商都支持)
outputFormat
"mp3"
输出格式:
"mp3"
"wav"
"flac"
"ogg"
"aac"
seed
随机值用于生成可复现结果的种子(取决于服务商)
n
1
生成的音频片段数量
timeoutMs
服务商默认值轮询服务商超时前的最长等待时间
onProgress
用于跟踪状态的回调函数(
queued
processing
complete
/
failed
providerPreferences
重新排序、屏蔽或设置服务商权重,用于自动选择和回退

Sound Effect Generation

音效生成

Basic Usage

基础用法

Generate a sound effect from a text description. The SFX detection order is:
ELEVENLABS_API_KEY
(highest quality) ->
STABILITY_API_KEY
->
REPLICATE_API_TOKEN
->
FAL_API_KEY
-> local AudioGen (no key required).
typescript
import { generateSFX } from 'agentos';

const result = await generateSFX({
  prompt: 'Thunder crack followed by heavy rain on a tin roof',
  durationSec: 5,
});
console.log(result.audio[0].url);
通过文本描述生成音效。音效服务商的检测优先级为:
ELEVENLABS_API_KEY
(最高质量)→
STABILITY_API_KEY
REPLICATE_API_TOKEN
FAL_API_KEY
→ 本地AudioGen(无需密钥)。
typescript
import { generateSFX } from 'agentos';

const result = await generateSFX({
  prompt: 'Thunder crack followed by heavy rain on a tin roof',
  durationSec: 5,
});
console.log(result.audio[0].url);

Prompt Tips for Sound Effects

音效提示词技巧

  • Be specific about the sound: "glass bottle shattering on concrete floor" rather than just "glass breaking"
  • Describe the environment: "footsteps on gravel in an empty parking garage with echo"
  • Layer multiple sounds: "busy city intersection with car horns, distant sirens, and pedestrian chatter"
  • Specify duration context: short stingers (1-3s) vs ambient loops (10-15s)
  • Include physical properties: "heavy wooden door creaking open slowly", "small metallic click of a light switch"
  • 明确描述声音细节:比如用“玻璃瓶在混凝土地面碎裂”代替“玻璃破碎”
  • 描述环境:“空旷停车场内碎石路上的脚步声,带有回声”
  • 叠加多种声音:“繁忙城市路口,包含汽车喇叭、远处警笛声和行人交谈声”
  • 明确时长场景:短提示音(1-3秒) vs 环境循环音(10-15秒)
  • 包含物理属性:“厚重木门缓慢吱呀打开”、“电灯开关的细微金属咔哒声”

SFX Options

音效生成选项

OptionDefaultDescription
prompt
(required)Text description of the desired sound effect
provider
auto-detectProvider ID (
"elevenlabs-sfx"
,
"stable-audio"
, etc.)
model
provider defaultModel identifier within the provider
durationSec
provider defaultOutput duration in seconds (SFX is typically 1-15s)
outputFormat
"mp3"
Output format:
"mp3"
,
"wav"
,
"flac"
,
"ogg"
,
"aac"
seed
randomSeed for reproducible output (provider-dependent)
n
1
Number of clips to generate
timeoutMs
provider defaultMax wait time before polling providers time out
onProgress
Best-effort callback for
queued
processing
complete
/
failed
providerPreferences
Reorder, block, or weight providers for auto-selection and fallback
选项默认值说明
prompt
必填目标音效的文本描述
provider
自动检测服务商ID(
"elevenlabs-sfx"
"stable-audio"
等)
model
服务商默认值服务商内的模型标识符
durationSec
服务商默认值输出时长(单位:秒,音效通常为1-15秒)
outputFormat
"mp3"
输出格式:
"mp3"
"wav"
"flac"
"ogg"
"aac"
seed
随机值用于生成可复现结果的种子(取决于服务商)
n
1
生成的音频片段数量
timeoutMs
服务商默认值轮询服务商超时前的最长等待时间
onProgress
用于跟踪状态的回调函数(
queued
processing
complete
/
failed
providerPreferences
重新排序、屏蔽或设置服务商权重,用于自动选择和回退

Provider Selection Guide

服务商选择指南

Music Providers

音乐服务商

ProviderIDBest ForEnv VarKey Required
Suno
suno
Highest quality vocals + instrumentals, full songs
SUNO_API_KEY
Yes
Udio
udio
High quality music, alternative to Suno
UDIO_API_KEY
Yes
Stable Audio
stable-audio
Instrumentals, loops, ambient, fast generation
STABILITY_API_KEY
Yes
Replicate
replicate-audio
Open-source models (MusicGen), pay-per-use
REPLICATE_API_TOKEN
Yes
Fal
fal-audio
Fast serverless GPU, cost-effective
FAL_API_KEY
Yes
MusicGen Local
musicgen-local
Offline generation, no API key needed, privacyNo
服务商ID适用场景环境变量是否需要密钥
Suno
suno
最高质量的人声+器乐、完整歌曲
SUNO_API_KEY
Udio
udio
高质量音乐,Suno的替代选项
UDIO_API_KEY
Stable Audio
stable-audio
器乐、循环音、氛围音,生成速度快
STABILITY_API_KEY
Replicate
replicate-audio
开源模型(MusicGen),按使用付费
REPLICATE_API_TOKEN
Fal
fal-audio
快速无服务器GPU,性价比高
FAL_API_KEY
MusicGen Local
musicgen-local
离线生成,无需API密钥,注重隐私

SFX Providers

音效服务商

ProviderIDBest ForEnv VarKey Required
ElevenLabs
elevenlabs-sfx
Highest quality SFX, fast turnaround
ELEVENLABS_API_KEY
Yes
Stable Audio
stable-audio
Good SFX + music in one provider
STABILITY_API_KEY
Yes
Replicate
replicate-audio
Open-source AudioGen model, pay-per-use
REPLICATE_API_TOKEN
Yes
Fal
fal-audio
Fast serverless GPU
FAL_API_KEY
Yes
AudioGen Local
audiogen-local
Offline SFX generation, no API key neededNo
服务商ID适用场景环境变量是否需要密钥
ElevenLabs
elevenlabs-sfx
最高质量音效,生成速度快
ELEVENLABS_API_KEY
Stable Audio
stable-audio
同一家服务商支持音效+音乐
STABILITY_API_KEY
Replicate
replicate-audio
开源AudioGen模型,按使用付费
REPLICATE_API_TOKEN
Fal
fal-audio
快速无服务器GPU
FAL_API_KEY
AudioGen Local
audiogen-local
离线音效生成,无需API密钥

Forcing a Specific Provider

指定特定服务商

typescript
const result = await generateMusic({
  prompt: 'Chill synthwave with arpeggiated synths',
  provider: 'stable-audio',
  apiKey: 'your-stability-key',
  durationSec: 30,
});
typescript
const result = await generateMusic({
  prompt: 'Chill synthwave with arpeggiated synths',
  provider: 'stable-audio',
  apiKey: 'your-stability-key',
  durationSec: 30,
});

Provider Preferences

服务商偏好设置

Use
providerPreferences
to control both auto-selection and fallback ordering without hardcoding a single provider. This is useful for load balancing, cost optimization, or respecting user preferences.
typescript
import { generateMusic } from 'agentos';

// Prefer Suno, fall back to Stable Audio, never use Udio
const result = await generateMusic({
  prompt: 'Orchestral film score with dramatic strings',
  providerPreferences: {
    preferred: ['suno', 'stable-audio'],
    blocked: ['udio'],
  },
});
使用
providerPreferences
控制自动选择和回退顺序,无需硬编码单个服务商。这适用于负载均衡、成本优化或遵循用户偏好的场景。
typescript
import { generateMusic } from 'agentos';

// 优先使用Suno,回退到Stable Audio,绝不使用Udio
const result = await generateMusic({
  prompt: 'Orchestral film score with dramatic strings',
  providerPreferences: {
    preferred: ['suno', 'stable-audio'],
    blocked: ['udio'],
  },
});

Preference Fields

偏好设置字段

FieldDescription
preferred
Ordered list of provider IDs to try first. Providers not in this list are excluded.
blocked
Provider IDs to unconditionally exclude from the chain.
weights
Weight map for weighted primary selection after filtering/reordering (useful for A/B testing or load balancing).
Provider preferences work identically across
generateMusic()
,
generateSFX()
,
generateImage()
, and
generateVideo()
.
字段说明
preferred
优先尝试的服务商ID有序列表。未在此列表中的服务商会被排除。
blocked
无条件排除在回退链之外的服务商ID。
weights
过滤/重排序后用于加权选择的权重映射(适用于A/B测试或负载均衡)。
服务商偏好设置在
generateMusic()
generateSFX()
generateImage()
generateVideo()
中的工作方式完全相同。

When to Use Music vs SFX vs TTS

音乐、音效与TTS的适用场景区分

NeedAPIWhy
Background music, songs, jingles
generateMusic()
Optimized for musical compositions with melody, harmony, rhythm
Sound effects, foley, ambient sounds
generateSFX()
Optimized for short, non-musical audio (impacts, nature, UI sounds)
Speech, narration, voice cloningTTS (speech subsystem)Use the speech/TTS APIs instead — audio generation is for non-speech
Podcast intros with music + voiceCombine bothGenerate music with
generateMusic()
, speech with TTS, mix externally
需求API原因
背景音乐、歌曲、广告配乐
generateMusic()
针对包含旋律、和声、节奏的音乐作品优化
音效、拟音、环境音
generateSFX()
针对短时长、非音乐类音频(撞击声、自然音、UI音效)优化
语音、旁白、语音克隆TTS(语音子系统)使用语音/TTS API替代——音频生成适用于非语音内容
包含音乐+语音的播客开场结合两者
generateMusic()
生成音乐,用TTS生成语音,外部混音

Combining Audio

音频合成

The audio generation APIs return URLs or base64 data that can be combined in downstream workflows:
  1. Generate background music:
    generateMusic({ prompt: 'Gentle ambient pad' })
  2. Generate SFX stingers:
    generateSFX({ prompt: 'Notification chime' })
  3. Generate speech: Use the TTS subsystem for narration
  4. Mix: Use ffmpeg or a Web Audio API pipeline to layer the tracks
typescript
import { generateMusic, generateSFX } from 'agentos';

// Generate assets in parallel
const [music, sfx] = await Promise.all([
  generateMusic({ prompt: 'Calm podcast background music', durationSec: 120 }),
  generateSFX({ prompt: 'Soft transition whoosh', durationSec: 2 }),
]);

// Use the URLs/base64 data in your mixing pipeline
console.log('Music:', music.audio[0].url);
console.log('SFX:', sfx.audio[0].url);
音频生成API返回的URL或base64数据可在下游工作流中合成:
  1. 生成背景音乐
    generateMusic({ prompt: 'Gentle ambient pad' })
  2. 生成提示音效
    generateSFX({ prompt: 'Notification chime' })
  3. 生成语音:使用TTS子系统生成旁白
  4. 混音:使用ffmpeg或Web Audio API流水线叠加音轨
typescript
import { generateMusic, generateSFX } from 'agentos';

// 并行生成资源
const [music, sfx] = await Promise.all([
  generateMusic({ prompt: 'Calm podcast background music', durationSec: 120 }),
  generateSFX({ prompt: 'Soft transition whoosh', durationSec: 2 }),
]);

// 在混音流水线中使用URL/base64数据
console.log('Music:', music.audio[0].url);
console.log('SFX:', sfx.audio[0].url);

Local Providers (No API Key)

本地服务商(无需API密钥)

Both MusicGen and AudioGen can run locally without any API keys using HuggingFace Transformers.js. The models are downloaded on first use and cached locally.
Requirements:
  • @huggingface/transformers
    must be installed as a peer dependency
  • Sufficient RAM for model inference (MusicGen Small ~1GB, AudioGen Medium ~2GB)
typescript
// Explicitly use local generation
const result = await generateMusic({
  prompt: 'Simple piano melody',
  provider: 'musicgen-local',
});
Local providers are automatically used as the last fallback when no cloud API keys are configured.
MusicGen和AudioGen均可通过HuggingFace Transformers.js在本地运行,无需任何API密钥。模型会在首次使用时下载并缓存到本地。
要求:
  • 必须安装
    @huggingface/transformers
    作为对等依赖
  • 具备足够的RAM用于模型推理(MusicGen Small约1GB,AudioGen Medium约2GB)
typescript
// 明确使用本地生成
const result = await generateMusic({
  prompt: 'Simple piano melody',
  provider: 'musicgen-local',
});
当未配置云端API密钥时,本地服务商会自动作为最后的回退选项。

Prerequisites

前置条件

  • At least one audio provider API key for cloud generation, OR
    @huggingface/transformers
    for local generation
  • For music:
    SUNO_API_KEY
    ,
    UDIO_API_KEY
    ,
    STABILITY_API_KEY
    ,
    REPLICATE_API_TOKEN
    , or
    FAL_API_KEY
  • For SFX:
    ELEVENLABS_API_KEY
    ,
    STABILITY_API_KEY
    ,
    REPLICATE_API_TOKEN
    , or
    FAL_API_KEY
  • 至少拥有一个音频服务商的API密钥用于云端生成,或安装
    @huggingface/transformers
    用于本地生成
  • 音乐生成:
    SUNO_API_KEY
    UDIO_API_KEY
    STABILITY_API_KEY
    REPLICATE_API_TOKEN
    FAL_API_KEY
  • 音效生成:
    ELEVENLABS_API_KEY
    STABILITY_API_KEY
    REPLICATE_API_TOKEN
    FAL_API_KEY

Examples

示例

  • "Generate a 60-second lo-fi hip hop beat for a study playlist"
  • "Create a thunder and rain sound effect for my podcast intro"
  • "Make upbeat electronic music for a product demo video"
  • "Generate a notification chime sound effect"
  • "Create ambient forest sounds with birds and a gentle stream"
  • "Generate a dramatic orchestral score for a trailer"
  • "Make a retro 8-bit video game soundtrack"
  • "Create footstep sounds on different surfaces — wood, gravel, snow"
  • "为学习播放列表生成一段60秒的lo-fi嘻哈节拍"
  • "为我的播客开场创建雷雨音效"
  • "为产品演示视频制作 upbeat电子音乐"
  • "生成通知提示音"
  • "创建包含鸟鸣和轻柔溪流声的森林环境音"
  • "为预告片生成戏剧性管弦乐配乐"
  • "制作复古8位电子游戏原声"
  • "生成不同表面上的脚步声——木质地面、碎石、雪地"