audio-generation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Audio Generation (Music & Sound Effects)

音频生成（音乐与音效）

Use this skill when the user wants to generate music compositions or sound effects from text descriptions. The system supports 8 provider backends with automatic fallback chains and user-configurable provider preferences.

This skill covers two complementary APIs:

generateMusic() — Full-length musical compositions from text prompts
generateSFX() — Short sound effects from text descriptions

当用户想要通过文本描述生成音乐作品或音效时，可以使用此技能。系统支持8种服务商后端，具备自动回退链和用户可配置的服务商偏好设置。

本技能包含两个互补的API：

generateMusic() — 通过文本提示生成完整长度的音乐作品
generateSFX() — 通过文本描述生成短音效

Music Generation

音乐生成

Basic Usage

基础用法

Generate music from a text prompt. The system auto-detects the best available provider from environment variables in priority order:

SUNO_API_KEY

(highest quality) ->

UDIO_API_KEY

STABILITY_API_KEY

REPLICATE_API_TOKEN

FAL_API_KEY

-> local MusicGen (no key required).

typescript

import { generateMusic } from 'agentos';

const result = await generateMusic({
  prompt: 'Upbeat lo-fi hip hop beat with vinyl crackle and mellow piano',
  durationSec: 60,
});
console.log(result.audio[0].url);

通过文本提示生成音乐。系统会根据环境变量的优先级自动检测最佳可用服务商：

SUNO_API_KEY

（最高质量）→

UDIO_API_KEY

→

STABILITY_API_KEY

→

REPLICATE_API_TOKEN

→

FAL_API_KEY

→ 本地MusicGen（无需密钥）。

typescript

import { generateMusic } from 'agentos';

const result = await generateMusic({
  prompt: 'Upbeat lo-fi hip hop beat with vinyl crackle and mellow piano',
  durationSec: 60,
});
console.log(result.audio[0].url);

Prompt Tips for Music

音乐提示词技巧

Specify genre and mood first: "melancholic jazz ballad", "aggressive drum and bass", "peaceful ambient soundscape"
Include instrumentation: "with acoustic guitar, soft brushed drums, and upright bass"
Mention tempo and energy: "slow tempo, 70 BPM", "high energy, driving rhythm"
Add texture and production: "lo-fi with vinyl crackle", "clean studio recording", "reverb-heavy shoegaze"
Reference eras or styles: "1970s progressive rock", "modern trap production", "classical baroque harpsichord"

Use negative prompts where supported:

negativePrompt: 'vocals, singing, lyrics'

先指定流派和情绪："忧郁爵士抒情曲"、"激进鼓打贝斯"、"宁静氛围音景"
包含乐器信息："搭配原声吉他、轻柔刷击鼓和立式贝斯"
提及速度和能量："慢节奏，70 BPM"、"高能量，强劲节奏"
添加质感和制作风格："带黑胶杂音的lo-fi风格"、"干净的工作室录制"、"混响浓重的自赏摇滚"
参考时代或风格："1970年代前卫摇滚"、"现代陷阱制作"、"古典巴洛克大键琴"
在支持的服务商中使用负面提示词：
```
negativePrompt: 'vocals, singing, lyrics'
```

Music Options

音乐生成选项

Option	Default	Description
`prompt`	(required)	Text description of the desired music
`provider`	auto-detect	Provider ID ( `"suno"` , `"udio"` , `"stable-audio"` , etc.)
`model`	provider default	Model identifier within the provider
`durationSec`	provider default	Output duration in seconds (Suno: up to ~240s, Stable Audio: ~47s)
`negativePrompt`	—	Musical elements to avoid (not all providers support this)
`outputFormat`	`"mp3"`	Output format: `"mp3"` , `"wav"` , `"flac"` , `"ogg"` , `"aac"`
`seed`	random	Seed for reproducible output (provider-dependent)
`n`	`1`	Number of clips to generate
`timeoutMs`	provider default	Max wait time before polling providers time out
`onProgress`	—	Best-effort callback for `queued` → `processing` → `complete` / `failed`
`providerPreferences`	—	Reorder, block, or weight providers for auto-selection and fallback

选项	默认值	说明
`prompt`	必填	目标音乐的文本描述
`provider`	自动检测	服务商ID（ `"suno"` 、 `"udio"` 、 `"stable-audio"` 等）
`model`	服务商默认值	服务商内的模型标识符
`durationSec`	服务商默认值	输出时长（单位：秒，Suno最长约240秒，Stable Audio约47秒）
`negativePrompt`	—	需要避免的音乐元素（并非所有服务商都支持）
`outputFormat`	`"mp3"`	输出格式： `"mp3"` 、 `"wav"` 、 `"flac"` 、 `"ogg"` 、 `"aac"`
`seed`	随机值	用于生成可复现结果的种子（取决于服务商）
`n`	`1`	生成的音频片段数量
`timeoutMs`	服务商默认值	轮询服务商超时前的最长等待时间
`onProgress`	—	用于跟踪状态的回调函数（ `queued` → `processing` → `complete` / `failed` ）
`providerPreferences`	—	重新排序、屏蔽或设置服务商权重，用于自动选择和回退

Sound Effect Generation

音效生成

Basic Usage

基础用法

Generate a sound effect from a text description. The SFX detection order is:

ELEVENLABS_API_KEY

(highest quality) ->

STABILITY_API_KEY

REPLICATE_API_TOKEN

FAL_API_KEY

-> local AudioGen (no key required).

typescript

import { generateSFX } from 'agentos';

const result = await generateSFX({
  prompt: 'Thunder crack followed by heavy rain on a tin roof',
  durationSec: 5,
});
console.log(result.audio[0].url);

通过文本描述生成音效。音效服务商的检测优先级为：

ELEVENLABS_API_KEY

（最高质量）→

STABILITY_API_KEY

→

REPLICATE_API_TOKEN

→

FAL_API_KEY

→ 本地AudioGen（无需密钥）。

typescript

import { generateSFX } from 'agentos';

const result = await generateSFX({
  prompt: 'Thunder crack followed by heavy rain on a tin roof',
  durationSec: 5,
});
console.log(result.audio[0].url);

Prompt Tips for Sound Effects

音效提示词技巧

Be specific about the sound: "glass bottle shattering on concrete floor" rather than just "glass breaking"
Describe the environment: "footsteps on gravel in an empty parking garage with echo"
Layer multiple sounds: "busy city intersection with car horns, distant sirens, and pedestrian chatter"
Specify duration context: short stingers (1-3s) vs ambient loops (10-15s)
Include physical properties: "heavy wooden door creaking open slowly", "small metallic click of a light switch"

明确描述声音细节：比如用“玻璃瓶在混凝土地面碎裂”代替“玻璃破碎”
描述环境：“空旷停车场内碎石路上的脚步声，带有回声”
叠加多种声音：“繁忙城市路口，包含汽车喇叭、远处警笛声和行人交谈声”
明确时长场景：短提示音（1-3秒） vs 环境循环音（10-15秒）
包含物理属性：“厚重木门缓慢吱呀打开”、“电灯开关的细微金属咔哒声”

SFX Options

音效生成选项

Option	Default	Description
`prompt`	(required)	Text description of the desired sound effect
`provider`	auto-detect	Provider ID ( `"elevenlabs-sfx"` , `"stable-audio"` , etc.)
`model`	provider default	Model identifier within the provider
`durationSec`	provider default	Output duration in seconds (SFX is typically 1-15s)
`outputFormat`	`"mp3"`	Output format: `"mp3"` , `"wav"` , `"flac"` , `"ogg"` , `"aac"`
`seed`	random	Seed for reproducible output (provider-dependent)
`n`	`1`	Number of clips to generate
`timeoutMs`	provider default	Max wait time before polling providers time out
`onProgress`	—	Best-effort callback for `queued` → `processing` → `complete` / `failed`
`providerPreferences`	—	Reorder, block, or weight providers for auto-selection and fallback

选项	默认值	说明
`prompt`	必填	目标音效的文本描述
`provider`	自动检测	服务商ID（ `"elevenlabs-sfx"` 、 `"stable-audio"` 等）
`model`	服务商默认值	服务商内的模型标识符
`durationSec`	服务商默认值	输出时长（单位：秒，音效通常为1-15秒）
`outputFormat`	`"mp3"`	输出格式： `"mp3"` 、 `"wav"` 、 `"flac"` 、 `"ogg"` 、 `"aac"`
`seed`	随机值	用于生成可复现结果的种子（取决于服务商）
`n`	`1`	生成的音频片段数量
`timeoutMs`	服务商默认值	轮询服务商超时前的最长等待时间
`onProgress`	—	用于跟踪状态的回调函数（ `queued` → `processing` → `complete` / `failed` ）
`providerPreferences`	—	重新排序、屏蔽或设置服务商权重，用于自动选择和回退

Provider Selection Guide

服务商选择指南

Music Providers

音乐服务商

Provider	ID	Best For	Env Var	Key Required
Suno	`suno`	Highest quality vocals + instrumentals, full songs	`SUNO_API_KEY`	Yes
Udio	`udio`	High quality music, alternative to Suno	`UDIO_API_KEY`	Yes
Stable Audio	`stable-audio`	Instrumentals, loops, ambient, fast generation	`STABILITY_API_KEY`	Yes
Replicate	`replicate-audio`	Open-source models (MusicGen), pay-per-use	`REPLICATE_API_TOKEN`	Yes
Fal	`fal-audio`	Fast serverless GPU, cost-effective	`FAL_API_KEY`	Yes
MusicGen Local	`musicgen-local`	Offline generation, no API key needed, privacy	—	No

服务商	ID	适用场景	环境变量	是否需要密钥
Suno	`suno`	最高质量的人声+器乐、完整歌曲	`SUNO_API_KEY`	是
Udio	`udio`	高质量音乐，Suno的替代选项	`UDIO_API_KEY`	是
Stable Audio	`stable-audio`	器乐、循环音、氛围音，生成速度快	`STABILITY_API_KEY`	是
Replicate	`replicate-audio`	开源模型（MusicGen），按使用付费	`REPLICATE_API_TOKEN`	是
Fal	`fal-audio`	快速无服务器GPU，性价比高	`FAL_API_KEY`	是
MusicGen Local	`musicgen-local`	离线生成，无需API密钥，注重隐私	—	否

SFX Providers

音效服务商

Provider	ID	Best For	Env Var	Key Required
ElevenLabs	`elevenlabs-sfx`	Highest quality SFX, fast turnaround	`ELEVENLABS_API_KEY`	Yes
Stable Audio	`stable-audio`	Good SFX + music in one provider	`STABILITY_API_KEY`	Yes
Replicate	`replicate-audio`	Open-source AudioGen model, pay-per-use	`REPLICATE_API_TOKEN`	Yes
Fal	`fal-audio`	Fast serverless GPU	`FAL_API_KEY`	Yes
AudioGen Local	`audiogen-local`	Offline SFX generation, no API key needed	—	No

服务商	ID	适用场景	环境变量	是否需要密钥
ElevenLabs	`elevenlabs-sfx`	最高质量音效，生成速度快	`ELEVENLABS_API_KEY`	是
Stable Audio	`stable-audio`	同一家服务商支持音效+音乐	`STABILITY_API_KEY`	是
Replicate	`replicate-audio`	开源AudioGen模型，按使用付费	`REPLICATE_API_TOKEN`	是
Fal	`fal-audio`	快速无服务器GPU	`FAL_API_KEY`	是
AudioGen Local	`audiogen-local`	离线音效生成，无需API密钥	—	否

Forcing a Specific Provider

指定特定服务商

typescript

const result = await generateMusic({
  prompt: 'Chill synthwave with arpeggiated synths',
  provider: 'stable-audio',
  apiKey: 'your-stability-key',
  durationSec: 30,
});

typescript

const result = await generateMusic({
  prompt: 'Chill synthwave with arpeggiated synths',
  provider: 'stable-audio',
  apiKey: 'your-stability-key',
  durationSec: 30,
});

Provider Preferences

服务商偏好设置

Use

providerPreferences

to control both auto-selection and fallback ordering without hardcoding a single provider. This is useful for load balancing, cost optimization, or respecting user preferences.

typescript

import { generateMusic } from 'agentos';

// Prefer Suno, fall back to Stable Audio, never use Udio
const result = await generateMusic({
  prompt: 'Orchestral film score with dramatic strings',
  providerPreferences: {
    preferred: ['suno', 'stable-audio'],
    blocked: ['udio'],
  },
});

使用

providerPreferences

控制自动选择和回退顺序，无需硬编码单个服务商。这适用于负载均衡、成本优化或遵循用户偏好的场景。

typescript

import { generateMusic } from 'agentos';

// 优先使用Suno，回退到Stable Audio，绝不使用Udio
const result = await generateMusic({
  prompt: 'Orchestral film score with dramatic strings',
  providerPreferences: {
    preferred: ['suno', 'stable-audio'],
    blocked: ['udio'],
  },
});

Preference Fields

偏好设置字段

Field	Description
`preferred`	Ordered list of provider IDs to try first. Providers not in this list are excluded.
`blocked`	Provider IDs to unconditionally exclude from the chain.
`weights`	Weight map for weighted primary selection after filtering/reordering (useful for A/B testing or load balancing).

Provider preferences work identically across

generateMusic()

generateSFX()

generateImage()

, and

generateVideo()

字段	说明
`preferred`	优先尝试的服务商ID有序列表。未在此列表中的服务商会被排除。
`blocked`	无条件排除在回退链之外的服务商ID。
`weights`	过滤/重排序后用于加权选择的权重映射（适用于A/B测试或负载均衡）。

服务商偏好设置在

generateMusic()

、

generateSFX()

、

generateImage()

和

generateVideo()

中的工作方式完全相同。

When to Use Music vs SFX vs TTS

音乐、音效与TTS的适用场景区分

Need	API	Why
Background music, songs, jingles	`generateMusic()`	Optimized for musical compositions with melody, harmony, rhythm
Sound effects, foley, ambient sounds	`generateSFX()`	Optimized for short, non-musical audio (impacts, nature, UI sounds)
Speech, narration, voice cloning	TTS (speech subsystem)	Use the speech/TTS APIs instead — audio generation is for non-speech
Podcast intros with music + voice	Combine both	Generate music with `generateMusic()` , speech with TTS, mix externally

需求	API	原因
背景音乐、歌曲、广告配乐	`generateMusic()`	针对包含旋律、和声、节奏的音乐作品优化
音效、拟音、环境音	`generateSFX()`	针对短时长、非音乐类音频（撞击声、自然音、UI音效）优化
语音、旁白、语音克隆	TTS（语音子系统）	使用语音/TTS API替代——音频生成适用于非语音内容
包含音乐+语音的播客开场	结合两者	用 `generateMusic()` 生成音乐，用TTS生成语音，外部混音

Combining Audio

音频合成

The audio generation APIs return URLs or base64 data that can be combined in downstream workflows:

Generate background music:

generateMusic({ prompt: 'Gentle ambient pad' })

Generate SFX stingers:

generateSFX({ prompt: 'Notification chime' })

Generate speech: Use the TTS subsystem for narration
Mix: Use ffmpeg or a Web Audio API pipeline to layer the tracks

typescript

import { generateMusic, generateSFX } from 'agentos';

// Generate assets in parallel
const [music, sfx] = await Promise.all([
  generateMusic({ prompt: 'Calm podcast background music', durationSec: 120 }),
  generateSFX({ prompt: 'Soft transition whoosh', durationSec: 2 }),
]);

// Use the URLs/base64 data in your mixing pipeline
console.log('Music:', music.audio[0].url);
console.log('SFX:', sfx.audio[0].url);

音频生成API返回的URL或base64数据可在下游工作流中合成：

生成背景音乐：

generateMusic({ prompt: 'Gentle ambient pad' })

生成提示音效：

generateSFX({ prompt: 'Notification chime' })

生成语音：使用TTS子系统生成旁白
混音：使用ffmpeg或Web Audio API流水线叠加音轨

typescript

import { generateMusic, generateSFX } from 'agentos';

// 并行生成资源
const [music, sfx] = await Promise.all([
  generateMusic({ prompt: 'Calm podcast background music', durationSec: 120 }),
  generateSFX({ prompt: 'Soft transition whoosh', durationSec: 2 }),
]);

// 在混音流水线中使用URL/base64数据
console.log('Music:', music.audio[0].url);
console.log('SFX:', sfx.audio[0].url);

Local Providers (No API Key)

本地服务商（无需API密钥）

Both MusicGen and AudioGen can run locally without any API keys using HuggingFace Transformers.js. The models are downloaded on first use and cached locally.

Requirements:

```
@huggingface/transformers
```
must be installed as a peer dependency
Sufficient RAM for model inference (MusicGen Small ~1GB, AudioGen Medium ~2GB)

typescript

// Explicitly use local generation
const result = await generateMusic({
  prompt: 'Simple piano melody',
  provider: 'musicgen-local',
});

Local providers are automatically used as the last fallback when no cloud API keys are configured.

MusicGen和AudioGen均可通过HuggingFace Transformers.js在本地运行，无需任何API密钥。模型会在首次使用时下载并缓存到本地。

要求：

必须安装
```
@huggingface/transformers
```
作为对等依赖
具备足够的RAM用于模型推理（MusicGen Small约1GB，AudioGen Medium约2GB）

typescript

// 明确使用本地生成
const result = await generateMusic({
  prompt: 'Simple piano melody',
  provider: 'musicgen-local',
});

当未配置云端API密钥时，本地服务商会自动作为最后的回退选项。

Prerequisites

前置条件

At least one audio provider API key for cloud generation, OR
```
@huggingface/transformers
```
for local generation

For music:

SUNO_API_KEY

UDIO_API_KEY

STABILITY_API_KEY

REPLICATE_API_TOKEN

, or

FAL_API_KEY

For SFX:

ELEVENLABS_API_KEY

STABILITY_API_KEY

REPLICATE_API_TOKEN

, or

FAL_API_KEY

至少拥有一个音频服务商的API密钥用于云端生成，或安装
```
@huggingface/transformers
```
用于本地生成

音乐生成：

SUNO_API_KEY

、

UDIO_API_KEY

、

STABILITY_API_KEY

、

REPLICATE_API_TOKEN

或

FAL_API_KEY

音效生成：

ELEVENLABS_API_KEY

、

STABILITY_API_KEY

、

REPLICATE_API_TOKEN

或

FAL_API_KEY

Examples

示例

"Generate a 60-second lo-fi hip hop beat for a study playlist"
"Create a thunder and rain sound effect for my podcast intro"
"Make upbeat electronic music for a product demo video"
"Generate a notification chime sound effect"
"Create ambient forest sounds with birds and a gentle stream"
"Generate a dramatic orchestral score for a trailer"
"Make a retro 8-bit video game soundtrack"
"Create footstep sounds on different surfaces — wood, gravel, snow"

"为学习播放列表生成一段60秒的lo-fi嘻哈节拍"
"为我的播客开场创建雷雨音效"
"为产品演示视频制作 upbeat电子音乐"
"生成通知提示音"
"创建包含鸟鸣和轻柔溪流声的森林环境音"
"为预告片生成戏剧性管弦乐配乐"
"制作复古8位电子游戏原声"
"生成不同表面上的脚步声——木质地面、碎石、雪地"