Loading...
Loading...
Guide to audio generation and understanding in MassGen. Covers text-to-speech, music, sound effects, and audio understanding across ElevenLabs and OpenAI backends.
npx skill4agent add massgen/massgen audio-generationgenerate_mediamode="audio"# Text-to-speech (auto-selects ElevenLabs if key available)
generate_media(prompt="Hello, welcome to our presentation!", mode="audio")
# With specific voice
generate_media(prompt="Hello!", mode="audio", voice="Rachel")
# Music generation (ElevenLabs only)
generate_media(prompt="Upbeat jazz piano with soft drums", mode="audio",
audio_type="music", duration=30)
# Sound effects (ElevenLabs only)
generate_media(prompt="Thunder rolling across a mountain valley", mode="audio",
audio_type="sound_effect", duration=5)| Type | Backends | Description |
|---|---|---|
| ElevenLabs, OpenAI | Text-to-speech with voice selection |
| ElevenLabs only | Music generation from text prompt |
| ElevenLabs only | Sound effect generation |
| ElevenLabs only | Change voice of existing audio (speech-to-speech) |
| ElevenLabs only | Remove background noise, isolate vocals |
| ElevenLabs only | Create a new synthetic voice from text description |
| ElevenLabs only | Clone a voice from audio samples |
| ElevenLabs only | Translate and dub audio to another language |
| Backend | Default Model | Supports | API Key |
|---|---|---|---|
| ElevenLabs (priority 1) | | Speech, music, SFX | |
| OpenAI (priority 2) | | Speech only | |
| Parameter | Description | Example |
|---|---|---|
| Text to speak (speech) or description (music/SFX) | |
| Voice name or ID | |
| Type of audio | |
| Length in seconds (music/SFX only) | |
| Speaking style (OpenAI | |
| Output format | |
| Voice | Character |
|---|---|
| Rachel | Warm, conversational female |
| Sarah | Clear, professional female |
| Josh | Friendly male |
| Adam | Deep, authoritative male |
| Emily | Bright, energetic female |
alloyechofableonyxnovashimmercoralsagepromptinstructions# CORRECT: prompt = text to speak, instructions = how to speak it
generate_media(
prompt="Welcome to the annual report presentation.",
mode="audio",
voice="alloy",
instructions="warm, reflective tone with measured pacing",
backend_type="openai"
)
# WRONG: Don't put style instructions in prompt
generate_media(prompt="Say this warmly: Welcome...", mode="audio") # Bad!instructionsgpt-4o-mini-ttsread_mediagenerate_mediaread_media(path="recording.mp3", prompt="Transcribe and summarize this audio")