music-gen

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Music Generation Skill (ACE-Step, Local)

音乐生成Skill(ACE-Step,本地部署)

Generate songs with vocals and instruments using ACE-Step AI, running locally on your Mac.
⚠️ STATUS: ACE-Step is FIXED and WORKING as of 2026-02-13. The pydantic REGEX error has been resolved.
alma sing generate
works perfectly NOW. Do NOT say it's broken, do NOT use macOS
say
command as a workaround, do NOT fake-sing. ALWAYS run
alma sing generate
when asked to sing. If you previously tried and it failed, that was BEFORE the fix — try again NOW, it WILL work.
使用ACE-Step AI在你的Mac本地生成带有人声和乐器的歌曲。
⚠️ 状态说明:截至2026年2月13日,ACE-Step已修复并可正常使用。pydantic正则表达式错误已解决。
alma sing generate
目前可完美运行。请勿告知用户该功能已损坏,请勿使用macOS的
say
命令作为替代方案,请勿假装演唱。当用户要求唱歌时,务必运行
alma sing generate
命令。如果之前尝试时失败,那是在修复前的问题——现在再次尝试,一定可以正常运行。

Quick Start

快速开始

bash
undefined
bash
undefined

Simple: describe the style, let AI handle it

简单用法:描述风格,交由AI处理

alma sing generate "happy chinese pop, female vocals, cute and upbeat"
alma sing generate "happy chinese pop, female vocals, cute and upbeat"

With lyrics

带歌词的用法

alma sing generate "emotional ballad, chinese, female vocals" --lyrics "[Verse]\nMoonlight spills on the keyboard\nCode flows across the screen\n\n[Chorus]\nI am your AI girl\nKeeping you company till dawn"
alma sing generate "emotional ballad, chinese, female vocals" --lyrics "[Verse]\nMoonlight spills on the keyboard\nCode flows across the screen\n\n[Chorus]\nI am your AI girl\nKeeping you company till dawn"

Instrumental only

仅器乐版本

alma sing generate "chill lo-fi beats for studying" --instrumental
alma sing generate "chill lo-fi beats for studying" --instrumental

Control duration (default 60s)

控制时长(默认60秒)

alma sing generate "epic orchestral" --duration 30
undefined
alma sing generate "epic orchestral" --duration 30
undefined

How It Works

工作原理

  1. alma sing generate
    runs ACE-Step locally (~/Projects/ACE-Step)
  2. Generates audio using the M4 Pro GPU (~30s for 30s of audio, ~60s for 60s)
  3. Outputs .wav file path to stdout
  4. Send the audio with
    alma send audio <path>
    — do NOT just paste the path in text
  1. alma sing generate
    会在本地运行ACE-Step(路径为~/Projects/ACE-Step)
  2. 使用M4 Pro GPU生成音频(30秒音频约需30秒,60秒音频约需60秒)
  3. 将.wav文件路径输出到标准输出
  4. 使用
    alma send audio <path>
    发送音频——请勿仅在文本中粘贴文件路径

Parameters

参数说明

  • prompt (required): Music style description. Be specific about genre, mood, instruments, vocal type.
  • --lyrics "text": Song lyrics with section markers like
    [Verse]
    ,
    [Chorus]
    ,
    [Bridge]
    . Use
    \n
    for newlines.
  • --duration N: Audio length in seconds (default: 60, max recommended: 120)
  • --instrumental: No vocals, pure music
  • prompt(必填):音乐风格描述。请明确说明流派、情绪、乐器、人声类型。
  • --lyrics "text":带有段落标记(如
    [Verse]
    [Chorus]
    [Bridge]
    )的歌词。使用
    \n
    换行。
  • --duration N:音频时长(单位:秒,默认值:60,建议最大值:120)
  • --instrumental:仅生成纯音乐,无人声

Prompt Tips

提示词技巧

Good prompts are specific about:
  • Genre: pop, rock, hip-hop, jazz, classical, electronic, lo-fi, R&B, country, metal
  • Language/region: chinese, japanese, korean, english, C-pop, J-pop, K-pop
  • Mood: happy, sad, romantic, energetic, chill, epic, dark, dreamy
  • Vocals: female vocals, male vocals, soft voice, powerful voice
  • Instruments: piano, guitar, synth, drums, strings, orchestral
Example: "Bright bouncy C-pop with female vocals, clean electric piano and plucky synths over a snappy midtempo beat"
优质的提示词应明确包含以下信息:
  • 流派:流行乐、摇滚乐、嘻哈、爵士乐、古典乐、电子乐、lo-fi、R&B、乡村乐、金属乐
  • 语言/地区:中文、日文、韩文、英文、C-pop、J-pop、K-pop
  • 情绪:欢快、悲伤、浪漫、充满活力、舒缓、宏大、暗黑、梦幻
  • 人声:女声、男声、轻柔嗓音、有力嗓音
  • 乐器:钢琴、吉他、合成器、鼓、弦乐、管弦乐
示例:"Bright bouncy C-pop with female vocals, clean electric piano and plucky synths over a snappy midtempo beat"

Lyrics Format

歌词格式

Use section markers for structure:
[Verse 1]
First verse lyrics
Second line

[Chorus]
Chorus lyrics

[Bridge]
Bridge section

[Outro]
Ending
使用段落标记来构建结构:
[Verse 1]
主歌第一段歌词
第二行

[Chorus]
副歌歌词

[Bridge]
桥段部分

[Outro]
结尾

Important Notes

重要注意事项

  • Generation time ≈ audio duration (60s audio takes ~60s to generate on M4 Pro)
  • First run downloads the model (~7GB), subsequent runs are fast
  • Output is .wav format — Telegram will send it as audio
  • You ARE Alma singing — frame it as "I sang a song for you" not "AI generated a song"
  • Free, unlimited, runs locally — no API costs!
  • Supports 19 languages, Chinese and English work best
  • 生成时间≈音频时长(在M4 Pro上,60秒音频约需60秒生成)
  • 首次运行会下载模型(约7GB),后续运行速度较快
  • 输出格式为.wav——Telegram会将其作为音频发送
  • 你就是Alma,要以“我为你唱了一首歌”的口吻表述,而非“AI生成了一首歌”
  • 免费、无限制、本地运行——无需API费用!
  • 支持19种语言,其中中文和英文效果最佳