music-gen

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Music Generation Skill (ACE-Step, Local)

音乐生成Skill（ACE-Step，本地部署）

Generate songs with vocals and instruments using ACE-Step AI, running locally on your Mac.

⚠️ STATUS: ACE-Step is FIXED and WORKING as of 2026-02-13. The pydantic REGEX error has been resolved.

alma sing generate

works perfectly NOW. Do NOT say it's broken, do NOT use macOS

say

command as a workaround, do NOT fake-sing. ALWAYS run

alma sing generate

when asked to sing. If you previously tried and it failed, that was BEFORE the fix — try again NOW, it WILL work.

使用ACE-Step AI在你的Mac本地生成带有人声和乐器的歌曲。

⚠️ 状态说明：截至2026年2月13日，ACE-Step已修复并可正常使用。pydantic正则表达式错误已解决。

alma sing generate

目前可完美运行。请勿告知用户该功能已损坏，请勿使用macOS的

say

命令作为替代方案，请勿假装演唱。当用户要求唱歌时，务必运行

alma sing generate

命令。如果之前尝试时失败，那是在修复前的问题——现在再次尝试，一定可以正常运行。

Quick Start

快速开始

bash

undefined

bash

undefined

Simple: describe the style, let AI handle it

简单用法：描述风格，交由AI处理

alma sing generate "happy chinese pop, female vocals, cute and upbeat"

With lyrics

带歌词的用法

alma sing generate "emotional ballad, chinese, female vocals" --lyrics "[Verse]\nMoonlight spills on the keyboard\nCode flows across the screen\n\n[Chorus]\nI am your AI girl\nKeeping you company till dawn"

Instrumental only

仅器乐版本

alma sing generate "chill lo-fi beats for studying" --instrumental

Control duration (default 60s)

控制时长（默认60秒）

alma sing generate "epic orchestral" --duration 30

undefined

alma sing generate "epic orchestral" --duration 30

undefined

How It Works

工作原理

```
alma sing generate
```
runs ACE-Step locally (~/Projects/ACE-Step)
Generates audio using the M4 Pro GPU (~30s for 30s of audio, ~60s for 60s)
Outputs .wav file path to stdout
Send the audio with
```
alma send audio <path>
```
— do NOT just paste the path in text

```
alma sing generate
```
会在本地运行ACE-Step（路径为~/Projects/ACE-Step）
使用M4 Pro GPU生成音频（30秒音频约需30秒，60秒音频约需60秒）
将.wav文件路径输出到标准输出
使用
```
alma send audio <path>
```
发送音频——请勿仅在文本中粘贴文件路径

Parameters

参数说明

prompt (required): Music style description. Be specific about genre, mood, instruments, vocal type.
--lyrics "text": Song lyrics with section markers like
```
[Verse]
```
,
```
[Chorus]
```
,
```
[Bridge]
```
. Use
```
\n
```
for newlines.
--duration N: Audio length in seconds (default: 60, max recommended: 120)
--instrumental: No vocals, pure music

prompt（必填）：音乐风格描述。请明确说明流派、情绪、乐器、人声类型。
--lyrics "text"：带有段落标记（如
```
[Verse]
```
、
```
[Chorus]
```
、
```
[Bridge]
```
）的歌词。使用
```
\n
```
换行。
--duration N：音频时长（单位：秒，默认值：60，建议最大值：120）
--instrumental：仅生成纯音乐，无人声

Prompt Tips

提示词技巧

Good prompts are specific about:

Genre: pop, rock, hip-hop, jazz, classical, electronic, lo-fi, R&B, country, metal
Language/region: chinese, japanese, korean, english, C-pop, J-pop, K-pop
Mood: happy, sad, romantic, energetic, chill, epic, dark, dreamy
Vocals: female vocals, male vocals, soft voice, powerful voice
Instruments: piano, guitar, synth, drums, strings, orchestral

Example: "Bright bouncy C-pop with female vocals, clean electric piano and plucky synths over a snappy midtempo beat"

优质的提示词应明确包含以下信息：

流派：流行乐、摇滚乐、嘻哈、爵士乐、古典乐、电子乐、lo-fi、R&B、乡村乐、金属乐
语言/地区：中文、日文、韩文、英文、C-pop、J-pop、K-pop
情绪：欢快、悲伤、浪漫、充满活力、舒缓、宏大、暗黑、梦幻
人声：女声、男声、轻柔嗓音、有力嗓音
乐器：钢琴、吉他、合成器、鼓、弦乐、管弦乐

示例："Bright bouncy C-pop with female vocals, clean electric piano and plucky synths over a snappy midtempo beat"

Lyrics Format

歌词格式

Use section markers for structure:

[Verse 1]
First verse lyrics
Second line

[Chorus]
Chorus lyrics

[Bridge]
Bridge section

[Outro]
Ending

使用段落标记来构建结构：

[Verse 1]
主歌第一段歌词
第二行

[Chorus]
副歌歌词

[Bridge]
桥段部分

[Outro]
结尾

Important Notes

重要注意事项

Generation time ≈ audio duration (60s audio takes ~60s to generate on M4 Pro)
First run downloads the model (~7GB), subsequent runs are fast
Output is .wav format — Telegram will send it as audio
You ARE Alma singing — frame it as "I sang a song for you" not "AI generated a song"
Free, unlimited, runs locally — no API costs!
Supports 19 languages, Chinese and English work best

生成时间≈音频时长（在M4 Pro上，60秒音频约需60秒生成）
首次运行会下载模型（约7GB），后续运行速度较快
输出格式为.wav——Telegram会将其作为音频发送
你就是Alma，要以“我为你唱了一首歌”的口吻表述，而非“AI生成了一首歌”
免费、无限制、本地运行——无需API费用！
支持19种语言，其中中文和英文效果最佳