create-sound

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Create Sound

创建音效

Generated from
rules/*.md
by
src/build.mjs
. Do not edit by hand.

Pick a generation path with

pipeline-detect-input

, then walk the matching section.

由
rules/*.md
通过
src/build.mjs
生成，请勿手动编辑。

通过

pipeline-detect-input

选择生成路径，然后执行对应章节的步骤。

1. Generation Pipeline

1. 生成流程

Procedural steps the agent runs end-to-end. Start here when handling any create-sound request.

代理端执行的全流程步骤。处理任何创建音效请求时均从此处开始。

1.1 Detect input mode and route the request (CRITICAL)

1.1 检测输入模式并路由请求 (CRITICAL)

Decide which path to run based on what the user provided.

Input	Path
Prompt only (no audio attachment)	Skip `interpret-*` . Go to `pipeline-pick-base-layer` .
Audio file only	Run all `interpret-` rules. Skip `event-` / `mood-*` .
Both prompt and audio	Run `interpret-*` first, then treat the prompt as a refinement layer over the measured `SoundDefinition` .

根据用户提供的内容决定执行哪条路径。

输入内容	路径
仅提示（无音频附件）	跳过 `interpret-*` 步骤，直接进入 `pipeline-pick-base-layer` 。
仅音频文件	执行所有 `interpret-` 规则，跳过 `event-` / `mood-*` 步骤。
同时提供提示和音频	先执行 `interpret-*` 规则，再将提示作为优化层应用到已解析的 `SoundDefinition` 上。

Detecting audio

音频检测

Look for attached files matching

*.wav

*.mp3

*.flac

*.ogg

, or any path the user references that resolves to an audio file. A JSON manifest (

*.json

next to a sprite) is also an audio-path signal.

查找匹配

*.wav

、

*.mp3

、

*.flac

、

*.ogg

格式的附件，或用户提及的可解析为音频文件的路径。sprite文件旁的JSON清单(

*.json

)也视为音频路径信号。

Refinement examples (prompt + audio)

优化示例（提示+音频）

Prompt qualifier	Refinement on measured definition
"warmer"	add `filter: { type: "lowpass", frequency: 2500 }`
"shorter" / "punchier"	clamp `envelope.decay` to `<= 0.06`
"brighter"	drop or raise any lowpass cutoff
"with reverb"	append `effects: [{ type: "reverb", decay: 0.5, mix: 0.15 }]`
"lower octave"	halve `source.frequency` (or both `start` / `end` )

提示限定词	对已解析定义的优化操作
"warmer"（更温暖）	添加 `filter: { type: "lowpass", frequency: 2500 }`
"shorter" / "punchier"（更短促/更有冲击力）	将 `envelope.decay` 限制为 `<= 0.06`
"brighter"（更明亮）	移除低通滤波器或提高其截止频率
"with reverb"（添加混响）	添加 `effects: [{ type: "reverb", decay: 0.5, mix: 0.15 }]`
"lower octave"（低一个八度）	将 `source.frequency` 减半（或同时调整 `start` / `end` 值）

Output of this step

本步骤输出

Produce an internal note like:

Input: prompt + audio
Plan: run interpret-* on out/click.wav, then refine with mood-warm.

Then proceed to the next pipeline step.

生成类似以下的内部记录：

输入：提示+音频
计划：对out/click.wav执行interpret-*规则，然后用mood-warm进行优化。

随后进入下一个流程步骤。

1.2 Pick a base layer from the prompt's event class (CRITICAL)

1.2 根据提示的事件类别选择基础层 (CRITICAL)

Tokenize the prompt and find the strongest event-class signal. Match against the

event-*

rules.

对提示进行分词，找到最匹配的事件类别信号，与

event-*

规则进行匹配。

Token map

分词映射

Tokens in prompt	Event rule
click, tap, key, press, button	`event-click` / `event-tap`
tick, scroll, snap, focus	`event-tick`
success, complete, win, achievement, level-up, confetti	`event-success` / `event-complete`
error, fail, wrong, invalid, delete, destroy	`event-error`
modal, dialog, popup, drawer, sheet, sidebar, dropdown, menu	`event-modal-open` / `event-modal-close`
swoosh, slide, transition, page, tab	`event-swoosh` / `event-whoosh`
notification, alert, ding, bell, mention, badge	`event-notification`
toggle, switch, on, off	`event-toggle`

提示中的分词	事件规则
click, tap, key, press, button	`event-click` / `event-tap`
tick, scroll, snap, focus	`event-tick`
success, complete, win, achievement, level-up, confetti	`event-success` / `event-complete`
error, fail, wrong, invalid, delete, destroy	`event-error`
modal, dialog, popup, drawer, sheet, sidebar, dropdown, menu	`event-modal-open` / `event-modal-close`
swoosh, slide, transition, page, tab	`event-swoosh` / `event-whoosh`
notification, alert, ding, bell, mention, badge	`event-notification`
toggle, switch, on, off	`event-toggle`

Direction tokens (open vs close)

方向分词（打开vs关闭）

"open", "appear", "in", "show", "expand", "confirm" -> ascending pitch.
"close", "dismiss", "out", "hide", "collapse", "cancel" -> descending pitch.

"open", "appear", "in", "show", "expand", "confirm" -> 升调。
"close", "dismiss", "out", "hide", "collapse", "cancel" -> 降调。

Output

输出

A starting

SoundDefinition

literal copied from the chosen event rule's

example

. The next step (

pipeline-apply-mood

) will mutate it.

If no event class fires confidently, default to

event-click

and let mood adjectives do the work.

从所选事件规则的

example

中复制起始

SoundDefinition

字面量。下一步(

pipeline-apply-mood

)将对其进行修改。

如果无法确定匹配的事件类别，默认使用

event-click

，并通过情绪形容词调整效果。

1.3 Apply mood adjectives onto the base layer (HIGH)

1.3 将情绪形容词应用到基础层 (HIGH)

After

pipeline-pick-base-layer

produces a starting

SoundDefinition

, scan the prompt for adjective tokens and apply each

mood-*

rule's mutation in order.

pipeline-pick-base-layer

生成起始

SoundDefinition

后，扫描提示中的形容词分词，按顺序应用每个

mood-*

规则的修改。

Order of application

应用顺序

Source-shape adjectives (

warm

bright

glassy

metallic

lofi

retro

organic

) - mutate

source.type

source.fm

, or add

filter

Envelope adjectives (
```
punchy
```
,
```
airy
```
) - mutate
```
envelope.attack
```
/
```
envelope.decay
```
.
Effect adjectives (
```
reverby
```
,
```
delayed
```
,
```
crushed
```
) - append to
```
effects
```
.

声源形态形容词（
```
warm
```
、
```
bright
```
、
```
glassy
```
、
```
metallic
```
、
```
lofi
```
、
```
retro
```
、
```
organic
```
）——修改
```
source.type
```
、
```
source.fm
```
或添加
```
filter
```
。
包络形容词（
```
punchy
```
、
```
airy
```
）——修改
```
envelope.attack
```
/
```
envelope.decay
```
。
效果形容词（
```
reverby
```
、
```
delayed
```
、
```
crushed
```
）——添加到
```
effects
```
数组。

Conflict resolution

冲突解决

```
warm
```
+
```
bright
```
-> the later token wins.
```
lofi
```
+
```
glassy
```
-> apply both, but cap
```
effects
```
at 2 entries.
```
punchy
```
+
```
airy
```
-> they're orthogonal (envelope vs source); both apply.

```
warm
```
+
```
bright
```
-> 后出现的分词优先级更高。
```
lofi
```
+
```
glassy
```
-> 同时应用，但
```
effects
```
条目上限为2个。
```
punchy
```
+
```
airy
```
-> 两者互不冲突（包络vs声源），均应用。

Refinement on existing definition (audio + prompt path)

对已有定义的优化（音频+提示路径）

When the input mode is

prompt + audio

, treat each adjective as a refinement on the measured definition rather than from scratch:

Adjective	Refinement
warmer	add or lower `filter.frequency` (lowpass at ~2500 Hz)
brighter	remove lowpass or raise its cutoff above 6 kHz
punchier	clamp `envelope.decay <= 0.06` , set `envelope.attack: 0`
longer	extend `envelope.decay` and add `release` if missing
crisper	raise `gain` slightly and add `fm: { ratio: 0.5, depth: 50 }`

当输入模式为

提示+音频

时，将每个形容词视为对已解析定义的优化，而非从头生成：

形容词	优化操作
warmer	添加或降低 `filter.frequency` （低通滤波器约2500 Hz）
brighter	移除低通滤波器或将其截止频率提高到6 kHz以上
punchier	将 `envelope.decay` 限制为 `<= 0.06` ，设置 `envelope.attack: 0`
longer	延长 `envelope.decay` ，若缺少 `release` 则添加该字段
crisper	小幅提高 `gain` 并添加 `fm: { ratio: 0.5, depth: 50 }`

Output

输出

A mutated

SoundDefinition

. Hand off to

pipeline-decide-layering

生成修改后的

SoundDefinition

，传递给

pipeline-decide-layering

。

1.4 Decide single-layer vs multi-layer (MEDIUM-HIGH)

1.4 决定单层或多层结构 (MEDIUM-HIGH)

Event class	Default
click, tap, tick, hover, focus, swoosh	1 layer ( `Layer` )
toggle, copy, send, sync	2 layers (paired pitches with `delay` )
success, complete, level-up, confetti	3+ layers (chord with cascading `delay` )
error, delete	2 layers ( `sawtooth` + `square` )

See

layer-single

layer-octave-pair

layer-ascending-chord

layer-click-plus-body

for the concrete shapes.

事件类别	默认结构
click, tap, tick, hover, focus, swoosh	1层（ `Layer` ）
toggle, copy, send, sync	2层（带 `delay` 的配对音调）
success, complete, level-up, confetti	3+层（带级联 `delay` 的和弦）
error, delete	2层（ `sawtooth` + `square` ）

具体结构可参考

layer-single

、

layer-octave-pair

、

layer-ascending-chord

、

layer-click-plus-body

。

Promoting a single Layer to MultiLayerSound

将单层Layer升级为MultiLayerSound

If the prompt or refinement requires more than one layer, wrap:

{
  layers: [<existing layer>, <new layer>],
  // optional global effects, e.g. sidechain compressor, master EQ
}

Per-layer

gain

values should sum to no more than ~0.6 (see

validate-gain-budget

如果提示或优化需要多层结构，可进行包裹：

{
  layers: [<existing layer>, <new layer>],
  // 可选全局效果，例如侧链压缩器、主EQ
}

每层的

gain

值总和不应超过约0.6（参考

validate-gain-budget

）。

Demoting MultiLayerSound to a single Layer

将MultiLayerSound降级为单层Layer

If only one layer survives mood application, emit the inner

Layer

directly rather than a one-element

MultiLayerSound

. Both validate, but the single-layer form is the canonical compact shape.

如果应用情绪规则后仅剩下一层，直接输出内部的

Layer

而非单元素

MultiLayerSound

。两种格式均有效，但单层形式是标准紧凑结构。

1.5 Emit, optionally render, optionally round-trip (HIGH)

1.5 输出、可选渲染、可选往返验证 (HIGH)

1. Emit

1. 输出

Always return a TypeScript snippet ready to paste into a

.web-kits/<patch>.ts

file:

import type { SoundDefinition } from "@web-kits/audio";

export const myClick: SoundDefinition = {
  source: { type: "sine", frequency: 1300, fm: { ratio: 0.5, depth: 60 } },
  envelope: { decay: 0.012, release: 0.004 },
  gain: 0.18,
};

Plus a one-line rationale that names the prompt tokens you acted on:

"click" -> base from
event-click
; "warm" -> kept default sine, no extra filter needed at 1.3 kHz.

始终返回可直接粘贴到

.web-kits/<patch>.ts

文件的TypeScript代码片段：

import type { SoundDefinition } from "@web-kits/audio";

export const myClick: SoundDefinition = {
  source: { type: "sine", frequency: 1300, fm: { ratio: 0.5, depth: 60 } },
  envelope: { decay: 0.012, release: 0.004 },
  gain: 0.18,
};

同时添加一行说明，列出你依据的提示分词：

"click" -> 基于
event-click
生成基础结构；"warm" -> 保留默认正弦波，1.3 kHz下无需额外滤波器。

2. Optional preview render

2. 可选预览渲染

If the user asked for a WAV (or you want to grade your own output), use

packages/audio/src/offline.ts

import { renderToWav } from "@web-kits/audio";
import { writeFile } from "node:fs/promises";

const blob = await renderToWav(myClick, { duration: 0.3 });
await writeFile("preview.wav", Buffer.from(await blob.arrayBuffer()));

duration

should be

attack + decay + release + 0.05

(small tail) or longer if reverb is present.

如果用户要求生成WAV（或你需要验证输出效果），使用

packages/audio/src/offline.ts

：

import { renderToWav } from "@web-kits/audio";
import { writeFile } from "node:fs/promises";

const blob = await renderToWav(myClick, { duration: 0.3 });
await writeFile("preview.wav", Buffer.from(await blob.arrayBuffer()));

duration

应设置为

attack + decay + release + 0.05

（小尾巴），若包含混响则需更长时间。

3. Optional round-trip validation

3. 可选往返验证

If you generated from a prompt and want to confirm the result matches intent, run the

interpret-*

rules against the rendered WAV and diff measured vs intended values:

Field	Acceptable drift
Fundamental Hz	±5%
Attack	±2 ms
Decay	±10%
Spectral centroid	±20% of expected for the chosen waveform

If drift exceeds tolerance, refine the definition (often by raising/lowering

gain

, tightening

envelope

, or adjusting

filter.frequency

) and render again.

如果根据提示生成了音效，想要确认结果符合预期，可对渲染后的WAV执行

interpret-*

规则，对比解析值与预期值：

字段	可接受偏差范围
基频Hz	±5%
Attack（起音）	±2 ms
Decay（衰减）	±10%
频谱重心	所选波形预期值的±20%

如果偏差超出容忍范围，优化定义（通常是调整

gain

、收紧

envelope

或修改

filter.frequency

）并重新渲染。

2. Audio Interpretation

2. 音频解析

FFT analysis sub-steps that fire when the user shares an audio file.

当用户分享音频文件时触发的FFT分析子步骤。

2.1 Acquire and split source audio (HIGH)

2.1 获取并拆分源音频 (HIGH)

The user shared a single file or a sprite (one file containing many sounds). Before any FFT work, get one mono WAV per sound on disk.

用户分享了单个文件或sprite（包含多个音效的单个文件）。在进行FFT分析前，先将每个音效保存为磁盘上的单声道WAV文件。

Sprite from an npm package

来自npm包的Sprite

bash

npm pack <package-name> --pack-destination /tmp
tar -xzf /tmp/<package-name>-*.tgz -C /tmp

Look for the MP3/WAV plus any JSON manifest mapping sound names to time offsets.

bash

npm pack <package-name> --pack-destination /tmp
tar -xzf /tmp/<package-name>-*.tgz -C /tmp

查找MP3/WAV文件及对应的JSON清单（映射音效名称到时间偏移量）。

Manifest-driven slicing

基于清单的切片

bash

ffmpeg -i sprite.mp3 \
  -ss <start_seconds> -t <duration_seconds> \
  -acodec pcm_s16le -ar 44100 \
  output/<name>.wav

bash

ffmpeg -i sprite.mp3 \
  -ss <start_seconds> -t <duration_seconds> \
  -acodec pcm_s16le -ar 44100 \
  output/<name>.wav

Silence-detection slicing (no manifest)

基于静音检测的切片（无清单）

bash

ffmpeg -i sprite.mp3 -af silencedetect=noise=-40dB:d=0.05 -f null -

Read the

silence_start

silence_end

lines and slice between gaps.

bash

ffmpeg -i sprite.mp3 -af silencedetect=noise=-40dB:d=0.05 -f null -

读取

silence_start

silence_end

行，在间隙处进行切片。

Output convention

输出约定

Per-sound WAVs go in

out/<name>.wav

(mono, 44.1 kHz, 16-bit PCM). Downstream interpret rules call

analyze.load_mono(path)

from src/analyze.py.

每个音效的WAV文件保存到

out/<name>.wav

（单声道，44.1 kHz，16位PCM）。后续解析规则调用src/analyze.py中的

analyze.load_mono(path)

。

2.2 Extract fundamental frequency and pitch sweep (HIGH)

2.2 提取基频和音高变化 (HIGH)

Sample the spectrum at multiple time slices to detect both the static pitch and any sweep.

python

from analyze import load_mono, analyze_slice

sample_rate, data = load_mono("out/click.wav")

slices = [0, 5, 10, 20, 50]  # ms
freqs_over_time = [analyze_slice(data, sample_rate, t) for t in slices]

在多个时间切片采样频谱，检测静态音高和音高变化。

python

from analyze import load_mono, analyze_slice

sample_rate, data = load_mono("out/click.wav")

slices = [0, 5, 10, 20, 50]  # ms
freqs_over_time = [analyze_slice(data, sample_rate, t) for t in slices]

Mapping

映射

Observation	Output
All slices within ±5%	`source.frequency: <Hz>` (static)
Decreasing across slices	`source.frequency: { start: <high>, end: <low> }`
Increasing across slices	`source.frequency: { start: <low>, end: <high> }`

观察结果	输出
所有切片偏差在±5%以内	`source.frequency: <Hz>` （静态）
切片间频率递减	`source.frequency: { start: <high>, end: <low> }`
切片间频率递增	`source.frequency: { start: <low>, end: <high> }`

Tips

提示

Skip the first 1-2 ms if the onset is a click transient; it pollutes the FFT.
For very short sounds (< 20 ms) use fewer slices and a smaller window.
Use a Hanning window before FFT (already applied in
```
analyze_slice
```
) to reduce spectral leakage.

如果起始部分是点击瞬态，跳过前1-2 ms，避免污染FFT结果。
对于极短音效（<20 ms），使用更少切片和更小窗口。
FFT前使用汉宁窗口（
```
analyze_slice
```
中已应用）减少频谱泄漏。

2.3 Extract ADSR envelope from amplitude (HIGH)

2.3 从振幅提取ADSR包络 (HIGH)

Smooth the time-domain amplitude, find onset/peak/sustain/end, and derive each ADSR stage.

python

from analyze import load_mono, extract_envelope

sample_rate, data = load_mono("out/click.wav")
env = extract_envelope(data, sample_rate)

平滑时域振幅，找到起始/峰值/持续/结束点，推导每个ADSR阶段。

python

from analyze import load_mono, extract_envelope

sample_rate, data = load_mono("out/click.wav")
env = extract_envelope(data, sample_rate)

-> { "attack": 0.0008, "decay": 0.012, "sustain": 0.0, "release": 0.005 }

undefined

undefined

Output shape

输出结构

The dict maps 1:1 to the

Envelope

type:

envelope: {
  attack: env.attack,    // 0 if percussive
  decay: env.decay,
  sustain: env.sustain,  // 0 for transient sounds, 0-1 for sustained
  release: env.release,
}

该字典与

Envelope

类型1:1对应：

envelope: {
  attack: env.attack,    // 打击乐设为0
  decay: env.decay,
  sustain: env.sustain,  // 瞬态音效设为0，持续音效设为0-1
  release: env.release,
}

Heuristics

启发式规则

```
sustain < 0.01
```
-> drop the field; the sound is percussive.
```
attack < 0.001
```
-> set
```
attack: 0
```
.
```
release < 0.005
```
-> clamp to
```
0.005
```
to avoid clicks at the end.

```
sustain < 0.01
```
-> 移除该字段，音效为打击乐类型。
```
attack < 0.001
```
-> 设置
```
attack: 0
```
.
```
release < 0.005
```
-> 限制为
```
0.005
```
，避免结尾出现咔哒声。

2.4 Classify oscillator waveform from harmonics (HIGH)

2.4 从谐波分类振荡器波形 (HIGH)

Compare the amplitude of the first 8 harmonics against the fundamental.

python

import numpy as np
from scipy.fft import rfft, rfftfreq
from analyze import classify_waveform

segment = data[:int(sample_rate * 0.02)].astype(float)
segment *= np.hanning(len(segment))
spectrum = np.abs(rfft(segment))
freqs = rfftfreq(len(segment), 1 / sample_rate)

waveform = classify_waveform(spectrum, freqs, fundamental_freq)

对比前8次谐波与基频的振幅。

python

import numpy as np
from scipy.fft import rfft, rfftfreq
from analyze import classify_waveform

segment = data[:int(sample_rate * 0.02)].astype(float)
segment *= np.hanning(len(segment))
spectrum = np.abs(rfft(segment))
freqs = rfftfreq(len(segment), 1 / sample_rate)

waveform = classify_waveform(spectrum, freqs, fundamental_freq)

-> "sine" | "triangle" | "square" | "sawtooth" | "wavetable"

undefined

undefined

Mapping

映射

Pattern	`source.type`
Fundamental only, harmonics < -40 dB	`sine`
Odd harmonics rolling off as 1/n	`triangle`
Odd harmonics at roughly equal amplitude	`square`
All harmonics rolling off as 1/n	`sawtooth`
Custom harmonic profile (none of the above)	`wavetable`
No clear harmonic structure, broadband energy	`noise`

模式	`source.type`
仅基频，谐波<-40 dB	`sine`
奇次谐波按1/n衰减	`triangle`
奇次谐波振幅大致相等	`square`
所有谐波按1/n衰减	`sawtooth`
自定义谐波分布（不符合以上任何一种）	`wavetable`
无清晰谐波结构，宽频能量	`noise`

When to fall back to wavetable

何时回退到wavetable

If the harmonic profile doesn't match a clean oscillator, extract the harmonic series instead:

python

from analyze import extract_harmonics
harmonics = extract_harmonics(spectrum, freqs, fundamental_freq, num_harmonics=16)

如果谐波分布不符合标准振荡器，提取谐波序列：

python

from analyze import extract_harmonics
harmonics = extract_harmonics(spectrum, freqs, fundamental_freq, num_harmonics=16)

-> { source: { type: "wavetable", harmonics, frequency: fundamental_freq } }

undefined

undefined

Noise color

噪声色彩

For broadband signals with no fundamental, classify by spectral slope:

python

from analyze import classify_noise_color
color = classify_noise_color(spectrum, freqs)  # "white" | "pink" | "brown"

对于无基频的宽频信号，按频谱斜率分类：

python

from analyze import classify_noise_color
color = classify_noise_color(spectrum, freqs)  # "white" | "pink" | "brown"

-> { source: { type: "noise", color } }

undefined

undefined

2.5 Detect filter type, cutoff, and resonance (MEDIUM-HIGH)

2.5 检测滤波器类型、截止频率和共振 (MEDIUM-HIGH)

Compare the measured spectrum against the expected spectrum for the identified oscillator.

对比测量频谱与已识别振荡器的预期频谱。

Cutoff via spectral centroid

通过频谱重心确定截止频率

python

from analyze import spectral_centroid
centroid = spectral_centroid(spectrum, freqs)

Expected centroids at a 440 Hz fundamental: sine ~440, triangle ~880, sawtooth ~2200, square ~1760. If the measured centroid is significantly lower than expected, a

lowpass

is present; estimate cutoff at the -3 dB point.

python

from analyze import spectral_centroid
centroid = spectral_centroid(spectrum, freqs)

基频440 Hz时的预期重心：正弦波~~440，三角波~~880，锯齿波~~2200，方波~~1760。如果测量重心远低于预期，说明存在

lowpass

滤波器；在-3 dB点估算截止频率。

Filter type from rolloff

从衰减斜率判断滤波器类型

Observation	`filter.type`
High-frequency rolloff steeper than the source would produce	`lowpass`
Low-frequency rolloff	`highpass`
Narrow band of frequencies passes through	`bandpass`
Narrow notch removed	`notch`
Resonant peak near cutoff	High `resonance`

观察结果	`filter.type`
高频衰减斜率比声源自身更陡峭	`lowpass`
低频衰减	`highpass`
窄带频率通过	`bandpass`
窄带频率被移除	`notch`
截止频率附近存在共振峰	高 `resonance` 值

Resonance (Q)

共振（Q值）

python

from analyze import estimate_resonance
q = estimate_resonance(spectrum, freqs, cutoff_hz)

python

from analyze import estimate_resonance
q = estimate_resonance(spectrum, freqs, cutoff_hz)

Returns 0.1 - 20.0

返回0.1 - 20.0

undefined

undefined

Filter envelope

滤波器包络

If brightness changes over time (bright attack fading to dull), there's a filter envelope:

python

from analyze import detect_filter_envelope
env = detect_filter_envelope(data, sample_rate)

如果亮度随时间变化（明亮起音逐渐变为低沉），说明存在滤波器包络：

python

from analyze import detect_filter_envelope
env = detect_filter_envelope(data, sample_rate)

-> { "peak": 4000, "resting": 800, "decay_ms": 50 } or None

-> { "peak": 4000, "resting": 800, "decay_ms": 50 } 或 None


Maps to:

```ts
filter: {
  type: "lowpass",
  frequency: env.resting,
  envelope: { attack: 0, peak: env.peak, decay: env.decay_ms / 1000 },
}


映射为：

```ts
filter: {
  type: "lowpass",
  frequency: env.resting,
  envelope: { attack: 0, peak: env.peak, decay: env.decay_ms / 1000 },
}

2.6 Detect post-source effects (MEDIUM)

2.6 检测声源后效果 (MEDIUM)

Each detector returns a confidence-flavored hint, not a guarantee. Effects are harder to extract than source/envelope - report low confidence when ambiguous.

每个检测器返回带置信度的提示，而非绝对结论。效果提取比声源/包络更难——结果模糊时报告低置信度。

Reverb

混响

python

from analyze import detect_reverb
result = detect_reverb(data, sample_rate, envelope_end_ms=120)

python

from analyze import detect_reverb
result = detect_reverb(data, sample_rate, envelope_end_ms=120)

-> { "type": "reverb", "decay": 0.6 } or None

-> { "type": "reverb", "decay": 0.6 } 或 None

undefined

undefined

Delay (autocorrelation)

延迟（自相关）

python

from analyze import detect_delay
result = detect_delay(data, sample_rate)

python

from analyze import detect_delay
result = detect_delay(data, sample_rate)

-> { "type": "delay", "time": 0.25, "feedback": 0.3 } or None

-> { "type": "delay", "time": 0.25, "feedback": 0.3 } 或 None

undefined

undefined

FM synthesis

FM合成

Spectral sidebands at non-integer ratios of the fundamental indicate FM:

python

from analyze import detect_fm
fm = detect_fm(spectrum, freqs, fundamental_freq)

基频非整数倍的频谱边带表明存在FM：

python

from analyze import detect_fm
fm = detect_fm(spectrum, freqs, fundamental_freq)

-> { "fm": { "ratio": 0.5, "depth": 80 } } or None

-> { "fm": { "ratio": 0.5, "depth": 80 } } 或 None


Maps to `source.fm: { ratio, depth }` (not a separate effect).


映射为`source.fm: { ratio, depth }`（不是独立效果）。

Tremolo and vibrato

颤音和震音

Periodic amplitude or frequency modulation in the 1-20 Hz band suggests tremolo/vibrato. Track amplitude or pitch over time and call

detect_lfo

(see

interpret-detect-lfo

1-20 Hz频段的周期性振幅或频率调制表明存在颤音/震音。随时间跟踪振幅或音高，调用

detect_lfo

（参考

interpret-detect-lfo

）。

Bitcrusher / distortion

比特压缩器/失真

Time-domain signature	Effect
Stepped/quantized waveform with aliasing artifacts	`bitcrusher`
Flat-topped waveform with added harmonics	`distortion`

时域特征	效果类型
带混叠伪影的阶梯状/量化波形	`bitcrusher`
平顶波形并添加谐波	`distortion`

Chorus / flanger / phaser

合唱/镶边/移相

Comb-filter pattern that sweeps over time produces moving notches in the spectrum. Hard to disambiguate algorithmically; flag for human review.

随时间变化的梳状滤波器模式会在频谱中产生移动的陷波。算法难以区分，标记为需人工审核。

2.7 Detect LFO modulation (LOW-MEDIUM)

2.7 检测LFO调制 (LOW-MEDIUM)

An LFO is sub-audio (0.1-20 Hz) periodic modulation of a parameter. Track the parameter over time, then run

detect_lfo

python

from analyze import detect_lfo

LFO是亚音频（0.1-20 Hz）的周期性参数调制。随时间跟踪参数，然后运行

detect_lfo

。

python

from analyze import detect_lfo

1. Track amplitude (or pitch, or spectral centroid) at regular intervals

1. 定期跟踪振幅（或音高、频谱重心）

window_ms = 10 samples_per_window = int(sample_rate * window_ms / 1000) amp_over_time = [ float(np.max(np.abs(data[i:i + samples_per_window]))) for i in range(0, len(data) - samples_per_window, samples_per_window) ]

2. Detect periodicity

2. 检测周期性

lfo = detect_lfo(np.array(amp_over_time), 1000 / window_ms)

-> { "frequency": 5.0, "depth": 0.12 } or None

-> { "frequency": 5.0, "depth": 0.12 } 或 None

undefined

undefined

Mapping by tracked parameter

按跟踪参数映射

Parameter tracked	LFO target
Amplitude	`gain`
Pitch	`frequency` or `detune`
Spectral centroid	`filter.frequency`
Pan position	`pan`

跟踪参数	LFO目标
振幅	`gain`
音高	`frequency` 或 `detune`
频谱重心	`filter.frequency`
声像位置	`pan`

Output

输出

lfo: { type: "sine", frequency: lfo.frequency, depth: lfo.depth, target: "gain" }

Pick

type

based on the shape of the modulation: smooth sinusoid ->

sine

, sharp ramp ->

sawtooth

, hard switching ->

square

lfo: { type: "sine", frequency: lfo.frequency, depth: lfo.depth, target: "gain" }

根据调制形状选择

type

：平滑正弦曲线->

sine

，尖锐斜坡->

sawtooth

，硬切换->

square

。

2.8 Detect multi-layer sounds and stereo positioning (MEDIUM)

2.8 检测多层音效和立体声定位 (MEDIUM)

Multiple fundamentals -> MultiLayerSound

多基频->MultiLayerSound

Inspect peaks in the spectrum. If two or more strong peaks are not integer multiples of one shared fundamental, the sound is layered.

python

from scipy.signal import find_peaks

peaks, props = find_peaks(spectrum, height=float(np.max(spectrum)) * 0.2)
peak_freqs = sorted(freqs[peaks])

检查频谱峰值。如果两个或多个强峰值不是同一基频的整数倍，说明音效是分层的。

python

from scipy.signal import find_peaks

peaks, props = find_peaks(spectrum, height=float(np.max(spectrum)) * 0.2)
peak_freqs = sorted(freqs[peaks])

Check pairwise ratios. If no shared fundamental explains all peaks, treat as layered.

检查两两比率。如果没有共同基频能解释所有峰值，则视为分层音效。


For each detected fundamental, run the full pipeline (frequency, envelope, waveform, filter, effects) and emit one `Layer` per fundamental:

```ts
{
  layers: [
    { source: { ... }, envelope: { ... }, gain: 0.2 },
    { source: { ... }, envelope: { ... }, gain: 0.15, delay: 0.04 },
  ]
}

The earlier layer typically gets

delay: 0

(omitted); subsequent layers offset their

delay

to match the measured onset gap.


对每个检测到的基频，运行完整流程（频率、包络、波形、滤波器、效果），为每个基频生成一个`Layer`：

```ts
{
  layers: [
    { source: { ... }, envelope: { ... }, gain: 0.2 },
    { source: { ... }, envelope: { ... }, gain: 0.15, delay: 0.04 },
  ]
}

第一层通常设置

delay: 0

（可省略）；后续层通过

delay

偏移匹配测量到的起始间隙。

Stereo and pan

立体声和声像

python

from analyze import analyze_stereo
stereo = analyze_stereo(data)

python

from analyze import analyze_stereo
stereo = analyze_stereo(data)

-> { "pan": 0.3, "stereo_width": 0.7 }


| `pan` magnitude | Output                          |
| --------------- | ------------------------------- |
| `< 0.05`        | omit (`pan: 0` is default)      |
| `0.05 - 1`      | `pan: <value>`                  |

`stereo_width > 0.5` with `|pan| < 0.05` suggests a stereo effect (chorus, dual-layer). Consider splitting into two layers panned `-0.5` / `+0.5`.


| `pan`绝对值 | 输出                          |
| ------------ | ------------------------------- |
| `< 0.05`     | 省略（默认`pan: 0`）      |
| `0.05 - 1`   | `pan: <value>`                  |

`stereo_width > 0.5`且`|pan| < 0.05`表明存在立体声效果（合唱、双层）。可考虑拆分为两个声像为`-0.5`/`+0.5`的层。

Fallback

回退方案

If a sound is unsynthesizable (complex transients, recorded material, irreducible texture), fall back to:

{ source: { type: "sample", url: "..." } }

and note that the original audio file should be used directly rather than re-synthesized.

如果音效无法合成（复杂瞬态、录制素材、不可简化的纹理），回退到：

{ source: { type: "sample", url: "..." } }

并说明应直接使用原始音频文件而非重新合成。

3. UI Event Recipes

3. UI事件模板

Concrete SoundDefinition templates per UI event class. Used by the prompt path as the base layer.

每个UI事件类别的具体SoundDefinition模板。作为提示路径的基础层使用。

3.1 Click - sine + low FM, very short decay (HIGH)

3.1 Click - 正弦波+弱FM，极短衰减 (HIGH)

A short ascending sine sweep with light FM. The sweep gives the click "snap"; the FM adds harmonic body without making it metallic.

Incorrect (decay too long, sounds like a chime):

{ source: { type: "sine", frequency: 1300 }, envelope: { decay: 0.5 }, gain: 0.18 }

Correct:

{
  source: { type: "sine", frequency: { start: 200, end: 700 }, fm: { ratio: 0.5, depth: 80 } },
  envelope: { attack: 0, decay: 0.06, sustain: 0, release: 0.02 },
  gain: 0.25,
}

Reference: .web-kits/core.ts

click

短升调正弦扫频加轻量FM。扫频赋予点击“脆感”；FM添加谐波质感但不产生金属感。

错误示例（衰减过长，听起来像钟鸣）：

{ source: { type: "sine", frequency: 1300 }, envelope: { decay: 0.5 }, gain: 0.18 }

正确示例：

{
  source: { type: "sine", frequency: { start: 200, end: 700 }, fm: { ratio: 0.5, depth: 80 } },
  envelope: { attack: 0, decay: 0.06, sustain: 0, release: 0.02 },
  gain: 0.25,
}

参考：.web-kits/core.ts

click

。

3.2 Complete - four-note ascending arpeggio (MEDIUM-HIGH)

3.2 Complete - 四音升调琶音 (MEDIUM-HIGH)

Same C-major triad as

success

, but with C6 added on top and tighter 15 ms

delay

increments so the notes blur into a single gesture rather than reading as discrete pitches.

Reference: .web-kits/core.ts

complete

与

success

使用相同的C大调和弦，但顶部添加C6，且

delay

增量更紧凑（15 ms），使音符融合为单个动作而非离散音调。

参考：.web-kits/core.ts

complete

。

3.3 Error - layered sawtooth + square with descending sweep (HIGH)

3.3 Error - 分层锯齿波+方波加降调扫频 (HIGH)

Two descending sweeps stacked an octave apart. Lowpass filters keep the result from being abrasive. Same shape works for

delete

(slightly longer decay).

Incorrect (no filter, sounds like a buzzer):

{ source: { type: "sawtooth", frequency: { start: 320, end: 140 } }, envelope: { decay: 0.25 }, gain: 0.22 }

Reference: .web-kits/core.ts

error

_delete

两个降调扫频叠加一个八度。低通滤波器避免结果刺耳。相同结构适用于

delete

（衰减稍长）。

错误示例（无滤波器，听起来像蜂鸣器）：

{ source: { type: "sawtooth", frequency: { start: 320, end: 140 } }, envelope: { decay: 0.25 }, gain: 0.22 }

参考：.web-kits/core.ts

error

_delete

。

3.4 Modal-close - downward sine sweep (MEDIUM)

3.4 Modal-close - 降调正弦扫频 (MEDIUM)

The inverse of

modalOpen

. Range is narrower because dismiss should feel less assertive than the entrance. Slightly lower

gain

for the same reason.

For

drawer-close

use 800 -> 350. For

dropdown-close

use 900 -> 500.

Reference: .web-kits/core.ts

modalClose

drawerClose

dropdownClose

modalOpen

的逆过程。范围更窄，因为关闭动作应比打开更柔和。

gain

也稍低。

drawer-close

使用800 -> 350。

dropdown-close

使用900 -> 500。

参考：.web-kits/core.ts

modalClose

drawerClose

dropdownClose

。

3.5 Modal-open - upward sine sweep (MEDIUM)

3.5 Modal-open - 升调正弦扫频 (MEDIUM)

A single sine sweeping from ~430 Hz up to ~1400 Hz over 80 ms. No FM, no filter; the cleanness signals "appearing".

For

drawer-open

use a slightly lower start (~350 Hz) and lower gain (~0.08). For

dropdown-open

use a smaller range (500 -> 1200) and decay ~60 ms.

Reference: .web-kits/core.ts

modalOpen

drawerOpen

dropdownOpen

单个正弦波在80 ms内从约430 Hz扫到约1400 Hz。无FM、无滤波器；纯净度表明“出现”。

drawer-open

使用稍低的起始频率（约350 Hz）和更低的

gain

（约0.08）。

dropdown-open

使用更小的范围（500 -> 1200）和60 ms左右的衰减。

参考：.web-kits/core.ts

modalOpen

drawerOpen

dropdownOpen

。

3.6 Notification - FM-rich sine with light reverb (HIGH)

3.6 Notification - 富FM正弦波加轻量混响 (HIGH)

Two FM bells a fifth apart with 100 ms

delay

between them. The

fm.ratio: 1.5

gives an inharmonic shimmer; the matched reverb on each layer glues them together.

For

ding

: single layer,

fm.ratio: 3.5

, reverb

decay: 0.8

. For

mention

: lower fundamental (660 Hz),

fm.ratio: 2.5

, slightly more attack.

Reference: .web-kits/core.ts

notification

ding

mention

badge

两个相差五度的FM钟音，间隔100 ms

delay

。

fm.ratio: 1.5

产生非谐波闪烁感；每层匹配的混响将它们融合在一起。

ding

：单层，

fm.ratio: 3.5

，混响

decay: 0.8

。

mention

：更低的基频（660 Hz），

fm.ratio: 2.5

，起音稍长。

参考：.web-kits/core.ts

notification

ding

mention

badge

。

3.7 Success - ascending three-note sine chord (HIGH)

3.7 Success - 升调三音正弦和弦 (HIGH)

Three sine layers at C5 / E5 / G5 with

delay

cascading 0.07 s between them. The top note has a small upward sweep (G5 -> A5) so the chord resolves "upward" instead of just stopping.

Layer gains sum to 0.45, comfortably under the 0.6 budget.

Reference: .web-kits/core.ts

success

三个正弦层分别为C5/E5/G5，

delay

级联间隔0.07 s。顶层有小幅度升调（G5 -> A5），使和弦向上解决而非停止。

层增益总和为0.45，远低于0.6的预算。

参考：.web-kits/core.ts

success

。

3.8 Swoosh - white noise through a sweeping bandpass (MEDIUM)

3.8 Swoosh - 白噪声通过扫频带通滤波器 (MEDIUM)

White noise is shaped by a bandpass filter whose center frequency sweeps from 300 Hz up to 4 kHz. The sweep direction is the gesture: peak above resting = upward swoosh, peak below resting (e.g., resting 2500, peak 400) = downward.

For

slide-up

use a similar shape with peak 3500. For

slide-down

flip to pink noise with

envelope: { decay: 0.12, peak: 500 }

(no attack on the filter envelope).

Reference: .web-kits/core.ts

swoosh

slide

slideUp

slideDown

白噪声由带通滤波器塑形，中心频率从300 Hz扫到4 kHz。扫频方向对应动作：峰值高于静止值->向上swoosh，峰值低于静止值（例如静止2500，峰值400）->向下swoosh。

slide-up

使用类似结构，峰值3500。

slide-down

使用粉红噪声，

envelope: { decay: 0.12, peak: 500 }

（滤波器包络无起音）。

参考：.web-kits/core.ts

swoosh

slide

slideUp

slideDown

。

3.9 Tap - static high sine + FM, ultra short (HIGH)

3.9 Tap - 静态高正弦波+FM，超短时长 (HIGH)

Single high pitch (no sweep), aggressive FM, decay under 20 ms. This is the "key-press" archetype.

Incorrect (frequency too low, sounds like a thump):

{ source: { type: "sine", frequency: 200 }, envelope: { decay: 0.015 }, gain: 0.2 }

Correct:

{
  source: { type: "sine", frequency: 1300, fm: { ratio: 0.5, depth: 100 } },
  envelope: { attack: 0, decay: 0.015, sustain: 0, release: 0.005 },
  gain: 0.2,
}

Reference: .web-kits/core.ts

tap

keyPress

单高音（无扫频），强FM，衰减小于20 ms。这是“按键”原型。

错误示例（频率过低，听起来像重击）：

{ source: { type: "sine", frequency: 200 }, envelope: { decay: 0.015 }, gain: 0.2 }

正确示例：

{
  source: { type: "sine", frequency: 1300, fm: { ratio: 0.5, depth: 100 } },
  envelope: { attack: 0, decay: 0.015, sustain: 0, release: 0.005 },
  gain: 0.2,
}

参考：.web-kits/core.ts

tap

keyPress

。

3.10 Tick - faintest possible sine (MEDIUM)

3.10 Tick - 极微弱正弦波 (MEDIUM)

Highest frequency in the tap family. Decay under 15 ms.

gain

capped at ~0.15 because ticks fire often and must not dominate.

For scroll-snap reduce

gain

to 0.08; for focus/blur reduce to 0.04-0.06.

Reference: .web-kits/core.ts

tick

scrollSnap

focus

blur

Tap家族中频率最高的音效。衰减小于15 ms。

gain

上限约0.15，因为tick频繁触发，不能过于突出。

scroll-snap将

gain

降至0.08；focus/blur降至0.04-0.06。

参考：.web-kits/core.ts

tick

scrollSnap

focus

blur

。

3.11 Toggle - paired sines with delay (direction matters) (MEDIUM)

3.11 Toggle - 带延迟的配对正弦波（方向重要） (MEDIUM)

Two short sines: C7 (2093 Hz) and G7 (3136 Hz), 25 ms apart.

```
toggle-on
```
: low note first, then high (ascending = enabling).
```
toggle-off
```
: high note first, then low (descending = disabling).

The same architecture works for

copy

(1200 Hz then 1400 Hz, 40 ms gap) and

sync

(C5 then G5).

Reference: .web-kits/core.ts

toggleOn

toggleOff

copy

sync

两个短正弦波：C7（2093 Hz）和G7（3136 Hz），间隔25 ms。

```
toggle-on
```
：先低音后高音（升调=启用）。
```
toggle-off
```
：先高音后低音（降调=禁用）。

相同结构适用于

copy

（1200 Hz后接1400 Hz，间隔40 ms）和

sync

（C5后接G5）。

参考：.web-kits/core.ts

toggleOn

toggleOff

copy

sync

。

3.12 Whoosh - longer, slower swoosh for full-page transitions (LOW-MEDIUM)

3.12 Whoosh - 更长更慢的swoosh，用于整页过渡 (LOW-MEDIUM)

Same architecture as

swoosh

but everything stretches. Filter attack is 4x longer (0.04 s vs 0.01 s) so the gesture starts gently. Slightly higher

gain

because it spans a longer time window.

pageEnter

uses bandpass peak 3000 with white noise;

pageExit

uses pink noise with the bandpass envelope inverted (decay only, peak 400).

Reference: .web-kits/core.ts

whoosh

pageEnter

pageExit

与

swoosh

结构相同，但所有参数延长。滤波器起音是原来的4倍（0.04 s vs 0.01 s），使动作开始更柔和。

gain

稍高，因为持续时间更长。

pageEnter

使用带通峰值3000和白噪声；

pageExit

使用粉红噪声，带通包络反转（仅衰减，峰值400）。

参考：.web-kits/core.ts

whoosh

pageEnter

pageExit

。

4. Mood Vocabulary

4. 情绪词汇

Adjective-to-knob mappings layered onto the base recipe.

形容词到参数的映射，叠加到基础模板上。

4.1 Airy - noise source + bandpass with high peak (LOW-MEDIUM)

4.1 Airy（空灵）- 噪声源+高峰值带通滤波器 (LOW-MEDIUM)

Mutation:

Replace
```
source
```
with
```
{ type: "noise", color: "white" }
```
.
Replace
```
filter
```
with bandpass envelope reaching a high peak (4-6 kHz).
Lengthen
```
envelope.attack
```
to 0.02-0.04 s so the result fades in rather than snapping.
Lower
```
gain
```
to 0.08-0.12.

If the base was tonal (sine, triangle, etc.), this mood replaces the source entirely - it's a structural change.

修改：

将
```
source
```
替换为
```
{ type: "noise", color: "white" }
```
。
将
```
filter
```
替换为带通包络，峰值达4-6 kHz。
将
```
envelope.attack
```
延长至0.02-0.04 s，使音效淡入而非突然出现。
将
```
gain
```
降至0.08-0.12。

如果基础是 tonal（正弦波、三角波等），此情绪会完全替换声源——这是结构性变化。

4.2 Bright - no lowpass, optional FM sparkle (MEDIUM)

4.2 Bright（明亮）- 无低通滤波器，可选FM闪烁 (MEDIUM)

Mutation:

Remove any
```
filter
```
of type
```
lowpass
```
, or raise its cutoff above 6 kHz.
If the base used
```
triangle
```
, upgrade to
```
sine
```
with
```
fm: { ratio: 2.5, depth: 50 }
```
for sparkle.
Slight
```
gain
```
bump (+0.02) is fine but stay under the budget.

修改：

移除任何
```
lowpass
```
类型的
```
filter
```
，或将其截止频率提高到6 kHz以上。
如果基础使用
```
triangle
```
，升级为
```
sine
```
并添加
```
fm: { ratio: 2.5, depth: 50 }
```
以增加闪烁感。
可小幅提高
```
gain
```
（+0.02），但需保持在预算内。

4.3 Glassy - high FM ratio + reverb (MEDIUM)

4.3 Glassy（玻璃质感）- 高FM比率+混响 (MEDIUM)

Mutation:

```
source.type: "sine"
```
.

source.fm: { ratio: 3.5, depth: 200-300 }

Append

effects: [{ type: "reverb", decay: 0.7, damping: 0.5, mix: 0.15 }]

Extend
```
envelope.decay
```
to at least 0.3 s so the bell can ring.

Reference: .web-kits/core.ts

ding

sparkle

star

修改：

```
source.type: "sine"
```
。

source.fm: { ratio: 3.5, depth: 200-300 }

。

添加

effects: [{ type: "reverb", decay: 0.7, damping: 0.5, mix: 0.15 }]

。

将
```
envelope.decay
```
延长至至少0.3 s，使钟音能够持续。

参考：.web-kits/core.ts

ding

sparkle

star

。

4.4 Lo-fi - bitcrusher + lowpass (MEDIUM)

4.4 Lo-fi（低保真）- 比特压缩器+低通滤波器 (MEDIUM)

Mutation:

Add

filter: { type: "lowpass", frequency: 1500 }

Append

effects: [{ type: "bitcrusher", bits: 6-8, mix: 0.7-1 }]

Optionally drop
```
gain
```
by 0.02 because bitcrushing adds perceived loudness.

Combines well with

mood-retro

修改：

添加

filter: { type: "lowpass", frequency: 1500 }

。

添加

effects: [{ type: "bitcrusher", bits: 6-8, mix: 0.7-1 }]

。

可选将
```
gain
```
降低0.02，因为比特压缩会增加感知响度。

与

mood-retro

搭配效果良好。

4.5 Metallic - inharmonic FM ratio (MEDIUM)

4.5 Metallic（金属质感）- 非谐波FM比率 (MEDIUM)

Mutation:

```
source.type: "sine"
```
(or
```
square
```
for a harsher result).

source.fm: { ratio: 2.76, depth: 300-400 }

- 2.76 is the inharmonic ratio used by

badge

.web-kits/core.ts

and reads as bell-metal.

Short release; metallic shouldn't sustain.

Avoid stacking with

mood-warm

- they cancel each other out.

Reference: .web-kits/core.ts

badge

修改：

```
source.type: "sine"
```
（或
```
square
```
以获得更刺耳的效果）。
```
source.fm: { ratio: 2.76, depth: 300-400 }
```
——2.76是
```
.web-kits/core.ts
```
中
```
badge
```
使用的非谐波比率，听起来像钟金属声。
短释放时间；金属质感不应持续。

避免与

mood-warm

叠加——两者会相互抵消。

参考：.web-kits/core.ts

badge

。

4.6 Organic - triangle + slight detune + light reverb (LOW-MEDIUM)

4.6 Organic（自然质感）- 三角波+轻微失谐+轻量混响 (LOW-MEDIUM)

Mutation:

```
source.type: "triangle"
```
.
Add
```
source.detune: 5-10
```
for very slight pitch wobble.
Bump
```
envelope.attack
```
from 0 to 0.003-0.008 s so the onset isn't a hard click.
Append a small reverb (
```
mix: 0.05-0.1
```
).

Combines well with

mood-warm

. Avoid combining with

mood-metallic

mood-lofi

- they fight the natural feel.

修改：

```
source.type: "triangle"
```
。
添加
```
source.detune: 5-10
```
以获得极轻微的音高摆动。
将
```
envelope.attack
```
从0提高到0.003-0.008 s，使起始不是生硬的点击。
添加小幅度混响（
```
mix: 0.05-0.1
```
）。

与

mood-warm

搭配效果良好。避免与

mood-metallic

或

mood-lofi

叠加——它们会破坏自然感。

4.7 Punchy - zero attack, very short decay (MEDIUM)

4.7 Punchy（有冲击力）- 零起音，极短衰减 (MEDIUM)

Mutation:

```
envelope.attack: 0
```
.
```
envelope.decay: <= 0.06
```
.
```
envelope.sustain: 0
```
.
```
envelope.release: <= 0.015
```
.
```
gain
```
bump of +0.05 is fine because the energy lives in a shorter window.

Orthogonal to source-shape moods - apply on top of warm/bright/glassy/metallic.

修改：

```
envelope.attack: 0
```
。
```
envelope.decay: <= 0.06
```
。
```
envelope.sustain: 0
```
。
```
envelope.release: <= 0.015
```
。
可提高
```
gain
```
+0.05，因为能量集中在更短的时间窗口。

与声源形态情绪正交——可叠加在warm/bright/glassy/metallic之上。

4.8 Retro - square or sawtooth + lowpass + bitcrusher (MEDIUM)

4.8 Retro（复古）- 方波或锯齿波+低通滤波器+比特压缩器 (MEDIUM)

Mutation:

```
source.type: "square"
```
(or
```
"sawtooth"
```
).

Add

filter: { type: "lowpass", frequency: 3000 }

to soften aliasing.

Append

effects: [{ type: "bitcrusher", bits: 8, sampleRateReduction: 2-4, mix: 1 }]

Pairs naturally with rising or stepped pitch sweeps (coins, power-ups).

修改：

```
source.type: "square"
```
（或
```
"sawtooth"
```
）。

添加

filter: { type: "lowpass", frequency: 3000 }

以柔化混叠。

添加

effects: [{ type: "bitcrusher", bits: 8, sampleRateReduction: 2-4, mix: 1 }]

。

自然搭配上升或阶梯式音高扫频（硬币、升级）。

4.9 Warm - lowpass + light reverb (MEDIUM)

4.9 Warm（温暖）- 低通滤波器+轻量混响 (MEDIUM)

Mutation applied on top of the base recipe:

Add

filter: { type: "lowpass", frequency: 2500 }

(or 2-3 kHz).

Optionally add

effects: [{ type: "reverb", decay: 0.4, mix: 0.1 }]

If the base used
```
sawtooth
```
or
```
square
```
, downgrade to
```
triangle
```
so the source itself is rounder.

If the base already had a lowpass, lower its cutoff by ~30%.

在基础模板上应用修改：

添加

filter: { type: "lowpass", frequency: 2500 }

（或2-3 kHz）。

可选添加

effects: [{ type: "reverb", decay: 0.4, mix: 0.1 }]

。

如果基础使用
```
sawtooth
```
或
```
square
```
，降级为
```
triangle
```
，使声源本身更圆润。

如果基础已有低通滤波器，将其截止频率降低约30%。

5. Layering Patterns

5. 分层模式

When to use one layer vs two vs a chord stack.

何时使用单层、双层或和弦堆叠。

5.1 Ascending chord - 3-4 layers with cascading delay (MEDIUM)

5.1 升调和弦 - 3-4层带级联延迟 (MEDIUM)

3-4 sine layers spelling out a major triad (C-E-G or C-E-G-C).

delay

increments by ~70 ms for "feels like notes" or ~15 ms for "feels like one gesture".

Top layer gets a small upward sweep so the chord resolves rather than stops.

Cap layer count at 4. Layer gains should sum to <= 0.6. If a layer has

sustain > 0

, all layers should have similar sustain values to avoid staggered ringing.

3-4个正弦层构成大调和弦（C-E-G或C-E-G-C）。

delay

增量约70 ms时“听起来像独立音符”，约15 ms时“听起来像单个动作”。

顶层添加小幅度升调，使和弦解决而非停止。

层数上限为4层。层增益总和应<=0.6。如果某层

sustain > 0

，所有层应具有相似的sustain值，避免交错持续。

5.2 Click + body - transient layer over a sustained tone (MEDIUM)

5.2 Click + body - 瞬态层叠加持续音调 (MEDIUM)

Two layers fired simultaneously (no

delay

High-frequency transient (3-5 kHz) with sub-10 ms decay - the "stick".
Lower-frequency body (80-300 Hz) with longer decay - the "drum".

Used for: send buttons, hard confirms, drum-like UI feedback, anything that needs perceived weight. Both layers use the same source

type

(usually

sine

) so they read as one event.

Gains should be roughly balanced (transient slightly quieter than body).

两层同时触发（无

delay

）：

高频瞬态（3-5 kHz），衰减<10 ms——“敲击声”。
低频主体（80-300 Hz），衰减更长——“鼓声”。

用于：发送按钮、确认操作、鼓类UI反馈、任何需要感知重量的场景。两层使用相同的声源

type

（通常是

sine

），使它们被视为同一事件。

增益应大致平衡（瞬态层稍低于主体层）。

5.3 Octave pair - two layers an octave apart with delay (MEDIUM)

5.3 八度配对 - 两层相差八度带延迟 (MEDIUM)

Two layers a fifth or octave apart, separated by 20-50 ms

delay

. Direction (low first vs high first) encodes "on" vs "off", "open" vs "close", etc.

Layer gains should sum to less than 0.5. Both envelopes should match so the second beat doesn't sound disconnected.

If you find yourself reaching for >2 layers, jump to

layer-ascending-chord

instead.

两层相差五度或八度，间隔20-50 ms

delay

。顺序（先低音后高音vs先高音后低音）编码“开”vs“关”、“打开”vs“关闭”等状态。

层增益总和应小于0.5。两层包络应匹配，避免第二个节拍听起来脱节。

如果需要超过2层，直接使用

layer-ascending-chord

。

5.4 Single layer - emit Layer directly (HIGH)

5.4 单层 - 直接输出Layer (HIGH)

When the recipe needs only one source, emit the

Layer

shape directly (not wrapped in

{ layers: [...] }

). The engine accepts both, but the bare-Layer form is the canonical compact representation.

const sound: SoundDefinition = {
  source: { type: "sine", frequency: 1300 },
  envelope: { decay: 0.012, release: 0.004 },
  gain: 0.18,
};

Use this for: click, tap, tick, hover, focus, blur, scroll-snap, single-tone notifications, simple swooshes.

当模板仅需一个声源时，直接输出

Layer

结构（不包裹在

{ layers: [...] }

中）。引擎支持两种格式，但裸Layer形式是标准紧凑表示。

const sound: SoundDefinition = {
  source: { type: "sine", frequency: 1300 },
  envelope: { decay: 0.012, release: 0.004 },
  gain: 0.18,
};

用于：click、tap、tick、hover、focus、blur、scroll-snap、单音通知、简单swoosh。

6. Effect Recipes

6. 效果模板

When and how to reach for each effect type.

何时及如何使用每种效果类型。

6.1 Bandpass noise swoosh - filter envelope is the gesture (MEDIUM)

6.1 带通噪声swoosh - 滤波器包络即动作 (MEDIUM)

Recipe is on the layer's

filter

, not its

effects

filter: {
  type: "bandpass",
  frequency: <resting Hz>,
  resonance: 1-3,
  envelope: { attack: 0.01-0.04, peak: <target Hz>, decay: 0.08-0.2 },
}

Peak above resting -> upward swoosh.
Peak below resting -> downward swoosh.
Higher
```
resonance
```
(>2) makes it whistle-like; lower (<1.5) is broader.

Source should be

noise

(white for sharp, pink for soft). Source amplitude envelope just gates the noise window.

模板在层的

filter

上，而非

effects

：

filter: {
  type: "bandpass",
  frequency: <resting Hz>,
  resonance: 1-3,
  envelope: { attack: 0.01-0.04, peak: <target Hz>, decay: 0.08-0.2 },
}

峰值高于静止值->向上swoosh。
峰值低于静止值->向下swoosh。
更高的
```
resonance
```
(>2)使其类似哨音；更低的(<1.5)更宽泛。

声源应为

noise

（白噪声更尖锐，粉红噪声更柔和）。声源振幅包络仅控制噪声窗口。

6.2 Bitcrusher - retro / lofi finish (LOW-MEDIUM)

6.2 比特压缩器 - 复古/低保真收尾 (LOW-MEDIUM)

```
bits
```
: 4-8. Lower = more crunchy. Below 4 turns into noise.
```
sampleRateReduction
```
: 1 (off) to 8 (heavy aliasing). Combine with
```
bits: 8
```
for that 8-bit console sound.
```
mix
```
: usually 1. Mixing bitcrush with the dry signal sounds muddy.

Best paired with

square

sawtooth

sources and a lowpass to soften the aliasing edges.

Avoid stacking with

effect-reverb-tail

- the quantization noise gets smeared.

```
bits
```
: 4-8。值越低越有颗粒感。低于4会变成噪声。
```
sampleRateReduction
```
: 1（关闭）到8（重度混叠）。与
```
bits: 8
```
搭配可获得8位游戏机音效。
```
mix
```
: 通常设为1。比特压缩与干信号混合会听起来浑浊。

最佳搭配

square

或

sawtooth

声源，以及低通滤波器柔化混叠边缘。

避免与

effect-reverb-tail

叠加——量化噪声会被模糊。

6.3 FM bell - high ratio, high depth (MEDIUM)

6.3 FM钟音 - 高比率，高深度 (MEDIUM)

source.fm: { ratio, depth }

is structural, not an effect node. To get a bell:

```
ratio
```
: 2.5-3.5 for harmonic-bell, 2.76 for the "badge" inharmonic clang.
```
depth
```
: 150-400. Higher depth = more strident.
```
envelope.decay
```
: at least 0.3 s so the bell can ring.

For a bright "ding", use

ratio: 3.5

depth: 250

and add reverb (

decay: 0.7, mix: 0.15

For a dull "thud" with body, use

ratio: 0.5

depth: 200

and a short envelope.

Pair with

mood-glassy

mood-metallic

source.fm: { ratio, depth }

是结构性参数，而非效果节点。要获得钟音：

```
ratio
```
: 2.5-3.5为谐波钟音，2.76为“badge”非谐波 clang 声。
```
depth
```
: 150-400。值越高越尖锐。
```
envelope.decay
```
: 至少0.3 s，使钟音能够持续。

明亮的“ding”使用

ratio: 3.5

、

depth: 250

并添加混响（

decay: 0.7, mix: 0.15

）。

低沉有质感的“thud”使用

ratio: 0.5

、

depth: 200

并搭配短包络。

与

mood-glassy

或

mood-metallic

搭配。

6.4 Lowpass warmth - the safest filter to add (MEDIUM)

6.4 低通温暖感 - 最安全的滤波器添加方式 (MEDIUM)

filter: { type: "lowpass", frequency: 2500, resonance: 0.7 }

```
frequency
```
: 1500-3000 Hz for "warm". Below 1000 starts muffling the sound.
```
resonance
```
: omit or set 0.7-1.5. Above 2 the cutoff itself starts to whistle.

Stacks safely with reverb, FM, and most moods. The fastest way to remove harshness from any source.

For dynamic warmth (bright attack -> warm sustain), add a filter envelope:

filter: {
  type: "lowpass",
  frequency: 2500,
  envelope: { attack: 0, peak: 6000, decay: 0.08 },
}

filter: { type: "lowpass", frequency: 2500, resonance: 0.7 }

```
frequency
```
: 1500-3000 Hz为“温暖”。低于1000 Hz开始模糊音效。
```
resonance
```
: 省略或设为0.7-1.5。高于2时截止频率本身会产生哨音。

可安全叠加混响、FM和大多数情绪。这是消除任何声源刺耳感的最快方法。

要获得动态温暖感（明亮起音->温暖持续），添加滤波器包络：

filter: {
  type: "lowpass",
  frequency: 2500,
  envelope: { attack: 0, peak: 6000, decay: 0.08 },
}

6.5 Reverb tail - small space, low mix (MEDIUM)

6.5 混响尾音 - 小空间，低混合比 (MEDIUM)

Default UI reverb:

```
decay
```
: 0.3-0.6 s.
```
damping
```
: 0.4-0.6 (kills high frequencies in the tail; without this the reverb sounds metallic).
```
mix
```
: 0.08-0.15. Anything above 0.2 starts to feel like a music production effect.

For per-layer reverb on bell-like sounds (notification, ding), put the reverb inside the layer's

effects

array so each note rings independently. For shared reverb on chords/transitions, put it on the top-level

effects

of the

MultiLayerSound

Avoid stacking reverb with delay - choose one.

默认UI混响：

```
decay
```
: 0.3-0.6 s.
```
damping
```
: 0.4-0.6（消除尾音中的高频；否则混响会有金属感）。
```
mix
```
: 0.08-0.15。超过0.2开始像音乐制作效果。

对于钟类音效（notification、ding）的每层混响，将混响放在层的

effects

数组中，使每个音符独立持续。对于和弦/过渡的共享混响，放在

MultiLayerSound

的顶层

effects

中。

避免同时叠加混响和延迟——二选一。

7. Output Validation

7. 输出验证

Checks every emitted SoundDefinition must pass before returning to the user.

每个输出的SoundDefinition在返回用户前必须通过的检查。

7.1 Duration cap - 1 s for transients, 3 s absolute max (MEDIUM)

7.1 时长限制 - 瞬态音效1 s以内，绝对上限3 s (MEDIUM)

Estimated total duration:

estimated = (envelope.attack ?? 0)
          + envelope.decay
          + (envelope.release ?? 0)
          + max(0, longestEffectTail)  // reverb decay, delay time * 4

Targets:

Click / tap / tick / hover / focus: <= 0.1 s.
Toggle / copy / sync: <= 0.2 s.
Modal / drawer / dropdown open/close: <= 0.3 s.
Success / complete / notification: <= 0.8 s.
Whoosh / page transition: <= 0.5 s.

Hard ceiling: 3 s. Anything longer should not be a UI sound.

The

validate

script computes the estimated duration and flags layers that exceed 3 s.

估算总时长：

estimated = (envelope.attack ?? 0)
          + envelope.decay
          + (envelope.release ?? 0)
          + max(0, longestEffectTail)  // 混响衰减、延迟时间*4

目标时长：

Click/tap/tick/hover/focus: <=0.1 s.
Toggle/copy/sync: <=0.2 s.
Modal/drawer/dropdown开/关: <=0.3 s.
Success/complete/notification: <=0.8 s.
Whoosh/页面过渡: <=0.5 s.

硬上限：3 s。任何更长的音效都不应作为UI音效。

validate

脚本计算估算时长，并标记超过3 s的层。

7.2 Envelope sanity - no zero decay, no infinite sustain without release (HIGH)

7.2 包络合理性 - 无零衰减，无无限持续而无释放 (HIGH)

Required:

```
envelope.decay > 0
```
(always). Set to 0.005 minimum.
If
```
envelope.sustain > 0
```
,
```
envelope.release
```
must be present and
```
> 0
```
.

Recommended:

```
envelope.attack
```
: 0 for percussive, 0.003-0.05 for sustained tones, up to 0.1 for ambient sounds.
```
envelope.decay + envelope.release
```
: <= 2 s for any UI sound. Above that, you're writing music, not interface feedback.
```
envelope.sustain
```
: 0 for transients, 0.03-0.15 for "rings out" tones, 0.3-0.7 only for held loops.

The

validate

script flags

decay <= 0

sustain > 0

without

release

, and total durations above 3 s.

必填项：

```
envelope.decay > 0
```
（始终）。最小值设为0.005。
如果
```
envelope.sustain > 0
```
，必须存在
```
envelope.release
```
且
```
>0
```
。

推荐值：

```
envelope.attack
```
: 打击乐设为0，持续音调设为0.003-0.05，环境音效设为0.1。
```
envelope.decay + envelope.release
```
: 任何UI音效<=2 s。超过此值则属于音乐创作，而非界面反馈。
```
envelope.sustain
```
: 瞬态音效设为0，“持续”音调设为0.03-0.15，仅循环音效设为0.3-0.7。

validate

脚本标记

decay <=0

、

sustain >0

无

release

、总时长超过3 s的情况。

7.3 Frequency bounds - 20 Hz to 20 kHz, both ends meaningful (HIGH)

7.3 频率范围 - 20 Hz到20 kHz，两端均有意义 (HIGH)

Hard bounds:

```
source.frequency
```
(or both
```
start
```
/
```
end
```
of a sweep): 20 Hz <= f <= 20000 Hz.
```
filter.frequency
```
: 20 Hz <= f <= 20000 Hz.
```
filter.envelope.peak
```
: same range as
```
filter.frequency
```
.

Recommended UI bounds:

Tonal sources: 80 Hz <= f <= 8000 Hz.
High transient layers (clicks, sticks): up to 5 kHz.
Sub layers (body, drum): 60-200 Hz.

Anything above 8 kHz risks being inaudible on phone speakers; anything below 60 Hz risks being inaudible on laptop speakers.

The

validate

script flags any frequency outside the hard bounds.

硬范围：

```
source.frequency
```
（或扫频的
```
start
```
/
```
end
```
）：20 Hz <=f <=20000 Hz.
```
filter.frequency
```
: 20 Hz <=f <=20000 Hz.
```
filter.envelope.peak
```
: 与
```
filter.frequency
```
范围相同。

推荐UI范围：

tonal声源：80 Hz <=f <=8000 Hz.
高频瞬态层（clicks、sticks）：最高到5 kHz.
低频层（body、drum）：60-200 Hz.

8 kHz以上的音效在手机扬声器上可能无法听见；60 Hz以下的音效在笔记本扬声器上可能无法听见。

validate

脚本标记超出硬范围的频率。

7.4 Gain budget - keep total layer gain under 0.6 (HIGH)

7.4 增益预算 - 总层增益低于0.6 (HIGH)

Single layer:

```
gain
```
between 0.04 and 0.3 for typical UI events.
Background ticks/scroll-snaps: 0.04-0.10.
Mid-importance (click, tap, hover): 0.12-0.20.
High-importance (success, notification): 0.16-0.25.

Multi-layer:

Sum of all
```
layer.gain
```
values must be <= 0.6.
If you exceed it, scale every layer proportionally rather than picking one to lower.

If a sound includes a heavy reverb (

mix > 0.15

) or distortion, lower the gain budget by 20%.

The

validate

script flags both individual layers above 0.4 and totals above 0.6.

单层：

```
gain
```
在0.04-0.3之间，适用于典型UI事件。
背景tick/scroll-snap: 0.04-0.10.
中等重要性（click、tap、hover）: 0.12-0.20.
高重要性（success、notification）: 0.16-0.25.

多层：

所有
```
layer.gain
```
值总和必须<=0.6.
如果超出，按比例缩放所有层，而非仅降低某一层。

如果音效包含重度混响（

mix >0.15

）或失真，将增益预算降低20%。

validate

脚本标记单个层增益超过0.4或总增益超过0.6的情况。

7.5 Schema conformance - validate against patch.schema.json (CRITICAL)

7.5 schema一致性 - 验证patch.schema.json (CRITICAL)

Every emitted

SoundDefinition

must validate against packages/audio/schemas/patch.schema.json (

#/$defs/SoundDefinition

Common mistakes:

Missing
```
decay
```
in
```
envelope
```
(required).
Missing
```
target
```
in
```
lfo
```
(required).
Setting
```
pan
```
outside
```
[-1, 1]
```
.

Using a

filter.type

that isn't one of

lowpass | highpass | bandpass | notch | allpass | peaking | lowshelf | highshelf | iir

Adding a top-level field that isn't in
```
Layer
```
or
```
MultiLayerSound
```
(e.g.
```
name
```
,
```
description
```
). The schema is
```
additionalProperties: false
```
.
Confusing
```
MultiLayerSound.effects
```
(chain on the mixed bus) with
```
Layer.effects
```
(chain on a single layer).

The

validate

script invokes the JSON Schema validator on every rule's

example

field. Any violation aborts the build.

每个输出的

SoundDefinition

必须通过packages/audio/schemas/patch.schema.json（

#/$defs/SoundDefinition

）的验证。

常见错误：

```
envelope
```
中缺少
```
decay
```
（必填）。
```
lfo
```
中缺少
```
target
```
（必填）。
```
pan
```
设置超出
```
[-1,1]
```
范围。

使用的

filter.type

不属于

lowpass | highpass | bandpass | notch | allpass | peaking | lowshelf | highshelf | iir

。

添加了
```
Layer
```
或
```
MultiLayerSound
```
中没有的顶层字段（如
```
name
```
、
```
description
```
）。schema设置为
```
additionalProperties: false
```
。
混淆
```
MultiLayerSound.effects
```
（混合总线链）与
```
Layer.effects
```
（单层层链）。

validate

脚本对每个规则的

example

字段调用JSON Schema验证。任何违规都会中止构建。