create-sound

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Create Sound

创建音效

Generated from
rules/*.md
by
src/build.mjs
. Do not edit by hand.
Pick a generation path with
pipeline-detect-input
, then walk the matching section.
rules/*.md
通过
src/build.mjs
生成,请勿手动编辑。
通过
pipeline-detect-input
选择生成路径,然后执行对应章节的步骤。

1. Generation Pipeline

1. 生成流程

Procedural steps the agent runs end-to-end. Start here when handling any create-sound request.
代理端执行的全流程步骤。处理任何创建音效请求时均从此处开始。

1.1 Detect input mode and route the request (CRITICAL)

1.1 检测输入模式并路由请求 (CRITICAL)

Decide which path to run based on what the user provided.
InputPath
Prompt only (no audio attachment)Skip
interpret-*
. Go to
pipeline-pick-base-layer
.
Audio file onlyRun all
interpret-*
rules. Skip
event-*
/
mood-*
.
Both prompt and audioRun
interpret-*
first, then treat the prompt as a refinement layer over the measured
SoundDefinition
.
根据用户提供的内容决定执行哪条路径。
输入内容路径
仅提示(无音频附件)跳过
interpret-*
步骤,直接进入
pipeline-pick-base-layer
仅音频文件执行所有
interpret-*
规则,跳过
event-*
/
mood-*
步骤。
同时提供提示和音频先执行
interpret-*
规则,再将提示作为优化层应用到已解析的
SoundDefinition
上。

Detecting audio

音频检测

Look for attached files matching
*.wav
,
*.mp3
,
*.flac
,
*.ogg
, or any path the user references that resolves to an audio file. A JSON manifest (
*.json
next to a sprite) is also an audio-path signal.
查找匹配
*.wav
*.mp3
*.flac
*.ogg
格式的附件,或用户提及的可解析为音频文件的路径。sprite文件旁的JSON清单(
*.json
)也视为音频路径信号。

Refinement examples (prompt + audio)

优化示例(提示+音频)

Prompt qualifierRefinement on measured definition
"warmer"add
filter: { type: "lowpass", frequency: 2500 }
"shorter" / "punchier"clamp
envelope.decay
to
<= 0.06
"brighter"drop or raise any lowpass cutoff
"with reverb"append
effects: [{ type: "reverb", decay: 0.5, mix: 0.15 }]
"lower octave"halve
source.frequency
(or both
start
/
end
)
提示限定词对已解析定义的优化操作
"warmer"(更温暖)添加
filter: { type: "lowpass", frequency: 2500 }
"shorter" / "punchier"(更短促/更有冲击力)
envelope.decay
限制为
<= 0.06
"brighter"(更明亮)移除低通滤波器或提高其截止频率
"with reverb"(添加混响)添加
effects: [{ type: "reverb", decay: 0.5, mix: 0.15 }]
"lower octave"(低一个八度)
source.frequency
减半(或同时调整
start
/
end
值)

Output of this step

本步骤输出

Produce an internal note like:
Input: prompt + audio
Plan: run interpret-* on out/click.wav, then refine with mood-warm.
Then proceed to the next pipeline step.
生成类似以下的内部记录:
输入:提示+音频
计划:对out/click.wav执行interpret-*规则,然后用mood-warm进行优化。
随后进入下一个流程步骤。

1.2 Pick a base layer from the prompt's event class (CRITICAL)

1.2 根据提示的事件类别选择基础层 (CRITICAL)

Tokenize the prompt and find the strongest event-class signal. Match against the
event-*
rules.
对提示进行分词,找到最匹配的事件类别信号,与
event-*
规则进行匹配。

Token map

分词映射

Tokens in promptEvent rule
click, tap, key, press, button
event-click
/
event-tap
tick, scroll, snap, focus
event-tick
success, complete, win, achievement, level-up, confetti
event-success
/
event-complete
error, fail, wrong, invalid, delete, destroy
event-error
modal, dialog, popup, drawer, sheet, sidebar, dropdown, menu
event-modal-open
/
event-modal-close
swoosh, slide, transition, page, tab
event-swoosh
/
event-whoosh
notification, alert, ding, bell, mention, badge
event-notification
toggle, switch, on, off
event-toggle
提示中的分词事件规则
click, tap, key, press, button
event-click
/
event-tap
tick, scroll, snap, focus
event-tick
success, complete, win, achievement, level-up, confetti
event-success
/
event-complete
error, fail, wrong, invalid, delete, destroy
event-error
modal, dialog, popup, drawer, sheet, sidebar, dropdown, menu
event-modal-open
/
event-modal-close
swoosh, slide, transition, page, tab
event-swoosh
/
event-whoosh
notification, alert, ding, bell, mention, badge
event-notification
toggle, switch, on, off
event-toggle

Direction tokens (open vs close)

方向分词(打开vs关闭)

  • "open", "appear", "in", "show", "expand", "confirm" -> ascending pitch.
  • "close", "dismiss", "out", "hide", "collapse", "cancel" -> descending pitch.
  • "open", "appear", "in", "show", "expand", "confirm" -> 升调。
  • "close", "dismiss", "out", "hide", "collapse", "cancel" -> 降调。

Output

输出

A starting
SoundDefinition
literal copied from the chosen event rule's
example
. The next step (
pipeline-apply-mood
) will mutate it.
If no event class fires confidently, default to
event-click
and let mood adjectives do the work.
从所选事件规则的
example
中复制起始
SoundDefinition
字面量。下一步(
pipeline-apply-mood
)将对其进行修改。
如果无法确定匹配的事件类别,默认使用
event-click
,并通过情绪形容词调整效果。

1.3 Apply mood adjectives onto the base layer (HIGH)

1.3 将情绪形容词应用到基础层 (HIGH)

After
pipeline-pick-base-layer
produces a starting
SoundDefinition
, scan the prompt for adjective tokens and apply each
mood-*
rule's mutation in order.
pipeline-pick-base-layer
生成起始
SoundDefinition
后,扫描提示中的形容词分词,按顺序应用每个
mood-*
规则的修改。

Order of application

应用顺序

  1. Source-shape adjectives (
    warm
    ,
    bright
    ,
    glassy
    ,
    metallic
    ,
    lofi
    ,
    retro
    ,
    organic
    ) - mutate
    source.type
    ,
    source.fm
    , or add
    filter
    .
  2. Envelope adjectives (
    punchy
    ,
    airy
    ) - mutate
    envelope.attack
    /
    envelope.decay
    .
  3. Effect adjectives (
    reverby
    ,
    delayed
    ,
    crushed
    ) - append to
    effects
    .
  1. 声源形态形容词(
    warm
    bright
    glassy
    metallic
    lofi
    retro
    organic
    )——修改
    source.type
    source.fm
    或添加
    filter
  2. 包络形容词(
    punchy
    airy
    )——修改
    envelope.attack
    /
    envelope.decay
  3. 效果形容词(
    reverby
    delayed
    crushed
    )——添加到
    effects
    数组。

Conflict resolution

冲突解决

  • warm
    +
    bright
    -> the later token wins.
  • lofi
    +
    glassy
    -> apply both, but cap
    effects
    at 2 entries.
  • punchy
    +
    airy
    -> they're orthogonal (envelope vs source); both apply.
  • warm
    +
    bright
    -> 后出现的分词优先级更高。
  • lofi
    +
    glassy
    -> 同时应用,但
    effects
    条目上限为2个。
  • punchy
    +
    airy
    -> 两者互不冲突(包络vs声源),均应用。

Refinement on existing definition (audio + prompt path)

对已有定义的优化(音频+提示路径)

When the input mode is
prompt + audio
, treat each adjective as a refinement on the measured definition rather than from scratch:
AdjectiveRefinement
warmeradd or lower
filter.frequency
(lowpass at ~2500 Hz)
brighterremove lowpass or raise its cutoff above 6 kHz
punchierclamp
envelope.decay <= 0.06
, set
envelope.attack: 0
longerextend
envelope.decay
and add
release
if missing
crisperraise
gain
slightly and add
fm: { ratio: 0.5, depth: 50 }
当输入模式为
提示+音频
时,将每个形容词视为对已解析定义的优化,而非从头生成:
形容词优化操作
warmer添加或降低
filter.frequency
(低通滤波器约2500 Hz)
brighter移除低通滤波器或将其截止频率提高到6 kHz以上
punchier
envelope.decay
限制为
<= 0.06
,设置
envelope.attack: 0
longer延长
envelope.decay
,若缺少
release
则添加该字段
crisper小幅提高
gain
并添加
fm: { ratio: 0.5, depth: 50 }

Output

输出

A mutated
SoundDefinition
. Hand off to
pipeline-decide-layering
.
生成修改后的
SoundDefinition
,传递给
pipeline-decide-layering

1.4 Decide single-layer vs multi-layer (MEDIUM-HIGH)

1.4 决定单层或多层结构 (MEDIUM-HIGH)

Event classDefault
click, tap, tick, hover, focus, swoosh1 layer (
Layer
)
toggle, copy, send, sync2 layers (paired pitches with
delay
)
success, complete, level-up, confetti3+ layers (chord with cascading
delay
)
error, delete2 layers (
sawtooth
+
square
)
See
layer-single
,
layer-octave-pair
,
layer-ascending-chord
,
layer-click-plus-body
for the concrete shapes.
事件类别默认结构
click, tap, tick, hover, focus, swoosh1层(
Layer
toggle, copy, send, sync2层(带
delay
的配对音调)
success, complete, level-up, confetti3+层(带级联
delay
的和弦)
error, delete2层(
sawtooth
+
square
具体结构可参考
layer-single
layer-octave-pair
layer-ascending-chord
layer-click-plus-body

Promoting a single Layer to MultiLayerSound

将单层Layer升级为MultiLayerSound

If the prompt or refinement requires more than one layer, wrap:
ts
{
  layers: [<existing layer>, <new layer>],
  // optional global effects, e.g. sidechain compressor, master EQ
}
Per-layer
gain
values should sum to no more than ~0.6 (see
validate-gain-budget
).
如果提示或优化需要多层结构,可进行包裹:
ts
{
  layers: [<existing layer>, <new layer>],
  // 可选全局效果,例如侧链压缩器、主EQ
}
每层的
gain
值总和不应超过约0.6(参考
validate-gain-budget
)。

Demoting MultiLayerSound to a single Layer

将MultiLayerSound降级为单层Layer

If only one layer survives mood application, emit the inner
Layer
directly rather than a one-element
MultiLayerSound
. Both validate, but the single-layer form is the canonical compact shape.
如果应用情绪规则后仅剩下一层,直接输出内部的
Layer
而非单元素
MultiLayerSound
。两种格式均有效,但单层形式是标准紧凑结构。

1.5 Emit, optionally render, optionally round-trip (HIGH)

1.5 输出、可选渲染、可选往返验证 (HIGH)

1. Emit

1. 输出

Always return a TypeScript snippet ready to paste into a
.web-kits/<patch>.ts
file:
ts
import type { SoundDefinition } from "@web-kits/audio";

export const myClick: SoundDefinition = {
  source: { type: "sine", frequency: 1300, fm: { ratio: 0.5, depth: 60 } },
  envelope: { decay: 0.012, release: 0.004 },
  gain: 0.18,
};
Plus a one-line rationale that names the prompt tokens you acted on:
"click" -> base from
event-click
; "warm" -> kept default sine, no extra filter needed at 1.3 kHz.
始终返回可直接粘贴到
.web-kits/<patch>.ts
文件的TypeScript代码片段:
ts
import type { SoundDefinition } from "@web-kits/audio";

export const myClick: SoundDefinition = {
  source: { type: "sine", frequency: 1300, fm: { ratio: 0.5, depth: 60 } },
  envelope: { decay: 0.012, release: 0.004 },
  gain: 0.18,
};
同时添加一行说明,列出你依据的提示分词:
"click" -> 基于
event-click
生成基础结构;"warm" -> 保留默认正弦波,1.3 kHz下无需额外滤波器。

2. Optional preview render

2. 可选预览渲染

If the user asked for a WAV (or you want to grade your own output), use
packages/audio/src/offline.ts
:
ts
import { renderToWav } from "@web-kits/audio";
import { writeFile } from "node:fs/promises";

const blob = await renderToWav(myClick, { duration: 0.3 });
await writeFile("preview.wav", Buffer.from(await blob.arrayBuffer()));
duration
should be
attack + decay + release + 0.05
(small tail) or longer if reverb is present.
如果用户要求生成WAV(或你需要验证输出效果),使用
packages/audio/src/offline.ts
ts
import { renderToWav } from "@web-kits/audio";
import { writeFile } from "node:fs/promises";

const blob = await renderToWav(myClick, { duration: 0.3 });
await writeFile("preview.wav", Buffer.from(await blob.arrayBuffer()));
duration
应设置为
attack + decay + release + 0.05
(小尾巴),若包含混响则需更长时间。

3. Optional round-trip validation

3. 可选往返验证

If you generated from a prompt and want to confirm the result matches intent, run the
interpret-*
rules against the rendered WAV and diff measured vs intended values:
FieldAcceptable drift
Fundamental Hz±5%
Attack±2 ms
Decay±10%
Spectral centroid±20% of expected for the chosen waveform
If drift exceeds tolerance, refine the definition (often by raising/lowering
gain
, tightening
envelope
, or adjusting
filter.frequency
) and render again.
如果根据提示生成了音效,想要确认结果符合预期,可对渲染后的WAV执行
interpret-*
规则,对比解析值与预期值:
字段可接受偏差范围
基频Hz±5%
Attack(起音)±2 ms
Decay(衰减)±10%
频谱重心所选波形预期值的±20%
如果偏差超出容忍范围,优化定义(通常是调整
gain
、收紧
envelope
或修改
filter.frequency
)并重新渲染。

2. Audio Interpretation

2. 音频解析

FFT analysis sub-steps that fire when the user shares an audio file.
当用户分享音频文件时触发的FFT分析子步骤。

2.1 Acquire and split source audio (HIGH)

2.1 获取并拆分源音频 (HIGH)

The user shared a single file or a sprite (one file containing many sounds). Before any FFT work, get one mono WAV per sound on disk.
用户分享了单个文件或sprite(包含多个音效的单个文件)。在进行FFT分析前,先将每个音效保存为磁盘上的单声道WAV文件。

Sprite from an npm package

来自npm包的Sprite

bash
npm pack <package-name> --pack-destination /tmp
tar -xzf /tmp/<package-name>-*.tgz -C /tmp
Look for the MP3/WAV plus any JSON manifest mapping sound names to time offsets.
bash
npm pack <package-name> --pack-destination /tmp
tar -xzf /tmp/<package-name>-*.tgz -C /tmp
查找MP3/WAV文件及对应的JSON清单(映射音效名称到时间偏移量)。

Manifest-driven slicing

基于清单的切片

bash
ffmpeg -i sprite.mp3 \
  -ss <start_seconds> -t <duration_seconds> \
  -acodec pcm_s16le -ar 44100 \
  output/<name>.wav
bash
ffmpeg -i sprite.mp3 \
  -ss <start_seconds> -t <duration_seconds> \
  -acodec pcm_s16le -ar 44100 \
  output/<name>.wav

Silence-detection slicing (no manifest)

基于静音检测的切片(无清单)

bash
ffmpeg -i sprite.mp3 -af silencedetect=noise=-40dB:d=0.05 -f null -
Read the
silence_start
/
silence_end
lines and slice between gaps.
bash
ffmpeg -i sprite.mp3 -af silencedetect=noise=-40dB:d=0.05 -f null -
读取
silence_start
/
silence_end
行,在间隙处进行切片。

Output convention

输出约定

Per-sound WAVs go in
out/<name>.wav
(mono, 44.1 kHz, 16-bit PCM). Downstream interpret rules call
analyze.load_mono(path)
from src/analyze.py.
每个音效的WAV文件保存到
out/<name>.wav
(单声道,44.1 kHz,16位PCM)。后续解析规则调用src/analyze.py中的
analyze.load_mono(path)

2.2 Extract fundamental frequency and pitch sweep (HIGH)

2.2 提取基频和音高变化 (HIGH)

Sample the spectrum at multiple time slices to detect both the static pitch and any sweep.
python
from analyze import load_mono, analyze_slice

sample_rate, data = load_mono("out/click.wav")

slices = [0, 5, 10, 20, 50]  # ms
freqs_over_time = [analyze_slice(data, sample_rate, t) for t in slices]
在多个时间切片采样频谱,检测静态音高和音高变化。
python
from analyze import load_mono, analyze_slice

sample_rate, data = load_mono("out/click.wav")

slices = [0, 5, 10, 20, 50]  # ms
freqs_over_time = [analyze_slice(data, sample_rate, t) for t in slices]

Mapping

映射

ObservationOutput
All slices within ±5%
source.frequency: <Hz>
(static)
Decreasing across slices
source.frequency: { start: <high>, end: <low> }
Increasing across slices
source.frequency: { start: <low>, end: <high> }
观察结果输出
所有切片偏差在±5%以内
source.frequency: <Hz>
(静态)
切片间频率递减
source.frequency: { start: <high>, end: <low> }
切片间频率递增
source.frequency: { start: <low>, end: <high> }

Tips

提示

  • Skip the first 1-2 ms if the onset is a click transient; it pollutes the FFT.
  • For very short sounds (< 20 ms) use fewer slices and a smaller window.
  • Use a Hanning window before FFT (already applied in
    analyze_slice
    ) to reduce spectral leakage.
  • 如果起始部分是点击瞬态,跳过前1-2 ms,避免污染FFT结果。
  • 对于极短音效(<20 ms),使用更少切片和更小窗口。
  • FFT前使用汉宁窗口(
    analyze_slice
    中已应用)减少频谱泄漏。

2.3 Extract ADSR envelope from amplitude (HIGH)

2.3 从振幅提取ADSR包络 (HIGH)

Smooth the time-domain amplitude, find onset/peak/sustain/end, and derive each ADSR stage.
python
from analyze import load_mono, extract_envelope

sample_rate, data = load_mono("out/click.wav")
env = extract_envelope(data, sample_rate)
平滑时域振幅,找到起始/峰值/持续/结束点,推导每个ADSR阶段。
python
from analyze import load_mono, extract_envelope

sample_rate, data = load_mono("out/click.wav")
env = extract_envelope(data, sample_rate)

-> { "attack": 0.0008, "decay": 0.012, "sustain": 0.0, "release": 0.005 }

-> { "attack": 0.0008, "decay": 0.012, "sustain": 0.0, "release": 0.005 }

undefined
undefined

Output shape

输出结构

The dict maps 1:1 to the
Envelope
type:
ts
envelope: {
  attack: env.attack,    // 0 if percussive
  decay: env.decay,
  sustain: env.sustain,  // 0 for transient sounds, 0-1 for sustained
  release: env.release,
}
该字典与
Envelope
类型1:1对应:
ts
envelope: {
  attack: env.attack,    // 打击乐设为0
  decay: env.decay,
  sustain: env.sustain,  // 瞬态音效设为0,持续音效设为0-1
  release: env.release,
}

Heuristics

启发式规则

  • sustain < 0.01
    -> drop the field; the sound is percussive.
  • attack < 0.001
    -> set
    attack: 0
    .
  • release < 0.005
    -> clamp to
    0.005
    to avoid clicks at the end.
  • sustain < 0.01
    -> 移除该字段,音效为打击乐类型。
  • attack < 0.001
    -> 设置
    attack: 0
    .
  • release < 0.005
    -> 限制为
    0.005
    ,避免结尾出现咔哒声。

2.4 Classify oscillator waveform from harmonics (HIGH)

2.4 从谐波分类振荡器波形 (HIGH)

Compare the amplitude of the first 8 harmonics against the fundamental.
python
import numpy as np
from scipy.fft import rfft, rfftfreq
from analyze import classify_waveform

segment = data[:int(sample_rate * 0.02)].astype(float)
segment *= np.hanning(len(segment))
spectrum = np.abs(rfft(segment))
freqs = rfftfreq(len(segment), 1 / sample_rate)

waveform = classify_waveform(spectrum, freqs, fundamental_freq)
对比前8次谐波与基频的振幅。
python
import numpy as np
from scipy.fft import rfft, rfftfreq
from analyze import classify_waveform

segment = data[:int(sample_rate * 0.02)].astype(float)
segment *= np.hanning(len(segment))
spectrum = np.abs(rfft(segment))
freqs = rfftfreq(len(segment), 1 / sample_rate)

waveform = classify_waveform(spectrum, freqs, fundamental_freq)

-> "sine" | "triangle" | "square" | "sawtooth" | "wavetable"

-> "sine" | "triangle" | "square" | "sawtooth" | "wavetable"

undefined
undefined

Mapping

映射

Pattern
source.type
Fundamental only, harmonics < -40 dB
sine
Odd harmonics rolling off as 1/n
triangle
Odd harmonics at roughly equal amplitude
square
All harmonics rolling off as 1/n
sawtooth
Custom harmonic profile (none of the above)
wavetable
No clear harmonic structure, broadband energy
noise
模式
source.type
仅基频,谐波<-40 dB
sine
奇次谐波按1/n衰减
triangle
奇次谐波振幅大致相等
square
所有谐波按1/n衰减
sawtooth
自定义谐波分布(不符合以上任何一种)
wavetable
无清晰谐波结构,宽频能量
noise

When to fall back to wavetable

何时回退到wavetable

If the harmonic profile doesn't match a clean oscillator, extract the harmonic series instead:
python
from analyze import extract_harmonics
harmonics = extract_harmonics(spectrum, freqs, fundamental_freq, num_harmonics=16)
如果谐波分布不符合标准振荡器,提取谐波序列:
python
from analyze import extract_harmonics
harmonics = extract_harmonics(spectrum, freqs, fundamental_freq, num_harmonics=16)

-> { source: { type: "wavetable", harmonics, frequency: fundamental_freq } }

-> { source: { type: "wavetable", harmonics, frequency: fundamental_freq } }

undefined
undefined

Noise color

噪声色彩

For broadband signals with no fundamental, classify by spectral slope:
python
from analyze import classify_noise_color
color = classify_noise_color(spectrum, freqs)  # "white" | "pink" | "brown"
对于无基频的宽频信号,按频谱斜率分类:
python
from analyze import classify_noise_color
color = classify_noise_color(spectrum, freqs)  # "white" | "pink" | "brown"

-> { source: { type: "noise", color } }

-> { source: { type: "noise", color } }

undefined
undefined

2.5 Detect filter type, cutoff, and resonance (MEDIUM-HIGH)

2.5 检测滤波器类型、截止频率和共振 (MEDIUM-HIGH)

Compare the measured spectrum against the expected spectrum for the identified oscillator.
对比测量频谱与已识别振荡器的预期频谱。

Cutoff via spectral centroid

通过频谱重心确定截止频率

python
from analyze import spectral_centroid
centroid = spectral_centroid(spectrum, freqs)
Expected centroids at a 440 Hz fundamental: sine ~440, triangle ~880, sawtooth ~2200, square ~1760. If the measured centroid is significantly lower than expected, a
lowpass
is present; estimate cutoff at the -3 dB point.
python
from analyze import spectral_centroid
centroid = spectral_centroid(spectrum, freqs)
基频440 Hz时的预期重心:正弦波440,三角波880,锯齿波2200,方波1760。如果测量重心远低于预期,说明存在
lowpass
滤波器;在-3 dB点估算截止频率。

Filter type from rolloff

从衰减斜率判断滤波器类型

Observation
filter.type
High-frequency rolloff steeper than the source would produce
lowpass
Low-frequency rolloff
highpass
Narrow band of frequencies passes through
bandpass
Narrow notch removed
notch
Resonant peak near cutoffHigh
resonance
观察结果
filter.type
高频衰减斜率比声源自身更陡峭
lowpass
低频衰减
highpass
窄带频率通过
bandpass
窄带频率被移除
notch
截止频率附近存在共振峰
resonance

Resonance (Q)

共振(Q值)

python
from analyze import estimate_resonance
q = estimate_resonance(spectrum, freqs, cutoff_hz)
python
from analyze import estimate_resonance
q = estimate_resonance(spectrum, freqs, cutoff_hz)

Returns 0.1 - 20.0

返回0.1 - 20.0

undefined
undefined

Filter envelope

滤波器包络

If brightness changes over time (bright attack fading to dull), there's a filter envelope:
python
from analyze import detect_filter_envelope
env = detect_filter_envelope(data, sample_rate)
如果亮度随时间变化(明亮起音逐渐变为低沉),说明存在滤波器包络:
python
from analyze import detect_filter_envelope
env = detect_filter_envelope(data, sample_rate)

-> { "peak": 4000, "resting": 800, "decay_ms": 50 } or None

-> { "peak": 4000, "resting": 800, "decay_ms": 50 } 或 None


Maps to:

```ts
filter: {
  type: "lowpass",
  frequency: env.resting,
  envelope: { attack: 0, peak: env.peak, decay: env.decay_ms / 1000 },
}

映射为:

```ts
filter: {
  type: "lowpass",
  frequency: env.resting,
  envelope: { attack: 0, peak: env.peak, decay: env.decay_ms / 1000 },
}

2.6 Detect post-source effects (MEDIUM)

2.6 检测声源后效果 (MEDIUM)

Each detector returns a confidence-flavored hint, not a guarantee. Effects are harder to extract than source/envelope - report low confidence when ambiguous.
每个检测器返回带置信度的提示,而非绝对结论。效果提取比声源/包络更难——结果模糊时报告低置信度。

Reverb

混响

python
from analyze import detect_reverb
result = detect_reverb(data, sample_rate, envelope_end_ms=120)
python
from analyze import detect_reverb
result = detect_reverb(data, sample_rate, envelope_end_ms=120)

-> { "type": "reverb", "decay": 0.6 } or None

-> { "type": "reverb", "decay": 0.6 } 或 None

undefined
undefined

Delay (autocorrelation)

延迟(自相关)

python
from analyze import detect_delay
result = detect_delay(data, sample_rate)
python
from analyze import detect_delay
result = detect_delay(data, sample_rate)

-> { "type": "delay", "time": 0.25, "feedback": 0.3 } or None

-> { "type": "delay", "time": 0.25, "feedback": 0.3 } 或 None

undefined
undefined

FM synthesis

FM合成

Spectral sidebands at non-integer ratios of the fundamental indicate FM:
python
from analyze import detect_fm
fm = detect_fm(spectrum, freqs, fundamental_freq)
基频非整数倍的频谱边带表明存在FM:
python
from analyze import detect_fm
fm = detect_fm(spectrum, freqs, fundamental_freq)

-> { "fm": { "ratio": 0.5, "depth": 80 } } or None

-> { "fm": { "ratio": 0.5, "depth": 80 } } 或 None


Maps to `source.fm: { ratio, depth }` (not a separate effect).

映射为`source.fm: { ratio, depth }`(不是独立效果)。

Tremolo and vibrato

颤音和震音

Periodic amplitude or frequency modulation in the 1-20 Hz band suggests tremolo/vibrato. Track amplitude or pitch over time and call
detect_lfo
(see
interpret-detect-lfo
).
1-20 Hz频段的周期性振幅或频率调制表明存在颤音/震音。随时间跟踪振幅或音高,调用
detect_lfo
(参考
interpret-detect-lfo
)。

Bitcrusher / distortion

比特压缩器/失真

Time-domain signatureEffect
Stepped/quantized waveform with aliasing artifacts
bitcrusher
Flat-topped waveform with added harmonics
distortion
时域特征效果类型
带混叠伪影的阶梯状/量化波形
bitcrusher
平顶波形并添加谐波
distortion

Chorus / flanger / phaser

合唱/镶边/移相

Comb-filter pattern that sweeps over time produces moving notches in the spectrum. Hard to disambiguate algorithmically; flag for human review.
随时间变化的梳状滤波器模式会在频谱中产生移动的陷波。算法难以区分,标记为需人工审核。

2.7 Detect LFO modulation (LOW-MEDIUM)

2.7 检测LFO调制 (LOW-MEDIUM)

An LFO is sub-audio (0.1-20 Hz) periodic modulation of a parameter. Track the parameter over time, then run
detect_lfo
.
python
from analyze import detect_lfo
LFO是亚音频(0.1-20 Hz)的周期性参数调制。随时间跟踪参数,然后运行
detect_lfo
python
from analyze import detect_lfo

1. Track amplitude (or pitch, or spectral centroid) at regular intervals

1. 定期跟踪振幅(或音高、频谱重心)

window_ms = 10 samples_per_window = int(sample_rate * window_ms / 1000) amp_over_time = [ float(np.max(np.abs(data[i:i + samples_per_window]))) for i in range(0, len(data) - samples_per_window, samples_per_window) ]
window_ms = 10 samples_per_window = int(sample_rate * window_ms / 1000) amp_over_time = [ float(np.max(np.abs(data[i:i + samples_per_window]))) for i in range(0, len(data) - samples_per_window, samples_per_window) ]

2. Detect periodicity

2. 检测周期性

lfo = detect_lfo(np.array(amp_over_time), 1000 / window_ms)
lfo = detect_lfo(np.array(amp_over_time), 1000 / window_ms)

-> { "frequency": 5.0, "depth": 0.12 } or None

-> { "frequency": 5.0, "depth": 0.12 } 或 None

undefined
undefined

Mapping by tracked parameter

按跟踪参数映射

Parameter trackedLFO target
Amplitude
gain
Pitch
frequency
or
detune
Spectral centroid
filter.frequency
Pan position
pan
跟踪参数LFO目标
振幅
gain
音高
frequency
detune
频谱重心
filter.frequency
声像位置
pan

Output

输出

ts
lfo: { type: "sine", frequency: lfo.frequency, depth: lfo.depth, target: "gain" }
Pick
type
based on the shape of the modulation: smooth sinusoid ->
sine
, sharp ramp ->
sawtooth
, hard switching ->
square
.
ts
lfo: { type: "sine", frequency: lfo.frequency, depth: lfo.depth, target: "gain" }
根据调制形状选择
type
:平滑正弦曲线->
sine
,尖锐斜坡->
sawtooth
,硬切换->
square

2.8 Detect multi-layer sounds and stereo positioning (MEDIUM)

2.8 检测多层音效和立体声定位 (MEDIUM)

Multiple fundamentals -> MultiLayerSound

多基频->MultiLayerSound

Inspect peaks in the spectrum. If two or more strong peaks are not integer multiples of one shared fundamental, the sound is layered.
python
from scipy.signal import find_peaks

peaks, props = find_peaks(spectrum, height=float(np.max(spectrum)) * 0.2)
peak_freqs = sorted(freqs[peaks])
检查频谱峰值。如果两个或多个强峰值不是同一基频的整数倍,说明音效是分层的。
python
from scipy.signal import find_peaks

peaks, props = find_peaks(spectrum, height=float(np.max(spectrum)) * 0.2)
peak_freqs = sorted(freqs[peaks])

Check pairwise ratios. If no shared fundamental explains all peaks, treat as layered.

检查两两比率。如果没有共同基频能解释所有峰值,则视为分层音效。


For each detected fundamental, run the full pipeline (frequency, envelope, waveform, filter, effects) and emit one `Layer` per fundamental:

```ts
{
  layers: [
    { source: { ... }, envelope: { ... }, gain: 0.2 },
    { source: { ... }, envelope: { ... }, gain: 0.15, delay: 0.04 },
  ]
}
The earlier layer typically gets
delay: 0
(omitted); subsequent layers offset their
delay
to match the measured onset gap.

对每个检测到的基频,运行完整流程(频率、包络、波形、滤波器、效果),为每个基频生成一个`Layer`:

```ts
{
  layers: [
    { source: { ... }, envelope: { ... }, gain: 0.2 },
    { source: { ... }, envelope: { ... }, gain: 0.15, delay: 0.04 },
  ]
}
第一层通常设置
delay: 0
(可省略);后续层通过
delay
偏移匹配测量到的起始间隙。

Stereo and pan

立体声和声像

python
from analyze import analyze_stereo
stereo = analyze_stereo(data)
python
from analyze import analyze_stereo
stereo = analyze_stereo(data)

-> { "pan": 0.3, "stereo_width": 0.7 }

-> { "pan": 0.3, "stereo_width": 0.7 }


| `pan` magnitude | Output                          |
| --------------- | ------------------------------- |
| `< 0.05`        | omit (`pan: 0` is default)      |
| `0.05 - 1`      | `pan: <value>`                  |

`stereo_width > 0.5` with `|pan| < 0.05` suggests a stereo effect (chorus, dual-layer). Consider splitting into two layers panned `-0.5` / `+0.5`.

| `pan`绝对值 | 输出                          |
| ------------ | ------------------------------- |
| `< 0.05`     | 省略(默认`pan: 0`)      |
| `0.05 - 1`   | `pan: <value>`                  |

`stereo_width > 0.5`且`|pan| < 0.05`表明存在立体声效果(合唱、双层)。可考虑拆分为两个声像为`-0.5`/`+0.5`的层。

Fallback

回退方案

If a sound is unsynthesizable (complex transients, recorded material, irreducible texture), fall back to:
ts
{ source: { type: "sample", url: "..." } }
and note that the original audio file should be used directly rather than re-synthesized.
如果音效无法合成(复杂瞬态、录制素材、不可简化的纹理),回退到:
ts
{ source: { type: "sample", url: "..." } }
并说明应直接使用原始音频文件而非重新合成。

3. UI Event Recipes

3. UI事件模板

Concrete SoundDefinition templates per UI event class. Used by the prompt path as the base layer.
每个UI事件类别的具体SoundDefinition模板。作为提示路径的基础层使用。

3.1 Click - sine + low FM, very short decay (HIGH)

3.1 Click - 正弦波+弱FM,极短衰减 (HIGH)

A short ascending sine sweep with light FM. The sweep gives the click "snap"; the FM adds harmonic body without making it metallic.
Incorrect (decay too long, sounds like a chime):
ts
{ source: { type: "sine", frequency: 1300 }, envelope: { decay: 0.5 }, gain: 0.18 }
Correct:
ts
{
  source: { type: "sine", frequency: { start: 200, end: 700 }, fm: { ratio: 0.5, depth: 80 } },
  envelope: { attack: 0, decay: 0.06, sustain: 0, release: 0.02 },
  gain: 0.25,
}
Reference: .web-kits/core.ts
click
.
短升调正弦扫频加轻量FM。扫频赋予点击“脆感”;FM添加谐波质感但不产生金属感。
错误示例(衰减过长,听起来像钟鸣):
ts
{ source: { type: "sine", frequency: 1300 }, envelope: { decay: 0.5 }, gain: 0.18 }
正确示例:
ts
{
  source: { type: "sine", frequency: { start: 200, end: 700 }, fm: { ratio: 0.5, depth: 80 } },
  envelope: { attack: 0, decay: 0.06, sustain: 0, release: 0.02 },
  gain: 0.25,
}
参考:.web-kits/core.ts
click

3.2 Complete - four-note ascending arpeggio (MEDIUM-HIGH)

3.2 Complete - 四音升调琶音 (MEDIUM-HIGH)

Same C-major triad as
success
, but with C6 added on top and tighter 15 ms
delay
increments so the notes blur into a single gesture rather than reading as discrete pitches.
Reference: .web-kits/core.ts
complete
.
success
使用相同的C大调和弦,但顶部添加C6,且
delay
增量更紧凑(15 ms),使音符融合为单个动作而非离散音调。
参考:.web-kits/core.ts
complete

3.3 Error - layered sawtooth + square with descending sweep (HIGH)

3.3 Error - 分层锯齿波+方波加降调扫频 (HIGH)

Two descending sweeps stacked an octave apart. Lowpass filters keep the result from being abrasive. Same shape works for
delete
(slightly longer decay).
Incorrect (no filter, sounds like a buzzer):
ts
{ source: { type: "sawtooth", frequency: { start: 320, end: 140 } }, envelope: { decay: 0.25 }, gain: 0.22 }
Reference: .web-kits/core.ts
error
,
_delete
.
两个降调扫频叠加一个八度。低通滤波器避免结果刺耳。相同结构适用于
delete
(衰减稍长)。
错误示例(无滤波器,听起来像蜂鸣器):
ts
{ source: { type: "sawtooth", frequency: { start: 320, end: 140 } }, envelope: { decay: 0.25 }, gain: 0.22 }
参考:.web-kits/core.ts
error
,
_delete

3.4 Modal-close - downward sine sweep (MEDIUM)

3.4 Modal-close - 降调正弦扫频 (MEDIUM)

The inverse of
modalOpen
. Range is narrower because dismiss should feel less assertive than the entrance. Slightly lower
gain
for the same reason.
For
drawer-close
use 800 -> 350. For
dropdown-close
use 900 -> 500.
Reference: .web-kits/core.ts
modalClose
,
drawerClose
,
dropdownClose
.
modalOpen
的逆过程。范围更窄,因为关闭动作应比打开更柔和。
gain
也稍低。
drawer-close
使用800 -> 350。
dropdown-close
使用900 -> 500。
参考:.web-kits/core.ts
modalClose
,
drawerClose
,
dropdownClose

3.5 Modal-open - upward sine sweep (MEDIUM)

3.5 Modal-open - 升调正弦扫频 (MEDIUM)

A single sine sweeping from ~430 Hz up to ~1400 Hz over 80 ms. No FM, no filter; the cleanness signals "appearing".
For
drawer-open
use a slightly lower start (~350 Hz) and lower gain (~0.08). For
dropdown-open
use a smaller range (500 -> 1200) and decay ~60 ms.
Reference: .web-kits/core.ts
modalOpen
,
drawerOpen
,
dropdownOpen
.
单个正弦波在80 ms内从约430 Hz扫到约1400 Hz。无FM、无滤波器;纯净度表明“出现”。
drawer-open
使用稍低的起始频率(约350 Hz)和更低的
gain
(约0.08)。
dropdown-open
使用更小的范围(500 -> 1200)和60 ms左右的衰减。
参考:.web-kits/core.ts
modalOpen
,
drawerOpen
,
dropdownOpen

3.6 Notification - FM-rich sine with light reverb (HIGH)

3.6 Notification - 富FM正弦波加轻量混响 (HIGH)

Two FM bells a fifth apart with 100 ms
delay
between them. The
fm.ratio: 1.5
gives an inharmonic shimmer; the matched reverb on each layer glues them together.
For
ding
: single layer,
fm.ratio: 3.5
, reverb
decay: 0.8
. For
mention
: lower fundamental (660 Hz),
fm.ratio: 2.5
, slightly more attack.
Reference: .web-kits/core.ts
notification
,
ding
,
mention
,
badge
.
两个相差五度的FM钟音,间隔100 ms
delay
fm.ratio: 1.5
产生非谐波闪烁感;每层匹配的混响将它们融合在一起。
ding
:单层,
fm.ratio: 3.5
,混响
decay: 0.8
mention
:更低的基频(660 Hz),
fm.ratio: 2.5
,起音稍长。
参考:.web-kits/core.ts
notification
,
ding
,
mention
,
badge

3.7 Success - ascending three-note sine chord (HIGH)

3.7 Success - 升调三音正弦和弦 (HIGH)

Three sine layers at C5 / E5 / G5 with
delay
cascading 0.07 s between them. The top note has a small upward sweep (G5 -> A5) so the chord resolves "upward" instead of just stopping.
Layer gains sum to 0.45, comfortably under the 0.6 budget.
Reference: .web-kits/core.ts
success
.
三个正弦层分别为C5/E5/G5,
delay
级联间隔0.07 s。顶层有小幅度升调(G5 -> A5),使和弦向上解决而非停止。
层增益总和为0.45,远低于0.6的预算。
参考:.web-kits/core.ts
success

3.8 Swoosh - white noise through a sweeping bandpass (MEDIUM)

3.8 Swoosh - 白噪声通过扫频带通滤波器 (MEDIUM)

White noise is shaped by a bandpass filter whose center frequency sweeps from 300 Hz up to 4 kHz. The sweep direction is the gesture: peak above resting = upward swoosh, peak below resting (e.g., resting 2500, peak 400) = downward.
For
slide-up
use a similar shape with peak 3500. For
slide-down
flip to pink noise with
envelope: { decay: 0.12, peak: 500 }
(no attack on the filter envelope).
Reference: .web-kits/core.ts
swoosh
,
slide
,
slideUp
,
slideDown
.
白噪声由带通滤波器塑形,中心频率从300 Hz扫到4 kHz。扫频方向对应动作:峰值高于静止值->向上swoosh,峰值低于静止值(例如静止2500,峰值400)->向下swoosh。
slide-up
使用类似结构,峰值3500。
slide-down
使用粉红噪声,
envelope: { decay: 0.12, peak: 500 }
(滤波器包络无起音)。
参考:.web-kits/core.ts
swoosh
,
slide
,
slideUp
,
slideDown

3.9 Tap - static high sine + FM, ultra short (HIGH)

3.9 Tap - 静态高正弦波+FM,超短时长 (HIGH)

Single high pitch (no sweep), aggressive FM, decay under 20 ms. This is the "key-press" archetype.
Incorrect (frequency too low, sounds like a thump):
ts
{ source: { type: "sine", frequency: 200 }, envelope: { decay: 0.015 }, gain: 0.2 }
Correct:
ts
{
  source: { type: "sine", frequency: 1300, fm: { ratio: 0.5, depth: 100 } },
  envelope: { attack: 0, decay: 0.015, sustain: 0, release: 0.005 },
  gain: 0.2,
}
Reference: .web-kits/core.ts
tap
,
keyPress
.
单高音(无扫频),强FM,衰减小于20 ms。这是“按键”原型。
错误示例(频率过低,听起来像重击):
ts
{ source: { type: "sine", frequency: 200 }, envelope: { decay: 0.015 }, gain: 0.2 }
正确示例:
ts
{
  source: { type: "sine", frequency: 1300, fm: { ratio: 0.5, depth: 100 } },
  envelope: { attack: 0, decay: 0.015, sustain: 0, release: 0.005 },
  gain: 0.2,
}
参考:.web-kits/core.ts
tap
,
keyPress

3.10 Tick - faintest possible sine (MEDIUM)

3.10 Tick - 极微弱正弦波 (MEDIUM)

Highest frequency in the tap family. Decay under 15 ms.
gain
capped at ~0.15 because ticks fire often and must not dominate.
For scroll-snap reduce
gain
to 0.08; for focus/blur reduce to 0.04-0.06.
Reference: .web-kits/core.ts
tick
,
scrollSnap
,
focus
,
blur
.
Tap家族中频率最高的音效。衰减小于15 ms。
gain
上限约0.15,因为tick频繁触发,不能过于突出。
scroll-snap将
gain
降至0.08;focus/blur降至0.04-0.06。
参考:.web-kits/core.ts
tick
,
scrollSnap
,
focus
,
blur

3.11 Toggle - paired sines with delay (direction matters) (MEDIUM)

3.11 Toggle - 带延迟的配对正弦波(方向重要) (MEDIUM)

Two short sines: C7 (2093 Hz) and G7 (3136 Hz), 25 ms apart.
  • toggle-on
    : low note first, then high (ascending = enabling).
  • toggle-off
    : high note first, then low (descending = disabling).
The same architecture works for
copy
(1200 Hz then 1400 Hz, 40 ms gap) and
sync
(C5 then G5).
Reference: .web-kits/core.ts
toggleOn
,
toggleOff
,
copy
,
sync
.
两个短正弦波:C7(2093 Hz)和G7(3136 Hz),间隔25 ms。
  • toggle-on
    :先低音后高音(升调=启用)。
  • toggle-off
    :先高音后低音(降调=禁用)。
相同结构适用于
copy
(1200 Hz后接1400 Hz,间隔40 ms)和
sync
(C5后接G5)。
参考:.web-kits/core.ts
toggleOn
,
toggleOff
,
copy
,
sync

3.12 Whoosh - longer, slower swoosh for full-page transitions (LOW-MEDIUM)

3.12 Whoosh - 更长更慢的swoosh,用于整页过渡 (LOW-MEDIUM)

Same architecture as
swoosh
but everything stretches. Filter attack is 4x longer (0.04 s vs 0.01 s) so the gesture starts gently. Slightly higher
gain
because it spans a longer time window.
pageEnter
uses bandpass peak 3000 with white noise;
pageExit
uses pink noise with the bandpass envelope inverted (decay only, peak 400).
Reference: .web-kits/core.ts
whoosh
,
pageEnter
,
pageExit
.
swoosh
结构相同,但所有参数延长。滤波器起音是原来的4倍(0.04 s vs 0.01 s),使动作开始更柔和。
gain
稍高,因为持续时间更长。
pageEnter
使用带通峰值3000和白噪声;
pageExit
使用粉红噪声,带通包络反转(仅衰减,峰值400)。
参考:.web-kits/core.ts
whoosh
,
pageEnter
,
pageExit

4. Mood Vocabulary

4. 情绪词汇

Adjective-to-knob mappings layered onto the base recipe.
形容词到参数的映射,叠加到基础模板上。

4.1 Airy - noise source + bandpass with high peak (LOW-MEDIUM)

4.1 Airy(空灵)- 噪声源+高峰值带通滤波器 (LOW-MEDIUM)

Mutation:
  • Replace
    source
    with
    { type: "noise", color: "white" }
    .
  • Replace
    filter
    with bandpass envelope reaching a high peak (4-6 kHz).
  • Lengthen
    envelope.attack
    to 0.02-0.04 s so the result fades in rather than snapping.
  • Lower
    gain
    to 0.08-0.12.
If the base was tonal (sine, triangle, etc.), this mood replaces the source entirely - it's a structural change.
修改:
  • source
    替换为
    { type: "noise", color: "white" }
  • filter
    替换为带通包络,峰值达4-6 kHz。
  • envelope.attack
    延长至0.02-0.04 s,使音效淡入而非突然出现。
  • gain
    降至0.08-0.12。
如果基础是 tonal(正弦波、三角波等),此情绪会完全替换声源——这是结构性变化。

4.2 Bright - no lowpass, optional FM sparkle (MEDIUM)

4.2 Bright(明亮)- 无低通滤波器,可选FM闪烁 (MEDIUM)

Mutation:
  • Remove any
    filter
    of type
    lowpass
    , or raise its cutoff above 6 kHz.
  • If the base used
    triangle
    , upgrade to
    sine
    with
    fm: { ratio: 2.5, depth: 50 }
    for sparkle.
  • Slight
    gain
    bump (+0.02) is fine but stay under the budget.
修改:
  • 移除任何
    lowpass
    类型的
    filter
    ,或将其截止频率提高到6 kHz以上。
  • 如果基础使用
    triangle
    ,升级为
    sine
    并添加
    fm: { ratio: 2.5, depth: 50 }
    以增加闪烁感。
  • 可小幅提高
    gain
    (+0.02),但需保持在预算内。

4.3 Glassy - high FM ratio + reverb (MEDIUM)

4.3 Glassy(玻璃质感)- 高FM比率+混响 (MEDIUM)

Mutation:
  • source.type: "sine"
    .
  • source.fm: { ratio: 3.5, depth: 200-300 }
    .
  • Append
    effects: [{ type: "reverb", decay: 0.7, damping: 0.5, mix: 0.15 }]
    .
  • Extend
    envelope.decay
    to at least 0.3 s so the bell can ring.
Reference: .web-kits/core.ts
ding
,
sparkle
,
star
.
修改:
  • source.type: "sine"
  • source.fm: { ratio: 3.5, depth: 200-300 }
  • 添加
    effects: [{ type: "reverb", decay: 0.7, damping: 0.5, mix: 0.15 }]
  • envelope.decay
    延长至至少0.3 s,使钟音能够持续。
参考:.web-kits/core.ts
ding
,
sparkle
,
star

4.4 Lo-fi - bitcrusher + lowpass (MEDIUM)

4.4 Lo-fi(低保真)- 比特压缩器+低通滤波器 (MEDIUM)

Mutation:
  • Add
    filter: { type: "lowpass", frequency: 1500 }
    .
  • Append
    effects: [{ type: "bitcrusher", bits: 6-8, mix: 0.7-1 }]
    .
  • Optionally drop
    gain
    by 0.02 because bitcrushing adds perceived loudness.
Combines well with
mood-retro
.
修改:
  • 添加
    filter: { type: "lowpass", frequency: 1500 }
  • 添加
    effects: [{ type: "bitcrusher", bits: 6-8, mix: 0.7-1 }]
  • 可选将
    gain
    降低0.02,因为比特压缩会增加感知响度。
mood-retro
搭配效果良好。

4.5 Metallic - inharmonic FM ratio (MEDIUM)

4.5 Metallic(金属质感)- 非谐波FM比率 (MEDIUM)

Mutation:
  • source.type: "sine"
    (or
    square
    for a harsher result).
  • source.fm: { ratio: 2.76, depth: 300-400 }
    - 2.76 is the inharmonic ratio used by
    badge
    in
    .web-kits/core.ts
    and reads as bell-metal.
  • Short release; metallic shouldn't sustain.
Avoid stacking with
mood-warm
- they cancel each other out.
Reference: .web-kits/core.ts
badge
.
修改:
  • source.type: "sine"
    (或
    square
    以获得更刺耳的效果)。
  • source.fm: { ratio: 2.76, depth: 300-400 }
    ——2.76是
    .web-kits/core.ts
    badge
    使用的非谐波比率,听起来像钟金属声。
  • 短释放时间;金属质感不应持续。
避免与
mood-warm
叠加——两者会相互抵消。
参考:.web-kits/core.ts
badge

4.6 Organic - triangle + slight detune + light reverb (LOW-MEDIUM)

4.6 Organic(自然质感)- 三角波+轻微失谐+轻量混响 (LOW-MEDIUM)

Mutation:
  • source.type: "triangle"
    .
  • Add
    source.detune: 5-10
    for very slight pitch wobble.
  • Bump
    envelope.attack
    from 0 to 0.003-0.008 s so the onset isn't a hard click.
  • Append a small reverb (
    mix: 0.05-0.1
    ).
Combines well with
mood-warm
. Avoid combining with
mood-metallic
or
mood-lofi
- they fight the natural feel.
修改:
  • source.type: "triangle"
  • 添加
    source.detune: 5-10
    以获得极轻微的音高摆动。
  • envelope.attack
    从0提高到0.003-0.008 s,使起始不是生硬的点击。
  • 添加小幅度混响(
    mix: 0.05-0.1
    )。
mood-warm
搭配效果良好。避免与
mood-metallic
mood-lofi
叠加——它们会破坏自然感。

4.7 Punchy - zero attack, very short decay (MEDIUM)

4.7 Punchy(有冲击力)- 零起音,极短衰减 (MEDIUM)

Mutation:
  • envelope.attack: 0
    .
  • envelope.decay: <= 0.06
    .
  • envelope.sustain: 0
    .
  • envelope.release: <= 0.015
    .
  • gain
    bump of +0.05 is fine because the energy lives in a shorter window.
Orthogonal to source-shape moods - apply on top of warm/bright/glassy/metallic.
修改:
  • envelope.attack: 0
  • envelope.decay: <= 0.06
  • envelope.sustain: 0
  • envelope.release: <= 0.015
  • 可提高
    gain
    +0.05,因为能量集中在更短的时间窗口。
与声源形态情绪正交——可叠加在warm/bright/glassy/metallic之上。

4.8 Retro - square or sawtooth + lowpass + bitcrusher (MEDIUM)

4.8 Retro(复古)- 方波或锯齿波+低通滤波器+比特压缩器 (MEDIUM)

Mutation:
  • source.type: "square"
    (or
    "sawtooth"
    ).
  • Add
    filter: { type: "lowpass", frequency: 3000 }
    to soften aliasing.
  • Append
    effects: [{ type: "bitcrusher", bits: 8, sampleRateReduction: 2-4, mix: 1 }]
    .
Pairs naturally with rising or stepped pitch sweeps (coins, power-ups).
修改:
  • source.type: "square"
    (或
    "sawtooth"
    )。
  • 添加
    filter: { type: "lowpass", frequency: 3000 }
    以柔化混叠。
  • 添加
    effects: [{ type: "bitcrusher", bits: 8, sampleRateReduction: 2-4, mix: 1 }]
自然搭配上升或阶梯式音高扫频(硬币、升级)。

4.9 Warm - lowpass + light reverb (MEDIUM)

4.9 Warm(温暖)- 低通滤波器+轻量混响 (MEDIUM)

Mutation applied on top of the base recipe:
  • Add
    filter: { type: "lowpass", frequency: 2500 }
    (or 2-3 kHz).
  • Optionally add
    effects: [{ type: "reverb", decay: 0.4, mix: 0.1 }]
    .
  • If the base used
    sawtooth
    or
    square
    , downgrade to
    triangle
    so the source itself is rounder.
If the base already had a lowpass, lower its cutoff by ~30%.
在基础模板上应用修改:
  • 添加
    filter: { type: "lowpass", frequency: 2500 }
    (或2-3 kHz)。
  • 可选添加
    effects: [{ type: "reverb", decay: 0.4, mix: 0.1 }]
  • 如果基础使用
    sawtooth
    square
    ,降级为
    triangle
    ,使声源本身更圆润。
如果基础已有低通滤波器,将其截止频率降低约30%。

5. Layering Patterns

5. 分层模式

When to use one layer vs two vs a chord stack.
何时使用单层、双层或和弦堆叠。

5.1 Ascending chord - 3-4 layers with cascading delay (MEDIUM)

5.1 升调和弦 - 3-4层带级联延迟 (MEDIUM)

3-4 sine layers spelling out a major triad (C-E-G or C-E-G-C).
delay
increments by ~70 ms for "feels like notes" or ~15 ms for "feels like one gesture".
Top layer gets a small upward sweep so the chord resolves rather than stops.
Cap layer count at 4. Layer gains should sum to <= 0.6. If a layer has
sustain > 0
, all layers should have similar sustain values to avoid staggered ringing.
3-4个正弦层构成大调和弦(C-E-G或C-E-G-C)。
delay
增量约70 ms时“听起来像独立音符”,约15 ms时“听起来像单个动作”。
顶层添加小幅度升调,使和弦解决而非停止。
层数上限为4层。层增益总和应<=0.6。如果某层
sustain > 0
,所有层应具有相似的sustain值,避免交错持续。

5.2 Click + body - transient layer over a sustained tone (MEDIUM)

5.2 Click + body - 瞬态层叠加持续音调 (MEDIUM)

Two layers fired simultaneously (no
delay
):
  1. High-frequency transient (3-5 kHz) with sub-10 ms decay - the "stick".
  2. Lower-frequency body (80-300 Hz) with longer decay - the "drum".
Used for: send buttons, hard confirms, drum-like UI feedback, anything that needs perceived weight. Both layers use the same source
type
(usually
sine
) so they read as one event.
Gains should be roughly balanced (transient slightly quieter than body).
两层同时触发(无
delay
):
  1. 高频瞬态(3-5 kHz),衰减<10 ms——“敲击声”。
  2. 低频主体(80-300 Hz),衰减更长——“鼓声”。
用于:发送按钮、确认操作、鼓类UI反馈、任何需要感知重量的场景。两层使用相同的声源
type
(通常是
sine
),使它们被视为同一事件。
增益应大致平衡(瞬态层稍低于主体层)。

5.3 Octave pair - two layers an octave apart with delay (MEDIUM)

5.3 八度配对 - 两层相差八度带延迟 (MEDIUM)

Two layers a fifth or octave apart, separated by 20-50 ms
delay
. Direction (low first vs high first) encodes "on" vs "off", "open" vs "close", etc.
Layer gains should sum to less than 0.5. Both envelopes should match so the second beat doesn't sound disconnected.
If you find yourself reaching for >2 layers, jump to
layer-ascending-chord
instead.
两层相差五度或八度,间隔20-50 ms
delay
。顺序(先低音后高音vs先高音后低音)编码“开”vs“关”、“打开”vs“关闭”等状态。
层增益总和应小于0.5。两层包络应匹配,避免第二个节拍听起来脱节。
如果需要超过2层,直接使用
layer-ascending-chord

5.4 Single layer - emit Layer directly (HIGH)

5.4 单层 - 直接输出Layer (HIGH)

When the recipe needs only one source, emit the
Layer
shape directly (not wrapped in
{ layers: [...] }
). The engine accepts both, but the bare-Layer form is the canonical compact representation.
ts
const sound: SoundDefinition = {
  source: { type: "sine", frequency: 1300 },
  envelope: { decay: 0.012, release: 0.004 },
  gain: 0.18,
};
Use this for: click, tap, tick, hover, focus, blur, scroll-snap, single-tone notifications, simple swooshes.
当模板仅需一个声源时,直接输出
Layer
结构(不包裹在
{ layers: [...] }
中)。引擎支持两种格式,但裸Layer形式是标准紧凑表示。
ts
const sound: SoundDefinition = {
  source: { type: "sine", frequency: 1300 },
  envelope: { decay: 0.012, release: 0.004 },
  gain: 0.18,
};
用于:click、tap、tick、hover、focus、blur、scroll-snap、单音通知、简单swoosh。

6. Effect Recipes

6. 效果模板

When and how to reach for each effect type.
何时及如何使用每种效果类型。

6.1 Bandpass noise swoosh - filter envelope is the gesture (MEDIUM)

6.1 带通噪声swoosh - 滤波器包络即动作 (MEDIUM)

Recipe is on the layer's
filter
, not its
effects
:
ts
filter: {
  type: "bandpass",
  frequency: <resting Hz>,
  resonance: 1-3,
  envelope: { attack: 0.01-0.04, peak: <target Hz>, decay: 0.08-0.2 },
}
  • Peak above resting -> upward swoosh.
  • Peak below resting -> downward swoosh.
  • Higher
    resonance
    (>2) makes it whistle-like; lower (<1.5) is broader.
Source should be
noise
(white for sharp, pink for soft). Source amplitude envelope just gates the noise window.
模板在层的
filter
上,而非
effects
ts
filter: {
  type: "bandpass",
  frequency: <resting Hz>,
  resonance: 1-3,
  envelope: { attack: 0.01-0.04, peak: <target Hz>, decay: 0.08-0.2 },
}
  • 峰值高于静止值->向上swoosh。
  • 峰值低于静止值->向下swoosh。
  • 更高的
    resonance
    (>2)使其类似哨音;更低的(<1.5)更宽泛。
声源应为
noise
(白噪声更尖锐,粉红噪声更柔和)。声源振幅包络仅控制噪声窗口。

6.2 Bitcrusher - retro / lofi finish (LOW-MEDIUM)

6.2 比特压缩器 - 复古/低保真收尾 (LOW-MEDIUM)

  • bits
    : 4-8. Lower = more crunchy. Below 4 turns into noise.
  • sampleRateReduction
    : 1 (off) to 8 (heavy aliasing). Combine with
    bits: 8
    for that 8-bit console sound.
  • mix
    : usually 1. Mixing bitcrush with the dry signal sounds muddy.
Best paired with
square
or
sawtooth
sources and a lowpass to soften the aliasing edges.
Avoid stacking with
effect-reverb-tail
- the quantization noise gets smeared.
  • bits
    : 4-8。值越低越有颗粒感。低于4会变成噪声。
  • sampleRateReduction
    : 1(关闭)到8(重度混叠)。与
    bits: 8
    搭配可获得8位游戏机音效。
  • mix
    : 通常设为1。比特压缩与干信号混合会听起来浑浊。
最佳搭配
square
sawtooth
声源,以及低通滤波器柔化混叠边缘。
避免与
effect-reverb-tail
叠加——量化噪声会被模糊。

6.3 FM bell - high ratio, high depth (MEDIUM)

6.3 FM钟音 - 高比率,高深度 (MEDIUM)

source.fm: { ratio, depth }
is structural, not an effect node. To get a bell:
  • ratio
    : 2.5-3.5 for harmonic-bell, 2.76 for the "badge" inharmonic clang.
  • depth
    : 150-400. Higher depth = more strident.
  • envelope.decay
    : at least 0.3 s so the bell can ring.
For a bright "ding", use
ratio: 3.5
,
depth: 250
and add reverb (
decay: 0.7, mix: 0.15
).
For a dull "thud" with body, use
ratio: 0.5
,
depth: 200
and a short envelope.
Pair with
mood-glassy
or
mood-metallic
.
source.fm: { ratio, depth }
是结构性参数,而非效果节点。要获得钟音:
  • ratio
    : 2.5-3.5为谐波钟音,2.76为“badge”非谐波 clang 声。
  • depth
    : 150-400。值越高越尖锐。
  • envelope.decay
    : 至少0.3 s,使钟音能够持续。
明亮的“ding”使用
ratio: 3.5
depth: 250
并添加混响(
decay: 0.7, mix: 0.15
)。
低沉有质感的“thud”使用
ratio: 0.5
depth: 200
并搭配短包络。
mood-glassy
mood-metallic
搭配。

6.4 Lowpass warmth - the safest filter to add (MEDIUM)

6.4 低通温暖感 - 最安全的滤波器添加方式 (MEDIUM)

ts
filter: { type: "lowpass", frequency: 2500, resonance: 0.7 }
  • frequency
    : 1500-3000 Hz for "warm". Below 1000 starts muffling the sound.
  • resonance
    : omit or set 0.7-1.5. Above 2 the cutoff itself starts to whistle.
Stacks safely with reverb, FM, and most moods. The fastest way to remove harshness from any source.
For dynamic warmth (bright attack -> warm sustain), add a filter envelope:
ts
filter: {
  type: "lowpass",
  frequency: 2500,
  envelope: { attack: 0, peak: 6000, decay: 0.08 },
}
ts
filter: { type: "lowpass", frequency: 2500, resonance: 0.7 }
  • frequency
    : 1500-3000 Hz为“温暖”。低于1000 Hz开始模糊音效。
  • resonance
    : 省略或设为0.7-1.5。高于2时截止频率本身会产生哨音。
可安全叠加混响、FM和大多数情绪。这是消除任何声源刺耳感的最快方法。
要获得动态温暖感(明亮起音->温暖持续),添加滤波器包络:
ts
filter: {
  type: "lowpass",
  frequency: 2500,
  envelope: { attack: 0, peak: 6000, decay: 0.08 },
}

6.5 Reverb tail - small space, low mix (MEDIUM)

6.5 混响尾音 - 小空间,低混合比 (MEDIUM)

Default UI reverb:
  • decay
    : 0.3-0.6 s.
  • damping
    : 0.4-0.6 (kills high frequencies in the tail; without this the reverb sounds metallic).
  • mix
    : 0.08-0.15. Anything above 0.2 starts to feel like a music production effect.
For per-layer reverb on bell-like sounds (notification, ding), put the reverb inside the layer's
effects
array so each note rings independently. For shared reverb on chords/transitions, put it on the top-level
effects
of the
MultiLayerSound
.
Avoid stacking reverb with delay - choose one.
默认UI混响:
  • decay
    : 0.3-0.6 s.
  • damping
    : 0.4-0.6(消除尾音中的高频;否则混响会有金属感)。
  • mix
    : 0.08-0.15。超过0.2开始像音乐制作效果。
对于钟类音效(notification、ding)的每层混响,将混响放在层的
effects
数组中,使每个音符独立持续。对于和弦/过渡的共享混响,放在
MultiLayerSound
的顶层
effects
中。
避免同时叠加混响和延迟——二选一。

7. Output Validation

7. 输出验证

Checks every emitted SoundDefinition must pass before returning to the user.
每个输出的SoundDefinition在返回用户前必须通过的检查。

7.1 Duration cap - 1 s for transients, 3 s absolute max (MEDIUM)

7.1 时长限制 - 瞬态音效1 s以内,绝对上限3 s (MEDIUM)

Estimated total duration:
estimated = (envelope.attack ?? 0)
          + envelope.decay
          + (envelope.release ?? 0)
          + max(0, longestEffectTail)  // reverb decay, delay time * 4
Targets:
  • Click / tap / tick / hover / focus: <= 0.1 s.
  • Toggle / copy / sync: <= 0.2 s.
  • Modal / drawer / dropdown open/close: <= 0.3 s.
  • Success / complete / notification: <= 0.8 s.
  • Whoosh / page transition: <= 0.5 s.
Hard ceiling: 3 s. Anything longer should not be a UI sound.
The
validate
script computes the estimated duration and flags layers that exceed 3 s.
估算总时长:
estimated = (envelope.attack ?? 0)
          + envelope.decay
          + (envelope.release ?? 0)
          + max(0, longestEffectTail)  // 混响衰减、延迟时间*4
目标时长:
  • Click/tap/tick/hover/focus: <=0.1 s.
  • Toggle/copy/sync: <=0.2 s.
  • Modal/drawer/dropdown开/关: <=0.3 s.
  • Success/complete/notification: <=0.8 s.
  • Whoosh/页面过渡: <=0.5 s.
硬上限:3 s。任何更长的音效都不应作为UI音效。
validate
脚本计算估算时长,并标记超过3 s的层。

7.2 Envelope sanity - no zero decay, no infinite sustain without release (HIGH)

7.2 包络合理性 - 无零衰减,无无限持续而无释放 (HIGH)

Required:
  • envelope.decay > 0
    (always). Set to 0.005 minimum.
  • If
    envelope.sustain > 0
    ,
    envelope.release
    must be present and
    > 0
    .
Recommended:
  • envelope.attack
    : 0 for percussive, 0.003-0.05 for sustained tones, up to 0.1 for ambient sounds.
  • envelope.decay + envelope.release
    : <= 2 s for any UI sound. Above that, you're writing music, not interface feedback.
  • envelope.sustain
    : 0 for transients, 0.03-0.15 for "rings out" tones, 0.3-0.7 only for held loops.
The
validate
script flags
decay <= 0
,
sustain > 0
without
release
, and total durations above 3 s.
必填项:
  • envelope.decay > 0
    (始终)。最小值设为0.005。
  • 如果
    envelope.sustain > 0
    ,必须存在
    envelope.release
    >0
推荐值:
  • envelope.attack
    : 打击乐设为0,持续音调设为0.003-0.05,环境音效设为0.1。
  • envelope.decay + envelope.release
    : 任何UI音效<=2 s。超过此值则属于音乐创作,而非界面反馈。
  • envelope.sustain
    : 瞬态音效设为0,“持续”音调设为0.03-0.15,仅循环音效设为0.3-0.7。
validate
脚本标记
decay <=0
sustain >0
release
、总时长超过3 s的情况。

7.3 Frequency bounds - 20 Hz to 20 kHz, both ends meaningful (HIGH)

7.3 频率范围 - 20 Hz到20 kHz,两端均有意义 (HIGH)

Hard bounds:
  • source.frequency
    (or both
    start
    /
    end
    of a sweep): 20 Hz <= f <= 20000 Hz.
  • filter.frequency
    : 20 Hz <= f <= 20000 Hz.
  • filter.envelope.peak
    : same range as
    filter.frequency
    .
Recommended UI bounds:
  • Tonal sources: 80 Hz <= f <= 8000 Hz.
  • High transient layers (clicks, sticks): up to 5 kHz.
  • Sub layers (body, drum): 60-200 Hz.
Anything above 8 kHz risks being inaudible on phone speakers; anything below 60 Hz risks being inaudible on laptop speakers.
The
validate
script flags any frequency outside the hard bounds.
硬范围:
  • source.frequency
    (或扫频的
    start
    /
    end
    ):20 Hz <=f <=20000 Hz.
  • filter.frequency
    : 20 Hz <=f <=20000 Hz.
  • filter.envelope.peak
    : 与
    filter.frequency
    范围相同。
推荐UI范围:
  • tonal声源:80 Hz <=f <=8000 Hz.
  • 高频瞬态层(clicks、sticks):最高到5 kHz.
  • 低频层(body、drum):60-200 Hz.
8 kHz以上的音效在手机扬声器上可能无法听见;60 Hz以下的音效在笔记本扬声器上可能无法听见。
validate
脚本标记超出硬范围的频率。

7.4 Gain budget - keep total layer gain under 0.6 (HIGH)

7.4 增益预算 - 总层增益低于0.6 (HIGH)

Single layer:
  • gain
    between 0.04 and 0.3 for typical UI events.
  • Background ticks/scroll-snaps: 0.04-0.10.
  • Mid-importance (click, tap, hover): 0.12-0.20.
  • High-importance (success, notification): 0.16-0.25.
Multi-layer:
  • Sum of all
    layer.gain
    values must be <= 0.6.
  • If you exceed it, scale every layer proportionally rather than picking one to lower.
If a sound includes a heavy reverb (
mix > 0.15
) or distortion, lower the gain budget by 20%.
The
validate
script flags both individual layers above 0.4 and totals above 0.6.
单层:
  • gain
    在0.04-0.3之间,适用于典型UI事件。
  • 背景tick/scroll-snap: 0.04-0.10.
  • 中等重要性(click、tap、hover): 0.12-0.20.
  • 高重要性(success、notification): 0.16-0.25.
多层:
  • 所有
    layer.gain
    值总和必须<=0.6.
  • 如果超出,按比例缩放所有层,而非仅降低某一层。
如果音效包含重度混响(
mix >0.15
)或失真,将增益预算降低20%。
validate
脚本标记单个层增益超过0.4或总增益超过0.6的情况。

7.5 Schema conformance - validate against patch.schema.json (CRITICAL)

7.5 schema一致性 - 验证patch.schema.json (CRITICAL)

Every emitted
SoundDefinition
must validate against packages/audio/schemas/patch.schema.json (
#/$defs/SoundDefinition
).
Common mistakes:
  • Missing
    decay
    in
    envelope
    (required).
  • Missing
    target
    in
    lfo
    (required).
  • Setting
    pan
    outside
    [-1, 1]
    .
  • Using a
    filter.type
    that isn't one of
    lowpass | highpass | bandpass | notch | allpass | peaking | lowshelf | highshelf | iir
    .
  • Adding a top-level field that isn't in
    Layer
    or
    MultiLayerSound
    (e.g.
    name
    ,
    description
    ). The schema is
    additionalProperties: false
    .
  • Confusing
    MultiLayerSound.effects
    (chain on the mixed bus) with
    Layer.effects
    (chain on a single layer).
The
validate
script invokes the JSON Schema validator on every rule's
example
field. Any violation aborts the build.
每个输出的
SoundDefinition
必须通过packages/audio/schemas/patch.schema.json
#/$defs/SoundDefinition
)的验证。
常见错误:
  • envelope
    中缺少
    decay
    (必填)。
  • lfo
    中缺少
    target
    (必填)。
  • pan
    设置超出
    [-1,1]
    范围。
  • 使用的
    filter.type
    不属于
    lowpass | highpass | bandpass | notch | allpass | peaking | lowshelf | highshelf | iir
  • 添加了
    Layer
    MultiLayerSound
    中没有的顶层字段(如
    name
    description
    )。schema设置为
    additionalProperties: false
  • 混淆
    MultiLayerSound.effects
    (混合总线链)与
    Layer.effects
    (单层层链)。
validate
脚本对每个规则的
example
字段调用JSON Schema验证。任何违规都会中止构建。