sound-engineer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Sound Engineer: Spatial Audio, Procedural Sound & App UX Audio

音频工程师：空间音频、程序音效与应用UX音频

Expert audio engineer for interactive media: games, VR/AR, and mobile apps. Specializes in spatial audio, procedural sound generation, middleware integration, and UX sound design.

专注于交互媒体（游戏、VR/AR及移动应用）的专业音频工程师，擅长空间音频、程序音效生成、中间件集成及UX声音设计。

When to Use This Skill

何时使用此技能

✅ Use for:

Spatial audio (HRTF, binaural, Ambisonics)
Procedural sound (footsteps, wind, environmental)
Game audio middleware (Wwise, FMOD)
Adaptive/interactive music systems
UI/UX sound design (clicks, notifications, feedback)
Sonic branding (audio logos, brand sounds)
iOS/Android audio session handling
Haptic-audio coordination
Real-time DSP (reverb, EQ, compression)

❌ Do NOT use for:

Music composition/production → DAW tools (Logic, Ableton)
Voice synthesis/cloning → voice-audio-engineer
Film audio post-production → linear editing workflows
Podcast editing → standard audio editors
Hardware microphone setup → specialized domain

✅ 适用场景：

空间音频（HRTF、双耳音频、Ambisonics）
程序音效（脚步声、风声、环境音）
游戏音频中间件（Wwise、FMOD）
自适应/交互音乐系统
UI/UX声音设计（点击音、通知音、反馈音）
声音品牌塑造（音频标识、品牌音效）
iOS/Android音频会话处理
触觉-音频协同设计
实时DSP（混响、均衡、压缩）

❌ 不适用场景：

音乐创作/制作 → 使用DAW工具（Logic、Ableton）
语音合成/克隆 → 请使用voice-audio-engineer
电影音频后期制作 → 线性编辑工作流
播客编辑 → 标准音频编辑器
硬件麦克风设置 → 专业领域工具

MCP Integrations

MCP集成

MCP	Purpose
ElevenLabs	`text_to_sound_effects` - Generate UI sounds, notifications, impacts
Firecrawl	Research Wwise/FMOD docs, DSP algorithms, platform guidelines
WebFetch	Fetch Apple/Android audio session documentation

MCP工具	用途
ElevenLabs	`text_to_sound_effects` - 生成UI音效、通知音、撞击音
Firecrawl	研究Wwise/FMOD文档、DSP算法、平台规范
WebFetch	获取苹果/安卓音频会话文档

Expert vs Novice Shibboleths

专家与新手的区别

Topic	Novice	Expert
Spatial audio	"Just pan left/right"	Uses HRTF convolution for true 3D; knows Ambisonics for VR head tracking
Footsteps	"Use 10-20 samples"	Procedural synthesis: infinite variation, tiny memory, parameter-driven
Middleware	"Just play sounds"	Uses RTPC for continuous params, Switches for materials, States for music
Adaptive music	"Crossfade tracks"	Horizontal re-orchestration (layers) + vertical remixing (stems)
UI sounds	"Any click sound works"	Designs for brand consistency, accessibility, haptic coordination
iOS audio	"AVAudioPlayer works"	Knows AVAudioSession categories, interruption handling, route changes
Distance rolloff	Linear attenuation	Inverse square with reference distance; logarithmic for realism
CPU budget	"Audio is cheap"	Knows 5-10% budget; HRTF convolution is expensive (2ms/source)

主题	新手做法	专家做法
空间音频	"仅做左右声道平移"	使用HRTF卷积实现真实3D效果；了解Ambisonics用于VR头部追踪
脚步声	"使用10-20个样本"	程序合成：无限变化、占用内存极小、参数驱动
中间件	"仅播放音效"	使用RTPC控制连续参数、Switches切换材质、States管理音乐状态
自适应音乐	"轨道交叉淡入淡出"	横向重新配器（分层）+ 纵向混音（分轨）
UI音效	"任何点击音都可以"	围绕品牌一致性、可访问性、触觉协同进行设计
iOS音频	"AVAudioPlayer足够用"	熟悉AVAudioSession分类、中断处理、路由变更
距离衰减	线性衰减	带参考距离的平方反比衰减；使用对数衰减提升真实感
CPU预算	"音频占用资源少"	了解5-10%的预算限制；HRTF卷积资源消耗高（每个源约2ms）

Common Anti-Patterns

常见反模式

Anti-Pattern: Sample-Based Footsteps at Scale

反模式：大规模基于样本的脚步声

What it looks like: 20 footstep samples × 6 surfaces × 3 intensities = 360 files (180MB) Why it's wrong: Memory bloat, repetition audible after 20 minutes of play What to do instead: Procedural synthesis - impact + texture layers, infinite variation from parameters When samples OK: Small games, very specific character sounds

表现形式：20个脚步声样本 × 6种地面材质 × 3种力度 = 360个文件（180MB） 问题所在：内存膨胀，20分钟游戏后可听到重复音效 替代方案：程序合成 - 撞击层 + 纹理层，通过参数实现无限变化 样本适用场景：小型游戏、特定角色的专属音效

Anti-Pattern: HRTF for Every Sound

反模式：所有音效都使用HRTF

What it looks like: Full HRTF convolution on 50 simultaneous sources Why it's wrong: 50 × 2ms = 100ms CPU time; destroys frame budget What to do instead: HRTF for 3-5 important sources; Ambisonics for ambient bed; simple panning for distant/unimportant

表现形式：对50个同时播放的声源全部应用HRTF卷积 问题所在：50 × 2ms = 100ms CPU耗时；耗尽帧预算 替代方案：仅对3-5个关键声源使用HRTF；环境音使用Ambisonics；远处/次要声源使用简单声道平移

Anti-Pattern: Ignoring Audio Sessions (Mobile)

反模式：忽略移动平台音频会话

What it looks like: App audio stops when user gets a phone call, never resumes Why it's wrong: iOS/Android require explicit session management What to do instead: Implement

AVAudioSession

(iOS) or

AudioFocus

(Android); handle interruptions, route changes

表现形式：用户接电话时应用音频停止，且无法恢复 问题所在：iOS/Android要求显式的会话管理 替代方案：实现iOS的

AVAudioSession

或安卓的

AudioFocus

；处理中断、路由变更

Anti-Pattern: Hard-Coded Sounds

反模式：硬编码音效

What it looks like:

PlaySound("footstep_concrete_01.wav")

Why it's wrong: No variation, no parameter control, can't adapt to context What to do instead: Use middleware events with Switches/RTPCs; procedural generation for environmental sounds

表现形式：

PlaySound("footstep_concrete_01.wav")

问题所在：无变化、无参数控制、无法适配上下文 替代方案：使用中间件事件配合Switches/RTPC；环境音采用程序生成

Anti-Pattern: Loud UI Sounds

反模式：UI音效音量过大

What it looks like: Every button click at -3dB, same volume as gameplay audio Why it's wrong: UI sounds should be subtle, never fatiguing; violates platform guidelines What to do instead: UI sounds at -18 to -24dB; use short, high-frequency transients; respect system volume

表现形式：所有按钮点击音量为-3dB，与游戏音频音量相同 问题所在：UI音效应低调，避免疲劳；违反平台规范 替代方案：UI音效设置在-18至-24dB；使用短促的高频瞬态音；遵循系统音量设置

Evolution Timeline

发展时间线

Pre-2010: Fixed Audio

2010年前：固定音频

Sample playback only
Basic stereo panning
Limited real-time processing

仅支持样本播放
基础立体声平移
有限的实时处理

2010-2015: Middleware Era

2010-2015：中间件时代

Wwise/FMOD become standard
RTPC and State systems mature
Basic HRTF support

Wwise/FMOD成为标准
RTPC和State系统成熟
基础HRTF支持

2016-2020: VR Audio Revolution

2016-2020：VR音频革命

Ambisonics for VR head tracking
Spatial audio APIs (Resonance, Steam Audio)
Procedural audio gains traction

Ambisonics用于VR头部追踪
空间音频API（Resonance、Steam Audio）
程序音频开始普及

2021-2024: AI & Mobile

2021-2024：AI与移动音频

ElevenLabs/AI sound effect generation
Apple Spatial Audio for AirPods
Procedural audio standard for AAA
Haptic-audio design becomes discipline

ElevenLabs/AI音效生成
苹果AirPods空间音频
程序音频成为AAA标准
触觉-音频设计成为独立学科

2025+: Current Best Practices

2025+：当前最佳实践

AI-assisted sound design
Neural audio codecs
Real-time voice transformation
Personalized HRTF from photos

AI辅助音效设计
神经音频编解码器
实时语音转换
通过照片生成个性化HRTF

Core Concepts

核心概念

Spatial Audio Approaches

空间音频方案

Approach	CPU Cost	Quality	Use Case
Stereo panning	~0.01ms	Basic	Distant sounds, many sources
HRTF convolution	~2ms/source	Excellent	Close/important 3D sounds
Ambisonics	~1ms total	Good	VR, many sources, head tracking
Binaural (simple)	~0.1ms/source	Decent	Budget/mobile spatial

HRTF: Convolves audio with measured ear impulse responses (512-1024 taps). Creates convincing 3D positioning including elevation.

Ambisonics: Encodes sound field as spherical harmonics (W,X,Y,Z for 1st order). Rotation-invariant, efficient for many sources.

cpp

// Key insight: encode once, rotate cheaply
AmbisonicSignal encode(mono_input, direction) {
    return {
        mono * 0.707f,      // W (omnidirectional)
        mono * direction.x, // X (front-back)
        mono * direction.y, // Y (left-right)
        mono * direction.z  // Z (up-down)
    };
}

方案	CPU消耗	质量	适用场景
立体声平移	~0.01ms	基础	远处音效、多声源
HRTF卷积	~2ms/声源	极佳	近距离/关键3D音效
Ambisonics	~1ms/总	良好	VR、多声源、头部追踪
简易双耳音频	~0.1ms/声源	尚可	预算有限/移动平台空间音频

HRTF：将音频与测量得到的耳朵脉冲响应（512-1024抽头）进行卷积，实现包含高度信息的逼真3D定位。

Ambisonics：将声场编码为球谐函数（一阶为W,X,Y,Z），旋转不变，多声源场景下效率高。

cpp

// Key insight: encode once, rotate cheaply
AmbisonicSignal encode(mono_input, direction) {
    return {
        mono * 0.707f,      // W (omnidirectional)
        mono * direction.x, // X (front-back)
        mono * direction.y, // Y (left-right)
        mono * direction.z  // Z (up-down)
    };
}

Procedural Footsteps

程序脚步声

Why procedural beats samples:

✅ Infinite variation (no repetition)
✅ Tiny memory (~50KB vs 5-10MB)
✅ Parameter-driven (speed → impact force)
✅ Surface-aware from physics materials

Core synthesis:

Impact burst (20ms noise + resonant tone)
Surface texture (gravel = granular, grass = filtered noise)
Debris (scattered micro-impacts)
Surface EQ (metal = bright, grass = muffled)

cpp

// Surface resonance frequencies (expert knowledge)
float get_resonance(Surface s) {
    switch(s) {
        case Concrete: return 150.0f;  // Low, dull
        case Wood:     return 250.0f;  // Mid, warm
        case Metal:    return 500.0f;  // High, ringing
        case Gravel:   return 300.0f;  // Crunchy mid
        default:       return 200.0f;
    }
}

程序合成优于样本的原因：

✅ 无限变化（无重复）
✅ 内存占用极小（约50KB vs 5-10MB）
✅ 参数驱动（速度→撞击力度）
✅ 可识别物理材质对应的地面

核心合成流程：

撞击爆发音（20ms噪声 + 谐振音调）
地面纹理（碎石=颗粒感，草地=滤波噪声）
碎屑音（分散的微型撞击）
地面均衡（金属=明亮，草地=低沉）

cpp

// Surface resonance frequencies (expert knowledge)
float get_resonance(Surface s) {
    switch(s) {
        case Concrete: return 150.0f;  // Low, dull
        case Wood:     return 250.0f;  // Mid, warm
        case Metal:    return 500.0f;  // High, ringing
        case Gravel:   return 300.0f;  // Crunchy mid
        default:       return 200.0f;
    }
}

Wwise/FMOD Integration

Wwise/FMOD集成

Key abstractions:

Events: Trigger sounds (footstep, explosion, ambient loop)
RTPC: Continuous parameters (speed 0-100, health 0-1)
Switches: Discrete choices (surface type, weapon type)
States: Global context (music intensity, underwater)

cpp

// Material-aware footsteps via Wwise
void OnFootDown(FHitResult& hit) {
    FString surface = DetectSurface(hit.PhysMaterial);
    float speed = GetVelocity().Size();

    SetSwitch("Surface", surface, this);        // Concrete/Wood/Metal
    SetRTPCValue("Impact_Force", speed/600.0f); // 0-1 normalized
    PostEvent(FootstepEvent, this);
}

关键抽象：

Events：触发音效（脚步声、爆炸声、环境循环音）
RTPC：连续参数（速度0-100，生命值0-1）
Switches：离散选项（地面类型、武器类型）
States：全局上下文（音乐强度、水下状态）

cpp

// Material-aware footsteps via Wwise
void OnFootDown(FHitResult& hit) {
    FString surface = DetectSurface(hit.PhysMaterial);
    float speed = GetVelocity().Size();

    SetSwitch("Surface", surface, this);        // Concrete/Wood/Metal
    SetRTPCValue("Impact_Force", speed/600.0f); // 0-1 normalized
    PostEvent(FootstepEvent, this);
}

UI/UX Sound Design

UI/UX声音设计

Principles for app sounds:

Subtle - UI sounds at -18 to -24dB
Short - 50-200ms for most interactions
Consistent - Same family/timbre across app
Accessible - Don't rely solely on audio for feedback
Haptic-paired - iOS haptics should match audio characteristics

Sound types:

Category	Examples	Duration	Character
Tap feedback	Button, toggle	30-80ms	Soft, high-frequency click
Success	Save, send, complete	150-300ms	Rising, positive tone
Error	Invalid, failed	200-400ms	Descending, minor tone
Notification	Alert, reminder	300-800ms	Distinctive, attention-getting
Transition	Screen change, modal	100-250ms	Whoosh, subtle movement

应用音效设计原则：

低调 - UI音效设置在-18至-24dB
短促 - 大多数交互音效为50-200ms
一致 - 应用内使用相同音色家族
可访问 - 不单独依赖音频提供反馈
触觉配对 - iOS触觉反馈应匹配音频特征

音效类型：

类别	示例	时长	特征
点击反馈	按钮、切换	30-80ms	柔和的高频点击音
成功提示	保存、发送、完成	150-300ms	上升的积极音调
错误提示	无效操作、失败	200-400ms	下降的小调音调
通知提示	警报、提醒	300-800ms	独特、吸引注意力
过渡音效	屏幕切换、模态框	100-250ms	风声、微妙的移动感

iOS/Android Audio Sessions

iOS/Android音频会话

iOS AVAudioSession categories:

```
.ambient
```
- Mixes with other audio, silenced by ringer
```
.playback
```
- Interrupts other audio, ignores ringer
```
.playAndRecord
```
- For voice apps
```
.soloAmbient
```
- Default, silences other audio

Critical handlers:

Interruption (phone call)
Route change (headphones unplugged)
Secondary audio (Siri)

swift

// Proper iOS audio session setup
func configureAudioSession() {
    let session = AVAudioSession.sharedInstance()
    try? session.setCategory(.playback, mode: .default, options: [.mixWithOthers])
    try? session.setActive(true)

    NotificationCenter.default.addObserver(
        self,
        selector: #selector(handleInterruption),
        name: AVAudioSession.interruptionNotification,
        object: nil
    )
}

iOS AVAudioSession分类：

```
.ambient
```
- 与其他音频混音，受铃声开关控制
```
.playback
```
- 中断其他音频，不受铃声开关控制
```
.playAndRecord
```
- 用于语音应用
```
.soloAmbient
```
- 默认选项，静音其他音频

关键处理：

中断（来电）
路由变更（拔下耳机）
次要音频（Siri）

swift

// Proper iOS audio session setup
func configureAudioSession() {
    let session = AVAudioSession.sharedInstance()
    try? session.setCategory(.playback, mode: .default, options: [.mixWithOthers])
    try? session.setActive(true)

    NotificationCenter.default.addObserver(
        self,
        selector: #selector(handleInterruption),
        name: AVAudioSession.interruptionNotification,
        object: nil
    )
}

Performance Targets

性能指标

Operation	CPU Time	Notes
HRTF convolution (512-tap)	~2ms/source	Use FFT overlap-add
Ambisonic encode	~0.1ms/source	Very efficient
Ambisonic decode (binaural)	~1ms total	Supports many sources
Procedural footstep	~1-2ms	vs 500KB per sample
Wind synthesis	~0.5ms/frame	Real-time streaming
Wwise event post	<0.1ms	Negligible
iOS audio callback	5-10ms budget	At 48kHz/512 samples

Budget guideline: Audio should use 5-10% of frame time.

操作	CPU耗时	说明
HRTF卷积（512抽头）	~2ms/声源	使用FFT重叠相加法
Ambisonics编码	~0.1ms/声源	效率极高
Ambisonics解码（双耳）	~1ms/总	支持多声源
程序脚步声合成	~1-2ms	对比样本的500KB/个
风声合成	~0.5ms/帧	实时流处理
Wwise事件触发	<0.1ms	可忽略不计
iOS音频回调	5-10ms预算	48kHz/512采样率下

预算准则：音频应占用5-10%的帧时间。

Quick Reference

快速参考

Spatial Audio Decision Tree

空间音频决策树

VR with head tracking? → Ambisonics
Few important sources? → Full HRTF
Many background sources? → Simple panning + distance rolloff
Mobile with limited CPU? → Binaural (simple) or panning

带头部追踪的VR？ → Ambisonics
少量关键声源？ → 完整HRTF
大量背景声源？ → 简单平移 + 距离衰减
CPU有限的移动平台？ → 简易双耳音频或平移

When to Use Procedural Audio

何时使用程序音频

Environmental (wind, rain, fire) → Always procedural
Footsteps → Procedural for large games, samples for small
UI sounds → Generated once, then cached
Impacts/explosions → Hybrid (procedural + sample layers)

环境音（风、雨、火）→ 始终使用程序合成
脚步声 → 大型游戏用程序合成，小型游戏用样本
UI音效 → 生成后缓存使用
撞击/爆炸声 → 混合方案（程序合成 + 样本层）

Platform Audio Sessions

平台音频会话设置

Game with music:
```
.ambient
```
+
```
mixWithOthers
```
Meditation/focus app:
```
.playback
```
(interrupt music)
Voice chat:
```
.playAndRecord
```
Video player:
```
.playback
```

带音乐的游戏：
```
.ambient
```
+
```
mixWithOthers
```
冥想/专注应用：
```
.playback
```
（中断音乐）
语音聊天：
```
.playAndRecord
```
视频播放器：
```
.playback
```

Integrates With

集成对象

voice-audio-engineer - Voice synthesis and TTS
vr-avatar-engineer - VR audio + avatar integration
metal-shader-expert - GPU audio processing
native-app-designer - App UI sound integration

For detailed implementations: See

/references/implementations.md

Remember: Great audio is invisible—players feel it, don't notice it. Focus on supporting the experience, not showing off. Procedural audio saves memory and eliminates repetition. Always respect CPU budgets and platform audio session requirements.

voice-audio-engineer - 语音合成与TTS
vr-avatar-engineer - VR音频 + 虚拟形象集成
metal-shader-expert - GPU音频处理
native-app-designer - 应用UI音效集成

详细实现参考：查看

/references/implementations.md

记住：优秀的音频是无形的——玩家能感受到它，却不会注意到它。专注于支撑体验，而非炫技。程序音频节省内存并消除重复。始终遵守CPU预算和平台音频会话要求。