sound-engineer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSound Engineer: Spatial Audio, Procedural Sound & App UX Audio
音频工程师:空间音频、程序音效与应用UX音频
Expert audio engineer for interactive media: games, VR/AR, and mobile apps. Specializes in spatial audio, procedural sound generation, middleware integration, and UX sound design.
专注于交互媒体(游戏、VR/AR及移动应用)的专业音频工程师,擅长空间音频、程序音效生成、中间件集成及UX声音设计。
When to Use This Skill
何时使用此技能
✅ Use for:
- Spatial audio (HRTF, binaural, Ambisonics)
- Procedural sound (footsteps, wind, environmental)
- Game audio middleware (Wwise, FMOD)
- Adaptive/interactive music systems
- UI/UX sound design (clicks, notifications, feedback)
- Sonic branding (audio logos, brand sounds)
- iOS/Android audio session handling
- Haptic-audio coordination
- Real-time DSP (reverb, EQ, compression)
❌ Do NOT use for:
- Music composition/production → DAW tools (Logic, Ableton)
- Voice synthesis/cloning → voice-audio-engineer
- Film audio post-production → linear editing workflows
- Podcast editing → standard audio editors
- Hardware microphone setup → specialized domain
✅ 适用场景:
- 空间音频(HRTF、双耳音频、Ambisonics)
- 程序音效(脚步声、风声、环境音)
- 游戏音频中间件(Wwise、FMOD)
- 自适应/交互音乐系统
- UI/UX声音设计(点击音、通知音、反馈音)
- 声音品牌塑造(音频标识、品牌音效)
- iOS/Android音频会话处理
- 触觉-音频协同设计
- 实时DSP(混响、均衡、压缩)
❌ 不适用场景:
- 音乐创作/制作 → 使用DAW工具(Logic、Ableton)
- 语音合成/克隆 → 请使用voice-audio-engineer
- 电影音频后期制作 → 线性编辑工作流
- 播客编辑 → 标准音频编辑器
- 硬件麦克风设置 → 专业领域工具
MCP Integrations
MCP集成
| MCP | Purpose |
|---|---|
| ElevenLabs | |
| Firecrawl | Research Wwise/FMOD docs, DSP algorithms, platform guidelines |
| WebFetch | Fetch Apple/Android audio session documentation |
| MCP工具 | 用途 |
|---|---|
| ElevenLabs | |
| Firecrawl | 研究Wwise/FMOD文档、DSP算法、平台规范 |
| WebFetch | 获取苹果/安卓音频会话文档 |
Expert vs Novice Shibboleths
专家与新手的区别
| Topic | Novice | Expert |
|---|---|---|
| Spatial audio | "Just pan left/right" | Uses HRTF convolution for true 3D; knows Ambisonics for VR head tracking |
| Footsteps | "Use 10-20 samples" | Procedural synthesis: infinite variation, tiny memory, parameter-driven |
| Middleware | "Just play sounds" | Uses RTPC for continuous params, Switches for materials, States for music |
| Adaptive music | "Crossfade tracks" | Horizontal re-orchestration (layers) + vertical remixing (stems) |
| UI sounds | "Any click sound works" | Designs for brand consistency, accessibility, haptic coordination |
| iOS audio | "AVAudioPlayer works" | Knows AVAudioSession categories, interruption handling, route changes |
| Distance rolloff | Linear attenuation | Inverse square with reference distance; logarithmic for realism |
| CPU budget | "Audio is cheap" | Knows 5-10% budget; HRTF convolution is expensive (2ms/source) |
| 主题 | 新手做法 | 专家做法 |
|---|---|---|
| 空间音频 | "仅做左右声道平移" | 使用HRTF卷积实现真实3D效果;了解Ambisonics用于VR头部追踪 |
| 脚步声 | "使用10-20个样本" | 程序合成:无限变化、占用内存极小、参数驱动 |
| 中间件 | "仅播放音效" | 使用RTPC控制连续参数、Switches切换材质、States管理音乐状态 |
| 自适应音乐 | "轨道交叉淡入淡出" | 横向重新配器(分层)+ 纵向混音(分轨) |
| UI音效 | "任何点击音都可以" | 围绕品牌一致性、可访问性、触觉协同进行设计 |
| iOS音频 | "AVAudioPlayer足够用" | 熟悉AVAudioSession分类、中断处理、路由变更 |
| 距离衰减 | 线性衰减 | 带参考距离的平方反比衰减;使用对数衰减提升真实感 |
| CPU预算 | "音频占用资源少" | 了解5-10%的预算限制;HRTF卷积资源消耗高(每个源约2ms) |
Common Anti-Patterns
常见反模式
Anti-Pattern: Sample-Based Footsteps at Scale
反模式:大规模基于样本的脚步声
What it looks like: 20 footstep samples × 6 surfaces × 3 intensities = 360 files (180MB)
Why it's wrong: Memory bloat, repetition audible after 20 minutes of play
What to do instead: Procedural synthesis - impact + texture layers, infinite variation from parameters
When samples OK: Small games, very specific character sounds
表现形式:20个脚步声样本 × 6种地面材质 × 3种力度 = 360个文件(180MB)
问题所在:内存膨胀,20分钟游戏后可听到重复音效
替代方案:程序合成 - 撞击层 + 纹理层,通过参数实现无限变化
样本适用场景:小型游戏、特定角色的专属音效
Anti-Pattern: HRTF for Every Sound
反模式:所有音效都使用HRTF
What it looks like: Full HRTF convolution on 50 simultaneous sources
Why it's wrong: 50 × 2ms = 100ms CPU time; destroys frame budget
What to do instead: HRTF for 3-5 important sources; Ambisonics for ambient bed; simple panning for distant/unimportant
表现形式:对50个同时播放的声源全部应用HRTF卷积
问题所在:50 × 2ms = 100ms CPU耗时;耗尽帧预算
替代方案:仅对3-5个关键声源使用HRTF;环境音使用Ambisonics;远处/次要声源使用简单声道平移
Anti-Pattern: Ignoring Audio Sessions (Mobile)
反模式:忽略移动平台音频会话
What it looks like: App audio stops when user gets a phone call, never resumes
Why it's wrong: iOS/Android require explicit session management
What to do instead: Implement (iOS) or (Android); handle interruptions, route changes
AVAudioSessionAudioFocus表现形式:用户接电话时应用音频停止,且无法恢复
问题所在:iOS/Android要求显式的会话管理
替代方案:实现iOS的或安卓的;处理中断、路由变更
AVAudioSessionAudioFocusAnti-Pattern: Hard-Coded Sounds
反模式:硬编码音效
What it looks like:
Why it's wrong: No variation, no parameter control, can't adapt to context
What to do instead: Use middleware events with Switches/RTPCs; procedural generation for environmental sounds
PlaySound("footstep_concrete_01.wav")表现形式:
问题所在:无变化、无参数控制、无法适配上下文
替代方案:使用中间件事件配合Switches/RTPC;环境音采用程序生成
PlaySound("footstep_concrete_01.wav")Anti-Pattern: Loud UI Sounds
反模式:UI音效音量过大
What it looks like: Every button click at -3dB, same volume as gameplay audio
Why it's wrong: UI sounds should be subtle, never fatiguing; violates platform guidelines
What to do instead: UI sounds at -18 to -24dB; use short, high-frequency transients; respect system volume
表现形式:所有按钮点击音量为-3dB,与游戏音频音量相同
问题所在:UI音效应低调,避免疲劳;违反平台规范
替代方案:UI音效设置在-18至-24dB;使用短促的高频瞬态音;遵循系统音量设置
Evolution Timeline
发展时间线
Pre-2010: Fixed Audio
2010年前:固定音频
- Sample playback only
- Basic stereo panning
- Limited real-time processing
- 仅支持样本播放
- 基础立体声平移
- 有限的实时处理
2010-2015: Middleware Era
2010-2015:中间件时代
- Wwise/FMOD become standard
- RTPC and State systems mature
- Basic HRTF support
- Wwise/FMOD成为标准
- RTPC和State系统成熟
- 基础HRTF支持
2016-2020: VR Audio Revolution
2016-2020:VR音频革命
- Ambisonics for VR head tracking
- Spatial audio APIs (Resonance, Steam Audio)
- Procedural audio gains traction
- Ambisonics用于VR头部追踪
- 空间音频API(Resonance、Steam Audio)
- 程序音频开始普及
2021-2024: AI & Mobile
2021-2024:AI与移动音频
- ElevenLabs/AI sound effect generation
- Apple Spatial Audio for AirPods
- Procedural audio standard for AAA
- Haptic-audio design becomes discipline
- ElevenLabs/AI音效生成
- 苹果AirPods空间音频
- 程序音频成为AAA标准
- 触觉-音频设计成为独立学科
2025+: Current Best Practices
2025+:当前最佳实践
- AI-assisted sound design
- Neural audio codecs
- Real-time voice transformation
- Personalized HRTF from photos
- AI辅助音效设计
- 神经音频编解码器
- 实时语音转换
- 通过照片生成个性化HRTF
Core Concepts
核心概念
Spatial Audio Approaches
空间音频方案
| Approach | CPU Cost | Quality | Use Case |
|---|---|---|---|
| Stereo panning | ~0.01ms | Basic | Distant sounds, many sources |
| HRTF convolution | ~2ms/source | Excellent | Close/important 3D sounds |
| Ambisonics | ~1ms total | Good | VR, many sources, head tracking |
| Binaural (simple) | ~0.1ms/source | Decent | Budget/mobile spatial |
HRTF: Convolves audio with measured ear impulse responses (512-1024 taps). Creates convincing 3D positioning including elevation.
Ambisonics: Encodes sound field as spherical harmonics (W,X,Y,Z for 1st order). Rotation-invariant, efficient for many sources.
cpp
// Key insight: encode once, rotate cheaply
AmbisonicSignal encode(mono_input, direction) {
return {
mono * 0.707f, // W (omnidirectional)
mono * direction.x, // X (front-back)
mono * direction.y, // Y (left-right)
mono * direction.z // Z (up-down)
};
}| 方案 | CPU消耗 | 质量 | 适用场景 |
|---|---|---|---|
| 立体声平移 | ~0.01ms | 基础 | 远处音效、多声源 |
| HRTF卷积 | ~2ms/声源 | 极佳 | 近距离/关键3D音效 |
| Ambisonics | ~1ms/总 | 良好 | VR、多声源、头部追踪 |
| 简易双耳音频 | ~0.1ms/声源 | 尚可 | 预算有限/移动平台空间音频 |
HRTF:将音频与测量得到的耳朵脉冲响应(512-1024抽头)进行卷积,实现包含高度信息的逼真3D定位。
Ambisonics:将声场编码为球谐函数(一阶为W,X,Y,Z),旋转不变,多声源场景下效率高。
cpp
// Key insight: encode once, rotate cheaply
AmbisonicSignal encode(mono_input, direction) {
return {
mono * 0.707f, // W (omnidirectional)
mono * direction.x, // X (front-back)
mono * direction.y, // Y (left-right)
mono * direction.z // Z (up-down)
};
}Procedural Footsteps
程序脚步声
Why procedural beats samples:
- ✅ Infinite variation (no repetition)
- ✅ Tiny memory (~50KB vs 5-10MB)
- ✅ Parameter-driven (speed → impact force)
- ✅ Surface-aware from physics materials
Core synthesis:
- Impact burst (20ms noise + resonant tone)
- Surface texture (gravel = granular, grass = filtered noise)
- Debris (scattered micro-impacts)
- Surface EQ (metal = bright, grass = muffled)
cpp
// Surface resonance frequencies (expert knowledge)
float get_resonance(Surface s) {
switch(s) {
case Concrete: return 150.0f; // Low, dull
case Wood: return 250.0f; // Mid, warm
case Metal: return 500.0f; // High, ringing
case Gravel: return 300.0f; // Crunchy mid
default: return 200.0f;
}
}程序合成优于样本的原因:
- ✅ 无限变化(无重复)
- ✅ 内存占用极小(约50KB vs 5-10MB)
- ✅ 参数驱动(速度→撞击力度)
- ✅ 可识别物理材质对应的地面
核心合成流程:
- 撞击爆发音(20ms噪声 + 谐振音调)
- 地面纹理(碎石=颗粒感,草地=滤波噪声)
- 碎屑音(分散的微型撞击)
- 地面均衡(金属=明亮,草地=低沉)
cpp
// Surface resonance frequencies (expert knowledge)
float get_resonance(Surface s) {
switch(s) {
case Concrete: return 150.0f; // Low, dull
case Wood: return 250.0f; // Mid, warm
case Metal: return 500.0f; // High, ringing
case Gravel: return 300.0f; // Crunchy mid
default: return 200.0f;
}
}Wwise/FMOD Integration
Wwise/FMOD集成
Key abstractions:
- Events: Trigger sounds (footstep, explosion, ambient loop)
- RTPC: Continuous parameters (speed 0-100, health 0-1)
- Switches: Discrete choices (surface type, weapon type)
- States: Global context (music intensity, underwater)
cpp
// Material-aware footsteps via Wwise
void OnFootDown(FHitResult& hit) {
FString surface = DetectSurface(hit.PhysMaterial);
float speed = GetVelocity().Size();
SetSwitch("Surface", surface, this); // Concrete/Wood/Metal
SetRTPCValue("Impact_Force", speed/600.0f); // 0-1 normalized
PostEvent(FootstepEvent, this);
}关键抽象:
- Events:触发音效(脚步声、爆炸声、环境循环音)
- RTPC:连续参数(速度0-100,生命值0-1)
- Switches:离散选项(地面类型、武器类型)
- States:全局上下文(音乐强度、水下状态)
cpp
// Material-aware footsteps via Wwise
void OnFootDown(FHitResult& hit) {
FString surface = DetectSurface(hit.PhysMaterial);
float speed = GetVelocity().Size();
SetSwitch("Surface", surface, this); // Concrete/Wood/Metal
SetRTPCValue("Impact_Force", speed/600.0f); // 0-1 normalized
PostEvent(FootstepEvent, this);
}UI/UX Sound Design
UI/UX声音设计
Principles for app sounds:
- Subtle - UI sounds at -18 to -24dB
- Short - 50-200ms for most interactions
- Consistent - Same family/timbre across app
- Accessible - Don't rely solely on audio for feedback
- Haptic-paired - iOS haptics should match audio characteristics
Sound types:
| Category | Examples | Duration | Character |
|---|---|---|---|
| Tap feedback | Button, toggle | 30-80ms | Soft, high-frequency click |
| Success | Save, send, complete | 150-300ms | Rising, positive tone |
| Error | Invalid, failed | 200-400ms | Descending, minor tone |
| Notification | Alert, reminder | 300-800ms | Distinctive, attention-getting |
| Transition | Screen change, modal | 100-250ms | Whoosh, subtle movement |
应用音效设计原则:
- 低调 - UI音效设置在-18至-24dB
- 短促 - 大多数交互音效为50-200ms
- 一致 - 应用内使用相同音色家族
- 可访问 - 不单独依赖音频提供反馈
- 触觉配对 - iOS触觉反馈应匹配音频特征
音效类型:
| 类别 | 示例 | 时长 | 特征 |
|---|---|---|---|
| 点击反馈 | 按钮、切换 | 30-80ms | 柔和的高频点击音 |
| 成功提示 | 保存、发送、完成 | 150-300ms | 上升的积极音调 |
| 错误提示 | 无效操作、失败 | 200-400ms | 下降的小调音调 |
| 通知提示 | 警报、提醒 | 300-800ms | 独特、吸引注意力 |
| 过渡音效 | 屏幕切换、模态框 | 100-250ms | 风声、微妙的移动感 |
iOS/Android Audio Sessions
iOS/Android音频会话
iOS AVAudioSession categories:
- - Mixes with other audio, silenced by ringer
.ambient - - Interrupts other audio, ignores ringer
.playback - - For voice apps
.playAndRecord - - Default, silences other audio
.soloAmbient
Critical handlers:
- Interruption (phone call)
- Route change (headphones unplugged)
- Secondary audio (Siri)
swift
// Proper iOS audio session setup
func configureAudioSession() {
let session = AVAudioSession.sharedInstance()
try? session.setCategory(.playback, mode: .default, options: [.mixWithOthers])
try? session.setActive(true)
NotificationCenter.default.addObserver(
self,
selector: #selector(handleInterruption),
name: AVAudioSession.interruptionNotification,
object: nil
)
}iOS AVAudioSession分类:
- - 与其他音频混音,受铃声开关控制
.ambient - - 中断其他音频,不受铃声开关控制
.playback - - 用于语音应用
.playAndRecord - - 默认选项,静音其他音频
.soloAmbient
关键处理:
- 中断(来电)
- 路由变更(拔下耳机)
- 次要音频(Siri)
swift
// Proper iOS audio session setup
func configureAudioSession() {
let session = AVAudioSession.sharedInstance()
try? session.setCategory(.playback, mode: .default, options: [.mixWithOthers])
try? session.setActive(true)
NotificationCenter.default.addObserver(
self,
selector: #selector(handleInterruption),
name: AVAudioSession.interruptionNotification,
object: nil
)
}Performance Targets
性能指标
| Operation | CPU Time | Notes |
|---|---|---|
| HRTF convolution (512-tap) | ~2ms/source | Use FFT overlap-add |
| Ambisonic encode | ~0.1ms/source | Very efficient |
| Ambisonic decode (binaural) | ~1ms total | Supports many sources |
| Procedural footstep | ~1-2ms | vs 500KB per sample |
| Wind synthesis | ~0.5ms/frame | Real-time streaming |
| Wwise event post | <0.1ms | Negligible |
| iOS audio callback | 5-10ms budget | At 48kHz/512 samples |
Budget guideline: Audio should use 5-10% of frame time.
| 操作 | CPU耗时 | 说明 |
|---|---|---|
| HRTF卷积(512抽头) | ~2ms/声源 | 使用FFT重叠相加法 |
| Ambisonics编码 | ~0.1ms/声源 | 效率极高 |
| Ambisonics解码(双耳) | ~1ms/总 | 支持多声源 |
| 程序脚步声合成 | ~1-2ms | 对比样本的500KB/个 |
| 风声合成 | ~0.5ms/帧 | 实时流处理 |
| Wwise事件触发 | <0.1ms | 可忽略不计 |
| iOS音频回调 | 5-10ms预算 | 48kHz/512采样率下 |
预算准则:音频应占用5-10%的帧时间。
Quick Reference
快速参考
Spatial Audio Decision Tree
空间音频决策树
- VR with head tracking? → Ambisonics
- Few important sources? → Full HRTF
- Many background sources? → Simple panning + distance rolloff
- Mobile with limited CPU? → Binaural (simple) or panning
- 带头部追踪的VR? → Ambisonics
- 少量关键声源? → 完整HRTF
- 大量背景声源? → 简单平移 + 距离衰减
- CPU有限的移动平台? → 简易双耳音频或平移
When to Use Procedural Audio
何时使用程序音频
- Environmental (wind, rain, fire) → Always procedural
- Footsteps → Procedural for large games, samples for small
- UI sounds → Generated once, then cached
- Impacts/explosions → Hybrid (procedural + sample layers)
- 环境音(风、雨、火)→ 始终使用程序合成
- 脚步声 → 大型游戏用程序合成,小型游戏用样本
- UI音效 → 生成后缓存使用
- 撞击/爆炸声 → 混合方案(程序合成 + 样本层)
Platform Audio Sessions
平台音频会话设置
- Game with music: +
.ambientmixWithOthers - Meditation/focus app: (interrupt music)
.playback - Voice chat:
.playAndRecord - Video player:
.playback
- 带音乐的游戏:+
.ambientmixWithOthers - 冥想/专注应用:(中断音乐)
.playback - 语音聊天:
.playAndRecord - 视频播放器:
.playback
Integrates With
集成对象
- voice-audio-engineer - Voice synthesis and TTS
- vr-avatar-engineer - VR audio + avatar integration
- metal-shader-expert - GPU audio processing
- native-app-designer - App UI sound integration
For detailed implementations: See
/references/implementations.mdRemember: Great audio is invisible—players feel it, don't notice it. Focus on supporting the experience, not showing off. Procedural audio saves memory and eliminates repetition. Always respect CPU budgets and platform audio session requirements.
- voice-audio-engineer - 语音合成与TTS
- vr-avatar-engineer - VR音频 + 虚拟形象集成
- metal-shader-expert - GPU音频处理
- native-app-designer - 应用UI音效集成
详细实现参考:查看
/references/implementations.md记住:优秀的音频是无形的——玩家能感受到它,却不会注意到它。专注于支撑体验,而非炫技。程序音频节省内存并消除重复。始终遵守CPU预算和平台音频会话要求。