sound-engineer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Sound Engineer: Spatial Audio, Procedural Sound & App UX Audio

音频工程师:空间音频、程序音效与应用UX音频

Expert audio engineer for interactive media: games, VR/AR, and mobile apps. Specializes in spatial audio, procedural sound generation, middleware integration, and UX sound design.
专注于交互媒体(游戏、VR/AR及移动应用)的专业音频工程师,擅长空间音频、程序音效生成、中间件集成及UX声音设计。

When to Use This Skill

何时使用此技能

Use for:
  • Spatial audio (HRTF, binaural, Ambisonics)
  • Procedural sound (footsteps, wind, environmental)
  • Game audio middleware (Wwise, FMOD)
  • Adaptive/interactive music systems
  • UI/UX sound design (clicks, notifications, feedback)
  • Sonic branding (audio logos, brand sounds)
  • iOS/Android audio session handling
  • Haptic-audio coordination
  • Real-time DSP (reverb, EQ, compression)
Do NOT use for:
  • Music composition/production → DAW tools (Logic, Ableton)
  • Voice synthesis/cloning → voice-audio-engineer
  • Film audio post-production → linear editing workflows
  • Podcast editing → standard audio editors
  • Hardware microphone setup → specialized domain
适用场景:
  • 空间音频(HRTF、双耳音频、Ambisonics)
  • 程序音效(脚步声、风声、环境音)
  • 游戏音频中间件(Wwise、FMOD)
  • 自适应/交互音乐系统
  • UI/UX声音设计(点击音、通知音、反馈音)
  • 声音品牌塑造(音频标识、品牌音效)
  • iOS/Android音频会话处理
  • 触觉-音频协同设计
  • 实时DSP(混响、均衡、压缩)
不适用场景:
  • 音乐创作/制作 → 使用DAW工具(Logic、Ableton)
  • 语音合成/克隆 → 请使用voice-audio-engineer
  • 电影音频后期制作 → 线性编辑工作流
  • 播客编辑 → 标准音频编辑器
  • 硬件麦克风设置 → 专业领域工具

MCP Integrations

MCP集成

MCPPurpose
ElevenLabs
text_to_sound_effects
- Generate UI sounds, notifications, impacts
FirecrawlResearch Wwise/FMOD docs, DSP algorithms, platform guidelines
WebFetchFetch Apple/Android audio session documentation
MCP工具用途
ElevenLabs
text_to_sound_effects
- 生成UI音效、通知音、撞击音
Firecrawl研究Wwise/FMOD文档、DSP算法、平台规范
WebFetch获取苹果/安卓音频会话文档

Expert vs Novice Shibboleths

专家与新手的区别

TopicNoviceExpert
Spatial audio"Just pan left/right"Uses HRTF convolution for true 3D; knows Ambisonics for VR head tracking
Footsteps"Use 10-20 samples"Procedural synthesis: infinite variation, tiny memory, parameter-driven
Middleware"Just play sounds"Uses RTPC for continuous params, Switches for materials, States for music
Adaptive music"Crossfade tracks"Horizontal re-orchestration (layers) + vertical remixing (stems)
UI sounds"Any click sound works"Designs for brand consistency, accessibility, haptic coordination
iOS audio"AVAudioPlayer works"Knows AVAudioSession categories, interruption handling, route changes
Distance rolloffLinear attenuationInverse square with reference distance; logarithmic for realism
CPU budget"Audio is cheap"Knows 5-10% budget; HRTF convolution is expensive (2ms/source)
主题新手做法专家做法
空间音频"仅做左右声道平移"使用HRTF卷积实现真实3D效果;了解Ambisonics用于VR头部追踪
脚步声"使用10-20个样本"程序合成:无限变化、占用内存极小、参数驱动
中间件"仅播放音效"使用RTPC控制连续参数、Switches切换材质、States管理音乐状态
自适应音乐"轨道交叉淡入淡出"横向重新配器(分层)+ 纵向混音(分轨)
UI音效"任何点击音都可以"围绕品牌一致性、可访问性、触觉协同进行设计
iOS音频"AVAudioPlayer足够用"熟悉AVAudioSession分类、中断处理、路由变更
距离衰减线性衰减带参考距离的平方反比衰减;使用对数衰减提升真实感
CPU预算"音频占用资源少"了解5-10%的预算限制;HRTF卷积资源消耗高(每个源约2ms)

Common Anti-Patterns

常见反模式

Anti-Pattern: Sample-Based Footsteps at Scale

反模式:大规模基于样本的脚步声

What it looks like: 20 footstep samples × 6 surfaces × 3 intensities = 360 files (180MB) Why it's wrong: Memory bloat, repetition audible after 20 minutes of play What to do instead: Procedural synthesis - impact + texture layers, infinite variation from parameters When samples OK: Small games, very specific character sounds
表现形式:20个脚步声样本 × 6种地面材质 × 3种力度 = 360个文件(180MB) 问题所在:内存膨胀,20分钟游戏后可听到重复音效 替代方案:程序合成 - 撞击层 + 纹理层,通过参数实现无限变化 样本适用场景:小型游戏、特定角色的专属音效

Anti-Pattern: HRTF for Every Sound

反模式:所有音效都使用HRTF

What it looks like: Full HRTF convolution on 50 simultaneous sources Why it's wrong: 50 × 2ms = 100ms CPU time; destroys frame budget What to do instead: HRTF for 3-5 important sources; Ambisonics for ambient bed; simple panning for distant/unimportant
表现形式:对50个同时播放的声源全部应用HRTF卷积 问题所在:50 × 2ms = 100ms CPU耗时;耗尽帧预算 替代方案:仅对3-5个关键声源使用HRTF;环境音使用Ambisonics;远处/次要声源使用简单声道平移

Anti-Pattern: Ignoring Audio Sessions (Mobile)

反模式:忽略移动平台音频会话

What it looks like: App audio stops when user gets a phone call, never resumes Why it's wrong: iOS/Android require explicit session management What to do instead: Implement
AVAudioSession
(iOS) or
AudioFocus
(Android); handle interruptions, route changes
表现形式:用户接电话时应用音频停止,且无法恢复 问题所在:iOS/Android要求显式的会话管理 替代方案:实现iOS的
AVAudioSession
或安卓的
AudioFocus
;处理中断、路由变更

Anti-Pattern: Hard-Coded Sounds

反模式:硬编码音效

What it looks like:
PlaySound("footstep_concrete_01.wav")
Why it's wrong: No variation, no parameter control, can't adapt to context What to do instead: Use middleware events with Switches/RTPCs; procedural generation for environmental sounds
表现形式
PlaySound("footstep_concrete_01.wav")
问题所在:无变化、无参数控制、无法适配上下文 替代方案:使用中间件事件配合Switches/RTPC;环境音采用程序生成

Anti-Pattern: Loud UI Sounds

反模式:UI音效音量过大

What it looks like: Every button click at -3dB, same volume as gameplay audio Why it's wrong: UI sounds should be subtle, never fatiguing; violates platform guidelines What to do instead: UI sounds at -18 to -24dB; use short, high-frequency transients; respect system volume
表现形式:所有按钮点击音量为-3dB,与游戏音频音量相同 问题所在:UI音效应低调,避免疲劳;违反平台规范 替代方案:UI音效设置在-18至-24dB;使用短促的高频瞬态音;遵循系统音量设置

Evolution Timeline

发展时间线

Pre-2010: Fixed Audio

2010年前:固定音频

  • Sample playback only
  • Basic stereo panning
  • Limited real-time processing
  • 仅支持样本播放
  • 基础立体声平移
  • 有限的实时处理

2010-2015: Middleware Era

2010-2015:中间件时代

  • Wwise/FMOD become standard
  • RTPC and State systems mature
  • Basic HRTF support
  • Wwise/FMOD成为标准
  • RTPC和State系统成熟
  • 基础HRTF支持

2016-2020: VR Audio Revolution

2016-2020:VR音频革命

  • Ambisonics for VR head tracking
  • Spatial audio APIs (Resonance, Steam Audio)
  • Procedural audio gains traction
  • Ambisonics用于VR头部追踪
  • 空间音频API(Resonance、Steam Audio)
  • 程序音频开始普及

2021-2024: AI & Mobile

2021-2024:AI与移动音频

  • ElevenLabs/AI sound effect generation
  • Apple Spatial Audio for AirPods
  • Procedural audio standard for AAA
  • Haptic-audio design becomes discipline
  • ElevenLabs/AI音效生成
  • 苹果AirPods空间音频
  • 程序音频成为AAA标准
  • 触觉-音频设计成为独立学科

2025+: Current Best Practices

2025+:当前最佳实践

  • AI-assisted sound design
  • Neural audio codecs
  • Real-time voice transformation
  • Personalized HRTF from photos
  • AI辅助音效设计
  • 神经音频编解码器
  • 实时语音转换
  • 通过照片生成个性化HRTF

Core Concepts

核心概念

Spatial Audio Approaches

空间音频方案

ApproachCPU CostQualityUse Case
Stereo panning~0.01msBasicDistant sounds, many sources
HRTF convolution~2ms/sourceExcellentClose/important 3D sounds
Ambisonics~1ms totalGoodVR, many sources, head tracking
Binaural (simple)~0.1ms/sourceDecentBudget/mobile spatial
HRTF: Convolves audio with measured ear impulse responses (512-1024 taps). Creates convincing 3D positioning including elevation.
Ambisonics: Encodes sound field as spherical harmonics (W,X,Y,Z for 1st order). Rotation-invariant, efficient for many sources.
cpp
// Key insight: encode once, rotate cheaply
AmbisonicSignal encode(mono_input, direction) {
    return {
        mono * 0.707f,      // W (omnidirectional)
        mono * direction.x, // X (front-back)
        mono * direction.y, // Y (left-right)
        mono * direction.z  // Z (up-down)
    };
}
方案CPU消耗质量适用场景
立体声平移~0.01ms基础远处音效、多声源
HRTF卷积~2ms/声源极佳近距离/关键3D音效
Ambisonics~1ms/总良好VR、多声源、头部追踪
简易双耳音频~0.1ms/声源尚可预算有限/移动平台空间音频
HRTF:将音频与测量得到的耳朵脉冲响应(512-1024抽头)进行卷积,实现包含高度信息的逼真3D定位。
Ambisonics:将声场编码为球谐函数(一阶为W,X,Y,Z),旋转不变,多声源场景下效率高。
cpp
// Key insight: encode once, rotate cheaply
AmbisonicSignal encode(mono_input, direction) {
    return {
        mono * 0.707f,      // W (omnidirectional)
        mono * direction.x, // X (front-back)
        mono * direction.y, // Y (left-right)
        mono * direction.z  // Z (up-down)
    };
}

Procedural Footsteps

程序脚步声

Why procedural beats samples:
  • ✅ Infinite variation (no repetition)
  • ✅ Tiny memory (~50KB vs 5-10MB)
  • ✅ Parameter-driven (speed → impact force)
  • ✅ Surface-aware from physics materials
Core synthesis:
  1. Impact burst (20ms noise + resonant tone)
  2. Surface texture (gravel = granular, grass = filtered noise)
  3. Debris (scattered micro-impacts)
  4. Surface EQ (metal = bright, grass = muffled)
cpp
// Surface resonance frequencies (expert knowledge)
float get_resonance(Surface s) {
    switch(s) {
        case Concrete: return 150.0f;  // Low, dull
        case Wood:     return 250.0f;  // Mid, warm
        case Metal:    return 500.0f;  // High, ringing
        case Gravel:   return 300.0f;  // Crunchy mid
        default:       return 200.0f;
    }
}
程序合成优于样本的原因:
  • ✅ 无限变化(无重复)
  • ✅ 内存占用极小(约50KB vs 5-10MB)
  • ✅ 参数驱动(速度→撞击力度)
  • ✅ 可识别物理材质对应的地面
核心合成流程:
  1. 撞击爆发音(20ms噪声 + 谐振音调)
  2. 地面纹理(碎石=颗粒感,草地=滤波噪声)
  3. 碎屑音(分散的微型撞击)
  4. 地面均衡(金属=明亮,草地=低沉)
cpp
// Surface resonance frequencies (expert knowledge)
float get_resonance(Surface s) {
    switch(s) {
        case Concrete: return 150.0f;  // Low, dull
        case Wood:     return 250.0f;  // Mid, warm
        case Metal:    return 500.0f;  // High, ringing
        case Gravel:   return 300.0f;  // Crunchy mid
        default:       return 200.0f;
    }
}

Wwise/FMOD Integration

Wwise/FMOD集成

Key abstractions:
  • Events: Trigger sounds (footstep, explosion, ambient loop)
  • RTPC: Continuous parameters (speed 0-100, health 0-1)
  • Switches: Discrete choices (surface type, weapon type)
  • States: Global context (music intensity, underwater)
cpp
// Material-aware footsteps via Wwise
void OnFootDown(FHitResult& hit) {
    FString surface = DetectSurface(hit.PhysMaterial);
    float speed = GetVelocity().Size();

    SetSwitch("Surface", surface, this);        // Concrete/Wood/Metal
    SetRTPCValue("Impact_Force", speed/600.0f); // 0-1 normalized
    PostEvent(FootstepEvent, this);
}
关键抽象:
  • Events:触发音效(脚步声、爆炸声、环境循环音)
  • RTPC:连续参数(速度0-100,生命值0-1)
  • Switches:离散选项(地面类型、武器类型)
  • States:全局上下文(音乐强度、水下状态)
cpp
// Material-aware footsteps via Wwise
void OnFootDown(FHitResult& hit) {
    FString surface = DetectSurface(hit.PhysMaterial);
    float speed = GetVelocity().Size();

    SetSwitch("Surface", surface, this);        // Concrete/Wood/Metal
    SetRTPCValue("Impact_Force", speed/600.0f); // 0-1 normalized
    PostEvent(FootstepEvent, this);
}

UI/UX Sound Design

UI/UX声音设计

Principles for app sounds:
  1. Subtle - UI sounds at -18 to -24dB
  2. Short - 50-200ms for most interactions
  3. Consistent - Same family/timbre across app
  4. Accessible - Don't rely solely on audio for feedback
  5. Haptic-paired - iOS haptics should match audio characteristics
Sound types:
CategoryExamplesDurationCharacter
Tap feedbackButton, toggle30-80msSoft, high-frequency click
SuccessSave, send, complete150-300msRising, positive tone
ErrorInvalid, failed200-400msDescending, minor tone
NotificationAlert, reminder300-800msDistinctive, attention-getting
TransitionScreen change, modal100-250msWhoosh, subtle movement
应用音效设计原则:
  1. 低调 - UI音效设置在-18至-24dB
  2. 短促 - 大多数交互音效为50-200ms
  3. 一致 - 应用内使用相同音色家族
  4. 可访问 - 不单独依赖音频提供反馈
  5. 触觉配对 - iOS触觉反馈应匹配音频特征
音效类型:
类别示例时长特征
点击反馈按钮、切换30-80ms柔和的高频点击音
成功提示保存、发送、完成150-300ms上升的积极音调
错误提示无效操作、失败200-400ms下降的小调音调
通知提示警报、提醒300-800ms独特、吸引注意力
过渡音效屏幕切换、模态框100-250ms风声、微妙的移动感

iOS/Android Audio Sessions

iOS/Android音频会话

iOS AVAudioSession categories:
  • .ambient
    - Mixes with other audio, silenced by ringer
  • .playback
    - Interrupts other audio, ignores ringer
  • .playAndRecord
    - For voice apps
  • .soloAmbient
    - Default, silences other audio
Critical handlers:
  • Interruption (phone call)
  • Route change (headphones unplugged)
  • Secondary audio (Siri)
swift
// Proper iOS audio session setup
func configureAudioSession() {
    let session = AVAudioSession.sharedInstance()
    try? session.setCategory(.playback, mode: .default, options: [.mixWithOthers])
    try? session.setActive(true)

    NotificationCenter.default.addObserver(
        self,
        selector: #selector(handleInterruption),
        name: AVAudioSession.interruptionNotification,
        object: nil
    )
}
iOS AVAudioSession分类:
  • .ambient
    - 与其他音频混音,受铃声开关控制
  • .playback
    - 中断其他音频,不受铃声开关控制
  • .playAndRecord
    - 用于语音应用
  • .soloAmbient
    - 默认选项,静音其他音频
关键处理:
  • 中断(来电)
  • 路由变更(拔下耳机)
  • 次要音频(Siri)
swift
// Proper iOS audio session setup
func configureAudioSession() {
    let session = AVAudioSession.sharedInstance()
    try? session.setCategory(.playback, mode: .default, options: [.mixWithOthers])
    try? session.setActive(true)

    NotificationCenter.default.addObserver(
        self,
        selector: #selector(handleInterruption),
        name: AVAudioSession.interruptionNotification,
        object: nil
    )
}

Performance Targets

性能指标

OperationCPU TimeNotes
HRTF convolution (512-tap)~2ms/sourceUse FFT overlap-add
Ambisonic encode~0.1ms/sourceVery efficient
Ambisonic decode (binaural)~1ms totalSupports many sources
Procedural footstep~1-2msvs 500KB per sample
Wind synthesis~0.5ms/frameReal-time streaming
Wwise event post<0.1msNegligible
iOS audio callback5-10ms budgetAt 48kHz/512 samples
Budget guideline: Audio should use 5-10% of frame time.
操作CPU耗时说明
HRTF卷积(512抽头)~2ms/声源使用FFT重叠相加法
Ambisonics编码~0.1ms/声源效率极高
Ambisonics解码(双耳)~1ms/总支持多声源
程序脚步声合成~1-2ms对比样本的500KB/个
风声合成~0.5ms/帧实时流处理
Wwise事件触发<0.1ms可忽略不计
iOS音频回调5-10ms预算48kHz/512采样率下
预算准则:音频应占用5-10%的帧时间。

Quick Reference

快速参考

Spatial Audio Decision Tree

空间音频决策树

  • VR with head tracking? → Ambisonics
  • Few important sources? → Full HRTF
  • Many background sources? → Simple panning + distance rolloff
  • Mobile with limited CPU? → Binaural (simple) or panning
  • 带头部追踪的VR? → Ambisonics
  • 少量关键声源? → 完整HRTF
  • 大量背景声源? → 简单平移 + 距离衰减
  • CPU有限的移动平台? → 简易双耳音频或平移

When to Use Procedural Audio

何时使用程序音频

  • Environmental (wind, rain, fire) → Always procedural
  • Footsteps → Procedural for large games, samples for small
  • UI sounds → Generated once, then cached
  • Impacts/explosions → Hybrid (procedural + sample layers)
  • 环境音(风、雨、火)→ 始终使用程序合成
  • 脚步声 → 大型游戏用程序合成,小型游戏用样本
  • UI音效 → 生成后缓存使用
  • 撞击/爆炸声 → 混合方案(程序合成 + 样本层)

Platform Audio Sessions

平台音频会话设置

  • Game with music:
    .ambient
    +
    mixWithOthers
  • Meditation/focus app:
    .playback
    (interrupt music)
  • Voice chat:
    .playAndRecord
  • Video player:
    .playback
  • 带音乐的游戏
    .ambient
    +
    mixWithOthers
  • 冥想/专注应用
    .playback
    (中断音乐)
  • 语音聊天
    .playAndRecord
  • 视频播放器
    .playback

Integrates With

集成对象

  • voice-audio-engineer - Voice synthesis and TTS
  • vr-avatar-engineer - VR audio + avatar integration
  • metal-shader-expert - GPU audio processing
  • native-app-designer - App UI sound integration

For detailed implementations: See
/references/implementations.md
Remember: Great audio is invisible—players feel it, don't notice it. Focus on supporting the experience, not showing off. Procedural audio saves memory and eliminates repetition. Always respect CPU budgets and platform audio session requirements.
  • voice-audio-engineer - 语音合成与TTS
  • vr-avatar-engineer - VR音频 + 虚拟形象集成
  • metal-shader-expert - GPU音频处理
  • native-app-designer - 应用UI音效集成

详细实现参考:查看
/references/implementations.md
记住:优秀的音频是无形的——玩家能感受到它,却不会注意到它。专注于支撑体验,而非炫技。程序音频节省内存并消除重复。始终遵守CPU预算和平台音频会话要求。