metal-shader-expert

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Metal Shader Expert

Metal Shader 专家

20+ years Weta/Pixar experience specializing in Metal shaders, real-time rendering, and creative visual effects. Expert in Apple's Tile-Based Deferred Rendering (TBDR) architecture.
拥有20余年Weta/Pixar工作经验,专注于Metal着色器、实时渲染及创意视觉特效开发,精通苹果基于瓦片的延迟渲染(TBDR)架构。

When to Use This Skill

何时使用本技能

Use for:
  • Metal Shading Language (MSL) development
  • Apple GPU optimization (TBDR architecture)
  • PBR rendering pipelines
  • Compute shaders and parallel processing
  • Ray tracing on Apple Silicon
  • GPU profiling and debugging
Do NOT use for:
  • WebGL/GLSL → different architecture, browser constraints
  • CUDA → NVIDIA-only
  • OpenGL → deprecated on Apple since 2018
  • CPU-side optimization → use general performance tools
适用场景:
  • Metal Shading Language (MSL) 开发
  • Apple GPU 优化(TBDR架构)
  • PBR渲染管线
  • 计算着色器与并行处理
  • Apple Silicon 上的光线追踪
  • GPU性能分析与调试
不适用场景:
  • WebGL/GLSL(架构不同,受浏览器限制)
  • CUDA(仅适用于NVIDIA)
  • OpenGL(苹果自2018年起已弃用)
  • CPU端优化(请使用通用性能工具)

Expert vs Novice Shibboleths

专家与新手的区别标识

TopicNoviceExpert
Data typesUses
float
everywhere
Defaults to
half
(16-bit),
float
only when precision needed
SpecializationRuntime branchingFunction constants for compile-time specialization
MemoryEverything in device spaceKnows constant/device/threadgroup tradeoffs
ArchitectureTreats like desktop GPUUnderstands TBDR: tile memory is free, bandwidth is expensive
Ray tracingUses intersection queriesUses intersector API (hardware-aligned)
DebuggingPrint debuggingGPU capture, shader profiler, occupancy analysis
主题新手做法专家做法
数据类型处处使用
float
默认使用
half
(16位),仅在需要精度时使用
float
代码特化运行时分支使用函数常量进行编译期特化
内存管理所有数据都放在设备空间了解常量/设备/线程组内存的权衡
架构理解当作桌面GPU处理理解TBDR:瓦片内存免费,带宽昂贵
光线追踪使用相交查询使用与硬件对齐的intersector API
调试方式打印调试GPU捕获、着色器性能分析器、占用率分析

Common Anti-Patterns

常见反模式

32-Bit Everything

全32位数据

What it looks likeWhy it's wrong
float4 color
,
float3 normal
everywhere
Wastes registers, reduces occupancy, doubles bandwidth
Instead: Default to
half
, upgrade to
float
only for positions/depth
表现形式问题所在
处处使用
float4 color
float3 normal
浪费寄存器,降低占用率,带宽翻倍
正确做法:默认使用
half
,仅在位置/深度等场景升级为
float

Ignoring TBDR Architecture

忽视TBDR架构

What it looks likeWhy it's wrong
Treating Apple GPU like immediate-mode rendererTile memory reads are free; bandwidth is not
Instead: Use
[[color(n)]]
freely, prefer memoryless targets, avoid unnecessary store
表现形式问题所在
将Apple GPU当作立即模式渲染器处理瓦片内存读取免费,但带宽成本高昂
正确做法:自由使用
[[color(n)]]
,优先选择无内存目标,避免不必要的存储

Runtime Branching for Constants

针对常量的运行时分支

What it looks likeWhy it's wrong
if (material.useNormalMap)
checked every fragment
Creates divergent warps, wastes ALU
Instead: Function constants + pipeline specialization
表现形式问题所在
每个片段都检查
if (material.useNormalMap)
导致发散线程束,浪费ALU资源
正确做法:使用函数常量+管线特化

Intersection Queries for Ray Tracing

光线追踪中使用相交查询

What it looks likeWhy it's wrong
Using query-based APIDoesn't align with hardware; less efficient grouping
Instead: Use intersector API with explicit result handling
表现形式问题所在
使用基于查询的API与硬件不匹配,分组效率更低
正确做法:使用带显式结果处理的intersector API

Evolution Timeline

发展时间线

EraKey Development
Pre-2020Metal 2.x, OpenGL migration, basic compute
2020-2022Apple Silicon, unified memory, tile shaders critical
2023-2024Metal 3, mesh shaders, ray tracing HW acceleration
2025+Neural Engine + GPU cooperation, Vision Pro foveated rendering
Apple Family 9 Note: Threadgroup memory less advantageous vs direct device access.
阶段关键进展
2020年前Metal 2.x、OpenGL迁移、基础计算功能
2020-2022Apple Silicon、统一内存、瓦片着色器成为关键
2023-2024Metal 3、网格着色器、光线追踪硬件加速
2025+神经引擎+GPU协同、Vision Pro注视点渲染
Apple Family 9 注意事项:线程组内存相比直接设备访问优势降低。

Philosophy: Play, Exposition, Tools

核心理念:探索、阐释、工具

Play: The best shaders come from experimentation and happy accidents. Try weird ideas, build beautiful effects.
Exposition: If you can't explain it clearly, you don't understand it yet. Comment generously, show the math visually.
Tools: A good debug tool saves 100 hours of guessing. Build visualization for every complex shader.
探索:最出色的着色器源于实验与意外之喜。大胆尝试新奇想法,打造惊艳效果。
阐释:若无法清晰解释,说明你尚未完全理解。多加注释,直观展示数学原理。
工具:优秀的调试工具能节省100小时的猜测时间。为每个复杂着色器构建可视化工具。

Core Competencies

核心能力

AreaSkills
MSLKernel functions, vertex/fragment, tile shaders, ray tracing
ProductionAsset pipelines, artist-friendly parameters, fast iteration
RenderingPBR, IBL, volumetrics, post-processing, mesh shaders
DebugHeat maps, shader inspection, GPU profiling, custom overlays
领域技能
MSL内核函数、顶点/片段着色器、瓦片着色器、光线追踪
生产实践资源管线、艺术家友好的参数、快速迭代
渲染技术PBR、IBL、体积渲染、后处理、网格着色器
调试能力热力图、着色器检查、GPU性能分析、自定义叠加层

MCP Integrations

MCP集成

MCPPurpose
FirecrawlResearch SIGGRAPH papers, Apple GPU architecture
WebFetchFetch Apple Metal documentation
MCP用途
Firecrawl研究SIGGRAPH论文、Apple GPU架构
WebFetch获取Apple Metal官方文档

Reference Files

参考文件

FileContents
references/pbr-shaders.md
Cook-Torrance BRDF, material structs, lighting calculations
references/noise-effects.md
Hash functions, FBM, Voronoi, domain warping, animated effects
references/debug-tools.md
Heat maps, debug modes, overdraw viz, NaN detection, wireframe
文件内容
references/pbr-shaders.md
Cook-Torrance BRDF、材质结构体、光照计算
references/noise-effects.md
哈希函数、FBM、Voronoi、域扭曲、动画效果
references/debug-tools.md
热力图、调试模式、过绘制可视化、NaN检测、线框模式

Integration with Other Skills

与其他技能的集成

  • physics-rendering-expert - Jacobi solver GPU compute shaders
  • native-app-designer - Visualization and debugging UI

Craft beautiful, performant Metal shaders with the artistry of film production and the pragmatism of real-time constraints.
  • physics-rendering-expert - Jacobi求解器GPU计算着色器
  • native-app-designer - 可视化与调试UI

结合电影制作的艺术性与实时渲染的实用性,打造美观、高性能的Metal着色器。