metal-gpu
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMetal GPU Code Skill
Metal GPU代码技能
Write production-quality Metal code with correct patterns, optimal performance, and clear explanations.
编写具有正确模式、最优性能且注释清晰的生产级Metal代码。
When to Read References
何时查阅参考资料
For detailed API topology, Metal 4 specifics, and Apple Silicon optimization patterns, read:
/mnt/skills/user/metal-gpu/references/metal-api-guide.md如需了解详细的API拓扑、Metal 4特性以及Apple Silicon优化模式,请阅读:
/mnt/skills/user/metal-gpu/references/metal-api-guide.mdCore Principles
核心原则
- Always start with the device: — every Metal workflow begins here
MTLCreateSystemDefaultDevice() - Command pattern: Device → Command Queue → Command Buffer → Command Encoder → Commit
- Shaders are MSL (Metal Shading Language): C++14-based, with Metal-specific types and attributes
- Resource management matters: Use appropriate storage modes, avoid unnecessary copies
- Triple buffering for render loops to keep CPU and GPU in parallel
- 始终从设备开始:— 所有Metal工作流均从此处启动
MTLCreateSystemDefaultDevice() - 命令模式:设备 → 命令队列 → 命令缓冲区 → 命令编码器 → 提交
- 着色器基于MSL(Metal Shading Language):基于C++14,包含Metal专属类型和属性
- 资源管理至关重要:使用合适的存储模式,避免不必要的内存拷贝
- 渲染循环采用三重缓冲:让CPU和GPU保持并行运行
Quick Reference: Metal Command Pipeline
快速参考:Metal命令管线
MTLDevice
└─ makeCommandQueue() → MTLCommandQueue
└─ makeCommandBuffer() → MTLCommandBuffer
├─ makeRenderCommandEncoder(descriptor:) → MTLRenderCommandEncoder
├─ makeComputeCommandEncoder() → MTLComputeCommandEncoder
└─ makeBlitCommandEncoder() → MTLBlitCommandEncoderMTLDevice
└─ makeCommandQueue() → MTLCommandQueue
└─ makeCommandBuffer() → MTLCommandBuffer
├─ makeRenderCommandEncoder(descriptor:) → MTLRenderCommandEncoder
├─ makeComputeCommandEncoder() → MTLComputeCommandEncoder
└─ makeBlitCommandEncoder() → MTLBlitCommandEncoderWriting Shaders (MSL)
编写着色器(MSL)
Use Metal Shading Language. Always include:
- and
#include <metal_stdlib>using namespace metal; - Correct attribute qualifiers: ,
[[vertex_id]],[[position]],[[stage_in]],[[buffer(n)]][[texture(n)]] - Proper address space qualifiers: ,
device,constant,threadgroupthread
使用Metal Shading Language编写,需始终包含:
- 和
#include <metal_stdlib>using namespace metal; - 正确的属性限定符:、
[[vertex_id]]、[[position]]、[[stage_in]]、[[buffer(n)]][[texture(n)]] - 正确的地址空间限定符:、
device、constant、threadgroupthread
Vertex Shader Pattern
顶点着色器模板
metal
#include <metal_stdlib>
using namespace metal;
struct VertexIn {
float3 position [[attribute(0)]];
float3 normal [[attribute(1)]];
float2 texCoord [[attribute(2)]];
};
struct VertexOut {
float4 position [[position]];
float3 normal;
float2 texCoord;
};
vertex VertexOut vertex_main(VertexIn in [[stage_in]],
constant float4x4 &mvp [[buffer(1)]]) {
VertexOut out;
out.position = mvp * float4(in.position, 1.0);
out.normal = in.normal;
out.texCoord = in.texCoord;
return out;
}metal
#include <metal_stdlib>
using namespace metal;
struct VertexIn {
float3 position [[attribute(0)]];
float3 normal [[attribute(1)]];
float2 texCoord [[attribute(2)]];
};
struct VertexOut {
float4 position [[position]];
float3 normal;
float2 texCoord;
};
vertex VertexOut vertex_main(VertexIn in [[stage_in]],
constant float4x4 &mvp [[buffer(1)]]) {
VertexOut out;
out.position = mvp * float4(in.position, 1.0);
out.normal = in.normal;
out.texCoord = in.texCoord;
return out;
}Fragment Shader Pattern
片段着色器模板
metal
fragment float4 fragment_main(VertexOut in [[stage_in]],
texture2d<float> albedo [[texture(0)]],
sampler texSampler [[sampler(0)]]) {
float4 color = albedo.sample(texSampler, in.texCoord);
return color;
}metal
fragment float4 fragment_main(VertexOut in [[stage_in]],
texture2d<float> albedo [[texture(0)]],
sampler texSampler [[sampler(0)]]) {
float4 color = albedo.sample(texSampler, in.texCoord);
return color;
}Compute Kernel Pattern
计算内核模板
metal
kernel void compute_main(device float *input [[buffer(0)]],
device float *output [[buffer(1)]],
uint id [[thread_position_in_grid]]) {
output[id] = input[id] * 2.0;
}metal
kernel void compute_main(device float *input [[buffer(0)]],
device float *output [[buffer(1)]],
uint id [[thread_position_in_grid]]) {
output[id] = input[id] * 2.0;
}Swift-Side Setup Patterns
Swift端配置模板
Render Pipeline Setup
渲染管线配置
swift
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()!
// Load shaders
let library = device.makeDefaultLibrary()!
let vertexFunction = library.makeFunction(name: "vertex_main")
let fragmentFunction = library.makeFunction(name: "fragment_main")
// Pipeline descriptor
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.vertexFunction = vertexFunction
pipelineDescriptor.fragmentFunction = fragmentFunction
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
// Vertex descriptor
let vertexDescriptor = MTLVertexDescriptor()
vertexDescriptor.attributes[0].format = .float3 // position
vertexDescriptor.attributes[0].offset = 0
vertexDescriptor.attributes[0].bufferIndex = 0
vertexDescriptor.layouts[0].stride = MemoryLayout<SIMD3<Float>>.stride
pipelineDescriptor.vertexDescriptor = vertexDescriptor
let pipelineState = try! device.makeRenderPipelineState(descriptor: pipelineDescriptor)swift
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()!
// 加载着色器
let library = device.makeDefaultLibrary()!
let vertexFunction = library.makeFunction(name: "vertex_main")
let fragmentFunction = library.makeFunction(name: "fragment_main")
// 管线描述符
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.vertexFunction = vertexFunction
pipelineDescriptor.fragmentFunction = fragmentFunction
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
// 顶点描述符
let vertexDescriptor = MTLVertexDescriptor()
vertexDescriptor.attributes[0].format = .float3 // 位置
vertexDescriptor.attributes[0].offset = 0
vertexDescriptor.attributes[0].bufferIndex = 0
vertexDescriptor.layouts[0].stride = MemoryLayout<SIMD3<Float>>.stride
pipelineDescriptor.vertexDescriptor = vertexDescriptor
let pipelineState = try! device.makeRenderPipelineState(descriptor: pipelineDescriptor)Compute Pipeline Setup
计算管线配置
swift
let computeFunction = library.makeFunction(name: "compute_main")!
let computePipeline = try! device.makeComputePipelineState(function: computeFunction)
let commandBuffer = commandQueue.makeCommandBuffer()!
let encoder = commandBuffer.makeComputeCommandEncoder()!
encoder.setComputePipelineState(computePipeline)
encoder.setBuffer(inputBuffer, offset: 0, index: 0)
encoder.setBuffer(outputBuffer, offset: 0, index: 1)
let gridSize = MTLSize(width: elementCount, height: 1, depth: 1)
let threadGroupSize = MTLSize(
width: min(computePipeline.maxTotalThreadsPerThreadgroup, elementCount),
height: 1, depth: 1
)
encoder.dispatchThreads(gridSize, threadsPerThreadgroup: threadGroupSize)
encoder.endEncoding()
commandBuffer.commit()swift
let computeFunction = library.makeFunction(name: "compute_main")!
let computePipeline = try! device.makeComputePipelineState(function: computeFunction)
let commandBuffer = commandQueue.makeCommandBuffer()!
let encoder = commandBuffer.makeComputeCommandEncoder()!
encoder.setComputePipelineState(computePipeline)
encoder.setBuffer(inputBuffer, offset: 0, index: 0)
encoder.setBuffer(outputBuffer, offset: 0, index: 1)
let gridSize = MTLSize(width: elementCount, height: 1, depth: 1)
let threadGroupSize = MTLSize(
width: min(computePipeline.maxTotalThreadsPerThreadgroup, elementCount),
height: 1, depth: 1
)
encoder.dispatchThreads(gridSize, threadsPerThreadgroup: threadGroupSize)
encoder.endEncoding()
commandBuffer.commit()MetalKit View Rendering
MetalKit视图渲染
swift
import MetalKit
class Renderer: NSObject, MTKViewDelegate {
let device: MTLDevice
let commandQueue: MTLCommandQueue
let pipelineState: MTLRenderPipelineState
func draw(in view: MTKView) {
guard let drawable = view.currentDrawable,
let descriptor = view.currentRenderPassDescriptor else { return }
let commandBuffer = commandQueue.makeCommandBuffer()!
let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)!
encoder.setRenderPipelineState(pipelineState)
// Set buffers, draw primitives...
encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
encoder.endEncoding()
commandBuffer.present(drawable)
commandBuffer.commit()
}
}swift
import MetalKit
class Renderer: NSObject, MTKViewDelegate {
let device: MTLDevice
let commandQueue: MTLCommandQueue
let pipelineState: MTLRenderPipelineState
func draw(in view: MTKView) {
guard let drawable = view.currentDrawable,
let descriptor = view.currentRenderPassDescriptor else { return }
let commandBuffer = commandQueue.makeCommandBuffer()!
let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)!
encoder.setRenderPipelineState(pipelineState)
// 设置缓冲区、绘制图元...
encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
encoder.endEncoding()
commandBuffer.present(drawable)
commandBuffer.commit()
}
}Performance Best Practices
性能最佳实践
- Storage modes: Use on Apple Silicon (unified memory),
.sharedfor GPU-only data,.privateon Intel Macs.managed - Triple buffering: Rotate 3 buffers with a semaphore to avoid CPU/GPU stalls
- Avoid per-frame allocations: Reuse buffers and command encoders
- Use over
dispatchThreadswhen possible (Apple Silicon)dispatchThreadgroups - Prefer tile-based deferred rendering patterns on Apple GPUs — use imageblocks and tile shaders
- Compile pipelines ahead of time: Pipeline creation is expensive, do it at load time
- Use Metal GPU frame capture in Xcode to profile and debug
- 存储模式:在Apple Silicon上使用(统一内存),仅GPU访问的数据使用
.shared,在Intel Mac上使用.private.managed - 三重缓冲:使用信号量轮换3个缓冲区,避免CPU/GPU停滞
- 避免每帧分配:复用缓冲区和命令编码器
- 尽可能使用而非
dispatchThreads(Apple Silicon平台)dispatchThreadgroups - 在苹果GPU上优先使用基于瓦片的延迟渲染模式 — 使用图像块和瓦片着色器
- 提前编译管线:管线创建成本高,应在加载阶段完成
- 使用Xcode中的Metal GPU帧捕获功能进行性能分析和调试
Common Mistakes to Avoid
需避免的常见错误
- Forgetting before committing
encoder.endEncoding() - Mismatched buffer indices between Swift and MSL
- Using wrong pixel format for render targets
- Not handling from optional Metal API calls
nil - Blocking the main thread waiting for GPU completion — use instead
addCompletedHandler - Forgetting to set the vertex descriptor when using
[[stage_in]]
- 在提交前忘记调用
encoder.endEncoding() - Swift与MSL中的缓冲区索引不匹配
- 为渲染目标使用错误的像素格式
- 未处理Metal API可选调用返回的
nil - 阻塞主线程等待GPU完成 — 改用
addCompletedHandler - 使用时忘记设置顶点描述符
[[stage_in]]
Metal 4 Notes
Metal 4说明
Metal 4 introduces a modernized core API. Key changes:
- New compilation API for finer shader compilation control
- Updated command encoding patterns
- See for the full Metal 4 API topology
references/metal-api-guide.md
Metal 4引入了现代化的核心API。主要变化:
- 新的编译API,可更精细地控制着色器编译
- 更新的命令编码模式
- 完整的Metal 4 API拓扑请查阅
references/metal-api-guide.md
Frameworks Ecosystem
框架生态
| Framework | Purpose |
|---|---|
| Metal | Direct GPU access, shaders, pipelines |
| MetalKit | View management, texture loading, model I/O |
| MetalFX | Upscaling (temporal/spatial) for performance |
| Metal Performance Shaders | Optimized compute & image processing kernels |
| Compositor Services | Stereoscopic rendering for visionOS |
| RealityKit | High-level 3D rendering (uses Metal underneath) |
| 框架 | 用途 |
|---|---|
| Metal | 直接GPU访问、着色器、管线 |
| MetalKit | 视图管理、纹理加载、模型I/O |
| MetalFX | 性能优化的缩放(时间/空间) |
| Metal Performance Shaders | 优化的计算和图像处理内核 |
| Compositor Services | visionOS的立体渲染 |
| RealityKit | 高级3D渲染(底层基于Metal) |