metal-gpu

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Metal GPU Code Skill

Metal GPU代码技能

Write production-quality Metal code with correct patterns, optimal performance, and clear explanations.

编写具有正确模式、最优性能且注释清晰的生产级Metal代码。

When to Read References

何时查阅参考资料

For detailed API topology, Metal 4 specifics, and Apple Silicon optimization patterns, read:

/mnt/skills/user/metal-gpu/references/metal-api-guide.md

如需了解详细的API拓扑、Metal 4特性以及Apple Silicon优化模式，请阅读：

/mnt/skills/user/metal-gpu/references/metal-api-guide.md

Core Principles

核心原则

Always start with the device:
```
MTLCreateSystemDefaultDevice()
```
— every Metal workflow begins here
Command pattern: Device → Command Queue → Command Buffer → Command Encoder → Commit
Shaders are MSL (Metal Shading Language): C++14-based, with Metal-specific types and attributes
Resource management matters: Use appropriate storage modes, avoid unnecessary copies
Triple buffering for render loops to keep CPU and GPU in parallel

始终从设备开始：
```
MTLCreateSystemDefaultDevice()
```
— 所有Metal工作流均从此处启动
命令模式：设备 → 命令队列 → 命令缓冲区 → 命令编码器 → 提交
着色器基于MSL（Metal Shading Language）：基于C++14，包含Metal专属类型和属性
资源管理至关重要：使用合适的存储模式，避免不必要的内存拷贝
渲染循环采用三重缓冲：让CPU和GPU保持并行运行

Quick Reference: Metal Command Pipeline

快速参考：Metal命令管线

MTLDevice
  └─ makeCommandQueue() → MTLCommandQueue
       └─ makeCommandBuffer() → MTLCommandBuffer
            ├─ makeRenderCommandEncoder(descriptor:) → MTLRenderCommandEncoder
            ├─ makeComputeCommandEncoder() → MTLComputeCommandEncoder
            └─ makeBlitCommandEncoder() → MTLBlitCommandEncoder

MTLDevice
  └─ makeCommandQueue() → MTLCommandQueue
       └─ makeCommandBuffer() → MTLCommandBuffer
            ├─ makeRenderCommandEncoder(descriptor:) → MTLRenderCommandEncoder
            ├─ makeComputeCommandEncoder() → MTLComputeCommandEncoder
            └─ makeBlitCommandEncoder() → MTLBlitCommandEncoder

Writing Shaders (MSL)

编写着色器（MSL）

Use Metal Shading Language. Always include:

#include <metal_stdlib>

and

using namespace metal;

Correct attribute qualifiers:

[[vertex_id]]

[[position]]

[[stage_in]]

[[buffer(n)]]

[[texture(n)]]

Proper address space qualifiers:
```
device
```
,
```
constant
```
,
```
threadgroup
```
,
```
thread
```

使用Metal Shading Language编写，需始终包含：

#include <metal_stdlib>

和

using namespace metal;

正确的属性限定符：

[[vertex_id]]

、

[[position]]

、

[[stage_in]]

、

[[buffer(n)]]

、

[[texture(n)]]

正确的地址空间限定符：
```
device
```
、
```
constant
```
、
```
threadgroup
```
、
```
thread
```

Vertex Shader Pattern

顶点着色器模板

metal

#include <metal_stdlib>
using namespace metal;

struct VertexIn {
    float3 position [[attribute(0)]];
    float3 normal   [[attribute(1)]];
    float2 texCoord [[attribute(2)]];
};

struct VertexOut {
    float4 position [[position]];
    float3 normal;
    float2 texCoord;
};

vertex VertexOut vertex_main(VertexIn in [[stage_in]],
                             constant float4x4 &mvp [[buffer(1)]]) {
    VertexOut out;
    out.position = mvp * float4(in.position, 1.0);
    out.normal = in.normal;
    out.texCoord = in.texCoord;
    return out;
}

metal

#include <metal_stdlib>
using namespace metal;

struct VertexIn {
    float3 position [[attribute(0)]];
    float3 normal   [[attribute(1)]];
    float2 texCoord [[attribute(2)]];
};

struct VertexOut {
    float4 position [[position]];
    float3 normal;
    float2 texCoord;
};

vertex VertexOut vertex_main(VertexIn in [[stage_in]],
                             constant float4x4 &mvp [[buffer(1)]]) {
    VertexOut out;
    out.position = mvp * float4(in.position, 1.0);
    out.normal = in.normal;
    out.texCoord = in.texCoord;
    return out;
}

Fragment Shader Pattern

片段着色器模板

metal

fragment float4 fragment_main(VertexOut in [[stage_in]],
                              texture2d<float> albedo [[texture(0)]],
                              sampler texSampler [[sampler(0)]]) {
    float4 color = albedo.sample(texSampler, in.texCoord);
    return color;
}

metal

fragment float4 fragment_main(VertexOut in [[stage_in]],
                              texture2d<float> albedo [[texture(0)]],
                              sampler texSampler [[sampler(0)]]) {
    float4 color = albedo.sample(texSampler, in.texCoord);
    return color;
}

Compute Kernel Pattern

计算内核模板

metal

kernel void compute_main(device float *input  [[buffer(0)]],
                         device float *output [[buffer(1)]],
                         uint id [[thread_position_in_grid]]) {
    output[id] = input[id] * 2.0;
}

metal

kernel void compute_main(device float *input  [[buffer(0)]],
                         device float *output [[buffer(1)]],
                         uint id [[thread_position_in_grid]]) {
    output[id] = input[id] * 2.0;
}

Swift-Side Setup Patterns

Swift端配置模板

Render Pipeline Setup

渲染管线配置

swift

let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()!

// Load shaders
let library = device.makeDefaultLibrary()!
let vertexFunction = library.makeFunction(name: "vertex_main")
let fragmentFunction = library.makeFunction(name: "fragment_main")

// Pipeline descriptor
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.vertexFunction = vertexFunction
pipelineDescriptor.fragmentFunction = fragmentFunction
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm

// Vertex descriptor
let vertexDescriptor = MTLVertexDescriptor()
vertexDescriptor.attributes[0].format = .float3  // position
vertexDescriptor.attributes[0].offset = 0
vertexDescriptor.attributes[0].bufferIndex = 0
vertexDescriptor.layouts[0].stride = MemoryLayout<SIMD3<Float>>.stride
pipelineDescriptor.vertexDescriptor = vertexDescriptor

let pipelineState = try! device.makeRenderPipelineState(descriptor: pipelineDescriptor)

swift

let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()!

// 加载着色器
let library = device.makeDefaultLibrary()!
let vertexFunction = library.makeFunction(name: "vertex_main")
let fragmentFunction = library.makeFunction(name: "fragment_main")

// 管线描述符
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.vertexFunction = vertexFunction
pipelineDescriptor.fragmentFunction = fragmentFunction
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm

// 顶点描述符
let vertexDescriptor = MTLVertexDescriptor()
vertexDescriptor.attributes[0].format = .float3  // 位置
vertexDescriptor.attributes[0].offset = 0
vertexDescriptor.attributes[0].bufferIndex = 0
vertexDescriptor.layouts[0].stride = MemoryLayout<SIMD3<Float>>.stride
pipelineDescriptor.vertexDescriptor = vertexDescriptor

let pipelineState = try! device.makeRenderPipelineState(descriptor: pipelineDescriptor)

Compute Pipeline Setup

计算管线配置

swift

let computeFunction = library.makeFunction(name: "compute_main")!
let computePipeline = try! device.makeComputePipelineState(function: computeFunction)

let commandBuffer = commandQueue.makeCommandBuffer()!
let encoder = commandBuffer.makeComputeCommandEncoder()!
encoder.setComputePipelineState(computePipeline)
encoder.setBuffer(inputBuffer, offset: 0, index: 0)
encoder.setBuffer(outputBuffer, offset: 0, index: 1)

let gridSize = MTLSize(width: elementCount, height: 1, depth: 1)
let threadGroupSize = MTLSize(
    width: min(computePipeline.maxTotalThreadsPerThreadgroup, elementCount),
    height: 1, depth: 1
)
encoder.dispatchThreads(gridSize, threadsPerThreadgroup: threadGroupSize)
encoder.endEncoding()
commandBuffer.commit()

swift

let computeFunction = library.makeFunction(name: "compute_main")!
let computePipeline = try! device.makeComputePipelineState(function: computeFunction)

let commandBuffer = commandQueue.makeCommandBuffer()!
let encoder = commandBuffer.makeComputeCommandEncoder()!
encoder.setComputePipelineState(computePipeline)
encoder.setBuffer(inputBuffer, offset: 0, index: 0)
encoder.setBuffer(outputBuffer, offset: 0, index: 1)

let gridSize = MTLSize(width: elementCount, height: 1, depth: 1)
let threadGroupSize = MTLSize(
    width: min(computePipeline.maxTotalThreadsPerThreadgroup, elementCount),
    height: 1, depth: 1
)
encoder.dispatchThreads(gridSize, threadsPerThreadgroup: threadGroupSize)
encoder.endEncoding()
commandBuffer.commit()

MetalKit View Rendering

MetalKit视图渲染

swift

import MetalKit

class Renderer: NSObject, MTKViewDelegate {
    let device: MTLDevice
    let commandQueue: MTLCommandQueue
    let pipelineState: MTLRenderPipelineState

    func draw(in view: MTKView) {
        guard let drawable = view.currentDrawable,
              let descriptor = view.currentRenderPassDescriptor else { return }

        let commandBuffer = commandQueue.makeCommandBuffer()!
        let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)!

        encoder.setRenderPipelineState(pipelineState)
        // Set buffers, draw primitives...
        encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)

        encoder.endEncoding()
        commandBuffer.present(drawable)
        commandBuffer.commit()
    }
}

swift

import MetalKit

class Renderer: NSObject, MTKViewDelegate {
    let device: MTLDevice
    let commandQueue: MTLCommandQueue
    let pipelineState: MTLRenderPipelineState

    func draw(in view: MTKView) {
        guard let drawable = view.currentDrawable,
              let descriptor = view.currentRenderPassDescriptor else { return }

        let commandBuffer = commandQueue.makeCommandBuffer()!
        let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)!

        encoder.setRenderPipelineState(pipelineState)
        // 设置缓冲区、绘制图元...
        encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)

        encoder.endEncoding()
        commandBuffer.present(drawable)
        commandBuffer.commit()
    }
}

Performance Best Practices

性能最佳实践

Storage modes: Use
```
.shared
```
on Apple Silicon (unified memory),
```
.private
```
for GPU-only data,
```
.managed
```
on Intel Macs
Triple buffering: Rotate 3 buffers with a semaphore to avoid CPU/GPU stalls
Avoid per-frame allocations: Reuse buffers and command encoders
Use
dispatchThreads
over
dispatchThreadgroups
when possible (Apple Silicon)
Prefer tile-based deferred rendering patterns on Apple GPUs — use imageblocks and tile shaders
Compile pipelines ahead of time: Pipeline creation is expensive, do it at load time
Use Metal GPU frame capture in Xcode to profile and debug

存储模式：在Apple Silicon上使用
```
.shared
```
（统一内存），仅GPU访问的数据使用
```
.private
```
，在Intel Mac上使用
```
.managed
```
三重缓冲：使用信号量轮换3个缓冲区，避免CPU/GPU停滞
避免每帧分配：复用缓冲区和命令编码器
尽可能使用
dispatchThreads
而非
dispatchThreadgroups
（Apple Silicon平台）
在苹果GPU上优先使用基于瓦片的延迟渲染模式 — 使用图像块和瓦片着色器
提前编译管线：管线创建成本高，应在加载阶段完成
使用Xcode中的Metal GPU帧捕获功能进行性能分析和调试

Common Mistakes to Avoid

需避免的常见错误

Forgetting
```
encoder.endEncoding()
```
before committing
Mismatched buffer indices between Swift and MSL
Using wrong pixel format for render targets
Not handling
```
nil
```
from optional Metal API calls
Blocking the main thread waiting for GPU completion — use
```
addCompletedHandler
```
instead
Forgetting to set the vertex descriptor when using
```
[[stage_in]]
```

在提交前忘记调用
```
encoder.endEncoding()
```
Swift与MSL中的缓冲区索引不匹配
为渲染目标使用错误的像素格式
未处理Metal API可选调用返回的
```
nil
```
阻塞主线程等待GPU完成 — 改用
```
addCompletedHandler
```
使用
```
[[stage_in]]
```
时忘记设置顶点描述符

Metal 4 Notes

Metal 4说明

Metal 4 introduces a modernized core API. Key changes:

New compilation API for finer shader compilation control
Updated command encoding patterns
See
```
references/metal-api-guide.md
```
for the full Metal 4 API topology

Metal 4引入了现代化的核心API。主要变化：

新的编译API，可更精细地控制着色器编译
更新的命令编码模式
完整的Metal 4 API拓扑请查阅
```
references/metal-api-guide.md
```

Frameworks Ecosystem

框架生态

Framework	Purpose
Metal	Direct GPU access, shaders, pipelines
MetalKit	View management, texture loading, model I/O
MetalFX	Upscaling (temporal/spatial) for performance
Metal Performance Shaders	Optimized compute & image processing kernels
Compositor Services	Stereoscopic rendering for visionOS
RealityKit	High-level 3D rendering (uses Metal underneath)

框架	用途
Metal	直接GPU访问、着色器、管线
MetalKit	视图管理、纹理加载、模型I/O
MetalFX	性能优化的缩放（时间/空间）
Metal Performance Shaders	优化的计算和图像处理内核
Compositor Services	visionOS的立体渲染
RealityKit	高级3D渲染（底层基于Metal）