metal-gpu

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Metal GPU Code Skill

Metal GPU代码技能

Write production-quality Metal code with correct patterns, optimal performance, and clear explanations.
编写具有正确模式、最优性能且注释清晰的生产级Metal代码。

When to Read References

何时查阅参考资料

For detailed API topology, Metal 4 specifics, and Apple Silicon optimization patterns, read:
/mnt/skills/user/metal-gpu/references/metal-api-guide.md
如需了解详细的API拓扑、Metal 4特性以及Apple Silicon优化模式,请阅读:
/mnt/skills/user/metal-gpu/references/metal-api-guide.md

Core Principles

核心原则

  1. Always start with the device:
    MTLCreateSystemDefaultDevice()
    — every Metal workflow begins here
  2. Command pattern: Device → Command Queue → Command Buffer → Command Encoder → Commit
  3. Shaders are MSL (Metal Shading Language): C++14-based, with Metal-specific types and attributes
  4. Resource management matters: Use appropriate storage modes, avoid unnecessary copies
  5. Triple buffering for render loops to keep CPU and GPU in parallel
  1. 始终从设备开始
    MTLCreateSystemDefaultDevice()
    — 所有Metal工作流均从此处启动
  2. 命令模式:设备 → 命令队列 → 命令缓冲区 → 命令编码器 → 提交
  3. 着色器基于MSL(Metal Shading Language):基于C++14,包含Metal专属类型和属性
  4. 资源管理至关重要:使用合适的存储模式,避免不必要的内存拷贝
  5. 渲染循环采用三重缓冲:让CPU和GPU保持并行运行

Quick Reference: Metal Command Pipeline

快速参考:Metal命令管线

MTLDevice
  └─ makeCommandQueue() → MTLCommandQueue
       └─ makeCommandBuffer() → MTLCommandBuffer
            ├─ makeRenderCommandEncoder(descriptor:) → MTLRenderCommandEncoder
            ├─ makeComputeCommandEncoder() → MTLComputeCommandEncoder
            └─ makeBlitCommandEncoder() → MTLBlitCommandEncoder
MTLDevice
  └─ makeCommandQueue() → MTLCommandQueue
       └─ makeCommandBuffer() → MTLCommandBuffer
            ├─ makeRenderCommandEncoder(descriptor:) → MTLRenderCommandEncoder
            ├─ makeComputeCommandEncoder() → MTLComputeCommandEncoder
            └─ makeBlitCommandEncoder() → MTLBlitCommandEncoder

Writing Shaders (MSL)

编写着色器(MSL)

Use Metal Shading Language. Always include:
  • #include <metal_stdlib>
    and
    using namespace metal;
  • Correct attribute qualifiers:
    [[vertex_id]]
    ,
    [[position]]
    ,
    [[stage_in]]
    ,
    [[buffer(n)]]
    ,
    [[texture(n)]]
  • Proper address space qualifiers:
    device
    ,
    constant
    ,
    threadgroup
    ,
    thread
使用Metal Shading Language编写,需始终包含:
  • #include <metal_stdlib>
    using namespace metal;
  • 正确的属性限定符:
    [[vertex_id]]
    [[position]]
    [[stage_in]]
    [[buffer(n)]]
    [[texture(n)]]
  • 正确的地址空间限定符:
    device
    constant
    threadgroup
    thread

Vertex Shader Pattern

顶点着色器模板

metal
#include <metal_stdlib>
using namespace metal;

struct VertexIn {
    float3 position [[attribute(0)]];
    float3 normal   [[attribute(1)]];
    float2 texCoord [[attribute(2)]];
};

struct VertexOut {
    float4 position [[position]];
    float3 normal;
    float2 texCoord;
};

vertex VertexOut vertex_main(VertexIn in [[stage_in]],
                             constant float4x4 &mvp [[buffer(1)]]) {
    VertexOut out;
    out.position = mvp * float4(in.position, 1.0);
    out.normal = in.normal;
    out.texCoord = in.texCoord;
    return out;
}
metal
#include <metal_stdlib>
using namespace metal;

struct VertexIn {
    float3 position [[attribute(0)]];
    float3 normal   [[attribute(1)]];
    float2 texCoord [[attribute(2)]];
};

struct VertexOut {
    float4 position [[position]];
    float3 normal;
    float2 texCoord;
};

vertex VertexOut vertex_main(VertexIn in [[stage_in]],
                             constant float4x4 &mvp [[buffer(1)]]) {
    VertexOut out;
    out.position = mvp * float4(in.position, 1.0);
    out.normal = in.normal;
    out.texCoord = in.texCoord;
    return out;
}

Fragment Shader Pattern

片段着色器模板

metal
fragment float4 fragment_main(VertexOut in [[stage_in]],
                              texture2d<float> albedo [[texture(0)]],
                              sampler texSampler [[sampler(0)]]) {
    float4 color = albedo.sample(texSampler, in.texCoord);
    return color;
}
metal
fragment float4 fragment_main(VertexOut in [[stage_in]],
                              texture2d<float> albedo [[texture(0)]],
                              sampler texSampler [[sampler(0)]]) {
    float4 color = albedo.sample(texSampler, in.texCoord);
    return color;
}

Compute Kernel Pattern

计算内核模板

metal
kernel void compute_main(device float *input  [[buffer(0)]],
                         device float *output [[buffer(1)]],
                         uint id [[thread_position_in_grid]]) {
    output[id] = input[id] * 2.0;
}
metal
kernel void compute_main(device float *input  [[buffer(0)]],
                         device float *output [[buffer(1)]],
                         uint id [[thread_position_in_grid]]) {
    output[id] = input[id] * 2.0;
}

Swift-Side Setup Patterns

Swift端配置模板

Render Pipeline Setup

渲染管线配置

swift
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()!

// Load shaders
let library = device.makeDefaultLibrary()!
let vertexFunction = library.makeFunction(name: "vertex_main")
let fragmentFunction = library.makeFunction(name: "fragment_main")

// Pipeline descriptor
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.vertexFunction = vertexFunction
pipelineDescriptor.fragmentFunction = fragmentFunction
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm

// Vertex descriptor
let vertexDescriptor = MTLVertexDescriptor()
vertexDescriptor.attributes[0].format = .float3  // position
vertexDescriptor.attributes[0].offset = 0
vertexDescriptor.attributes[0].bufferIndex = 0
vertexDescriptor.layouts[0].stride = MemoryLayout<SIMD3<Float>>.stride
pipelineDescriptor.vertexDescriptor = vertexDescriptor

let pipelineState = try! device.makeRenderPipelineState(descriptor: pipelineDescriptor)
swift
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()!

// 加载着色器
let library = device.makeDefaultLibrary()!
let vertexFunction = library.makeFunction(name: "vertex_main")
let fragmentFunction = library.makeFunction(name: "fragment_main")

// 管线描述符
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.vertexFunction = vertexFunction
pipelineDescriptor.fragmentFunction = fragmentFunction
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm

// 顶点描述符
let vertexDescriptor = MTLVertexDescriptor()
vertexDescriptor.attributes[0].format = .float3  // 位置
vertexDescriptor.attributes[0].offset = 0
vertexDescriptor.attributes[0].bufferIndex = 0
vertexDescriptor.layouts[0].stride = MemoryLayout<SIMD3<Float>>.stride
pipelineDescriptor.vertexDescriptor = vertexDescriptor

let pipelineState = try! device.makeRenderPipelineState(descriptor: pipelineDescriptor)

Compute Pipeline Setup

计算管线配置

swift
let computeFunction = library.makeFunction(name: "compute_main")!
let computePipeline = try! device.makeComputePipelineState(function: computeFunction)

let commandBuffer = commandQueue.makeCommandBuffer()!
let encoder = commandBuffer.makeComputeCommandEncoder()!
encoder.setComputePipelineState(computePipeline)
encoder.setBuffer(inputBuffer, offset: 0, index: 0)
encoder.setBuffer(outputBuffer, offset: 0, index: 1)

let gridSize = MTLSize(width: elementCount, height: 1, depth: 1)
let threadGroupSize = MTLSize(
    width: min(computePipeline.maxTotalThreadsPerThreadgroup, elementCount),
    height: 1, depth: 1
)
encoder.dispatchThreads(gridSize, threadsPerThreadgroup: threadGroupSize)
encoder.endEncoding()
commandBuffer.commit()
swift
let computeFunction = library.makeFunction(name: "compute_main")!
let computePipeline = try! device.makeComputePipelineState(function: computeFunction)

let commandBuffer = commandQueue.makeCommandBuffer()!
let encoder = commandBuffer.makeComputeCommandEncoder()!
encoder.setComputePipelineState(computePipeline)
encoder.setBuffer(inputBuffer, offset: 0, index: 0)
encoder.setBuffer(outputBuffer, offset: 0, index: 1)

let gridSize = MTLSize(width: elementCount, height: 1, depth: 1)
let threadGroupSize = MTLSize(
    width: min(computePipeline.maxTotalThreadsPerThreadgroup, elementCount),
    height: 1, depth: 1
)
encoder.dispatchThreads(gridSize, threadsPerThreadgroup: threadGroupSize)
encoder.endEncoding()
commandBuffer.commit()

MetalKit View Rendering

MetalKit视图渲染

swift
import MetalKit

class Renderer: NSObject, MTKViewDelegate {
    let device: MTLDevice
    let commandQueue: MTLCommandQueue
    let pipelineState: MTLRenderPipelineState

    func draw(in view: MTKView) {
        guard let drawable = view.currentDrawable,
              let descriptor = view.currentRenderPassDescriptor else { return }

        let commandBuffer = commandQueue.makeCommandBuffer()!
        let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)!

        encoder.setRenderPipelineState(pipelineState)
        // Set buffers, draw primitives...
        encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)

        encoder.endEncoding()
        commandBuffer.present(drawable)
        commandBuffer.commit()
    }
}
swift
import MetalKit

class Renderer: NSObject, MTKViewDelegate {
    let device: MTLDevice
    let commandQueue: MTLCommandQueue
    let pipelineState: MTLRenderPipelineState

    func draw(in view: MTKView) {
        guard let drawable = view.currentDrawable,
              let descriptor = view.currentRenderPassDescriptor else { return }

        let commandBuffer = commandQueue.makeCommandBuffer()!
        let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)!

        encoder.setRenderPipelineState(pipelineState)
        // 设置缓冲区、绘制图元...
        encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)

        encoder.endEncoding()
        commandBuffer.present(drawable)
        commandBuffer.commit()
    }
}

Performance Best Practices

性能最佳实践

  1. Storage modes: Use
    .shared
    on Apple Silicon (unified memory),
    .private
    for GPU-only data,
    .managed
    on Intel Macs
  2. Triple buffering: Rotate 3 buffers with a semaphore to avoid CPU/GPU stalls
  3. Avoid per-frame allocations: Reuse buffers and command encoders
  4. Use
    dispatchThreads
    over
    dispatchThreadgroups
    when possible (Apple Silicon)
  5. Prefer tile-based deferred rendering patterns on Apple GPUs — use imageblocks and tile shaders
  6. Compile pipelines ahead of time: Pipeline creation is expensive, do it at load time
  7. Use Metal GPU frame capture in Xcode to profile and debug
  1. 存储模式:在Apple Silicon上使用
    .shared
    (统一内存),仅GPU访问的数据使用
    .private
    ,在Intel Mac上使用
    .managed
  2. 三重缓冲:使用信号量轮换3个缓冲区,避免CPU/GPU停滞
  3. 避免每帧分配:复用缓冲区和命令编码器
  4. 尽可能使用
    dispatchThreads
    而非
    dispatchThreadgroups
    (Apple Silicon平台)
  5. 在苹果GPU上优先使用基于瓦片的延迟渲染模式 — 使用图像块和瓦片着色器
  6. 提前编译管线:管线创建成本高,应在加载阶段完成
  7. 使用Xcode中的Metal GPU帧捕获功能进行性能分析和调试

Common Mistakes to Avoid

需避免的常见错误

  • Forgetting
    encoder.endEncoding()
    before committing
  • Mismatched buffer indices between Swift and MSL
  • Using wrong pixel format for render targets
  • Not handling
    nil
    from optional Metal API calls
  • Blocking the main thread waiting for GPU completion — use
    addCompletedHandler
    instead
  • Forgetting to set the vertex descriptor when using
    [[stage_in]]
  • 在提交前忘记调用
    encoder.endEncoding()
  • Swift与MSL中的缓冲区索引不匹配
  • 为渲染目标使用错误的像素格式
  • 未处理Metal API可选调用返回的
    nil
  • 阻塞主线程等待GPU完成 — 改用
    addCompletedHandler
  • 使用
    [[stage_in]]
    时忘记设置顶点描述符

Metal 4 Notes

Metal 4说明

Metal 4 introduces a modernized core API. Key changes:
  • New compilation API for finer shader compilation control
  • Updated command encoding patterns
  • See
    references/metal-api-guide.md
    for the full Metal 4 API topology
Metal 4引入了现代化的核心API。主要变化:
  • 新的编译API,可更精细地控制着色器编译
  • 更新的命令编码模式
  • 完整的Metal 4 API拓扑请查阅
    references/metal-api-guide.md

Frameworks Ecosystem

框架生态

FrameworkPurpose
MetalDirect GPU access, shaders, pipelines
MetalKitView management, texture loading, model I/O
MetalFXUpscaling (temporal/spatial) for performance
Metal Performance ShadersOptimized compute & image processing kernels
Compositor ServicesStereoscopic rendering for visionOS
RealityKitHigh-level 3D rendering (uses Metal underneath)
框架用途
Metal直接GPU访问、着色器、管线
MetalKit视图管理、纹理加载、模型I/O
MetalFX性能优化的缩放(时间/空间)
Metal Performance Shaders优化的计算和图像处理内核
Compositor ServicesvisionOS的立体渲染
RealityKit高级3D渲染(底层基于Metal)