Loading...
Loading...
Found 6 Skills
GLSL shader fundamentals—vertex and fragment shaders, uniforms, varyings, attributes, coordinate systems, built-in variables, and data types. Use when writing custom shaders, understanding the graphics pipeline, or debugging shader code. The foundational skill for all shader work.
A Just-In-Time (JIT) compiler for Python that translates a subset of Python and NumPy code into fast machine code. Developed by Anaconda, Inc. Highly effective for accelerating loops, custom mathematical functions, and complex numerical algorithms. Use for @njit, @vectorize, prange, cuda.jit, numba.typed, JIT compilation, parallel loops, GPU acceleration with CUDA, Monte Carlo simulations, numerical algorithms, and high-performance Python computing.
TypeGPU is type-safe WebGPU in TypeScript. Use whenever the user writes, debugs, or designs TypeGPU code: 'use gpu' shader functions, tgpu.fn, buffers, textures, bind groups, compute and render pipelines, vertex layouts, slots, accessors, and any TypeGPU API. Shader logic and CPU-side resources are tightly coupled - handle both sides here even if the user only mentions one (e.g. "how do I write a shader", "how do I create a buffer"). Trigger on any mention of typegpu, tgpu, "use gpu", TypedGPU, or WebGPU code written using TypeGPU's schema API (d.*, tgpu.*, std.*). Do NOT trigger for raw WebGPU (using GPUDevice/GPURenderPipeline directly without tgpu), WGSL-only questions, Three.js, Babylon.js, or WebGL.
Write, debug, and optimize Triton and Gluon GPU kernels using local source code, tutorials, and kernel references. Use when the user mentions Triton, Gluon, tl.load, tl.store, tl.dot, triton.jit, gluon.jit, wgmma, tcgen05, TMA, tensor descriptor, persistent kernel, warp specialization, fused attention, matmul kernel, kernel fusion, tl.program_id, triton autotune, MXFP, FP8, FP4, block-scaled matmul, SwiGLU, top-k, or asks about writing GPU kernels in Python.
Shared optimization guidance plus cuTile Python DSL-specific overlays. Use when: (1) selecting optimizations for a cuTile Python DSL kernel, (2) checking cuTile-specific implementation traps, (3) deciding whether a profiling finding belongs in shared knowledge or a cuTile overlay, (4) updating cuTile Python DSL optimization docs, (5) reviewing how a shared pattern maps to cuTile.
cuTile Python DSL kernel implementation patterns, CtKernel runtime wrapper, suitability gate, and cuTile-specific pitfalls. Use when: (1) creating or modifying a cuTile Python DSL kernel version, (2) implementing an optimization that still fits within cuTile's exposed control surface, (3) deciding whether cuTile is still the right DSL, (4) reviewing cuTile-specific runtime patterns. Always also load /design-kernel for shared naming, versioning, and workflow.