搜索： gpu-kernel-development

AI & Machine Learningnvidia/skills

kernel-cute-writing

Write and implement GPU kernels using NVIDIA CuTe DSL (CUTLASS 4.x Python API) — NOT for Triton, CUDA C++, or conceptual explanations. Trigger only when the user wants to write or implement a kernel, not when asking questions about CuTe DSL concepts or layouts. CuTe DSL uses cute.jit/cute.kernel decorators and cutlass.cute imports. Covers element-wise kernels, GEMM patterns, reductions, memory hierarchy (global/shared/register/TMA), MMA tensor core operations, software pipelining, and framework integration.

🇺🇸|EnglishTranslated

23

3 scripts/Attention

AI & Machine Learningpepperu96/hyper-mla

learn-cute-dsl

Workflow for learning CuTe Python DSL by reading, importing, profiling, and extracting reusable patterns from CUTLASS Blackwell example kernels. Use when: (1) studying CUTLASS CuTe DSL reference implementations, (2) importing CUTLASS examples into the project runtime infrastructure, (3) building CuTe DSL knowledge base entries from profiling experiments, (4) understanding CuTe DSL API patterns, TMA pipelining, warpgroup scheduling, or persistent kernel structure.

🇺🇸|EnglishTranslated

13

AI & Machine Learningnvidia/skills

tilegym-cutile-python

Expert cuTile programming assistant. Write high-performance GPU kernels using cuTile's tile-based programming model with proper validation and optimization. Supports deep agent orchestration for complex multi-kernel tasks.

🇺🇸|EnglishTranslated

9

11 scripts/Attention

AI & Machine Learningnvidia/skills

cutile-python

Expert cuTile programming assistant. Write high-performance GPU kernels using cuTile's tile-based programming model with proper validation and optimization. Supports deep agent orchestration for complex multi-kernel tasks.

🇺🇸|EnglishTranslated

6

11 scripts/Attention

Search Results: gpu-kernel-development

kernel-cute-writing

learn-cute-dsl

tilegym-cutile-python

cutile-python

Search Results: gpu-kernel-development

kernel-cute-writing

learn-cute-dsl

tilegym-cutile-python

cutile-python