Search Results: cuda-programming

Found 3 Skills

cuopt-developer

Modify, build, test, debug, and contribute to NVIDIA cuOpt (C++/CUDA, Python, server, CI). Use for solver internals, PRs, DCO, and code conventions.

🇺🇸|EnglishTranslated

AI & Machine Learningslowlyc/agent-gpu-skills

cutlass-skill

Write, debug, and optimize CUTLASS and CuTeDSL GPU kernels using local source code, examples, and header references. Use when the user mentions CUTLASS, CuTe, CuTeDSL, cute::Layout, cute::Tensor, TiledMMA, TiledCopy, CollectiveMainloop, CollectiveEpilogue, GEMM kernel, grouped GEMM, sparse GEMM, flash attention CUTLASS, blackwell GEMM, hopper GEMM, FP8 GEMM, blockwise scaling, MoE GEMM, StreamK, warp specialization CUTLASS, TMA CUTLASS, or asks about writing high-performance CUDA kernels with CUTLASS/CuTe templates.

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learningar4mirez/samuel

cuda-guide

CUDA/GPU computing guardrails, patterns, and best practices for AI-assisted development. Use when working with CUDA files (.cu, .cuh), or when the user mentions CUDA/GPU programming. Provides kernel design patterns, memory hierarchy guidelines, and occupancy optimization specific to this project's coding standards.

🇺🇸|EnglishTranslated