Loading...
Loading...
CUDA kernel development, debugging, and performance optimization for Claude Code. Use when writing, debugging, or optimizing CUDA code, GPU kernels, or parallel algorithms. Covers non-interactive profiling with nsys/ncu, debugging with cuda-gdb/compute-sanitizer, binary inspection with cuobjdump, and performance analysis workflows. Triggers on CUDA, GPU programming, kernel optimization, nsys, ncu, cuda-gdb, compute-sanitizer, PTX, GPU profiling, parallel performance.
npx skill4agent add technillogue/ptx-isa-markdown cudaprintfcompute-sanitizer --tool memcheck ./your_program
compute-sanitizer --tool racecheck ./your_program # for race conditions
compute-sanitizer --tool initcheck ./your_program # uninitialized memorycuda-gdb -batch -ex "run" -ex "bt" ./your_program__global__ void myKernel(float* data, int n) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx == 0) { // Limit output
printf("Kernel launched, n=%d, data[0]=%f\n", n, data[0]);
}
// ... kernel logic ...
if (idx < 10) { // Sample a few threads
printf("Thread %d: result=%f\n", idx, someValue);
}
}if (idx == 0)if (idx < N)<<<grid, block, smem_size>>># Memory errors (most common)
compute-sanitizer --tool memcheck ./program
# Other tools: racecheck, initcheck, synccheck
# For detailed options, see references/debugging-tools.md# Get backtrace on crash
cuda-gdb -batch -ex "run" -ex "bt" ./program
# For breakpoints, thread inspection, see references/debugging-tools.mdnvcc -g -G -lineinfo program.cu -o program# Dump PTX and SASS
cuobjdump -ptx ./program
cuobjdump -sass ./program
# For resource usage, symbol listing, see references/debugging-tools.mdreferences/debugging-tools.md# Basic profile
nsys profile -o report ./program
nsys stats report.nsys-rep --report cuda_gpu_kern_sum
# With NVTX markers
nsys profile --trace=cuda,nvtx -o report ./program
# Key reports: cuda_gpu_kern_sum, cuda_api_sum, cuda_gpu_mem_time_sum, nvtx_sum
# For detailed usage, see references/nsys-guide.mdreferences/nsys-guide.md# Profile specific kernel
ncu --kernel-name "myKernel" -o report ./program
# Quick summary to stdout
ncu --set basic ./program
# Sets: basic, full, memory, launch, roofline
# Sections: ComputeWorkloadAnalysis, MemoryWorkloadAnalysis, Occupancy
# For detailed metrics and interpretation, see references/ncu-guide.mdreferences/ncu-guide.md#include <nvtx3/nvToolsExt.h>
nvtxRangePush("Operation Name");
// ... code to profile ...
nvtxRangePop();-lnvToolsExtnsys profile --trace=cuda,nvtxreferences/nvtx-patterns.md| Symptom | Likely Cause | Investigation |
|---|---|---|
| Low GPU utilization | Kernel launch overhead, CPU bottleneck | nsys timeline, look for gaps |
| Memory bound | Poor access patterns, low cache hit | ncu memory section, check coalescing |
| Compute bound but slow | Low occupancy, register pressure | ncu occupancy, reduce registers |
| Lots of small kernels | Launch overhead dominates | nsys timeline, consider fusion |
| High memcpy time | Excessive H2D/D2H transfers | nsys cuda_gpu_mem, batch transfers |
| Most cycles stalled | Bank conflicts, memory stalls | ncu SchedulerStatistics, check shared memory |
| High sectors/request | Poor coalescing (>4 sectors/req) | ncu memory metrics, use vectorized loads |
references/performance-traps.md# Debug build
nvcc -g -G -lineinfo -O0 program.cu -o program_debug
# Release build
nvcc -O3 -lineinfo program.cu -o program
# Specific architecture
nvcc -arch=sm_80 program.cu -o program # Ampere
nvcc -arch=sm_89 program.cu -o program # Ada Lovelace
nvcc -arch=sm_90 program.cu -o program # Hopper
# Generate PTX (inspect it)
nvcc -ptx program.cu
# Verbose compilation (see register usage)
nvcc --ptxas-options=-v program.cu
# With NVTX
nvcc program.cu -lnvToolsExt -o program-lineinforeferences/ptx-docs/references/ptx-isa.mdreferences/cuda-runtime-docs/references/cuda-runtime.mdcudaDevicePropreferences/cuda-driver-docs/references/cuda-driver.mdcuCtxCreatecuModuleLoadCUDA_ERROR_**-docs/.mdreferences/performance-traps.mdreferences/debugging-tools.mdreferences/nsys-guide.mdreferences/ncu-guide.mdreferences/nvtx-patterns.md