Loading...
Loading...
Converts cuTile Python GPU kernels (@ct.kernel) to cuTile.jl Julia equivalents. Handles kernel syntax translation, 0-indexed to 1-indexed conversion, broadcasting differences, memory layout (row-major to column-major), type system mapping, and launch API differences. Use when converting, porting, or translating cuTile Python kernels to Julia cuTile.jl, or debugging/optimizing existing Julia cuTile translations.
npx skill4agent add nvidia/skills converting-cutile-to-julia@ct.kernelfunction ... endtranslations/workflow.mdMethodErrorIRErrorreferences/debugging.mdreferences/api-mapping.mdreferences/critical-rules.mdreferences/testing.mdjulia/Project.tomljulia/ # Self-contained Julia sub-project
├── Project.toml # Dependencies: CUDA.jl, cuTile.jl, NNlib.jl, Test
├── kernels/ # cuTile.jl kernel implementations
│ ├── add.jl # ← Ground-truth: 1D element-wise with alpha scaling (tensor+tensor, tensor+scalar)
│ ├── matmul.jl # ← Ground-truth: 2D tiled MMA, standard Julia layout (M,K)×(K,N)→(M,N)
│ └── softmax.jl # ← Ground-truth: 3 strategies (TMA, online, chunked) using ct.load/ct.store
└── test/ # Julia-native tests (using Test stdlib)
├── runtests.jl # Test runner entry point
├── test_add.jl
├── test_matmul.jl
└── test_softmax.jljulia/kernels/*.jljulia/test/*.jljulia/kernels/<op>.jltranslations/workflow.mdreferences/api-mapping.mdreferences/critical-rules.mdjulia/test/test_<op>.jlTestNNlib.jlinclude(...)julia/test/runtests.jlpython <skill-dir>/scripts/validate_cutile_jl.py <file.jl>julia --project=julia/ julia/test/runtests.jltranslations/workflow.mdreferences/critical-rules.md| # | Pitfall | One-line fix |
|---|---|---|
| 1 | | Use |
| 2 | | Use |
| 3 | | Compiler bug — file upstream with minimal reproducer |
| 4 | | Args are positional — match kernel signature exactly |
| 5 | | |
julia/kernels/cutile_python.pycutile_julia.jl| # | Example | Key Patterns | When to Reference |
|---|---|---|---|
| 01 | | 1D | Starting point; basic TMA + element-wise patterns |
| 02 | | | MMA / tensor core operations |
| 03 | | Persistent scheduling, | Large-tensor reduction patterns |
julia/kernels/add.jlmatmul.jlsoftmax.jljulia/kernels/*.jl| Category | Document | Content |
|---|---|---|
| Workflows | | Full conversion workflow with todo list, validation loop, checklist |
| Rules | | 17 Critical Rules for cuTile Python → Julia conversion |
| API | | Python↔Julia bidirectional API mapping + kernel patterns |
| Testing | | Julia-native test patterns, tolerances, failure diagnosis |
| Debugging | | Julia-specific error diagnosis + IR debug commands |
| Scripts | | Static validation for Julia anti-patterns (run it) |
| Ground Truth | | Actual working implementations in the codebase |
julia/Project.toml[compat] juliajulia --versionjulia --version# Install Julia dependencies declared in julia/Project.toml
julia --project=julia/ -e 'using Pkg; Pkg.instantiate()'
# Run tests
julia --project=julia/ julia/test/runtests.jljulia/Project.toml[compat] juliajulia/Project.toml