converting-cutile-to-julia

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

cuTile Python → cuTile.jl (Julia) Conversion

cuTile Python → cuTile.jl(Julia)转换

Convert
@ct.kernel
Python kernels to Julia
function ... end
cuTile.jl kernels.
@ct.kernel
Python内核转换为Julia
function ... end
cuTile.jl内核。

Workflow Selection

工作流选择

  • Standard conversion → Full workflow:
    translations/workflow.md
  • Errors (
    MethodError
    ,
    IRError
    , numerical mismatch) →
    references/debugging.md
  • Quick reference
    references/api-mapping.md
    +
    references/critical-rules.md
  • Test patterns
    references/testing.md
  • 标准转换 → 完整工作流:
    translations/workflow.md
  • 错误处理
    MethodError
    IRError
    、数值不匹配)→
    references/debugging.md
  • 快速参考
    references/api-mapping.md
    +
    references/critical-rules.md
  • 测试模式
    references/testing.md

Architecture

架构

Julia kernels are standalone — no Python bridge, no pytest integration. The Julia sub-project lives in
julia/
at the repo root with its own
Project.toml
for dependency management.
julia/                          # Self-contained Julia sub-project
├── Project.toml                # Dependencies: CUDA.jl, cuTile.jl, NNlib.jl, Test
├── kernels/                    # cuTile.jl kernel implementations
│   ├── add.jl                  # ← Ground-truth: 1D element-wise with alpha scaling (tensor+tensor, tensor+scalar)
│   ├── matmul.jl               # ← Ground-truth: 2D tiled MMA, standard Julia layout (M,K)×(K,N)→(M,N)
│   └── softmax.jl              # ← Ground-truth: 3 strategies (TMA, online, chunked) using ct.load/ct.store
└── test/                       # Julia-native tests (using Test stdlib)
    ├── runtests.jl             # Test runner entry point
    ├── test_add.jl
    ├── test_matmul.jl
    └── test_softmax.jl
Ground-truth reference: Always consult
julia/kernels/*.jl
and
julia/test/*.jl
for patterns that compile and pass tests. These are the canonical examples of working cuTile.jl code.
Julia内核是独立的——无需Python桥接,无需pytest集成。Julia子项目位于仓库根目录的
julia/
文件夹中,通过自身的
Project.toml
管理依赖。
julia/                          # 独立的Julia子项目
├── Project.toml                # 依赖项:CUDA.jl, cuTile.jl, NNlib.jl, Test
├── kernels/                    # cuTile.jl内核实现
│   ├── add.jl                  # ← 基准实现:带alpha缩放的1D元素级运算(张量+张量、张量+标量)
│   ├── matmul.jl               # ← 基准实现:2D分块MMA,标准Julia布局 (M,K)×(K,N)→(M,N)
│   └── softmax.jl              # ← 基准实现:使用ct.load/ct.store的3种策略(TMA、在线、分块)
└── test/                       # Julia原生测试(使用Test标准库)
    ├── runtests.jl             # 测试运行器入口
    ├── test_add.jl
    ├── test_matmul.jl
    └── test_softmax.jl
基准参考:始终参考
julia/kernels/*.jl
julia/test/*.jl
中的可编译并通过测试的模式。这些是cuTile.jl可运行代码的标准示例。

Instructions

操作步骤

  1. Analyze the Python kernel: identify patterns, shapes, dtypes, operations
  2. Write Julia kernel
    julia/kernels/<op>.jl
    with cuTile.jl kernel + bridge function(s)
  3. Convert kernel signature (see
    translations/workflow.md
    Phase 2)
  4. Convert kernel body (apply
    references/api-mapping.md
    +
    references/critical-rules.md
    )
  5. Write Julia test
    julia/test/test_<op>.jl
    using
    Test
    stdlib +
    NNlib.jl
    for reference
  6. Register test — add
    include(...)
    in
    julia/test/runtests.jl
  7. Validate — run the bundled validator:
    python <skill-dir>/scripts/validate_cutile_jl.py <file.jl>
  8. Test — run
    julia --project=julia/ julia/test/runtests.jl
Full conversion checklist with post-conversion verification →
translations/workflow.md
  1. 分析Python内核:识别模式、形状、数据类型、操作
  2. 编写Julia内核 — 在
    julia/kernels/<op>.jl
    中编写cuTile.jl内核及桥接函数
  3. 转换内核签名(参见
    translations/workflow.md
    第2阶段)
  4. 转换内核主体(应用
    references/api-mapping.md
    +
    references/critical-rules.md
    中的规则)
  5. 编写Julia测试 — 在
    julia/test/test_<op>.jl
    中使用
    Test
    标准库 +
    NNlib.jl
    作为参考实现
  6. 注册测试 — 在
    julia/test/runtests.jl
    中添加
    include(...)
  7. 验证 — 运行内置验证器:
    python <skill-dir>/scripts/validate_cutile_jl.py <file.jl>
  8. 测试 — 运行
    julia --project=julia/ julia/test/runtests.jl
包含转换后验证步骤的完整转换检查清单 →
translations/workflow.md

⚠️ Top Pitfalls

⚠️ 主要陷阱

The most dangerous translation errors. Full rules (17 total) in
references/critical-rules.md
.
#PitfallOne-line fix
1
ct.full()
doesn't exist in Julia
Use
fill(val, shape)
,
zeros(T, dims...)
, or
ones(T, dims...)
2
max(a, b)
on tiles →
IRError
Use
max.(a, b)
(broadcast dot)
3
IRError
/
MethodError
mentioning
IRStructurizer
Compiler bug — file upstream with minimal reproducer
4
ct.launch
arg order silently wrong
Args are positional — match kernel signature exactly
5
ct.load
with
order
— index positions wrong
order
remaps BOTH shape AND index (Critical Rule 16)
最危险的翻译错误。完整规则(共17条)请参见
references/critical-rules.md
#陷阱单行修复方案
1Julia中不存在
ct.full()
使用
fill(val, shape)
zeros(T, dims...)
ones(T, dims...)
2对tile使用
max(a, b)
IRError
使用
max.(a, b)
(广播点语法)
3提到
IRStructurizer
IRError
/
MethodError
编译器bug — 提交上游问题并附带最小复现示例
4
ct.launch
参数顺序错误但无提示
参数是位置参数 — 完全匹配内核签名
5
order
参数的
ct.load
— 索引位置错误
order
会同时重映射形状和索引(关键规则16)

Worked Examples

示例演示

Side-by-side Python → Julia conversions matching the released Julia kernels in
julia/kernels/
. Each directory contains
cutile_python.py
(before) and
cutile_julia.jl
(after).
#ExampleKey PatternsWhen to Reference
01
add
1D
ct.load
/
ct.store
, alpha scaling, scalar broadcast,
fill
/
zeros
, keyword load/store
Starting point; basic TMA + element-wise patterns
02
matmul
muladd
, TF32 conversion, K-loop with
for
, 2D swizzle, standard Julia layout,
ct.@compiler_options
MMA / tensor core operations
03
softmax
Persistent scheduling,
for
loops,
gather
/
scatter
,
padding_mode
, multi-pass
Large-tensor reduction patterns
These match the released kernels in
julia/kernels/
(
add.jl
,
matmul.jl
,
softmax.jl
). The examples are simplified teaching versions — always consult
julia/kernels/*.jl
for the canonical, tested implementations.
Python → Julia的对比转换示例,与
julia/kernels/
中已发布的Julia内核匹配。每个目录包含
cutile_python.py
(转换前)和
cutile_julia.jl
(转换后)。
#示例核心模式参考场景
01
add
1D
ct.load
/
ct.store
、alpha缩放、标量广播、
fill
/
zeros
、关键字加载/存储
入门示例;基础TMA + 元素级运算模式
02
matmul
muladd
、TF32转换、带
for
的K循环、2D重排、标准Julia布局、
ct.@compiler_options
MMA / 张量核运算
03
softmax
持久调度、
for
循环、
gather
/
scatter
padding_mode
、多阶段处理
大张量归约模式
这些示例与
julia/kernels/
中的已发布内核(
add.jl
matmul.jl
softmax.jl
)匹配。示例是简化的教学版本 — 始终参考
julia/kernels/*.jl
获取经过测试的标准实现。

Reference Documents

参考文档

CategoryDocumentContent
Workflows
translations/workflow.md
Full conversion workflow with todo list, validation loop, checklist
Rules
references/critical-rules.md
17 Critical Rules for cuTile Python → Julia conversion
API
references/api-mapping.md
Python↔Julia bidirectional API mapping + kernel patterns
Testing
references/testing.md
Julia-native test patterns, tolerances, failure diagnosis
Debugging
references/debugging.md
Julia-specific error diagnosis + IR debug commands
Scripts
scripts/validate_cutile_jl.py
Static validation for Julia anti-patterns (run it)
Ground Truth
julia/kernels/*.jl
+
julia/test/*.jl
Actual working implementations in the codebase
分类文档内容
工作流
translations/workflow.md
完整转换工作流,包含任务清单、验证循环、检查清单
规则
references/critical-rules.md
cuTile Python → Julia转换的17条关键规则
API
references/api-mapping.md
Python↔Julia双向API映射 + 内核模式
测试
references/testing.md
Julia原生测试模式、容差设置、故障诊断
调试
references/debugging.md
Julia特定错误诊断 + IR调试命令
脚本
scripts/validate_cutile_jl.py
Julia反模式的静态验证工具(建议运行)
基准实现
julia/kernels/*.jl
+
julia/test/*.jl
代码库中的实际可运行实现

Environment Setup

环境设置

Prerequisite — Julia: this skill requires the Julia version declared in
julia/Project.toml
under
[compat] julia
. If
julia --version
is missing or older than that, install from the official Julia site at https://julialang.org/install/ following the verified installer instructions for your OS. Resume below once
julia --version
is compatible.
Then, from the repo root:
bash
undefined
前置条件 — Julia:本工具要求使用
julia/Project.toml
[compat] julia
声明的Julia版本。如果
julia --version
显示版本缺失或低于要求版本,请从Julia官方网站https://julialang.org/install/下载对应操作系统的验证安装程序进行安装。待
julia --version
显示版本符合要求后,继续以下步骤。
然后,从仓库根目录执行:
bash
undefined

Install Julia dependencies declared in julia/Project.toml

安装julia/Project.toml中声明的Julia依赖

julia --project=julia/ -e 'using Pkg; Pkg.instantiate()'
julia --project=julia/ -e 'using Pkg; Pkg.instantiate()'

Run tests

运行测试

julia --project=julia/ julia/test/runtests.jl

Requirements:
- Julia (minimum version declared in `julia/Project.toml` under `[compat] julia`)
- CUDA 13.1+ driver
- Blackwell GPU (compute capability 10+)
- Dependencies managed via `julia/Project.toml`: CUDA.jl, cuTile.jl, NNlib.jl, Test
julia --project=julia/ julia/test/runtests.jl

要求:
- Julia(最低版本为`julia/Project.toml`中`[compat] julia`声明的版本)
- CUDA 13.1+ 驱动
- Blackwell GPU(计算能力10+)
- 通过`julia/Project.toml`管理的依赖项:CUDA.jl, cuTile.jl, NNlib.jl, Test