Loading...
Loading...
Found 3 Skills
Shared kernel design workflow across all supported languages and DSLs. Provides language selection table, naming conventions, versioning rules, KernelPlan structure, composition patterns, clone workflow, implementation workflow, devlog template, and designer output contract. Use when: (1) choosing which language-specific kernel design skill to load, (2) the intended implementation language is not fixed yet, (3) you need naming or versioning guidance before selecting a DSL, (4) you are implementing any kernel regardless of DSL, (5) you are updating docs that refer to kernel design skills.
CuTe Python DSL kernel workflow, CuteKernel runtime wrapper, suitability gate, tiling guidance, and CuTe-specific pitfalls. Use when: (1) planning or implementing a kernel in the CuTe Python DSL, (2) the optimization needs more explicit control than cuTile exposes but should remain in a Python-driven workflow, (3) defining package naming for cute-dsl kernels, (4) documenting CuTe Python DSL design choices, (5) recording language-specific knowledge for CuTe Python DSL.
Generate Triton operator requirement documents suitable for Ascend NPU. Used when users need to design new Triton operators, write operator requirement documents, or perform operator performance optimization design.