Loading...
Loading...
Found 1,060 Skills
Verify and build the required environment for Triton operator development on the Ascend platform, including configurations of dependencies such as CANN, Python/torch/torch_npu/triton-ascend and PATH environment variables. This is used when users need to configure the Triton operator development environment, check the installation of CANN/torch/triton-ascend, or verify whether the environment is available.
AscendC Operator Precision Evaluation. Generate a comprehensive precision test case set (≥30 cases) for the compiled and installed operator, run the tests and generate a precision verification report. Keywords: precision test, precision evaluation, precision report, accuracy, error analysis. After execution, YOU MUST display the overview, failure summary and key findings in the current conversation, and must not only attach the report path.
Evaluate the performance of Triton operators on Ascend NPU. It is used when users need to analyze operator performance bottlenecks, collect and compare operator performance using msprof/msprof op, diagnose Memory-Bound/Compute-Bound bottlenecks, measure hardware utilization metrics, and generate performance evaluation reports.
Deep Performance Optimization Skill for Triton Operators on Ascend NPU, dedicated to achieving the Triton operator performance improvement required by users. Core technologies include but are not limited to Unified Buffer (UB) capacity planning, multi-Tokens parallel processing, MTE/Vector pipeline parallelism, mask optimization, etc. This Skill must be triggered when the user mentions the following: performance optimization of Vector-type Triton operators on Ascend NPU.
Generate PyTorch-style interface documentation (README.md) for AscendC operators. Trigger scenarios: Use this when interface documentation needs to be generated after compilation and debugging are completed, or when the user mentions "generate operator documentation", "create README", "document operator", "help me write documentation" (in operator context), "operator documentation".
Generate interface documents for Triton operators of Ascend NPU. Used when users need to create or update interface documents for Triton operators of Ascend NPU. Core capabilities: (1) Generate standardized documents based on templates (2) Support the list of Ascend NPU product models (3) Provide specifications for operator parameter descriptions (4) Generate call example frameworks.
AscendC Operator End-to-End Development Orchestrator. Used when users need to develop new operators, implement custom operators, or complete the full process from requirements to testing. Keywords: operator development, end-to-end, full process, workflow orchestration, new operator creation.
AscendC Operator Design Completion - Assist users in completing operator architecture design, interface definition, and performance planning. Use this skill when users mention operator design, operator development, tiling strategy, memory planning, AscendC kernel design, two-level tiling, inter-core splitting, or intra-core splitting.
Generate Triton operator requirement documents suitable for Ascend NPU. Used when users need to design new Triton operators, write operator requirement documents, or perform operator performance optimization design.
Debugging and Root Cause Localization for AscendC Operator Precision Issues. Used when operator precision tests fail (such as allclose failure, result deviation, all-zero/NaN output, etc.). Process: Error Distribution Analysis → Code Error-Prone Point Review → Experimental Isolation → printf/DumpTensor Instrumentation → Fix Verification. Keywords: precision debugging, precision issue, result inconsistency, error localization, allclose failure, output deviation, NaN, all-zero, precision debug.
Create Docker containers for Huawei Ascend NPU development with proper device mappings and volume mounts. Use when setting up Ascend development environments in Docker, running CANN applications in containers, or creating isolated NPU development workspaces. Supports privileged mode (default), basic mode, and full mode with profiling/logging. Auto-detects available NPU devices.
Optimize the performance of Triton operators optimized for Ascend NPU. This guide is for users who need to optimize the performance of Triton operators on Ascend NPU, resolve UB overflow, improve Cube unit utilization, and design Tiling strategies.