Loading...
Loading...
Found 25 Skills
Capable of completing the installation and deployment of Ascend NPU drivers and firmware, featuring regular expression-based installation package extraction, on-demand addition of executable permissions, dual package verification via Python+Shell, pre-check and installation of system dependencies, and compatibility with CentOS/RHEL/Ubuntu/Debian systems. It is suitable for the installation and deployment of Ascend NPU drivers and firmware.
HCCL (Huawei Collective Communication Library) performance testing for Ascend NPU clusters. Use for testing distributed communication bandwidth, verifying HCCL functionality, and benchmarking collective operations like AllReduce, AllGather. Covers MPI installation, multi-node pre-flight checks (SSH/CANN version/NPU health), and production testing workflows.
Huawei Ascend NPU npu-smi command reference. Use for device queries (health, temperature, power, memory, processes, ECC), configuration (thresholds, modes, fan), firmware upgrades (MCU, bootloader, VRD), virtualization (vNPU), and certificate management.
Static inspection of Triton operator code quality (Host side + Device side) for Ascend NPU. Used when users need to identify potential bugs, API misuses, and performance risks by reading code. Core capabilities: (1) Ascend API constraint compliance check (2) Mask integrity verification (3) Precision processing review (4) Code pattern recognition. Note: This Skill only focuses on static code analysis; compile-time and runtime issues are handled by other Skills.
Evaluate the performance of Triton operators on Ascend NPU. It is used when users need to analyze operator performance bottlenecks, collect and compare operator performance using msprof/msprof op, diagnose Memory-Bound/Compute-Bound bottlenecks, measure hardware utilization metrics, and generate performance evaluation reports.
Deep Performance Optimization Skill for Triton Operators on Ascend NPU, dedicated to achieving the Triton operator performance improvement required by users. Core technologies include but are not limited to Unified Buffer (UB) capacity planning, multi-Tokens parallel processing, MTE/Vector pipeline parallelism, mask optimization, etc. This Skill must be triggered when the user mentions the following: performance optimization of Vector-type Triton operators on Ascend NPU.
Generate interface documents for Triton operators of Ascend NPU. Used when users need to create or update interface documents for Triton operators of Ascend NPU. Core capabilities: (1) Generate standardized documents based on templates (2) Support the list of Ascend NPU product models (3) Provide specifications for operator parameter descriptions (4) Generate call example frameworks.
将简单Vector类型Triton算子从GPU迁移到昇腾NPU。当用户需要迁移Triton代码到NPU、提到GPU到NPU迁移、Triton迁移、昇腾适配时使用。注意:无法自动迁移存在编译问题的算子。
Generate Triton kernel code for Ascend NPU based on operator design documents. Used when users need to implement Triton operator kernels and convert requirement documents into executable code. Core capabilities: (1) Parse requirement documents to confirm computing logic (2) Design tiling partitioning strategy (3) Generate high-performance kernel code (4) Generate test code to verify correctness.
Task Orchestration for Full-Process Development of Ascend Triton Operators. Used when users need to develop Triton Operators, covering the complete workflow of environment configuration → requirement design → code generation → static inspection → precision verification → performance evaluation → document generation → performance optimization.
Generate Triton operator requirement documents suitable for Ascend NPU. Used when users need to design new Triton operators, write operator requirement documents, or perform operator performance optimization design.
Create Docker containers for Huawei Ascend NPU development with proper device mappings and volume mounts. Use when setting up Ascend development environments in Docker, running CANN applications in containers, or creating isolated NPU development workspaces. Supports privileged mode (default), basic mode, and full mode with profiling/logging. Auto-detects available NPU devices.