Loading...
Loading...
Found 25 Skills
Capable of completing the installation and deployment of Ascend NPU drivers and firmware, featuring regular expression-based installation package extraction, on-demand addition of executable permissions, dual package verification via Python+Shell, pre-check and installation of system dependencies, and compatibility with CentOS/RHEL/Ubuntu/Debian systems. It is suitable for the installation and deployment of Ascend NPU drivers and firmware.
Provides installation guidance for CANN on Ascend NPU. Call this skill when users need to install CANN, configure the Ascend environment, or resolve installation issues.
This skill should be used when the user asks about "Ascend NPU", "昇腾", "Huawei NPU", "triton-ascend", "Ascend kernel development", "NPU算子开发", "Atlas", "CANN", or mentions Ascend hardware, AI Core, Cube/Vector/Scalar units. Provides expert guidance on Ascend NPU hardware architecture, triton-ascend kernel development, and GPU to NPU migration. Always use this skill for Ascend-related questions to avoid confusion with GPU documentation and concepts.
Evaluate the performance of Triton operators on Ascend NPU. It is used when users need to analyze operator performance bottlenecks, collect and compare operator performance using msprof/msprof op, diagnose Memory-Bound/Compute-Bound bottlenecks, measure hardware utilization metrics, and generate performance evaluation reports.
Task Orchestration for Full-Process Development of Ascend Triton Operators. Used when users need to develop Triton Operators, covering the complete workflow of environment configuration → requirement design → code generation → static inspection → precision verification → performance evaluation → document generation → performance optimization.
This skill provides comprehensive guidance for adapting Wan-series video generation models (Wan2.1/Wan2.2) from NVIDIA CUDA to Huawei Ascend NPU. It should be used when performing NPU migration of DiT-based video diffusion models, including device layer adaptation, operator replacement, distributed parallelism refactoring, attention optimization, VAE parallelization, and model quantization. This skill covers 9 major adaptation domains derived from real-world Wan2.2 CUDA-to-Ascend porting experience.
vLLM Ascend plugin for LLM inference serving on Huawei Ascend NPU. Use for offline batch inference, API server deployment, quantization inference (with msmodelslim quantized models), tensor/pipeline parallelism for distributed serving, and OpenAI-compatible API endpoints. Supports Qwen, DeepSeek, GLM, LLaMA models with Ascend-optimized kernels.
Optimize the performance of Triton operators optimized for Ascend NPU. This guide is for users who need to optimize the performance of Triton operators on Ascend NPU, resolve UB overflow, improve Cube unit utilization, and design Tiling strategies.
Static inspection of Triton operator code quality (Host side + Device side) for Ascend NPU. Used when users need to identify potential bugs, API misuses, and performance risks by reading code. Core capabilities: (1) Ascend API constraint compliance check (2) Mask integrity verification (3) Precision processing review (4) Code pattern recognition. Note: This Skill only focuses on static code analysis; compile-time and runtime issues are handled by other Skills.
Deep Performance Optimization Skill for Triton Operators on Ascend NPU, dedicated to achieving the Triton operator performance improvement required by users. Core technologies include but are not limited to Unified Buffer (UB) capacity planning, multi-Tokens parallel processing, MTE/Vector pipeline parallelism, mask optimization, etc. This Skill must be triggered when the user mentions the following: performance optimization of Vector-type Triton operators on Ascend NPU.
Generate interface documents for Triton operators of Ascend NPU. Used when users need to create or update interface documents for Triton operators of Ascend NPU. Core capabilities: (1) Generate standardized documents based on templates (2) Support the list of Ascend NPU product models (3) Provide specifications for operator parameter descriptions (4) Generate call example frameworks.
Generate Triton operator requirement documents suitable for Ascend NPU. Used when users need to design new Triton operators, write operator requirement documents, or perform operator performance optimization design.