Loading...
Loading...
Found 913 Skills
Provides installation guidance for CANN on Ascend NPU. Call this skill when users need to install CANN, configure the Ascend environment, or resolve installation issues.
Analyze Huawei Ascend NPU profiling data to discover hidden performance anomalies and produce a detailed model architecture report reverse-engineered from profiling. Trigger on Ascend profiling traces, NPU bottlenecks, device idle gaps, host-device issues, kernel_details.csv / trace_view.json / op_summary / communication.json. Also trigger on "profiling", "step time", "device bubble", "underfeed", "host bound", "device bound", "AICPU", "wait anchor", "kernel gap", "Ascend performance", "model architecture", "layer structure", "forward pass", "model structure". Runs anomaly discovery (bubble detection, wait-anchor, AICPU exposure) alongside model architecture analysis (layer classification, per-layer sub-structure, communication pipeline). Outputs a separate Markdown architecture report alongside anomaly analysis.
Troubleshoot and optimize the performance of Ascend C operators. This skill is applicable when users develop, review or optimize Ascend C kernel operators, or triggered when users mention keywords such as Ascend C performance optimization, operator optimization, tiling, pipeline, data copy, memory optimization, NPU/Ascend.
Optimize the performance of Triton operators optimized for Ascend NPU. This guide is for users who need to optimize the performance of Triton operators on Ascend NPU, resolve UB overflow, improve Cube unit utilization, and design Tiling strategies.
Maintain JSONL-only profiler performance test cases under csrc/ops/<op>/test in ascend-kernel. Collect data using torch_npu.profiler (with fixed warmup=5 and active=5), aggregate the Total Time(us) from ASCEND_PROFILER_OUTPUT/op_statistic.csv, and output a unified Markdown comparison report (custom operator vs baseline) that includes a DType column. Do not generate perf_cases.json or *_profiler_results.json. Refer to examples/layer_norm_profiler_reference/ for the reference implementation.
Complete toolkit for Huawei Ascend NPU model conversion and end-to-end inference adaptation. Workflow 1 auto-discovers input shapes and parameters from user source code. Workflow 2 exports PyTorch models to ONNX. Workflow 3 converts ONNX to .om via ATC with multi-CANN version support. Workflow 4 adapts the user's full inference pipeline (preprocessing + model + postprocessing) to run end-to-end on NPU. Workflow 5 verifies precision between ONNX and OM outputs. Workflow 6 generates a reproducible README. Supports any standard PyTorch/ONNX model. Use when converting, testing, or deploying models on Ascend AI processors.
Initialize AscendC operator project and create compilable operator skeleton. Trigger scenarios: (1) User requests to create a new operator; (2) Keywords: ascendc operator, new operator, operator directory, operator initialization; (3) Need to quickly implement based on ascend-kernel template. This skill not only creates directories, but also outputs standard files and checklists for "continuous development".
Generate Triton kernel code for Ascend NPU based on operator design documents. Used when users need to implement Triton operator kernels and convert requirement documents into executable code. Core capabilities: (1) Parse requirement documents to confirm computing logic (2) Design tiling partitioning strategy (3) Generate high-performance kernel code (4) Generate test code to verify correctness.
Complete AscendC Operator Verification Testcase Generation - Help users with testcase design. Use this skill when users mention testcase design, generalized testcase generation, operator benchmark, UT testcase, precision testcase, or performance testcase.
Ascend C Code Inspection Skill. Conduct security specification inspection on code based on the hypothesis testing methodology. When calling, you must clearly provide: code snippets and inspection rule descriptions. TRIGGER when: Users request code inspection, code review, ask code security questions, check coding specifications, or need to check specific code issues (such as memory leaks, integer overflows, null pointers, etc.). Keywords: Ascend C, code inspection, code review, security specification, memory, pointer, overflow, leak, coding specification.
昇腾(Ascend)推理生态开源代码仓库智能问答专家旨在为 vLLM、vLLM-Ascend、MindIE-LLM、MindIE-SD、MindIE-Motor、MindIE-Turbo 以及 msModelSlim (MindStudio-ModelSlim) 等仓库提供专家级且易于理解的解释。在处理昇腾(Ascend)推理生态相关项目的用户询问时,务必触发此技能(Skill),可解答使用方法、部署流程、支持模型、支持特性、系统架构、配置管理、调试、测试、故障排查、性能优化、定制开发、源码解析以及其他技术问题。支持中英文双语回复,并可借助 deepwiki MCP 工具检索仓库知识库,生成具备上下文感知且基于证据的回答。Ascend inference ecosystem open-source code repository intelligent question-and-answer (Q&A) expert. Provide expert-level yet comprehensible explanations for repositories such as vLLM, vLLM-Ascend, MindIE-LLM, MindIE-SD, MindIE-Motor, MindIE-Turbo, and msModelSlim (MindStudio-ModelSlim). Use this skill when addressing user inquiries related to these Ascend inference ecosystem projects, including topics such as usage, deployment process, supported models, supported features, system architecture, configuration management, debugging, testing, troubleshooting, performance optimization, custom development, source code analysis, and any other technical issues about these projects. Support responses in both Chinese and English. Use deepwiki MCP tools to query repository knowledge bases and generate context-aware, evidence-based responses.
Task Orchestration for Full-Process Development of Ascend Triton Operators. Used when users need to develop Triton Operators, covering the complete workflow of environment configuration → requirement design → code generation → static inspection → precision verification → performance evaluation → document generation → performance optimization.