Loading...
Loading...
Found 3,526 Skills
Catlass Operator End-to-End Development Orchestrator. Based on ascend-kernel (csrc/ops), it connects catlass design, catlass-operator-code-gen and ascendc sub-skills to complete the closed loop from project initialization to documentation, precision, and performance. Keywords: Catlass, end-to-end, ascend-kernel, operator development, workflow orchestration.
Verify and build the required environment for Triton operator development on the Ascend platform, including configurations of dependencies such as CANN, Python/torch/torch_npu/triton-ascend and PATH environment variables. This is used when users need to configure the Triton operator development environment, check the installation of CANN/torch/triton-ascend, or verify whether the environment is available.
Generate PyTorch-style interface documentation (README.md) for AscendC operators. Trigger scenarios: Use this when interface documentation needs to be generated after compilation and debugging are completed, or when the user mentions "generate operator documentation", "create README", "document operator", "help me write documentation" (in operator context), "operator documentation".
Python code refactoring skills, covering code smell identification, design pattern application, readability improvement, and practical experience. This skill is applicable when users request "refactor code", "refactor", "code optimization", "improve code quality", "code smell review", "apply design patterns", "enhance readability", or submit code review requests. It supports generating structured refactoring documents after refactoring completion ("output refactoring document", "generate refactoring report"). It includes practical patterns extracted from 20+ real refactoring PRs in the vllm-ascend repository.
Guide Catlass operator performance tuning. Process: Read the Catlass optimization guide, obtain/update profiler baseline, modify tiling according to the guide, recompile, **mandatorily generate and display performance comparison report**, iterate and compare. Tuning strategies are based on Catlass documentation. Ask for clarification if conditions are unclear.
HCCL (Huawei Collective Communication Library) performance testing for Ascend NPU clusters. Use for testing distributed communication bandwidth, verifying HCCL functionality, and benchmarking collective operations like AllReduce, AllGather. Covers MPI installation, multi-node pre-flight checks (SSH/CANN version/NPU health), and production testing workflows.
Provides installation guidance for CANN on Ascend NPU. Call this skill when users need to install CANN, configure the Ascend environment, or resolve installation issues.
Maintain JSONL-only profiler performance test cases under csrc/ops/<op>/test in ascend-kernel. Collect data using torch_npu.profiler (with fixed warmup=5 and active=5), aggregate the Total Time(us) from ASCEND_PROFILER_OUTPUT/op_statistic.csv, and output a unified Markdown comparison report (custom operator vs baseline) that includes a DType column. Do not generate perf_cases.json or *_profiler_results.json. Refer to examples/layer_norm_profiler_reference/ for the reference implementation.
Complete toolkit for Huawei Ascend NPU model conversion and end-to-end inference adaptation. Workflow 1 auto-discovers input shapes and parameters from user source code. Workflow 2 exports PyTorch models to ONNX. Workflow 3 converts ONNX to .om via ATC with multi-CANN version support. Workflow 4 adapts the user's full inference pipeline (preprocessing + model + postprocessing) to run end-to-end on NPU. Workflow 5 verifies precision between ONNX and OM outputs. Workflow 6 generates a reproducible README. Supports any standard PyTorch/ONNX model. Use when converting, testing, or deploying models on Ascend AI processors.
Compose Mapbox MCP tools to produce grounded, cited location-aware responses from live data instead of training data
Expert Mermaid diagram creation, validation, and rendering with dual-engine output (SVG/PNG/ASCII). Supports all 20+ diagram types including C4 architecture, AWS architecture-beta with service icons, flowcharts, sequence, ERD, state, class, mindmap, timeline, git graph, sankey, and more. Features code-to-diagram analysis, batch rendering, 15+ themes, and syntax validation. Use when users ask to create diagrams, visualize architecture, render mermaid files, generate ASCII diagrams, document system flows, model databases, draw AWS infrastructure, analyze code structure, or anything involving "mermaid", "diagram", "flowchart", "architecture diagram", "sequence diagram", "ERD", "C4", "ASCII diagram". Do NOT use for non-Mermaid image generation, data plotting with chart libraries, or general documentation writing.
Use when selecting commits, ranges, or historical refs in git — covers ^, ~, .., ..., @{N}, @{time}, --not, and pickaxe content selectors