Loading...
Loading...
Found 115 Skills
Apply CUDA Graphs to PyTorch workloads — API selection (torch.compile, PyTorch make_graphed_callables, TE make_graphed_callables, MCore CudaGraphManager, FullCudaGraphWrapper, manual torch.cuda.graph), code compatibility, capture workflows, dynamic pattern handling, and troubleshooting. Triggers: CUDA graph, torch.cuda.graph, make_graphed_callables, reduce-overhead, graph capture, graph replay, kernel launch overhead, CudaGraphManager, FullCudaGraphWrapper, full-iteration graph, stream capture.
Debug AutoDeploy accuracy regressions vs a reference score (PyTorch backend or published baseline). Use when an AutoDeploy model's eval score is significantly below the reference and the root cause is unknown.
Migrate GPU/CUDA Triton operators to Triton-Ascend, or rewrite Python/PyTorch operators into Triton-Ascend implementations that can run on Ascend NPU. When clear optimization opportunities are identified, directly output the optimized code, minimal validation script, and troubleshooting instructions. This skill should be prioritized when users mention 昇腾 (Ascend), Ascend, NPU, triton-ascend, Triton operator migration, PyTorch operator rewriting, coreDim, UB overflow, 1D grid, physical core binding, block_ptr, stride, memory access alignment, mask performance, dtype degradation, operator optimization, or directly ask questions like "How to use this skill", "How to run it in the command line", "How to perform migration/validation in a container", even if users do not explicitly say "write a skill" or "perform migration".
SQL analysis skill for Ascend PyTorch Profiler / msprof DB (e.g., ascend_pytorch_profiler*.db, msprof_*.db). Convert natural language questions (operator latency, communication, dispatch, scheduling, schema/table queries) into safe and executable SQL, and extract table structure details from official documents as needed.
Review, design, and refactor TensorRT-LLM PyTorch MoE code for architecture fit, clean code, maintainability, and testability. Always use for any modification, review, refactor, or design planning that touches MoE modules, including tensorrt_llm/_torch/modules/fused_moe, ConfigurableMoE, MoE backends, MoEScheduler/moe_scheduler.py, forward execution/chunking, communication strategies, EPLB, quantization/weight handling, routing, factories, MoE docs, or MoE tests. Also use when the user asks whether a MoE design follows the current architecture or whether a MoE refactor is reasonable.
Migrate a file to use stricter Pyrefly type checking with annotations required for all functions, classes, and attributes.
Guide users through creating Agent Skills for Claude Code. Use when the user wants to create, write, author, or design a new Skill, or needs help with SKILL.md files, frontmatter, or skill structure.
Refactor code after tests pass. The "Refactor" phase of Red-Green-Refactor.
Determine if proposed changes require an RFC. Use when planning significant changes, before starting major work, or when asked whether an RFC is needed.
Hardware-agnostic quantum ML framework with automatic differentiation. Use when training quantum circuits via gradients, building hybrid quantum-classical models, or needing device portability across IBM/Google/Rigetti/IonQ. Best for variational algorithms (VQE, QAOA), quantum neural networks, and integration with PyTorch/JAX/TensorFlow. For hardware-specific optimizations use qiskit (IBM) or cirq (Google); for open quantum systems use qutip.
Molecular ML with diverse featurizers and pre-built datasets. Use for property prediction (ADMET, toxicity) with traditional ML or GNNs when you want extensive featurization options and MoleculeNet benchmarks. Best for quick experiments with pre-trained models, diverse molecular representations. For graph-first PyTorch workflows use torchdrug; for benchmark datasets use pytdc.
PyTorch-native graph neural networks for molecules and proteins. Use when building custom GNN architectures for drug discovery, protein modeling, or knowledge graph reasoning. Best for custom model development, protein property prediction, retrosynthesis. For pre-trained models and diverse featurizers use deepchem; for benchmark datasets use pytdc.