Loading...
Loading...
Found 98 Skills
AI and ML expert including PyTorch, LangChain, LLM integration, and scientific computing
LeetCode-style PyTorch interview practice environment with auto-grading for implementing softmax, attention, GPT-2 and more from scratch.
Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.
Generate a source-backed starting `trtllm-serve --config` YAML for basic aggregate single-node PyTorch serving, aligned with checked-in TensorRT-LLM configs and deployment docs. Preserves explicit latency / balanced / throughput objectives. Excludes disaggregated, multi-node, and non-MTP speculative configs.
Universal Runtime best practices for PyTorch inference, Transformers models, and FastAPI serving. Covers device management, model loading, memory optimization, and performance tuning.
Review PyTorch pull requests for code quality, test coverage, security, and backward compatibility. Use when reviewing PRs, when asked to review code changes, or when the user mentions "review PR", "code review", or "check this PR".
Document undocumented public APIs in PyTorch by removing functions from coverage_ignore_functions and coverage_ignore_classes in docs/source/conf.py, running Sphinx coverage, and adding the appropriate autodoc directives to the correct .md or .rst doc files. Use when a user asks to remove functions from conf.py ignore lists.
High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.
Guide for building Graph Neural Networks with PyTorch Geometric (PyG). Use this skill whenever the user asks about graph neural networks, GNNs, node classification, link prediction, graph classification, message passing networks, heterogeneous graphs, neighbor sampling, or any task involving torch_geometric / PyG. Also trigger when you see imports from torch_geometric, or the user mentions graph convolutions (GCN, GAT, GraphSAGE, GIN), graph data structures, or working with relational/network data. Even if the user just says 'graph learning' or 'geometric deep learning', use this skill.
AI and machine learning development with PyTorch, TensorFlow, and LLM integration. Use when building ML models, training pipelines, fine-tuning LLMs, or implementing AI features.
Apply CUDA Graphs to PyTorch workloads — API selection (torch.compile, PyTorch make_graphed_callables, TE make_graphed_callables, MCore CudaGraphManager, FullCudaGraphWrapper, manual torch.cuda.graph), code compatibility, capture workflows, dynamic pattern handling, and troubleshooting. Triggers: CUDA graph, torch.cuda.graph, make_graphed_callables, reduce-overhead, graph capture, graph replay, kernel launch overhead, CudaGraphManager, FullCudaGraphWrapper, full-iteration graph, stream capture.
Migrate GPU/CUDA Triton operators to Triton-Ascend, or rewrite Python/PyTorch operators into Triton-Ascend implementations that can run on Ascend NPU. When clear optimization opportunities are identified, directly output the optimized code, minimal validation script, and troubleshooting instructions. This skill should be prioritized when users mention 昇腾 (Ascend), Ascend, NPU, triton-ascend, Triton operator migration, PyTorch operator rewriting, coreDim, UB overflow, 1D grid, physical core binding, block_ptr, stride, memory access alignment, mask performance, dtype degradation, operator optimization, or directly ask questions like "How to use this skill", "How to run it in the command line", "How to perform migration/validation in a container", even if users do not explicitly say "write a skill" or "perform migration".