Search Results: pytorch

Found 115 Skills

perf-torch-cuda-graphs

Apply CUDA Graphs to PyTorch workloads — API selection (torch.compile, PyTorch make_graphed_callables, TE make_graphed_callables, MCore CudaGraphManager, FullCudaGraphWrapper, manual torch.cuda.graph), code compatibility, capture workflows, dynamic pattern handling, and troubleshooting. Triggers: CUDA graph, torch.cuda.graph, make_graphed_callables, reduce-overhead, graph capture, graph replay, kernel launch overhead, CudaGraphManager, FullCudaGraphWrapper, full-iteration graph, stream capture.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningnvidia/skills

ad-accuracy-debug

Debug AutoDeploy accuracy regressions vs a reference score (PyTorch backend or published baseline). Use when an AutoDeploy model's eval score is significantly below the reference and the root cause is unknown.

🇺🇸|EnglishTranslated

AI & Machine Learningascend-ai-coding/awesome-...

triton-ascend-migration

Migrate GPU/CUDA Triton operators to Triton-Ascend, or rewrite Python/PyTorch operators into Triton-Ascend implementations that can run on Ascend NPU. When clear optimization opportunities are identified, directly output the optimized code, minimal validation script, and troubleshooting instructions. This skill should be prioritized when users mention 昇腾 (Ascend), Ascend, NPU, triton-ascend, Triton operator migration, PyTorch operator rewriting, coreDim, UB overflow, 1D grid, physical core binding, block_ptr, stride, memory access alignment, mask performance, dtype degradation, operator optimization, or directly ask questions like "How to use this skill", "How to run it in the command line", "How to perform migration/validation in a container", even if users do not explicitly say "write a skill" or "perform migration".

🇨🇳|ChineseTranslated

AI & Machine Learningascend-ai-coding/awesome-...

external-mindstudio-ascend-profiler-db-explorer

SQL analysis skill for Ascend PyTorch Profiler / msprof DB (e.g., ascend_pytorch_profiler*.db, msprof_*.db). Convert natural language questions (operator latency, communication, dispatch, scheduling, schema/table queries) into safe and executable SQL, and extract table structure details from official documents as needed.

🇨🇳|ChineseTranslated

1 scripts/Attention

Code Qualitynvidia/skills

trtllm-moe-develop

Review, design, and refactor TensorRT-LLM PyTorch MoE code for architecture fit, clean code, maintainability, and testability. Always use for any modification, review, refactor, or design planning that touches MoE modules, including tensorrt_llm/_torch/modules/fused_moe, ConfigurableMoE, MoE backends, MoEScheduler/moe_scheduler.py, forward execution/chunking, communication strategies, EPLB, quantization/weight handling, routing, factories, MoE docs, or MoE tests. Also use when the user asks whether a MoE design follows the current architecture or whether a MoE refactor is reasonable.

🇺🇸|EnglishTranslated

Code Qualitypytorch/pytorch

pyrefly-type-coverage

Migrate a file to use stricter Pyrefly type checking with annotations required for all functions, classes, and attributes.

🇺🇸|EnglishTranslated

AI & Machine Learningpytorch/pytorch

skill-writer

Guide users through creating Agent Skills for Claude Code. Use when the user wants to create, write, author, or design a new Skill, or needs help with SKILL.md files, frontmatter, or skill structure.

🇺🇸|EnglishTranslated

Code Qualitymeta-pytorch/openenv

simplify

Refactor code after tests pass. The "Refactor" phase of Red-Green-Refactor.

🇺🇸|EnglishTranslated

Project Managementmeta-pytorch/openenv

rfc-check

Determine if proposed changes require an RFC. Use when planning significant changes, before starting major work, or when asked whether an RFC is needed.

🇺🇸|EnglishTranslated

AI & Machine Learningk-dense-ai/claude-scienti...

pennylane

Hardware-agnostic quantum ML framework with automatic differentiation. Use when training quantum circuits via gradients, building hybrid quantum-classical models, or needing device portability across IBM/Google/Rigetti/IonQ. Best for variational algorithms (VQE, QAOA), quantum neural networks, and integration with PyTorch/JAX/TensorFlow. For hardware-specific optimizations use qiskit (IBM) or cirq (Google); for open quantum systems use qutip.

🇺🇸|EnglishTranslated

AI & Machine Learningk-dense-ai/claude-scienti...

deepchem

Molecular ML with diverse featurizers and pre-built datasets. Use for property prediction (ADMET, toxicity) with traditional ML or GNNs when you want extensive featurization options and MoleculeNet benchmarks. Best for quick experiments with pre-trained models, diverse molecular representations. For graph-first PyTorch workflows use torchdrug; for benchmark datasets use pytdc.

🇺🇸|EnglishTranslated

3 scripts/Checked

AI & Machine Learningk-dense-ai/claude-scienti...

torchdrug

PyTorch-native graph neural networks for molecules and proteins. Use when building custom GNN architectures for drug discovery, protein modeling, or knowledge graph reasoning. Best for custom model development, protein property prediction, retrosynthesis. For pre-trained models and diverse featurizers use deepchem; for benchmark datasets use pytdc.

🇺🇸|EnglishTranslated