Search Results: pytorch

Found 98 Skills

perf-memory-tuning

Techniques for reducing peak GPU memory in Megatron Bridge — expandable segments, parallelism resizing, activation recompute, CPU offloading constraints, and common OOM fixes.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

audiocraft-audio-generation

PyTorch library for audio generation including text-to-music (MusicGen) and text-to-sound (AudioGen). Use when you need to generate music from text descriptions, create sound effects, or perform melody-conditioned music generation.

🇺🇸|EnglishTranslated

AI & Machine Learningpluginagentmarketplace/cu...

computer-vision

Image processing, object detection, segmentation, and vision models. Use for image classification, object detection, or visual analysis tasks.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningascend-ai-coding/awesome-...

ai-for-science-generator

GENERator DNA 序列生成模型的昇腾 NPU 迁移 Skill，适用于将基于 HuggingFace Transformers 的 Causal LM 从 CUDA 迁移到华为 Ascend NPU，覆盖环境搭建、依赖安装、代码适配、多进程处理和 sequence recovery 验证。

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learningnvidia/skills

perf-workload-profiling

Code instrumentation for timing workloads. Two scenarios: (1) Training loop — inject manual timing to report per-iteration latency, throughput (samples/sec), and data load time. (2) Standalone kernel/op — write CUDA event timing code with warmup, per-iteration statistics, and anti-pattern avoidance. Also covers NVTX annotation for labeling profiler timelines. NOT for: running or analyzing profiler tools (nsys, ncu, Nsight Systems, Nsight Compute), writing kernels (Triton, CuTe, CUDA), applying optimizations (CUDA Graphs, gradient checkpointing, fusion), or interpreting roofline/SOL% metrics. Triggers: "measure throughput", "benchmark this function", "time my training loop", "samples per second", "NVTX annotate", "instrument my dataloader", "data load time", "kernel timing", "how do I time".

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

torch-geometric

Graph Neural Networks (PyG). Node/graph classification, link prediction, GCN, GAT, GraphSAGE, heterogeneous graphs, molecular property prediction, for geometric deep learning.

🇺🇸|EnglishTranslated

3 scripts/Attention

AI & Machine Learningascend-ai-coding/awesome-...

mindspeed-llm-env-setup

MindSpeed-LLM 环境搭建指南，用于华为昇腾 NPU。覆盖 CANN 环境激活、PyTorch + torch_npu 安装、MindSpeed 加速库安装、Megatron-LM 核心模块集成、MindSpeed-LLM 安装及环境验证。当用户需要在昇腾 NPU 上搭建 MindSpeed-LLM 训练环境时使用。

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

trtllm-codebase-exploration

Systematic approach to exploring the TensorRT-LLM codebase before implementing new features or optimizations. Teaches how to discover existing infrastructure, trace code paths, and avoid reimplementing what already exists. Derived from real mistakes where ~250 lines of code were written and deleted because existing forward methods weren't discovered upfront. Use when starting any new feature, optimization, or code modification in TRT-LLM.

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/trending-skills

wildworld-dataset

WildWorld large-scale action-conditioned world modeling dataset with 108M+ frames from a photorealistic ARPG game, featuring per-frame annotations, 450+ actions, and explicit state information for generative world modeling research.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

adding-cutile-kernel

Add a new cuTile GPU kernel operator to TileGym. Covers dispatch registration in ops.py, cuTile backend implementation, __init__.py exports, test creation, and benchmark in tests/benchmark. Use when adding, creating, or implementing a new cuTile operator/kernel in TileGym, or when asking how to register a new cuTile op.

🇺🇸|EnglishTranslated

Data Processingdavila7/claude-code-templ...

cellxgene-census

Query CZ CELLxGENE Census (61M+ cells). Filter by cell type/tissue/disease, retrieve expression data, integrate with scanpy/PyTorch, for population-scale single-cell analysis.

🇺🇸|EnglishTranslated

AI & Machine Learninglingzhi227/claude-skills

experiment-code

Write ML experiment code with iterative improvement. Generate training/evaluation pipelines, debug errors, and optimize results through code reflection. Use when implementing experiments for a research paper.

🇺🇸|EnglishTranslated