Search Results: GPU-acceleration

Found 19 Skills

perf-parallelism-strategies

Operational guide for choosing and combining parallelism strategies in Megatron Bridge, including sizing rules, hardware topology mapping, and combined parallelism configuration.

🇺🇸|EnglishTranslated

AI & Machine Learningg1joshi/agent-skills

xgboost

XGBoost gradient boosting library. Use for tabular ML.

🇺🇸|EnglishTranslated

Data Processingg1joshi/agent-skills

polars

Polars fast DataFrame library. Use for fast data processing.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-optimization

Performance optimization coordination playbook. Contains specialist routing table, TileIR two-step pipeline, kernel generation specialist selection, prioritization criteria, and safe modification workflow. Use when the user asks to apply optimizations, write kernels, or improve performance. Covers both user-specified optimization and autopilot-driven iterative optimization.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

kernel-triton-writing

ONLY for OpenAI Triton (@triton.jit) kernel development. NEVER use for CUDA C++ kernels, TileIR, or profiling tools (ncu, nsys). The user's request must involve Triton explicitly. Covers Triton-specific patterns: fused elementwise, reductions (softmax, LayerNorm, RMSNorm), tiled GEMM with triton.autotune, and flash attention. Workflow: design, write, verify (with fast-path for explicit requests).

🇺🇸|EnglishTranslated

3 scripts/Attention

Data Processingnvidia/skills

cupynumeric-install

Install and verify cuPyNumeric for Python — requirements, commands, verification. Source builds are out of scope.

🇺🇸|EnglishTranslated

AI & Machine Learningactionbook/rust-skills

domain-ml

Use when building ML/AI apps in Rust. Keywords: machine learning, ML, AI, tensor, model, inference, neural network, deep learning, training, prediction, ndarray, tch-rs, burn, candle, 机器学习, 人工智能, 模型推理

🇺🇸|EnglishTranslated