Loading...
Loading...
Found 14 Skills
Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.
Use when building ML/AI apps in Rust. Keywords: machine learning, ML, AI, tensor, model, inference, neural network, deep learning, training, prediction, ndarray, tch-rs, burn, candle, 机器学习, 人工智能, 模型推理
The basics of how to program GPUs using Mojo. Use this skill in addition to mojo-syntax when writing Mojo code that targets GPUs or other accelerators. Use targeting code to NVIDIA, AMD, Apple silicon GPUs, or others. Use this skill to overcome misconceptions about how Mojo GPU code is written.
Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without metadata. Best for high-performance applications.
Polars fast DataFrame library. Use for fast data processing.
Use when animation runs slow, janky, or causes frame drops
This skill should be used at the start of any computationally intensive scientific task to detect and report available system resources (CPU cores, GPUs, memory, disk space). It creates a JSON file with resource information and strategic recommendations that inform computational approach decisions such as whether to use parallel processing (joblib, multiprocessing), out-of-core computing (Dask, Zarr), GPU acceleration (PyTorch, JAX), or memory-efficient strategies. Use this skill before running analyses, training models, processing large datasets, or any task where resource constraints matter.
Comprehensive Modal.com platform knowledge covering all features, pricing, and best practices
Composable transformations of Python+NumPy programs. Differentiate, vectorize, JIT-compile to GPU/TPU. Built for high-performance machine learning research and complex scientific simulations. Use for automatic differentiation, GPU/TPU acceleration, higher-order derivatives, physics-informed machine learning, differentiable simulations, and automatic vectorization.
Structure prediction using Boltz-1/Boltz-2, an open biomolecular structure predictor. Use this skill when: (1) Predicting protein complex structures, (2) Validating designed binders, (3) Need open-source alternative to AF2, (4) Predicting protein-ligand complexes, (5) Using local GPU resources. For QC thresholds, use protein-qc. For AlphaFold2 prediction, use alphafold. For Chai prediction, use chai.
XGBoost gradient boosting library. Use for tabular ML.
WebGPU fundamentals for high-performance canvas rendering. Covers device initialization, buffer management, WGSL shaders, render pipelines, compute shaders, and web component integration. Use when building GPU-accelerated graphics, particle systems, or compute-intensive visualizations.