Loading...
Loading...
Found 62 Skills
Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use when you need high-performance tokenization or custom tokenizer training.
Feed-forward 3D foundation model for streaming scene reconstruction using Geometric Context Transformer
Guides systematic PyTorch recommender-system model development across compact data facts, existing source code, configs, focused tests, and training loops without overloading context from broad research archives. Use when building, debugging, or refactoring torch/nn.Module RecSys models with Transformer/HSTU/attention blocks, sparse/dense/list feature fusion, pCVR/CTR heads, ablation axes, or competition codebases where many model ideas exist but bugs and interface drift must be controlled. 用来指导推荐系统 PyTorch 模型开发、Transformer/HSTU 建模、关键数据事实、特征交互、shape/debug、训练闭环和已有模型结构的系统化推进。
Validate and use CUDA graph capture in Megatron Bridge, including local full-iteration graphs and Transformer Engine scoped graphs for attention, MLP, and MoE modules.
Validate and use CUDA graph capture in Megatron Bridge, including local full-iteration graphs and Transformer Engine scoped graphs for attention, MLP, and MoE modules.
Response Transformer - Auto-activating skill for API Integration. Triggers on: response transformer, response transformer Part of the API Integration skill category.
Universal Runtime best practices for PyTorch inference, Transformers models, and FastAPI serving. Covers device management, model loading, memory optimization, and performance tuning.
Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4, FP4 formats, QLoRA training, and 8-bit optimizers. Works with HuggingFace Transformers.
GENERator DNA 序列生成模型的昇腾 NPU 迁移 Skill,适用于将基于 HuggingFace Transformers 的 Causal LM 从 CUDA 迁移到华为 Ascend NPU,覆盖环境搭建、依赖安装、代码适配、多进程处理和 sequence recovery 验证。
Train neural models (LSTM, Transformer, N-BEATS) on market data using npx neural-trader with confidence intervals
Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when deploying with vLLM or HuggingFace Transformers.
Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with <2% perplexity degradation, or for faster inference (3-4× speedup) vs FP16. Integrates with transformers and PEFT for QLoRA fine-tuning.