Loading...
Loading...
Found 4 Skills
Autonomous LLM training optimization with GPU support. Runs 5-minute training experiments, measures val_bpb, keeps improvements or reverts — repeat forever. Use this skill when the user asks to "train a model autonomously", "optimize LLM training", "run ML experiments", "autoresearch with GPU", "optimize val_bpb", "autonomous ML training", "LLM pretraining loop", "setup ML autoresearch", "GPU training experiments", "pretrain from scratch", "speed up training", "lower my loss", "GPU optimization", "CUDA training", or mentions "train.py", "prepare.py", "bits per byte", "val_bpb", "NVIDIA GPU training", "RTX training", "H100 training", "autonomous model training", "consumer GPU training", "low VRAM training". Always use this skill when the user wants to autonomously optimize any ML training metric.
Operational guide for enabling hierarchical context parallelism in Megatron-Bridge, including config knobs, code anchors, pitfalls, and verification.
Operational guide for enabling hierarchical context parallelism in Megatron-Bridge, including config knobs, code anchors, pitfalls, and verification.
How to launch distributed Megatron-LM training jobs on a SLURM cluster. Covers a minimal sbatch skeleton, environment-variable setup for torch.distributed.run, CUDA_DEVICE_MAX_CONNECTIONS rules across hardware and parallelism modes, container conventions, monitoring, and per-rank failure diagnosis.