Loading...
Loading...
Found 30 Skills
Q-learning, DQN, PPO, A3C, policy gradient methods, multi-agent systems, and Gym environments. Use for training agents, game AI, robotics, or decision-making systems.
Use when implementing RL algorithms, training agents with rewards, or aligning LLMs with human feedback - covers policy gradients, PPO, Q-learning, RLHF, and GRPOUse when ", " mentioned.
C++ Reinforcement Learning best practices using libtorch (PyTorch C++ frontend) and modern C++17/20. Use when: - Implementing RL algorithms in C++ for performance-critical applications - Building production RL systems with libtorch - Creating replay buffers and experience storage - Optimizing RL training with GPU acceleration - Deploying RL models with ONNX Runtime
Production-ready reinforcement learning algorithms (PPO, SAC, DQN, TD3, DDPG, A2C) with scikit-learn-like API. Use for standard RL experiments, quick prototyping, and well-documented algorithm implementations. Best for single-agent RL with Gymnasium environments. For high-performance parallel training, multi-agent systems, or custom vectorized environments, use pufferlib instead.
High-performance reinforcement learning framework optimized for speed and scale. Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments (Atari, Procgen, NetHack). Achieves 2-10x speedups over standard implementations. For quick prototyping or standard algorithm implementations with extensive documentation, use stable-baselines3 instead.
Brev instance operating guidance for NeMo-RL agents working in /home/ubuntu/RL with limited workspace disk, a larger /ephemeral volume, and optional /home/ubuntu/RL/.env secrets. Use when running nemo-rl-auto-research campaigns, experiments, training jobs, model or dataset downloads, shared cache-heavy commands, log-producing runs, checkpoint generation, W&B or Hugging Face authenticated workflows, or any workflow that may create large files on Brev.
Build autonomous game-playing agents using AI and reinforcement learning. Covers game environments, agent decision-making, strategy development, and performance optimization. Use when creating game-playing bots, testing game AI, strategic decision-making systems, or game theory applications.
Train and fine-tune transformer language models using TRL (Transformers Reinforcement Learning). Supports SFT, DPO, GRPO, KTO, RLOO and Reward Model training via CLI commands.
Federated learning with Deep Q-Networks for privacy-preserving optimization
Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.
Use when debugging a Nemo Gym run or reward profiling job. Covers rollout collection failures, empty or partial JSONL outputs, stale materialized inputs, verifier/schema errors, Ray or Slurm issues, vLLM readiness, judge failures, tool/sandbox failures, cache problems, and throughput bottlenecks.
Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.