Search Results: rlhf

Found 10 Skills

AI & Machine Learningdavila7/claude-code-templ...

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

🇺🇸|EnglishTranslated

AI & Machine Learningitsmostafa/llm-engineerin...

rlhf

Understanding Reinforcement Learning from Human Feedback (RLHF) for aligning language models. Use when learning about preference data, reward modeling, policy optimization, or direct alignment algorithms like DPO.

🇺🇸|EnglishTranslated

AI & Machine Learningomer-metin/skills-for-ant...

reinforcement-learning

Use when implementing RL algorithms, training agents with rewards, or aligning LLMs with human feedback - covers policy gradients, PPO, Q-learning, RLHF, and GRPOUse when ", " mentioned.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

fine-tuning-with-trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

🇺🇸|EnglishTranslated

AI & Machine Learningsundial-org/skills

training-data-curation

Guidelines for creating high-quality datasets for LLM post-training (SFT/DPO/RLHF). Use when preparing data for fine-tuning, evaluating data quality, or designing data collection strategies.

🇺🇸|EnglishTranslated

AI & Machine Learningeyadsibai/ltk

llm-training

Use when "training LLM", "finetuning", "RLHF", "distributed training", "DeepSpeed", "Accelerate", "PyTorch Lightning", "Ray Train", "TRL", "Unsloth", "LoRA training", "flash attention", "gradient checkpointing"

🇺🇸|EnglishTranslated

AI & Machine Learningvuralserhat86/antigravity...

model_finetuning

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/trending-skills

pyre-code-ml-practice

Self-hosted ML coding practice platform with 68 problems covering Transformers, diffusion, RLHF, and more — instant browser feedback, no GPU required.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

🇺🇸|EnglishTranslated

AI & Machine Learningsickn33/antigravity-aweso...

hugging-face-model-trainer

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for...

🇺🇸|EnglishTranslated