Search Results: rlhf

Found 13 Skills

AI & Machine Learningdavila7/claude-code-templ...

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

🇺🇸|EnglishTranslated

AI & Machine Learningitsmostafa/llm-engineerin...

rlhf

Understanding Reinforcement Learning from Human Feedback (RLHF) for aligning language models. Use when learning about preference data, reward modeling, policy optimization, or direct alignment algorithms like DPO.

🇺🇸|EnglishTranslated

Product & Designdaemon-blockint-tech/agen...

product-management-human-data-platform

Guides product management for human data platforms—annotation and labeling products, workforce workflows, task design, quality systems (gold sets, adjudication, inter-annotator agreement), customer ML-team project delivery, contributor experience, and privacy-safe handling of human-generated training data. Use when prioritizing roadmap for labeling/RLHF/eval data platforms, writing PRDs for annotation or QA features, defining success metrics for throughput and quality, scoping enterprise customer workflows, or balancing cost-quality-speed tradeoffs—not for hands-on model training (data-scientist), warehouse/analytics pipelines (data-warehouse-engineer), generic BRD workshops without product lens (business-analyst), AI solution architecture for copilots (applied-ai-architect-commercial-enterprise), or control implementation for audits (compliance-engineer). UX flows: product-designer. Eval harnesses: prompt-engineer-agent-prompts-evals. Pricing/packaging for platform: product-management-monetization.

🇺🇸|EnglishTranslated

AI & Machine Learningomer-metin/skills-for-ant...

reinforcement-learning

Use when implementing RL algorithms, training agents with rewards, or aligning LLMs with human feedback - covers policy gradients, PPO, Q-learning, RLHF, and GRPOUse when ", " mentioned.

🇺🇸|EnglishTranslated

AI & Machine Learningsundial-org/skills

training-data-curation

Guidelines for creating high-quality datasets for LLM post-training (SFT/DPO/RLHF). Use when preparing data for fine-tuning, evaluating data quality, or designing data collection strategies.

🇺🇸|EnglishTranslated

AI & Machine Learningeyadsibai/ltk

llm-training

Use when "training LLM", "finetuning", "RLHF", "distributed training", "DeepSpeed", "Accelerate", "PyTorch Lightning", "Ray Train", "TRL", "Unsloth", "LoRA training", "flash attention", "gradient checkpointing"

🇺🇸|EnglishTranslated

AI & Machine Learningvuralserhat86/antigravity...

model_finetuning

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

fine-tuning-with-trl

🇺🇸|EnglishTranslated

AI & Machine Learningpromptingcompany/nv-skill...

nemotron-customize

Plan, configure, and chain repo-native Nemotron customization steps into single-step or multi-step pipelines: curation, translation, SFT/PEFT (AutoModel or Megatron-Bridge), pretraining/CPT, RL alignment (DPO/RLVR/GRPO/RLHF), BYOB/MCQ benchmarks, checkpoint conversion, ModelOpt optimization, env profiles, and evaluation of trained checkpoints or existing/hosted endpoints. Use when a request names a Nemotron step or workflow, or asks to clean, translate, train, fine-tune, align, convert, optimize, evaluate, or compose these into a pipeline. Do NOT use for frontend/dashboard/visualization work, generic ML advice, billing/access, or non-Nemotron coding tasks.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

nemotron-customize

Plan Nemotron customization pipelines from repo steps: SFT, PEFT/LoRA, AutoModel vs Megatron-Bridge, DPO/RLVR/GRPO/RLHF, curate-then-translate, BYOB/MCQ benchmark prep or translation, checkpoint conversion, ModelOpt optimization, and endpoint or checkpoint evaluation.

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/trending-skills

pyre-code-ml-practice

Self-hosted ML coding practice platform with 68 problems covering Transformers, diffusion, RLHF, and more — instant browser feedback, no GPU required.

🇺🇸|EnglishTranslated