Search Results: rlhf

Found 9 Skills

AI & Machine Learningomer-metin/skills-for-ant...

reinforcement-learning

Use when implementing RL algorithms, training agents with rewards, or aligning LLMs with human feedback - covers policy gradients, PPO, Q-learning, RLHF, and GRPOUse when ", " mentioned.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

🇺🇸|EnglishTranslated

AI & Machine Learningvuralserhat86/antigravity...

model_finetuning

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

fine-tuning-with-trl

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/trending-skills

pyre-code-ml-practice

Self-hosted ML coding practice platform with 68 problems covering Transformers, diffusion, RLHF, and more — instant browser feedback, no GPU required.

🇺🇸|EnglishTranslated

Product & Designdaemon-blockint-tech/agen...

product-management-human-data-platform

Guides product management for human data platforms—annotation and labeling products, workforce workflows, task design, quality systems (gold sets, adjudication, inter-annotator agreement), customer ML-team project delivery, contributor experience, and privacy-safe handling of human-generated training data. Use when prioritizing roadmap for labeling/RLHF/eval data platforms, writing PRDs for annotation or QA features, defining success metrics for throughput and quality, scoping enterprise customer workflows, or balancing cost-quality-speed tradeoffs—not for hands-on model training (data-scientist), warehouse/analytics pipelines (data-warehouse-engineer), generic BRD workshops without product lens (business-analyst), AI solution architecture for copilots (applied-ai-architect-commercial-enterprise), or control implementation for audits (compliance-engineer). UX flows: product-designer. Eval harnesses: prompt-engineer-agent-prompts-evals. Pricing/packaging for platform: product-management-monetization.

🇺🇸|EnglishTranslated

AI & Machine Learningsickn33/antigravity-aweso...

hugging-face-model-trainer

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for...

🇺🇸|EnglishTranslated

AI & Machine Learningeyadsibai/ltk

llm-training

Use when "training LLM", "finetuning", "RLHF", "distributed training", "DeepSpeed", "Accelerate", "PyTorch Lightning", "Ray Train", "TRL", "Unsloth", "LoRA training", "flash attention", "gradient checkpointing"

🇺🇸|EnglishTranslated