huggingface-llm-trainer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

TRL Training on Hugging Face Jobs

在Hugging Face Jobs上进行TRL训练

Overview

概述

Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.
TRL provides multiple training methods:
  • SFT (Supervised Fine-Tuning) - Standard instruction tuning
  • DPO (Direct Preference Optimization) - Alignment from preference data
  • GRPO (Group Relative Policy Optimization) - Online RL training
  • Reward Modeling - Train reward models for RLHF
For detailed TRL method documentation:
python
hf_doc_search("your query", product="trl")
hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer")  # SFT
hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer")  # DPO
在全托管的Hugging Face基础设施上,使用TRL(Transformer Reinforcement Learning)训练语言模型。无需本地GPU配置——模型在云GPU上训练,结果自动保存到Hugging Face Hub。
TRL提供多种训练方法:
  • SFT(监督式微调)- 标准指令微调
  • DPO(直接偏好优化)- 基于偏好数据的对齐训练
  • GRPO(组相对策略优化)- 在线强化学习训练
  • 奖励建模 - 为RLHF训练奖励模型
如需TRL方法的详细文档:
python
hf_doc_search("your query", product="trl")
hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer")  # SFT
hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer")  # DPO

etc.

etc.


**See also:** `references/training_methods.md` for method overviews and selection guidance

**另请参阅:** `references/training_methods.md` 获取方法概述和选择指南

When to Use This Skill

何时使用本技能

Use this skill when users want to:
  • Fine-tune language models on cloud GPUs without local infrastructure
  • Train with TRL methods (SFT, DPO, GRPO, etc.)
  • Run training jobs on Hugging Face Jobs infrastructure
  • Convert trained models to GGUF for local deployment (Ollama, LM Studio, llama.cpp)
  • Ensure trained models are permanently saved to the Hub
  • Use modern workflows with optimized defaults
当用户需要以下操作时,使用本技能:
  • 无需本地基础设施,在云GPU上微调语言模型
  • 使用TRL方法(SFT、DPO、GRPO等)进行训练
  • 在Hugging Face Jobs基础设施上运行训练任务
  • 将训练后的模型转换为GGUF格式以进行本地部署(Ollama、LM Studio、llama.cpp)
  • 确保训练后的模型永久保存到Hub
  • 使用具有优化默认设置的现代工作流

When to Use Unsloth

何时使用Unsloth

Use Unsloth (
references/unsloth.md
) instead of standard TRL when:
  • Limited GPU memory - Unsloth uses ~60% less VRAM
  • Speed matters - Unsloth is ~2x faster
  • Training large models (>13B) - memory efficiency is critical
  • Training Vision-Language Models (VLMs) - Unsloth has
    FastVisionModel
    support
See
references/unsloth.md
for complete Unsloth documentation and
scripts/unsloth_sft_example.py
for a production-ready training script.
当出现以下情况时,使用Unsloth
references/unsloth.md
)替代标准TRL:
  • GPU内存有限 - Unsloth使用的VRAM减少约60%
  • 追求速度 - Unsloth速度提升约2倍
  • 训练大模型(>13B参数) - 内存效率至关重要
  • 训练视觉语言模型(VLM) - Unsloth支持
    FastVisionModel
有关完整的Unsloth文档,请参阅
references/unsloth.md
,生产就绪的训练脚本请查看
scripts/unsloth_sft_example.py

Key Directives

核心准则

When assisting with training jobs:
  1. ALWAYS use
    hf_jobs()
    MCP tool
    - Submit jobs using
    hf_jobs("uv", {...})
    , NOT bash
    trl-jobs
    commands. The
    script
    parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to
    hf_jobs()
    . If user asks to "train a model", "fine-tune", or similar requests, you MUST create the training script AND submit the job immediately using
    hf_jobs()
    .
  2. Always include Trackio - Every training script should include Trackio for real-time monitoring. Use example scripts in
    scripts/
    as templates.
  3. Provide job details after submission - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
  4. Use example scripts as templates - Reference
    scripts/train_sft_example.py
    ,
    scripts/train_dpo_example.py
    , etc. as starting points.
在协助处理训练任务时:
  1. 务必使用
    hf_jobs()
    MCP工具
    - 使用
    hf_jobs("uv", {...})
    提交任务,而非bash的
    trl-jobs
    命令。
    script
    参数可直接接受Python代码。除非用户明确要求,否则不要保存到本地文件。将脚本内容以字符串形式传递给
    hf_jobs()
    。如果用户要求“训练模型”“微调”或类似请求,你必须创建训练脚本并立即使用
    hf_jobs()
    提交任务。
  2. 始终包含Trackio - 每个训练脚本都应包含Trackio以实现实时监控。以
    scripts/
    目录下的示例脚本为模板。
  3. 提交后提供任务详情 - 提交完成后,提供任务ID、监控URL、预计时间,并告知用户之后可请求状态检查。
  4. 以示例脚本为模板 - 参考
    scripts/train_sft_example.py
    scripts/train_dpo_example.py
    等作为起点。

Local Script Execution

本地脚本执行

Repository scripts use PEP 723 inline dependencies. Run them with
uv run
:
bash
uv run scripts/estimate_cost.py --help
uv run scripts/dataset_inspector.py --help
仓库中的脚本使用PEP 723内联依赖。使用
uv run
运行:
bash
uv run scripts/estimate_cost.py --help
uv run scripts/dataset_inspector.py --help

Prerequisites Checklist

先决条件检查清单

Before starting any training job, verify:
在启动任何训练任务之前,请验证以下内容:

Account & Authentication

账户与认证

  • Hugging Face Account with Pro, Team, or Enterprise plan (Jobs require paid plan)
  • Authenticated login: Check with
    hf_whoami()
  • HF_TOKEN for Hub Push ⚠️ CRITICAL - Training environment is ephemeral, must push to Hub or ALL training results are lost
  • Token must have write permissions
  • MUST pass
    secrets={"HF_TOKEN": "$HF_TOKEN"}
    in job config
    to make token available (the
    $HF_TOKEN
    syntax references your actual token value)
  • 拥有Pro、Team或Enterprise计划的Hugging Face账户(Jobs需要付费计划)
  • 已完成认证登录:使用
    hf_whoami()
    检查
  • 用于Hub推送的HF_TOKEN ⚠️ 至关重要 - 训练环境是临时的,必须推送到Hub,否则所有训练结果都会丢失
  • Token必须具有写入权限
  • 必须在任务配置中传入
    secrets={"HF_TOKEN": "$HF_TOKEN"}
    以确保Token可用(
    $HF_TOKEN
    语法引用你的实际Token值)

Dataset Requirements

数据集要求

  • Dataset must exist on Hub or be loadable via
    datasets.load_dataset()
  • Format must match training method (SFT: "messages"/text/prompt-completion; DPO: chosen/rejected; GRPO: prompt-only)
  • ALWAYS validate unknown datasets before GPU training to prevent format failures (see Dataset Validation section below)
  • Size appropriate for hardware (Demo: 50-100 examples on t4-small; Production: 1K-10K+ on a10g-large/a100-large)
  • 数据集必须存在于Hub上,或可通过
    datasets.load_dataset()
    加载
  • 格式必须与训练方法匹配(SFT:"messages"/文本/提示-补全;DPO:chosen/rejected;GRPO:仅提示)
  • 在GPU训练前,务必验证未知数据集 以避免格式错误(请参阅下方数据集验证部分)
  • 大小与硬件匹配(演示:在t4-small上使用50-100个示例;生产环境:在a10g-large/a100-large上使用1K-10K+个示例)

⚠️ Critical Settings

⚠️ 关键设置

  • Timeout must exceed expected training time - Default 30min is TOO SHORT for most training. Minimum recommended: 1-2 hours. Job fails and loses all progress if timeout is exceeded.
  • Hub push must be enabled - Config:
    push_to_hub=True
    ,
    hub_model_id="username/model-name"
    ; Job:
    secrets={"HF_TOKEN": "$HF_TOKEN"}
  • 超时时间必须超过预期训练时间 - 默认30分钟对于大多数训练来说太短。建议最小值:1-2小时。如果超时,任务将失败,所有未保存的进度都会丢失。
  • 必须启用Hub推送 - 配置:
    push_to_hub=True
    hub_model_id="username/model-name"
    ;任务:
    secrets={"HF_TOKEN": "$HF_TOKEN"}

Asynchronous Job Guidelines

异步任务指南

⚠️ IMPORTANT: Training jobs run asynchronously and can take hours
⚠️ 重要提示:训练任务以异步方式运行,可能需要数小时

Action Required

必要操作

When user requests training:
  1. Create the training script with Trackio included (use
    scripts/train_sft_example.py
    as template)
  2. Submit immediately using
    hf_jobs()
    MCP tool with script content inline - don't save to file unless user requests
  3. Report submission with job ID, monitoring URL, and estimated time
  4. Wait for user to request status checks - don't poll automatically
当用户请求训练时:
  1. 创建包含Trackio的训练脚本(以
    scripts/train_sft_example.py
    为模板)
  2. 立即提交 - 使用
    hf_jobs()
    MCP工具,将脚本内容内联传入,除非用户要求否则不要保存到文件
  3. 反馈提交结果 - 提供任务ID、监控URL和预计时间
  4. 等待用户请求 - 不要自动轮询,等待用户请求状态检查

Ground Rules

基本原则

  • Jobs run in background - Submission returns immediately; training continues independently
  • Initial logs delayed - Can take 30-60 seconds for logs to appear
  • User checks status - Wait for user to request status updates
  • Avoid polling - Check logs only on user request; provide monitoring links instead
  • 任务在后台运行 - 提交后立即返回,训练将独立进行
  • 初始日志延迟 - 日志可能需要30-60秒才会显示
  • 用户主动检查状态 - 等待用户请求状态更新
  • 避免轮询 - 仅在用户请求时检查日志,提供监控链接即可

After Submission

提交后操作

Provide to user:
  • ✅ Job ID and monitoring URL
  • ✅ Expected completion time
  • ✅ Trackio dashboard URL
  • ✅ Note that user can request status checks later
Example Response:
✅ Job submitted successfully!

Job ID: abc123xyz
Monitor: https://huggingface.co/jobs/username/abc123xyz

Expected time: ~2 hours
Estimated cost: ~$10

The job is running in the background. Ask me to check status/logs when ready!
需向用户提供:
  • ✅ 任务ID和监控URL
  • ✅ 预计完成时间
  • ✅ Trackio仪表板URL
  • ✅ 告知用户之后可请求状态检查
示例回复:
✅ 任务提交成功!

任务ID:abc123xyz
监控链接:https://huggingface.co/jobs/username/abc123xyz

预计时间:约2小时
预计成本:约10美元

任务正在后台运行。准备好后可随时让我检查状态/日志!

Quick Start: Three Approaches

快速开始:三种方法

💡 Tip for Demos: For quick demos on smaller GPUs (t4-small), omit
eval_dataset
and
eval_strategy
to save ~40% memory. You'll still see training loss and learning progress.
💡 演示提示: 在较小GPU(t4-small)上进行快速演示时,省略
eval_dataset
eval_strategy
可节省约40%的内存。你仍能看到训练损失和学习进度。

Sequence Length Configuration

序列长度配置

TRL config classes use
max_length
(not
max_seq_length
)
to control tokenized sequence length:
python
undefined
**TRL配置类使用
max_length
(而非
max_seq_length
)**来控制分词后的序列长度:
python
undefined

✅ CORRECT - If you need to set sequence length

✅ 正确 - 如需设置序列长度

SFTConfig(max_length=512) # Truncate sequences to 512 tokens DPOConfig(max_length=2048) # Longer context (2048 tokens)
SFTConfig(max_length=512) # 将序列截断为512个token DPOConfig(max_length=2048) # 更长的上下文(2048个token)

❌ WRONG - This parameter doesn't exist

❌ 错误 - 该参数不存在

SFTConfig(max_seq_length=512) # TypeError!

**Default behavior:** `max_length=1024` (truncates from right). This works well for most training.

**When to override:**
- **Longer context**: Set higher (e.g., `max_length=2048`)
- **Memory constraints**: Set lower (e.g., `max_length=512`)
- **Vision models**: Set `max_length=None` (prevents cutting image tokens)

**Usually you don't need to set this parameter at all** - the examples below use the sensible default.
SFTConfig(max_seq_length=512) # 类型错误!

**默认行为:** `max_length=1024`(从右侧截断)。这适用于大多数训练场景。

**何时需要覆盖默认值:**
- **更长上下文**:设置更大的值(如`max_length=2048`)
- **内存受限**:设置更小的值(如`max_length=512`)
- **视觉模型**:设置`max_length=None`(避免截断图像token)

**通常你无需设置此参数** - 以下示例使用合理的默认值。

Approach 1: UV Scripts (Recommended—Default Choice)

方法1:UV脚本(推荐——默认选择)

UV scripts use PEP 723 inline dependencies for clean, self-contained training. This is the primary approach for Claude Code.
python
hf_jobs("uv", {
    "script": """
UV脚本使用PEP 723内联依赖,实现简洁、独立的训练。这是Claude Code中的主要方法。
python
hf_jobs("uv", {
    "script": """

/// script

/// script

dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]

dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]

///

///

from datasets import load_dataset from peft import LoraConfig from trl import SFTTrainer, SFTConfig import trackio
dataset = load_dataset("trl-lib/Capybara", split="train")
from datasets import load_dataset from peft import LoraConfig from trl import SFTTrainer, SFTConfig import trackio
dataset = load_dataset("trl-lib/Capybara", split="train")

Create train/eval split for monitoring

创建训练/验证拆分以进行监控

dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
trainer = SFTTrainer( model="Qwen/Qwen2.5-0.5B", train_dataset=dataset_split["train"], eval_dataset=dataset_split["test"], peft_config=LoraConfig(r=16, lora_alpha=32), args=SFTConfig( output_dir="my-model", push_to_hub=True, hub_model_id="username/my-model", num_train_epochs=3, eval_strategy="steps", eval_steps=50, report_to="trackio", project="meaningful_prject_name", # project name for the training name (trackio) run_name="meaningful_run_name", # descriptive name for the specific training run (trackio) ) )
trainer.train() trainer.push_to_hub() """, "flavor": "a10g-large", "timeout": "2h", "secrets": {"HF_TOKEN": "$HF_TOKEN"} })

**Benefits:** Direct MCP tool usage, clean code, dependencies declared inline (PEP 723), no file saving required, full control
**When to use:** Default choice for all training tasks in Claude Code, custom training logic, any scenario requiring `hf_jobs()`
dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
trainer = SFTTrainer( model="Qwen/Qwen2.5-0.5B", train_dataset=dataset_split["train"], eval_dataset=dataset_split["test"], peft_config=LoraConfig(r=16, lora_alpha=32), args=SFTConfig( output_dir="my-model", push_to_hub=True, hub_model_id="username/my-model", num_train_epochs=3, eval_strategy="steps", eval_steps=50, report_to="trackio", project="meaningful_prject_name", # 训练项目名称(trackio) run_name="meaningful_run_name", # 特定训练运行的描述性名称(trackio) ) )
trainer.train() trainer.push_to_hub() """, "flavor": "a10g-large", "timeout": "2h", "secrets": {"HF_TOKEN": "$HF_TOKEN"} })

**优势:** 直接使用MCP工具,代码简洁,内声明依赖(PEP 723),无需保存文件,完全可控
**适用场景:** Claude Code中所有训练任务的默认选择,自定义训练逻辑,任何需要`hf_jobs()`的场景

Working with Scripts

脚本使用注意事项

⚠️ Important: The
script
parameter accepts either inline code (as shown above) OR a URL. Local file paths do NOT work.
Why local paths don't work: Jobs run in isolated Docker containers without access to your local filesystem. Scripts must be:
  • Inline code (recommended for custom training)
  • Publicly accessible URLs
  • Private repo URLs (with HF_TOKEN)
Common mistakes:
python
undefined
⚠️ 重要提示:
script
参数可接受内联代码(如上所示)或URL。本地文件路径无效。
本地路径无效的原因: 任务在隔离的Docker容器中运行,无法访问你的本地文件系统。脚本必须是:
  • 内联代码(自定义训练的推荐方式)
  • 可公开访问的URL
  • 私有仓库URL(需HF_TOKEN)
常见错误:
python
undefined

❌ These will all fail

❌ 这些操作都会失败

hf_jobs("uv", {"script": "train.py"}) hf_jobs("uv", {"script": "./scripts/train.py"}) hf_jobs("uv", {"script": "/path/to/train.py"})

**Correct approaches:**
```python
hf_jobs("uv", {"script": "train.py"}) hf_jobs("uv", {"script": "./scripts/train.py"}) hf_jobs("uv", {"script": "/path/to/train.py"})

**正确方式:**
```python

✅ Inline code (recommended)

✅ 内联代码(推荐)

hf_jobs("uv", {"script": "# /// script\n# dependencies = [...]\n# ///\n\n<your code>"})
hf_jobs("uv", {"script": "# /// script\

✅ From Hugging Face Hub

dependencies = [...]\

///\


<your code>"})

✅ From GitHub

✅ 来自Hugging Face Hub

✅ From Gist

✅ 来自GitHub


**To use local scripts:** Upload to HF Hub first:
```bash
hf repos create my-training-scripts --type model
hf upload my-training-scripts ./train.py train.py
undefined

**使用本地脚本的方法:** 先上传到HF Hub:
```bash
hf repos create my-training-scripts --type model
hf upload my-training-scripts ./train.py train.py

Approach 2: TRL Maintained Scripts (Official Examples)

TRL provides battle-tested scripts for all methods. Can be run from URLs:
python
hf_jobs("uv", {
    "script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",
    "script_args": [
        "--model_name_or_path", "Qwen/Qwen2.5-0.5B",
        "--dataset_name", "trl-lib/Capybara",
        "--output_dir", "my-model",
        "--push_to_hub",
        "--hub_model_id", "username/my-model"
    ],
    "flavor": "a10g-large",
    "timeout": "2h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})
Benefits: No code to write, maintained by TRL team, production-tested When to use: Standard TRL training, quick experiments, don't need custom code Available: Scripts are available from https://github.com/huggingface/trl/tree/main/examples/scripts
undefined

Finding More UV Scripts on Hub

方法2:TRL官方维护脚本(官方示例)

The
uv-scripts
organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:
python
undefined
TRL为所有方法提供经过实战测试的脚本,可通过URL运行:
python
hf_jobs("uv", {
    "script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",
    "script_args": [
        "--model_name_or_path", "Qwen/Qwen2.5-0.5B",
        "--dataset_name", "trl-lib/Capybara",
        "--output_dir", "my-model",
        "--push_to_hub",
        "--hub_model_id", "username/my-model"
    ],
    "flavor": "a10g-large",
    "timeout": "2h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})
优势: 无需编写代码,由TRL团队维护,经过生产环境测试 适用场景: 标准TRL训练,快速实验,无需自定义代码 可用脚本: 请访问https://github.com/huggingface/trl/tree/main/examples/scripts

Discover available UV script collections

在Hub上查找更多UV脚本

dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})
uv-scripts
组织在Hugging Face Hub上提供了现成可用的UV脚本,存储为数据集:
python
undefined

Explore a specific collection

发现可用的UV脚本集合

hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)

**Popular collections:** ocr, classification, synthetic-data, vllm, dataset-creation
dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})

Approach 3: HF Jobs CLI (Direct Terminal Commands)

浏览特定集合

When the
hf_jobs()
MCP tool is unavailable, use the
hf jobs
CLI directly.
⚠️ CRITICAL: CLI Syntax Rules
bash
undefined
hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)

**热门集合:** ocr、classification、synthetic-data、vllm、dataset-creation

✅ CORRECT syntax - flags BEFORE script URL

方法3:HF Jobs CLI(直接终端命令)

hf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN "https://example.com/train.py"
hf_jobs()
MCP工具不可用时,直接使用
hf jobs
CLI。
⚠️ 关键:CLI语法规则
bash
undefined

❌ WRONG - "run uv" instead of "uv run"

✅ 正确语法 - 标志位在脚本URL之前

hf jobs run uv "https://example.com/train.py" --flavor a10g-large
hf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN "https://example.com/train.py"

❌ WRONG - flags AFTER script URL (will be ignored!)

❌ 错误 - 使用"run uv"而非"uv run"

hf jobs uv run "https://example.com/train.py" --flavor a10g-large
hf jobs run uv "https://example.com/train.py" --flavor a10g-large

❌ WRONG - "--secret" instead of "--secrets" (plural)

❌ 错误 - 标志位在脚本URL之后(将被忽略!)

hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"

**Key syntax rules:**
1. Command order is `hf jobs uv run` (NOT `hf jobs run uv`)
2. All flags (`--flavor`, `--timeout`, `--secrets`) must come BEFORE the script URL
3. Use `--secrets` (plural), not `--secret`
4. Script URL must be the last positional argument

**Complete CLI example:**
```bash
hf jobs uv run \
  --flavor a10g-large \
  --timeout 2h \
  --secrets HF_TOKEN \
  "https://huggingface.co/user/repo/resolve/main/train.py"
Check job status via CLI:
bash
hf jobs ps                        # List all jobs
hf jobs logs <job-id>             # View logs
hf jobs inspect <job-id>          # Job details
hf jobs cancel <job-id>           # Cancel a job
hf jobs uv run "https://example.com/train.py" --flavor a10g-large

Approach 4: TRL Jobs Package (Simplified Training)

❌ 错误 - 使用"--secret"而非"--secrets"(复数形式)

The
trl-jobs
package provides optimized defaults and one-liner training.
bash
uvx trl-jobs sft \
  --model_name Qwen/Qwen2.5-0.5B \
  --dataset_name trl-lib/Capybara
Benefits: Pre-configured settings, automatic Trackio integration, automatic Hub push, one-line commands When to use: User working in terminal directly (not Claude Code context), quick local experimentation Repository: https://github.com/huggingface/trl-jobs
⚠️ In Claude Code context, prefer using
hf_jobs()
MCP tool (Approach 1) when available.
hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"

**核心语法规则:**
1. 命令顺序为`hf jobs uv run`(而非`hf jobs run uv`)
2. 所有标志位(`--flavor`、`--timeout`、`--secrets`)必须在脚本URL之前
3. 使用`--secrets`(复数形式),而非`--secret`
4. 脚本URL必须是最后一个位置参数

**完整CLI示例:**
```bash
hf jobs uv run \\
  --flavor a10g-large \\
  --timeout 2h \\
  --secrets HF_TOKEN \\
  "https://huggingface.co/user/repo/resolve/main/train.py"
通过CLI检查任务状态:
bash
hf jobs ps                        # 列出所有任务
hf jobs logs <job-id>             # 查看日志
hf jobs inspect <job-id>          # 任务详情
hf jobs cancel <job-id>           # 取消任务

Hardware Selection

方法4:TRL Jobs包(简化训练)

Model SizeRecommended HardwareCost (approx/hr)Use Case
<1B params
t4-small
~$0.75Demos, quick tests only without eval steps
1-3B params
t4-medium
,
l4x1
~$1.50-2.50Development
3-7B params
a10g-small
,
a10g-large
~$3.50-5.00Production training
7-13B params
a10g-large
,
a100-large
~$5-10Large models (use LoRA)
13B+ params
a100-large
,
a10g-largex2
~$10-20Very large (use LoRA)
GPU Flavors: cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8
Guidelines:
  • Use LoRA/PEFT for models >7B to reduce memory
  • Multi-GPU automatically handled by TRL/Accelerate
  • Start with smaller hardware for testing
See:
references/hardware_guide.md
for detailed specifications
trl-jobs
包提供优化的默认设置和单行命令训练。
bash
uvx trl-jobs sft \\
  --model_name Qwen/Qwen2.5-0.5B \\
  --dataset_name trl-lib/Capybara
优势: 预配置设置,自动集成Trackio,自动推送到Hub,单行命令 适用场景: 用户直接在终端工作(非Claude Code上下文),快速本地实验 仓库: https://github.com/huggingface/trl-jobs
⚠️ 在Claude Code上下文中,当可用时,优先使用
hf_jobs()
MCP工具(方法1)。

Critical: Saving Results to Hub

硬件选择

⚠️ EPHEMERAL ENVIRONMENT—MUST PUSH TO HUB
The Jobs environment is temporary. All files are deleted when the job ends. If the model isn't pushed to Hub, ALL TRAINING IS LOST.
模型大小推荐硬件大致成本(每小时)适用场景
<1B参数
t4-small
~$0.75演示、无验证步骤的快速测试
1-3B参数
t4-medium
,
l4x1
~$1.50-2.50开发
3-7B参数
a10g-small
,
a10g-large
~$3.50-5.00生产环境训练
7-13B参数
a10g-large
,
a100-large
~$5-10大模型(使用LoRA)
13B+参数
a100-large
,
a10g-largex2
~$10-20超大规模模型(使用LoRA)
GPU类型: cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8
指南:
  • 对于>7B参数的模型,使用LoRA/PEFT以减少内存占用
  • TRL/Accelerate自动处理多GPU
  • 先使用较小硬件进行测试
详情:
references/hardware_guide.md
包含详细规格

Required Configuration

关键:将结果保存到Hub

In training script/config:
python
SFTConfig(
    push_to_hub=True,
    hub_model_id="username/model-name",  # MUST specify
    hub_strategy="every_save",  # Optional: push checkpoints
)
In job submission:
python
{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # Enables authentication
}
⚠️ 临时环境——必须推送到Hub
Jobs环境是临时的,任务结束后所有文件将被删除。如果模型未推送到Hub,所有训练成果将丢失

Verification Checklist

必要配置

Before submitting:
  • push_to_hub=True
    set in config
  • hub_model_id
    includes username/repo-name
  • secrets
    parameter includes HF_TOKEN
  • User has write access to target repo
See:
references/hub_saving.md
for detailed troubleshooting
在训练脚本/配置中:
python
SFTConfig(
    push_to_hub=True,
    hub_model_id="username/model-name",  # 必须指定
    hub_strategy="every_save",  # 可选:推送检查点
)
在任务提交时:
python
{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # 启用认证
}

Timeout Management

验证清单

⚠️ DEFAULT: 30 MINUTES—TOO SHORT FOR TRAINING
提交前检查:
  • 配置中已设置
    push_to_hub=True
  • hub_model_id
    包含用户名/仓库名称
  • secrets
    参数包含HF_TOKEN
  • 用户对目标仓库有写入权限
详情:
references/hub_saving.md
包含详细故障排除指南

Setting Timeouts

超时管理

python
{
    "timeout": "2h"   # 2 hours (formats: "90m", "2h", "1.5h", or seconds as integer)
}
⚠️ 默认值:30分钟——对于训练来说太短

Timeout Guidelines

设置超时时间

ScenarioRecommendedNotes
Quick demo (50-100 examples)10-30 minVerify setup
Development training1-2 hoursSmall datasets
Production (3-7B model)4-6 hoursFull datasets
Large model with LoRA3-6 hoursDepends on dataset
Always add 20-30% buffer for model/dataset loading, checkpoint saving, Hub push operations, and network delays.
On timeout: Job killed immediately, all unsaved progress lost, must restart from beginning
python
{
    "timeout": "2h"   # 2小时(格式:"90m", "2h", "1.5h",或整数秒)
}

Cost Estimation

超时指南

Offer to estimate cost when planning jobs with known parameters. Use
scripts/estimate_cost.py
:
bash
uv run scripts/estimate_cost.py \
  --model meta-llama/Llama-2-7b-hf \
  --dataset trl-lib/Capybara \
  --hardware a10g-large \
  --dataset-size 16000 \
  --epochs 3
Output includes estimated time, cost, recommended timeout (with buffer), and optimization suggestions.
When to offer: User planning a job, asks about cost/time, choosing hardware, job will run >1 hour or cost >$5
场景推荐超时说明
快速演示(50-100个示例)10-30分钟验证设置
开发训练1-2小时小型数据集
生产环境(3-7B模型)4-6小时完整数据集
使用LoRA的大模型3-6小时取决于数据集
始终增加20-30%的缓冲时间,用于模型/数据集加载、检查点保存、Hub推送操作和网络延迟。
超时后果: 任务立即终止,所有未保存的进度丢失,必须从头开始

Example Training Scripts

成本估算

Production-ready templates with all best practices:
Load these scripts for correctly:
  • scripts/train_sft_example.py
    - Complete SFT training with Trackio, LoRA, checkpoints
  • scripts/train_dpo_example.py
    - DPO training for preference learning
  • scripts/train_grpo_example.py
    - GRPO training for online RL
These scripts demonstrate proper Hub saving, Trackio integration, checkpoint management, and optimized parameters. Pass their content inline to
hf_jobs()
or use as templates for custom scripts.
当任务参数已知时,主动提供成本估算。 使用
scripts/estimate_cost.py
bash
uv run scripts/estimate_cost.py \\
  --model meta-llama/Llama-2-7b-hf \\
  --dataset trl-lib/Capybara \\
  --hardware a10g-large \\
  --dataset-size 16000 \\
  --epochs 3
输出包含预计时间、成本、推荐超时时间(含缓冲)和优化建议。
何时提供: 用户规划任务、询问成本/时间、选择硬件、任务运行时间>1小时或成本>$5时

Monitoring and Tracking

示例训练脚本

Trackio provides real-time metrics visualization. See
references/trackio_guide.md
for complete setup guide.
Key points:
  • Add
    trackio
    to dependencies
  • Configure trainer with
    report_to="trackio" and run_name="meaningful_name"
符合最佳实践的生产就绪模板:
正确加载以下脚本:
  • scripts/train_sft_example.py
    - 完整的SFT训练,包含Trackio、LoRA、检查点
  • scripts/train_dpo_example.py
    - 用于偏好学习的DPO训练
  • scripts/train_grpo_example.py
    - 用于在线RL的GRPO训练
这些脚本展示了正确的Hub保存、Trackio集成、检查点管理和优化参数。将其内容内联传递给
hf_jobs()
,或作为自定义脚本的模板。

Trackio Configuration Defaults

监控与追踪

Use sensible defaults unless user specifies otherwise. When generating training scripts with Trackio:
Default Configuration:
  • Space ID:
    {username}/trackio
    (use "trackio" as default space name)
  • Run naming: Unless otherwise specified, name the run in a way the user will recognize (e.g., descriptive of the task, model, or purpose)
  • Config: Keep minimal - only include hyperparameters and model/dataset info
  • Project Name: Use a Project Name to associate runs with a particular Project
User overrides: If user requests specific trackio configuration (custom space, run naming, grouping, or additional config), apply their preferences instead of defaults.
This is useful for managing multiple jobs with the same configuration or keeping training scripts portable.
See
references/trackio_guide.md
for complete documentation including grouping runs for experiments.
Trackio提供实时指标可视化。完整设置指南请参阅
references/trackio_guide.md
关键点:
  • trackio
    添加到依赖中
  • 配置训练器时设置
    report_to="trackio"
    run_name="meaningful_name"

Check Job Status

Trackio默认配置

python
undefined
除非用户指定,否则使用合理的默认值。 当生成包含Trackio的训练脚本时:
默认配置:
  • 空间ID
    {username}/trackio
    (默认空间名称为"trackio")
  • 运行命名:除非另有指定,为运行命名时使用用户易识别的名称(例如,描述任务、模型或用途)
  • 配置:保持简洁 - 仅包含超参数和模型/数据集信息
  • 项目名称:使用项目名称将运行与特定项目关联
用户自定义: 如果用户要求特定的Trackio配置(自定义空间、运行命名、分组或额外配置),则优先使用用户的偏好设置。
这有助于管理具有相同配置的多个任务,或保持训练脚本的可移植性。
有关完整文档,包括为实验分组运行的方法,请参阅
references/trackio_guide.md

List all jobs

检查任务状态

hf_jobs("ps")
python
undefined

Inspect specific job

列出所有任务

hf_jobs("inspect", {"job_id": "your-job-id"})
hf_jobs("ps")

View logs

检查特定任务

hf_jobs("logs", {"job_id": "your-job-id"})

**Remember:** Wait for user to request status checks. Avoid polling repeatedly.
hf_jobs("inspect", {"job_id": "your-job-id"})

Dataset Validation

查看日志

Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.
hf_jobs("logs", {"job_id": "your-job-id"})

**注意:** 等待用户请求状态检查,避免重复轮询。

Why Validate

数据集验证

  • 50%+ of training failures are due to dataset format issues
  • DPO especially strict: requires exact column names (
    prompt
    ,
    chosen
    ,
    rejected
    )
  • Failed GPU jobs waste $1-10 and 30-60 minutes
  • Validation on CPU costs ~$0.01 and takes <1 minute
在启动GPU训练前,务必验证数据集格式,以避免最常见的训练失败原因:格式不匹配。

When to Validate

为什么要验证

ALWAYS validate for:
  • Unknown or custom datasets
  • DPO training (CRITICAL - 90% of datasets need mapping)
  • Any dataset not explicitly TRL-compatible
Skip validation for known TRL datasets:
  • trl-lib/ultrachat_200k
    ,
    trl-lib/Capybara
    ,
    HuggingFaceH4/ultrachat_200k
    , etc.
  • 超过50%的训练失败是由于数据集格式问题
  • DPO要求尤其严格:需要精确的列名(
    prompt
    ,
    chosen
    ,
    rejected
  • 失败的GPU任务会浪费$1-10和30-60分钟
  • CPU验证成本约$0.01,耗时<1分钟

Usage

何时验证

python
hf_jobs("uv", {
    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
    "script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})
The script is fast, and will usually complete synchronously.
务必验证以下情况:
  • 未知或自定义数据集
  • DPO训练(至关重要——90%的数据集需要映射)
  • 任何未明确标记为TRL兼容的数据集
已知TRL兼容数据集可跳过验证:
  • trl-lib/ultrachat_200k
    ,
    trl-lib/Capybara
    ,
    HuggingFaceH4/ultrachat_200k

Reading Results

使用方法

The output shows compatibility for each training method:
  • ✓ READY
    - Dataset is compatible, use directly
  • ✗ NEEDS MAPPING
    - Compatible but needs preprocessing (mapping code provided)
  • ✗ INCOMPATIBLE
    - Cannot be used for this method
When mapping is needed, the output includes a "MAPPING CODE" section with copy-paste ready Python code.
python
hf_jobs("uv", {
    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
    "script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})
该脚本运行速度快,通常会同步完成。

Example Workflow

解读结果

python
undefined
输出显示数据集对每种训练方法的兼容性:
  • ✓ READY
    - 数据集兼容,可直接使用
  • ✗ NEEDS MAPPING
    - 兼容但需要预处理(提供映射代码)
  • ✗ INCOMPATIBLE
    - 无法用于该方法
当需要映射时,输出包含**“MAPPING CODE”**部分,提供可直接复制粘贴的Python代码。

1. Inspect dataset (costs ~$0.01, <1 min on CPU)

示例工作流

hf_jobs("uv", { "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py", "script_args": ["--dataset", "argilla/distilabel-math-preference-dpo", "--split", "train"] })
python
undefined

2. Check output markers:

1. 检查数据集(成本约$0.01,在CPU上耗时<1分钟)

✓ READY → proceed with training

✗ NEEDS MAPPING → apply mapping code below

✗ INCOMPATIBLE → choose different method/dataset

3. If mapping needed, apply before training:

def format_for_dpo(example): return { 'prompt': example['instruction'], 'chosen': example['chosen_response'], 'rejected': example['rejected_response'], } dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)
hf_jobs("uv", { "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py", "script_args": ["--dataset", "argilla/distilabel-math-preference-dpo", "--split", "train"] })

4. Launch training job with confidence

2. 检查输出标记:

✓ READY → 继续训练

✗ NEEDS MAPPING → 应用下方的映射代码

✗ INCOMPATIBLE → 选择其他方法/数据集

3. 如果需要映射,在训练前应用:

undefined
def format_for_dpo(example): return { 'prompt': example['instruction'], 'chosen': example['chosen_response'], 'rejected': example['rejected_response'], } dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)

Common Scenario: DPO Format Mismatch

4. 放心启动训练任务

Most DPO datasets use non-standard column names. Example:
Dataset has: instruction, chosen_response, rejected_response
DPO expects: prompt, chosen, rejected
The validator detects this and provides exact mapping code to fix it.
undefined

Converting Models to GGUF

常见场景:DPO格式不匹配

After training, convert models to GGUF format for use with llama.cpp, Ollama, LM Studio, and other local inference tools.
What is GGUF:
  • Optimized for CPU/GPU inference with llama.cpp
  • Supports quantization (4-bit, 5-bit, 8-bit) to reduce model size
  • Compatible with Ollama, LM Studio, Jan, GPT4All, llama.cpp
  • Typically 2-8GB for 7B models (vs 14GB unquantized)
When to convert:
  • Running models locally with Ollama or LM Studio
  • Reducing model size with quantization
  • Deploying to edge devices
  • Sharing models for local-first use
See:
references/gguf_conversion.md
for complete conversion guide, including production-ready conversion script, quantization options, hardware requirements, usage examples, and troubleshooting.
Quick conversion:
python
hf_jobs("uv", {
    "script": "<see references/gguf_conversion.md for complete script>",
    "flavor": "a10g-large",
    "timeout": "45m",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
    "env": {
        "ADAPTER_MODEL": "username/my-finetuned-model",
        "BASE_MODEL": "Qwen/Qwen2.5-0.5B",
        "OUTPUT_REPO": "username/my-model-gguf"
    }
})
大多数DPO数据集使用非标准列名。示例:
数据集包含:instruction, chosen_response, rejected_response
DPO期望:prompt, chosen, rejected
验证器会检测到这一点,并提供修复所需的精确映射代码。

Common Training Patterns

将模型转换为GGUF格式

See
references/training_patterns.md
for detailed examples including:
  • Quick demo (5-10 minutes)
  • Production with checkpoints
  • Multi-GPU training
  • DPO training (preference learning)
  • GRPO training (online RL)
训练完成后,将模型转换为GGUF格式,以便与llama.cpp、Ollama、LM Studio和其他本地推理工具配合使用。
什么是GGUF:
  • 针对llama.cpp的CPU/GPU推理优化
  • 支持量化(4位、5位、8位)以减小模型大小
  • 兼容Ollama、LM Studio、Jan、GPT4All、llama.cpp
  • 7B模型通常大小为2-8GB(未量化版本为14GB)
何时转换:
  • 使用Ollama或LM Studio在本地运行模型
  • 通过量化减小模型大小
  • 部署到边缘设备
  • 共享模型以供本地优先使用
详情:
references/gguf_conversion.md
包含完整的转换指南,包括生产就绪的转换脚本、量化选项、硬件要求、使用示例和故障排除。
快速转换:
python
hf_jobs("uv", {
    "script": "<请参阅references/gguf_conversion.md获取完整脚本>",
    "flavor": "a10g-large",
    "timeout": "45m",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
    "env": {
        "ADAPTER_MODEL": "username/my-finetuned-model",
        "BASE_MODEL": "Qwen/Qwen2.5-0.5B",
        "OUTPUT_REPO": "username/my-model-gguf"
    }
})

Common Failure Modes

常见训练模式

Out of Memory (OOM)

Fix (try in order):
  1. Reduce batch size:
    per_device_train_batch_size=1
    , increase
    gradient_accumulation_steps=8
    . Effective batch size is
    per_device_train_batch_size
    x
    gradient_accumulation_steps
    . For best performance keep effective batch size close to 128.
  2. Enable:
    gradient_checkpointing=True
  3. Upgrade hardware: t4-small → l4x1, a10g-small → a10g-large etc.
详细示例请参阅
references/training_patterns.md
,包括:
  • 快速演示(5-10分钟)
  • 带检查点的生产环境训练
  • 多GPU训练
  • DPO训练(偏好学习)
  • GRPO训练(在线RL)

Dataset Misformatted

常见失败模式

内存不足(OOM)

Fix:
  1. Validate first with dataset inspector:
    bash
    uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
      --dataset name --split train
  2. Check output for compatibility markers (✓ READY, ✗ NEEDS MAPPING, ✗ INCOMPATIBLE)
  3. Apply mapping code from inspector output if needed
修复方法(按顺序尝试):
  1. 减小批量大小:
    per_device_train_batch_size=1
    ,增加
    gradient_accumulation_steps=8
    。有效批量大小为
    per_device_train_batch_size
    ×
    gradient_accumulation_steps
    。为获得最佳性能,保持有效批量大小接近128。
  2. 启用:
    gradient_checkpointing=True
  3. 升级硬件:t4-small → l4x1,a10g-small → a10g-large等。

Job Timeout

数据集格式错误

Fix:
  1. Check logs for actual runtime:
    hf_jobs("logs", {"job_id": "..."})
  2. Increase timeout with buffer:
    "timeout": "3h"
    (add 30% to estimated time)
  3. Or reduce training: lower
    num_train_epochs
    , use smaller dataset, enable
    max_steps
  4. Save checkpoints:
    save_strategy="steps"
    ,
    save_steps=500
    ,
    hub_strategy="every_save"
Note: Default 30min is insufficient for real training. Minimum 1-2 hours.
修复方法:
  1. 首先使用数据集检查器验证:
    bash
    uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \\
      --dataset name --split train
  2. 检查输出中的兼容性标记(✓ READY, ✗ NEEDS MAPPING, ✗ INCOMPATIBLE)
  3. 如果需要,应用检查器输出中的映射代码

Hub Push Failures

任务超时

Fix:
  1. Add to job:
    secrets={"HF_TOKEN": "$HF_TOKEN"}
  2. Add to config:
    push_to_hub=True
    ,
    hub_model_id="username/model-name"
  3. Verify auth:
    mcp__huggingface__hf_whoami()
  4. Check token has write permissions and repo exists (or set
    hub_private_repo=True
    )
修复方法:
  1. 查看日志了解实际运行时间:
    hf_jobs("logs", {"job_id": "..."})
  2. 增加超时时间并预留缓冲:
    "timeout": "3h"
    (在预计时间基础上增加30%)
  3. 或减少训练量:降低
    num_train_epochs
    ,使用更小的数据集,启用
    max_steps
  4. 保存检查点:
    save_strategy="steps"
    ,
    save_steps=500
    ,
    hub_strategy="every_save"
注意: 默认的30分钟不足以进行实际训练,建议最小值为1-2小时。

Missing Dependencies

Hub推送失败

Fix: Add to PEP 723 header:
python
undefined
修复方法:
  1. 在任务中添加:
    secrets={"HF_TOKEN": "$HF_TOKEN"}
  2. 在配置中添加:
    push_to_hub=True
    ,
    hub_model_id="username/model-name"
  3. 验证认证:
    mcp__huggingface__hf_whoami()
  4. 检查Token是否具有写入权限,以及仓库是否存在(或设置
    hub_private_repo=True

/// script

缺少依赖

dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-package"]

///

undefined
修复方法: 添加到PEP 723头部:
python
undefined

Troubleshooting

/// script

dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-package"]

///

Common issues:
  • Job times out → Increase timeout, reduce epochs/dataset, use smaller model/LoRA
  • Model not saved to Hub → Check push_to_hub=True, hub_model_id, secrets=HF_TOKEN
  • Out of Memory (OOM) → Reduce batch size, increase gradient accumulation, enable LoRA, use larger GPU
  • Dataset format error → Validate with dataset inspector (see Dataset Validation section)
  • Import/module errors → Add PEP 723 header with dependencies, verify format
  • Authentication errors → Check
    mcp__huggingface__hf_whoami()
    , token permissions, secrets parameter
See:
references/troubleshooting.md
for complete troubleshooting guide
undefined

Resources

故障排除

References (In This Skill)

  • references/training_methods.md
    - Overview of SFT, DPO, GRPO, KTO, PPO, Reward Modeling
  • references/training_patterns.md
    - Common training patterns and examples
  • references/unsloth.md
    - Unsloth for fast VLM training (~2x speed, 60% less VRAM)
  • references/gguf_conversion.md
    - Complete GGUF conversion guide
  • references/trackio_guide.md
    - Trackio monitoring setup
  • references/hardware_guide.md
    - Hardware specs and selection
  • references/hub_saving.md
    - Hub authentication troubleshooting
  • references/troubleshooting.md
    - Common issues and solutions
  • references/local_training_macos.md
    - Local training on macOS
常见问题:
  • 任务超时 → 增加超时时间,减少轮次/数据集大小,使用更小的模型/LoRA
  • 模型未保存到Hub → 检查push_to_hub=True、hub_model_id、secrets=HF_TOKEN
  • 内存不足(OOM)→ 减小批量大小,增加梯度累积步数,启用LoRA,使用更大的GPU
  • 数据集格式错误 → 使用数据集检查器验证(请参阅数据集验证部分)
  • 导入/模块错误 → 在PEP 723头部添加依赖,验证格式
  • 认证错误 → 检查
    mcp__huggingface__hf_whoami()
    、Token权限、secrets参数
详情:
references/troubleshooting.md
包含完整的故障排除指南

Scripts (In This Skill)

资源

本技能中的参考文档

  • scripts/train_sft_example.py
    - Production SFT template
  • scripts/train_dpo_example.py
    - Production DPO template
  • scripts/train_grpo_example.py
    - Production GRPO template
  • scripts/unsloth_sft_example.py
    - Unsloth text LLM training template (faster, less VRAM)
  • scripts/estimate_cost.py
    - Estimate time and cost (offer when appropriate)
  • scripts/convert_to_gguf.py
    - Complete GGUF conversion script
  • references/training_methods.md
    - SFT、DPO、GRPO、KTO、PPO、奖励建模的概述
  • references/training_patterns.md
    - 常见训练模式和示例
  • references/unsloth.md
    - Unsloth用于快速VLM训练(速度提升2倍,VRAM使用减少60%)
  • references/gguf_conversion.md
    - 完整的GGUF转换指南
  • references/trackio_guide.md
    - Trackio监控设置
  • references/hardware_guide.md
    - 硬件规格和选择
  • references/hub_saving.md
    - Hub认证故障排除
  • references/troubleshooting.md
    - 常见问题和解决方案
  • references/local_training_macos.md
    - macOS上的本地训练

External Scripts

本技能中的脚本

  • Dataset Inspector - Validate dataset format before training (use via
    uv run
    or
    hf_jobs
    )
  • scripts/train_sft_example.py
    - 生产环境SFT模板
  • scripts/train_dpo_example.py
    - 生产环境DPO模板
  • scripts/train_grpo_example.py
    - 生产环境GRPO模板
  • scripts/unsloth_sft_example.py
    - Unsloth文本LLM训练模板(更快,占用更少VRAM)
  • scripts/estimate_cost.py
    - 估算时间和成本(适当时提供)
  • scripts/convert_to_gguf.py
    - 完整的GGUF转换脚本

External Links

外部脚本

Key Takeaways

外部链接

  1. Submit scripts inline - The
    script
    parameter accepts Python code directly; no file saving required unless user requests
  2. Jobs are asynchronous - Don't wait/poll; let user check when ready
  3. Always set timeout - Default 30 min is insufficient; minimum 1-2 hours recommended
  4. Always enable Hub push - Environment is ephemeral; without push, all results lost
  5. Include Trackio - Use example scripts as templates for real-time monitoring
  6. Offer cost estimation - When parameters are known, use
    scripts/estimate_cost.py
  7. Use UV scripts (Approach 1) - Default to
    hf_jobs("uv", {...})
    with inline scripts; TRL maintained scripts for standard training; avoid bash
    trl-jobs
    commands in Claude Code
  8. Use hf_doc_fetch/hf_doc_search for latest TRL documentation
  9. Validate dataset format before training with dataset inspector (see Dataset Validation section)
  10. Choose appropriate hardware for model size; use LoRA for models >7B

核心要点

  1. 内联提交脚本 -
    script
    参数可直接接受Python代码;除非用户要求,否则无需保存文件
  2. 任务是异步的 - 不要等待/轮询;让用户在准备好后检查
  3. 务必设置超时时间 - 默认30分钟不足;建议最小值为1-2小时
  4. 务必启用Hub推送 - 环境是临时的;如果不推送,所有结果都会丢失
  5. 包含Trackio - 以示例脚本为模板实现实时监控
  6. 提供成本估算 - 当参数已知时,使用
    scripts/estimate_cost.py
  7. 使用UV脚本(方法1) - 默认使用
    hf_jobs("uv", {...})
    和内联脚本;标准训练使用TRL官方维护的脚本;在Claude Code中避免使用bash的
    trl-jobs
    命令
  8. 使用hf_doc_fetch/hf_doc_search获取最新TRL文档
  9. 训练前使用数据集检查器验证格式(请参阅数据集验证部分)
  10. 为模型大小选择合适的硬件;对>7B参数的模型使用LoRA",