hugging-face-model-trainer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

TRL Training on Hugging Face Jobs

在Hugging Face Jobs上进行TRL训练

Overview

概述

Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.

TRL provides multiple training methods:

SFT (Supervised Fine-Tuning) - Standard instruction tuning
DPO (Direct Preference Optimization) - Alignment from preference data
GRPO (Group Relative Policy Optimization) - Online RL training
Reward Modeling - Train reward models for RLHF

For detailed TRL method documentation:

python

hf_doc_search("your query", product="trl")
hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer")  # SFT
hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer")  # DPO

在全托管的Hugging Face基础设施上使用TRL（Transformer Reinforcement Learning）训练语言模型。无需本地GPU配置——模型在云端GPU上训练，结果自动保存到Hugging Face Hub。

TRL提供多种训练方法：

SFT（监督微调）- 标准指令微调
DPO（直接偏好优化）- 基于偏好数据的对齐
GRPO（组相对策略优化）- 在线RL训练
奖励建模 - 为RLHF训练奖励模型

如需查看详细的TRL方法文档：

python

hf_doc_search("your query", product="trl")
hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer")  # SFT
hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer")  # DPO

etc.


**See also:** `references/training_methods.md` for method overviews and selection guidance


**另请参阅：** `references/training_methods.md` 获取方法概述和选择指南

When to Use This Skill

何时使用该技能

Use this skill when users want to:

Fine-tune language models on cloud GPUs without local infrastructure
Train with TRL methods (SFT, DPO, GRPO, etc.)
Run training jobs on Hugging Face Jobs infrastructure
Convert trained models to GGUF for local deployment (Ollama, LM Studio, llama.cpp)
Ensure trained models are permanently saved to the Hub
Use modern workflows with optimized defaults

当用户需要执行以下操作时使用该技能：

无需本地基础设施，在云端GPU上微调语言模型
使用TRL方法（SFT、DPO、GRPO等）训练
在Hugging Face Jobs基础设施上运行训练任务
将训练好的模型转换为GGUF格式用于本地部署（Ollama、LM Studio、llama.cpp）
确保训练好的模型永久保存到Hub
使用带有优化默认值的现代工作流

When to Use Unsloth

何时使用Unsloth

Use Unsloth (

references/unsloth.md

) instead of standard TRL when:

Limited GPU memory - Unsloth uses ~60% less VRAM
Speed matters - Unsloth is ~2x faster
Training large models (>13B) - memory efficiency is critical
Training Vision-Language Models (VLMs) - Unsloth has
```
FastVisionModel
```
support

See

references/unsloth.md

for complete Unsloth documentation and

scripts/unsloth_sft_example.py

for a production-ready training script.

在以下场景中使用Unsloth（

references/unsloth.md

）替代标准TRL：

GPU内存有限 - Unsloth可减少约60%的VRAM占用
对速度要求高 - Unsloth速度快约2倍
训练大模型（>13B） - 内存效率至关重要
训练视觉语言模型（VLM） - Unsloth支持
```
FastVisionModel
```

查看

references/unsloth.md

获取完整的Unsloth文档，查看

scripts/unsloth_sft_example.py

获取生产级训练脚本。

Key Directives

核心指令

When assisting with training jobs:

ALWAYS use
hf_jobs()
MCP tool - Submit jobs using
```
hf_jobs("uv", {...})
```
, NOT bash
```
trl-jobs
```
commands. The
```
script
```
parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to
```
hf_jobs()
```
. If user asks to "train a model", "fine-tune", or similar requests, you MUST create the training script AND submit the job immediately using
```
hf_jobs()
```
.
Always include Trackio - Every training script should include Trackio for real-time monitoring. Use example scripts in
```
scripts/
```
as templates.
Provide job details after submission - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
Use example scripts as templates - Reference
```
scripts/train_sft_example.py
```
,
```
scripts/train_dpo_example.py
```
, etc. as starting points.

协助处理训练任务时：

始终使用
hf_jobs()
MCP工具 - 使用
```
hf_jobs("uv", {...})
```
提交任务，不要使用bash的
```
trl-jobs
```
命令。
```
script
```
参数直接接受Python代码。除非用户明确要求，否则不要保存到本地文件。将脚本内容作为字符串传递给
```
hf_jobs()
```
。如果用户要求“训练模型”、“微调”或类似请求，你必须创建训练脚本并立即使用
```
hf_jobs()
```
提交任务。
始终包含Trackio - 每个训练脚本都应包含Trackio用于实时监控。使用
```
scripts/
```
中的示例脚本作为模板。
提交后提供任务详情 - 提交后，提供任务ID、监控URL、预计耗时，并说明用户可以后续请求状态检查。
使用示例脚本作为模板 - 参考
```
scripts/train_sft_example.py
```
、
```
scripts/train_dpo_example.py
```
等作为起点。

Local Script Dependencies

本地脚本依赖

To run scripts locally (like

estimate_cost.py

), install dependencies:

bash

pip install -r requirements.txt

如需本地运行脚本（如

estimate_cost.py

），请安装依赖：

bash

pip install -r requirements.txt

Prerequisites Checklist

前置条件检查清单

Before starting any training job, verify:

启动任何训练任务前，请验证：

✅ Account & Authentication

✅ 账号与认证

Hugging Face Account with Pro, Team, or Enterprise plan (Jobs require paid plan)
Authenticated login: Check with
```
hf_whoami()
```
HF_TOKEN for Hub Push ⚠️ CRITICAL - Training environment is ephemeral, must push to Hub or ALL training results are lost
Token must have write permissions
MUST pass
secrets={"HF_TOKEN": "$HF_TOKEN"}
in job config to make token available (the
```
$HF_TOKEN
```
syntax references your actual token value)

拥有Pro、团队或企业套餐的Hugging Face账号（Jobs需要付费套餐）
已完成认证登录：使用
```
hf_whoami()
```
检查
用于Hub推送的HF_TOKEN ⚠️ 关键 - 训练环境是临时的，必须推送到Hub否则所有训练结果都会丢失
Token必须拥有写入权限
必须在任务配置中传入
secrets={"HF_TOKEN": "$HF_TOKEN"}
来提供Token（
```
$HF_TOKEN
```
语法会引用你实际的Token值）

✅ Dataset Requirements

✅ 数据集要求

Dataset must exist on Hub or be loadable via
```
datasets.load_dataset()
```
Format must match training method (SFT: "messages"/text/prompt-completion; DPO: chosen/rejected; GRPO: prompt-only)
ALWAYS validate unknown datasets before GPU training to prevent format failures (see Dataset Validation section below)
Size appropriate for hardware (Demo: 50-100 examples on t4-small; Production: 1K-10K+ on a10g-large/a100-large)

数据集必须存在于Hub上，或可通过
```
datasets.load_dataset()
```
加载
格式必须匹配训练方法（SFT："messages"/文本/提示-补全；DPO：选中/未选中；GRPO：仅提示）
GPU训练前始终验证未知数据集 以避免格式错误（请参阅下方数据集验证章节）
大小适配硬件（演示：在t4-small上使用50-100个样本；生产：在a10g-large/a100-large上使用1K-10K+样本）

⚠️ Critical Settings

⚠️ 关键设置

Timeout must exceed expected training time - Default 30min is TOO SHORT for most training. Minimum recommended: 1-2 hours. Job fails and loses all progress if timeout is exceeded.

Hub push must be enabled - Config:

push_to_hub=True

hub_model_id="username/model-name"

; Job:

secrets={"HF_TOKEN": "$HF_TOKEN"}

超时时间必须超过预期训练时间 - 默认30分钟对大多数训练来说太短。建议最低设置：1-2小时。如果超过超时时间，任务会失败并丢失所有进度。

必须启用Hub推送 - 配置：

push_to_hub=True

，

hub_model_id="用户名/模型名"

；任务配置：

secrets={"HF_TOKEN": "$HF_TOKEN"}

Asynchronous Job Guidelines

异步任务指南

⚠️ IMPORTANT: Training jobs run asynchronously and can take hours

⚠️ 重要提示：训练任务异步运行，可能需要数小时

Action Required

需要执行的操作

When user requests training:

Create the training script with Trackio included (use
```
scripts/train_sft_example.py
```
as template)
Submit immediately using
```
hf_jobs()
```
MCP tool with script content inline - don't save to file unless user requests
Report submission with job ID, monitoring URL, and estimated time
Wait for user to request status checks - don't poll automatically

当用户请求训练时：

创建训练脚本 并包含Trackio（使用
```
scripts/train_sft_example.py
```
作为模板）
立即提交 使用
```
hf_jobs()
```
MCP工具，脚本内容内联传入 - 除非用户要求，否则不要保存到文件
反馈提交结果 包含任务ID、监控URL和预计耗时
等待用户 请求状态检查 - 不要自动轮询

Ground Rules

基本规则

Jobs run in background - Submission returns immediately; training continues independently
Initial logs delayed - Can take 30-60 seconds for logs to appear
User checks status - Wait for user to request status updates
Avoid polling - Check logs only on user request; provide monitoring links instead

任务在后台运行 - 提交后立即返回；训练独立继续
初始日志有延迟 - 日志可能需要30-60秒才会显示
用户自行检查状态 - 等待用户请求状态更新
避免轮询 - 仅在用户请求时检查日志；提供监控链接即可

After Submission

提交后

Provide to user:

✅ Job ID and monitoring URL
✅ Expected completion time
✅ Trackio dashboard URL
✅ Note that user can request status checks later

Example Response:

✅ Job submitted successfully!

Job ID: abc123xyz
Monitor: https://huggingface.co/jobs/username/abc123xyz

Expected time: ~2 hours
Estimated cost: ~$10

The job is running in the background. Ask me to check status/logs when ready!

向用户提供：

✅ 任务ID和监控URL
✅ 预计完成时间
✅ Trackio仪表盘URL
✅ 说明用户可以后续请求状态检查

示例响应：

✅ 任务提交成功！

任务ID：abc123xyz
监控地址：https://huggingface.co/jobs/username/abc123xyz

预计耗时：~2小时
预估费用：~$10

任务正在后台运行。需要时可以让我检查状态/日志！

Quick Start: Three Approaches

快速开始：三种方法

💡 Tip for Demos: For quick demos on smaller GPUs (t4-small), omit

eval_dataset

and

eval_strategy

to save ~40% memory. You'll still see training loss and learning progress.

💡 演示提示： 对于在小型GPU（t4-small）上的快速演示，省略

eval_dataset

和

eval_strategy

可节省约40%内存。你仍然可以看到训练损失和学习进度。

Sequence Length Configuration

序列长度配置

TRL config classes use
max_length
(not
max_seq_length
) to control tokenized sequence length:

python

undefined

TRL配置类使用
max_length
（而非
max_seq_length
）来控制分词后的序列长度：

python

undefined

✅ CORRECT - If you need to set sequence length

✅ 正确 - 如需设置序列长度

SFTConfig(max_length=512) # Truncate sequences to 512 tokens DPOConfig(max_length=2048) # Longer context (2048 tokens)

SFTConfig(max_length=512) # 将序列截断为512个token DPOConfig(max_length=2048) # 更长上下文（2048个token）

❌ WRONG - This parameter doesn't exist

❌ 错误 - 该参数不存在

SFTConfig(max_seq_length=512) # TypeError!


**Default behavior:** `max_length=1024` (truncates from right). This works well for most training.

**When to override:**
- **Longer context**: Set higher (e.g., `max_length=2048`)
- **Memory constraints**: Set lower (e.g., `max_length=512`)
- **Vision models**: Set `max_length=None` (prevents cutting image tokens)

**Usually you don't need to set this parameter at all** - the examples below use the sensible default.

SFTConfig(max_seq_length=512) # 会抛出TypeError！


**默认行为：** `max_length=1024`（从右侧截断）。这适用于大多数训练场景。

**何时覆盖默认值：**
- **更长上下文：** 设置更高值（例如`max_length=2048`）
- **内存受限：** 设置更低值（例如`max_length=512`）
- **视觉模型：** 设置`max_length=None`（避免截断图像token）

**通常你根本不需要设置该参数** - 下方示例使用合理的默认值。

Approach 1: UV Scripts (Recommended—Default Choice)

方法1：UV脚本（推荐-默认选择）

UV scripts use PEP 723 inline dependencies for clean, self-contained training. This is the primary approach for Claude Code.

python

hf_jobs("uv", {
    "script": """

UV脚本使用PEP 723内联依赖，实现简洁、自包含的训练。这是Claude Code的主要方法。

python

hf_jobs("uv", {
    "script": """

/// script

dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]

///

from datasets import load_dataset from peft import LoraConfig from trl import SFTTrainer, SFTConfig import trackio

dataset = load_dataset("trl-lib/Capybara", split="train")

from datasets import load_dataset from peft import LoraConfig from trl import SFTTrainer, SFTConfig import trackio

dataset = load_dataset("trl-lib/Capybara", split="train")

Create train/eval split for monitoring

为监控创建训练/验证拆分

dataset_split = dataset.train_test_split(test_size=0.1, seed=42)

trainer = SFTTrainer( model="Qwen/Qwen2.5-0.5B", train_dataset=dataset_split["train"], eval_dataset=dataset_split["test"], peft_config=LoraConfig(r=16, lora_alpha=32), args=SFTConfig( output_dir="my-model", push_to_hub=True, hub_model_id="username/my-model", num_train_epochs=3, eval_strategy="steps", eval_steps=50, report_to="trackio", project="meaningful_prject_name", # project name for the training name (trackio) run_name="meaningful_run_name", # descriptive name for the specific training run (trackio) ) )

trainer.train() trainer.push_to_hub() """, "flavor": "a10g-large", "timeout": "2h", "secrets": {"HF_TOKEN": "$HF_TOKEN"} })


**Benefits:** Direct MCP tool usage, clean code, dependencies declared inline (PEP 723), no file saving required, full control
**When to use:** Default choice for all training tasks in Claude Code, custom training logic, any scenario requiring `hf_jobs()`

dataset_split = dataset.train_test_split(test_size=0.1, seed=42)

trainer = SFTTrainer( model="Qwen/Qwen2.5-0.5B", train_dataset=dataset_split["train"], eval_dataset=dataset_split["test"], peft_config=LoraConfig(r=16, lora_alpha=32), args=SFTConfig( output_dir="my-model", push_to_hub=True, hub_model_id="username/my-model", num_train_epochs=3, eval_strategy="steps", eval_steps=50, report_to="trackio", project="meaningful_prject_name", # 训练项目名称（trackio） run_name="meaningful_run_name", # 特定训练运行的描述性名称（trackio） ) )

trainer.train() trainer.push_to_hub() """, "flavor": "a10g-large", "timeout": "2h", "secrets": {"HF_TOKEN": "$HF_TOKEN"} })


**优势：** 直接使用MCP工具，代码简洁，依赖内联声明（PEP 723），无需保存文件，完全可控
**适用场景：** Claude Code中所有训练任务的默认选择，自定义训练逻辑，任何需要`hf_jobs()`的场景

Working with Scripts

脚本使用说明

⚠️ Important: The

script

parameter accepts either inline code (as shown above) OR a URL. Local file paths do NOT work.

Why local paths don't work: Jobs run in isolated Docker containers without access to your local filesystem. Scripts must be:

Inline code (recommended for custom training)
Publicly accessible URLs
Private repo URLs (with HF_TOKEN)

Common mistakes:

python

undefined

⚠️ 重要：

script

参数接受内联代码（如上所示）或URL。本地文件路径无效。

本地路径无效的原因： 任务在隔离的Docker容器中运行，无法访问你的本地文件系统。脚本必须是：

内联代码（自定义训练推荐使用）
公开可访问的URL
私有仓库URL（需提供HF_TOKEN）

常见错误：

python

undefined

❌ These will all fail

❌ 这些都会失败

hf_jobs("uv", {"script": "train.py"}) hf_jobs("uv", {"script": "./scripts/train.py"}) hf_jobs("uv", {"script": "/path/to/train.py"})


**Correct approaches:**
```python

hf_jobs("uv", {"script": "train.py"}) hf_jobs("uv", {"script": "./scripts/train.py"}) hf_jobs("uv", {"script": "/path/to/train.py"})


**正确方法：**
```python

✅ Inline code (recommended)

✅ 内联代码（推荐）

hf_jobs("uv", {"script": "# /// script\n# dependencies = [...]\n# ///\n\n<your code>"})

✅ From Hugging Face Hub

✅ 来自Hugging Face Hub

hf_jobs("uv", {"script": "https://huggingface.co/user/repo/resolve/main/train.py"})

✅ From GitHub

✅ 来自GitHub

hf_jobs("uv", {"script": "https://raw.githubusercontent.com/user/repo/main/train.py"})

✅ From Gist

✅ 来自Gist

hf_jobs("uv", {"script": "https://gist.githubusercontent.com/user/id/raw/train.py"})


**To use local scripts:** Upload to HF Hub first:
```bash
huggingface-cli repo create my-training-scripts --type model
huggingface-cli upload my-training-scripts ./train.py train.py

hf_jobs("uv", {"script": "https://gist.githubusercontent.com/user/id/raw/train.py"})


**使用本地脚本的方法：** 先上传到HF Hub：
```bash
huggingface-cli repo create my-training-scripts --type model
huggingface-cli upload my-training-scripts ./train.py train.py

Use: https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py

使用地址：https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py

undefined

undefined

Approach 2: TRL Maintained Scripts (Official Examples)

方法2：TRL维护的脚本（官方示例）

TRL provides battle-tested scripts for all methods. Can be run from URLs:

python

hf_jobs("uv", {
    "script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",
    "script_args": [
        "--model_name_or_path", "Qwen/Qwen2.5-0.5B",
        "--dataset_name", "trl-lib/Capybara",
        "--output_dir", "my-model",
        "--push_to_hub",
        "--hub_model_id", "username/my-model"
    ],
    "flavor": "a10g-large",
    "timeout": "2h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})

Benefits: No code to write, maintained by TRL team, production-tested When to use: Standard TRL training, quick experiments, don't need custom code Available: Scripts are available from https://github.com/huggingface/trl/tree/main/examples/scripts

TRL为所有方法提供了经过实战检验的脚本。可以通过URL运行：

python

hf_jobs("uv", {
    "script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",
    "script_args": [
        "--model_name_or_path", "Qwen/Qwen2.5-0.5B",
        "--dataset_name", "trl-lib/Capybara",
        "--output_dir", "my-model",
        "--push_to_hub",
        "--hub_model_id", "username/my-model"
    ],
    "flavor": "a10g-large",
    "timeout": "2h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})

优势： 无需编写代码，由TRL团队维护，经过生产验证 适用场景： 标准TRL训练，快速实验，不需要自定义代码 获取地址： 脚本可从https://github.com/huggingface/trl/tree/main/examples/scripts 获取

Finding More UV Scripts on Hub

在Hub上查找更多UV脚本

The

uv-scripts

organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:

python

undefined

uv-scripts

组织提供了存储在Hugging Face Hub上作为数据集的即用型UV脚本：

python

undefined

Discover available UV script collections

发现可用的UV脚本集合

dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})

Explore a specific collection

查看特定集合

hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)


**Popular collections:** ocr, classification, synthetic-data, vllm, dataset-creation

hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)


**热门集合：** ocr、classification、synthetic-data、vllm、dataset-creation

Approach 3: HF Jobs CLI (Direct Terminal Commands)

方法3：HF Jobs CLI（直接终端命令）

When the

hf_jobs()

MCP tool is unavailable, use the

hf jobs

CLI directly.

⚠️ CRITICAL: CLI Syntax Rules

bash

undefined

当无法使用

hf_jobs()

MCP工具时，直接使用

hf jobs

CLI。

⚠️ 关键：CLI语法规则

bash

undefined

✅ CORRECT syntax - flags BEFORE script URL

✅ 正确语法 - 标志位在脚本URL之前

hf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN "https://example.com/train.py"

❌ WRONG - "run uv" instead of "uv run"

❌ 错误 - 使用了"run uv"而非"uv run"

hf jobs run uv "https://example.com/train.py" --flavor a10g-large

❌ WRONG - flags AFTER script URL (will be ignored!)

❌ 错误 - 标志位在脚本URL之后（会被忽略！）

hf jobs uv run "https://example.com/train.py" --flavor a10g-large

❌ WRONG - "--secret" instead of "--secrets" (plural)

❌ 错误 - 使用了"--secret"而非"--secrets"（复数形式）

hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"


**Key syntax rules:**
1. Command order is `hf jobs uv run` (NOT `hf jobs run uv`)
2. All flags (`--flavor`, `--timeout`, `--secrets`) must come BEFORE the script URL
3. Use `--secrets` (plural), not `--secret`
4. Script URL must be the last positional argument

**Complete CLI example:**
```bash
hf jobs uv run \
  --flavor a10g-large \
  --timeout 2h \
  --secrets HF_TOKEN \
  "https://huggingface.co/user/repo/resolve/main/train.py"

Check job status via CLI:

bash

hf jobs ps                        # List all jobs
hf jobs logs <job-id>             # View logs
hf jobs inspect <job-id>          # Job details
hf jobs cancel <job-id>           # Cancel a job

hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"


**核心语法规则：**
1. 命令顺序是`hf jobs uv run`（不是`hf jobs run uv`）
2. 所有标志位（`--flavor`、`--timeout`、`--secrets`）必须放在脚本URL之前
3. 使用`--secrets`（复数），不要用`--secret`
4. 脚本URL必须是最后一个位置参数

**完整CLI示例：**
```bash
hf jobs uv run \
  --flavor a10g-large \
  --timeout 2h \
  --secrets HF_TOKEN \
  "https://huggingface.co/user/repo/resolve/main/train.py"

通过CLI检查任务状态：

bash

hf jobs ps                        # 列出所有任务
hf jobs logs <job-id>             # 查看日志
hf jobs inspect <job-id>          # 任务详情
hf jobs cancel <job-id>           # 取消任务

Approach 4: TRL Jobs Package (Simplified Training)

方法4：TRL Jobs包（简化训练）

The

trl-jobs

package provides optimized defaults and one-liner training.

bash

undefined

trl-jobs

包提供了优化的默认值和单行训练命令。

bash

undefined

Install

安装

pip install trl-jobs

Train with SFT (simplest possible)

使用SFT训练（最简单的方式）

trl-jobs sft
--model_name Qwen/Qwen2.5-0.5B
--dataset_name trl-lib/Capybara


**Benefits:** Pre-configured settings, automatic Trackio integration, automatic Hub push, one-line commands
**When to use:** User working in terminal directly (not Claude Code context), quick local experimentation
**Repository:** https://github.com/huggingface/trl-jobs

⚠️ **In Claude Code context, prefer using `hf_jobs()` MCP tool (Approach 1) when available.**

trl-jobs sft
--model_name Qwen/Qwen2.5-0.5B
--dataset_name trl-lib/Capybara


**优势：** 预配置设置，自动集成Trackio，自动推送到Hub，单行命令
**适用场景：** 用户直接在终端工作（非Claude Code上下文），快速本地实验
**代码仓库：** https://github.com/huggingface/trl-jobs

⚠️ **在Claude Code上下文中，优先使用`hf_jobs()` MCP工具（方法1）（如果可用）。**

Hardware Selection

硬件选择

Model Size	Recommended Hardware	Cost (approx/hr)	Use Case
<1B params	`t4-small`	~$0.75	Demos, quick tests only without eval steps
1-3B params	`t4-medium` , `l4x1`	~$1.50-2.50	Development
3-7B params	`a10g-small` , `a10g-large`	~$3.50-5.00	Production training
7-13B params	`a10g-large` , `a100-large`	~$5-10	Large models (use LoRA)
13B+ params	`a100-large` , `a10g-largex2`	~$10-20	Very large (use LoRA)

GPU Flavors: cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8

Guidelines:

Use LoRA/PEFT for models >7B to reduce memory
Multi-GPU automatically handled by TRL/Accelerate
Start with smaller hardware for testing

See:

references/hardware_guide.md

for detailed specifications

模型大小	推荐硬件	费用（约/小时）	适用场景
<1B参数	`t4-small`	~$0.75	仅演示、无验证步骤的快速测试
1-3B参数	`t4-medium` , `l4x1`	~$1.50-2.50	开发
3-7B参数	`a10g-small` , `a10g-large`	~$3.50-5.00	生产训练
7-13B参数	`a10g-large` , `a100-large`	~$5-10	大模型（使用LoRA）
13B+参数	`a100-large` , `a10g-largex2`	~$10-20	超大型模型（使用LoRA）

GPU规格： cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8

指南：

对于>7B的模型使用LoRA/PEFT减少内存占用
多GPU由TRL/Accelerate自动处理
测试时先使用更小的硬件

参见：

references/hardware_guide.md

获取详细规格

Critical: Saving Results to Hub

关键：将结果保存到Hub

⚠️ EPHEMERAL ENVIRONMENT—MUST PUSH TO HUB

The Jobs environment is temporary. All files are deleted when the job ends. If the model isn't pushed to Hub, ALL TRAINING IS LOST.

⚠️ 临时环境—必须推送到Hub

Jobs环境是临时的。任务结束时所有文件都会被删除。如果模型没有推送到Hub，所有训练成果都会丢失。

Required Configuration

必需配置

In training script/config:

python

SFTConfig(
    push_to_hub=True,
    hub_model_id="username/model-name",  # MUST specify
    hub_strategy="every_save",  # Optional: push checkpoints
)

In job submission:

python

{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # Enables authentication
}

在训练脚本/配置中：

python

SFTConfig(
    push_to_hub=True,
    hub_model_id="用户名/模型名",  # 必须指定
    hub_strategy="every_save",  # 可选：推送检查点
)

在任务提交配置中：

python

{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # 启用认证
}

Verification Checklist

验证清单

Before submitting:

```
push_to_hub=True
```
set in config
```
hub_model_id
```
includes username/repo-name
```
secrets
```
parameter includes HF_TOKEN
User has write access to target repo

See:

references/hub_saving.md

for detailed troubleshooting

提交前：

配置中设置了
```
push_to_hub=True
```
```
hub_model_id
```
包含用户名/仓库名
```
secrets
```
参数包含HF_TOKEN
用户对目标仓库有写入权限

参见：

references/hub_saving.md

获取详细故障排除指南

Timeout Management

超时管理

⚠️ DEFAULT: 30 MINUTES—TOO SHORT FOR TRAINING

⚠️ 默认值：30分钟—对训练来说太短

Setting Timeouts

设置超时时间

python

{
    "timeout": "2h"   # 2 hours (formats: "90m", "2h", "1.5h", or seconds as integer)
}

python

{
    "timeout": "2h"   # 2小时（格式："90m"、"2h"、"1.5h"，或整数秒数）
}

Timeout Guidelines

超时指南

Scenario	Recommended	Notes
Quick demo (50-100 examples)	10-30 min	Verify setup
Development training	1-2 hours	Small datasets
Production (3-7B model)	4-6 hours	Full datasets
Large model with LoRA	3-6 hours	Depends on dataset

Always add 20-30% buffer for model/dataset loading, checkpoint saving, Hub push operations, and network delays.

On timeout: Job killed immediately, all unsaved progress lost, must restart from beginning

场景	推荐值	说明
快速演示（50-100个样本）	10-30分钟	验证配置
开发训练	1-2小时	小型数据集
生产（3-7B模型）	4-6小时	全量数据集
带LoRA的大模型	3-6小时	取决于数据集

始终增加20-30%的缓冲时间 用于模型/数据集加载、检查点保存、Hub推送操作和网络延迟。

超时后果： 任务立即终止，所有未保存的进度丢失，必须从头重新开始

Cost Estimation

成本估算

Offer to estimate cost when planning jobs with known parameters. Use

scripts/estimate_cost.py

bash

uv run scripts/estimate_cost.py \
  --model meta-llama/Llama-2-7b-hf \
  --dataset trl-lib/Capybara \
  --hardware a10g-large \
  --dataset-size 16000 \
  --epochs 3

Output includes estimated time, cost, recommended timeout (with buffer), and optimization suggestions.

When to offer: User planning a job, asks about cost/time, choosing hardware, job will run >1 hour or cost >$5

当规划已知参数的任务时，主动提供成本估算。 使用

scripts/estimate_cost.py

：

bash

uv run scripts/estimate_cost.py \
  --model meta-llama/Llama-2-7b-hf \
  --dataset trl-lib/Capybara \
  --hardware a10g-large \
  --dataset-size 16000 \
  --epochs 3

输出包含预计耗时、费用、推荐超时时间（含缓冲）和优化建议。

何时提供： 用户规划任务、询问成本/时间、选择硬件、任务运行时间>1小时或成本>$5时

Example Training Scripts

示例训练脚本

Production-ready templates with all best practices:

Load these scripts for correctly:

scripts/train_sft_example.py
- Complete SFT training with Trackio, LoRA, checkpoints
scripts/train_dpo_example.py
- DPO training for preference learning
scripts/train_grpo_example.py
- GRPO training for online RL

These scripts demonstrate proper Hub saving, Trackio integration, checkpoint management, and optimized parameters. Pass their content inline to

hf_jobs()

or use as templates for custom scripts.

包含所有最佳实践的生产级模板：

正确加载这些脚本：

scripts/train_sft_example.py
- 完整的SFT训练，包含Trackio、LoRA、检查点
scripts/train_dpo_example.py
- 用于偏好学习的DPO训练
scripts/train_grpo_example.py
- 用于在线RL的GRPO训练

这些脚本演示了正确的Hub保存、Trackio集成、检查点管理和优化参数。将它们的内容内联传递给

hf_jobs()

，或作为自定义脚本的模板。

Monitoring and Tracking

监控与追踪

Trackio provides real-time metrics visualization. See

references/trackio_guide.md

for complete setup guide.

Key points:

Add
```
trackio
```
to dependencies

Configure trainer with

report_to="trackio" and run_name="meaningful_name"

Trackio提供实时指标可视化。查看

references/trackio_guide.md

获取完整设置指南。

关键点：

将
```
trackio
```
添加到依赖中

配置训练器时设置

report_to="trackio"

和

run_name="有意义的名称"

Trackio Configuration Defaults

Trackio配置默认值

Use sensible defaults unless user specifies otherwise. When generating training scripts with Trackio:

Default Configuration:

Space ID:
```
{username}/trackio
```
(use "trackio" as default space name)
Run naming: Unless otherwise specified, name the run in a way the user will recognize (e.g., descriptive of the task, model, or purpose)
Config: Keep minimal - only include hyperparameters and model/dataset info
Project Name: Use a Project Name to associate runs with a particular Project

User overrides: If user requests specific trackio configuration (custom space, run naming, grouping, or additional config), apply their preferences instead of defaults.

This is useful for managing multiple jobs with the same configuration or keeping training scripts portable.

See

references/trackio_guide.md

for complete documentation including grouping runs for experiments.

除非用户另有指定，否则使用合理的默认值。 生成带Trackio的训练脚本时：

默认配置：

空间ID：
```
{用户名}/trackio
```
（使用"trackio"作为默认空间名）
运行命名：除非另有指定，否则以用户可识别的方式命名运行（例如描述任务、模型或用途）
配置：保持精简 - 仅包含超参数和模型/数据集信息
项目名称：使用项目名称将运行与特定项目关联

用户覆盖： 如果用户要求特定的trackio配置（自定义空间、运行命名、分组或额外配置），优先应用用户的偏好而非默认值。

这对于管理具有相同配置的多个任务或保持训练脚本可移植性非常有用。

查看

references/trackio_guide.md

获取完整文档，包括实验运行分组方法。

Check Job Status

检查任务状态

python

undefined

python

undefined

List all jobs

列出所有任务

hf_jobs("ps")

Inspect specific job

查看特定任务详情

hf_jobs("inspect", {"job_id": "your-job-id"})

hf_jobs("inspect", {"job_id": "你的任务ID"})

View logs

查看日志

hf_jobs("logs", {"job_id": "your-job-id"})


**Remember:** Wait for user to request status checks. Avoid polling repeatedly.

hf_jobs("logs", {"job_id": "你的任务ID"})


**注意：** 等待用户请求状态检查。避免重复轮询。

Dataset Validation

数据集验证

Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.

启动GPU训练前验证数据集格式，避免训练失败的首要原因：格式不匹配。

Why Validate

为什么要验证

50%+ of training failures are due to dataset format issues
DPO especially strict: requires exact column names (
```
prompt
```
,
```
chosen
```
,
```
rejected
```
)
Failed GPU jobs waste $1-10 and 30-60 minutes
Validation on CPU costs ~$0.01 and takes <1 minute

超过50%的训练失败是由于数据集格式问题
DPO要求尤其严格：需要精确的列名（
```
prompt
```
、
```
chosen
```
、
```
rejected
```
）
失败的GPU任务会浪费$1-10和30-60分钟
CPU上的验证成本约$0.01，耗时不到1分钟

When to Validate

何时验证

ALWAYS validate for:

Unknown or custom datasets
DPO training (CRITICAL - 90% of datasets need mapping)
Any dataset not explicitly TRL-compatible

Skip validation for known TRL datasets:

trl-lib/ultrachat_200k

trl-lib/Capybara

HuggingFaceH4/ultrachat_200k

, etc.

以下场景始终需要验证：

未知或自定义数据集
DPO训练（关键 - 90%的数据集需要映射）
任何未明确标注为TRL兼容的数据集

已知TRL数据集可跳过验证：

trl-lib/ultrachat_200k

、

trl-lib/Capybara

、

HuggingFaceH4/ultrachat_200k

等

Usage

使用方法

python

hf_jobs("uv", {
    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
    "script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})

The script is fast, and will usually complete synchronously.

python

hf_jobs("uv", {
    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
    "script_args": ["--dataset", "用户名/数据集名", "--split", "train"]
})

该脚本运行速度快，通常会同步完成。

Reading Results

解读结果

The output shows compatibility for each training method:

✓ READY
- Dataset is compatible, use directly
✗ NEEDS MAPPING
- Compatible but needs preprocessing (mapping code provided)
✗ INCOMPATIBLE
- Cannot be used for this method

When mapping is needed, the output includes a "MAPPING CODE" section with copy-paste ready Python code.

输出会显示每种训练方法的兼容性：

✓ 就绪
- 数据集兼容，可直接使用
✗ 需要映射
- 兼容但需要预处理（提供映射代码）
✗ 不兼容
- 无法用于该方法

当需要映射时，输出包含**"映射代码"**部分，提供可直接复制粘贴的Python代码。

Example Workflow

示例工作流

python

undefined

python

undefined

1. Inspect dataset (costs ~$0.01, <1 min on CPU)

1. 检查数据集（成本约$0.01，CPU上运行<1分钟）

hf_jobs("uv", { "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py", "script_args": ["--dataset", "argilla/distilabel-math-preference-dpo", "--split", "train"] })

2. Check output markers:

2. 检查输出标记：

✓ READY → proceed with training

✓ 就绪 → 继续训练

✗ NEEDS MAPPING → apply mapping code below

✗ 需要映射 → 应用下方的映射代码

✗ INCOMPATIBLE → choose different method/dataset

✗ 不兼容 → 选择其他方法/数据集

3. If mapping needed, apply before training:

3. 如果需要映射，训练前应用：

def format_for_dpo(example): return { 'prompt': example['instruction'], 'chosen': example['chosen_response'], 'rejected': example['rejected_response'], } dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)

4. Launch training job with confidence

4. 放心启动训练任务

undefined

undefined

Common Scenario: DPO Format Mismatch

常见场景：DPO格式不匹配

Most DPO datasets use non-standard column names. Example:

Dataset has: instruction, chosen_response, rejected_response
DPO expects: prompt, chosen, rejected

The validator detects this and provides exact mapping code to fix it.

大多数DPO数据集使用非标准列名。示例：

数据集包含：instruction, chosen_response, rejected_response
DPO期望：prompt, chosen, rejected

验证器会检测到该问题并提供确切的映射代码来修复。

Converting Models to GGUF

将模型转换为GGUF

After training, convert models to GGUF format for use with llama.cpp, Ollama, LM Studio, and other local inference tools.

What is GGUF:

Optimized for CPU/GPU inference with llama.cpp
Supports quantization (4-bit, 5-bit, 8-bit) to reduce model size
Compatible with Ollama, LM Studio, Jan, GPT4All, llama.cpp
Typically 2-8GB for 7B models (vs 14GB unquantized)

When to convert:

Running models locally with Ollama or LM Studio
Reducing model size with quantization
Deploying to edge devices
Sharing models for local-first use

See:

references/gguf_conversion.md

for complete conversion guide, including production-ready conversion script, quantization options, hardware requirements, usage examples, and troubleshooting.

Quick conversion:

python

hf_jobs("uv", {
    "script": "<see references/gguf_conversion.md for complete script>",
    "flavor": "a10g-large",
    "timeout": "45m",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
    "env": {
        "ADAPTER_MODEL": "username/my-finetuned-model",
        "BASE_MODEL": "Qwen/Qwen2.5-0.5B",
        "OUTPUT_REPO": "username/my-model-gguf"
    }
})

训练完成后，将模型转换为GGUF格式，用于llama.cpp、Ollama、LM Studio和其他本地推理工具。

什么是GGUF：

针对llama.cpp的CPU/GPU推理优化
支持量化（4位、5位、8位）以减小模型大小
兼容Ollama、LM Studio、Jan、GPT4All、llama.cpp
7B模型通常大小为2-8GB（对比未量化的14GB）

何时转换：

使用Ollama或LM Studio本地运行模型
通过量化减小模型大小
部署到边缘设备
共享模型供本地优先使用

参见：

references/gguf_conversion.md

获取完整转换指南，包括生产级转换脚本、量化选项、硬件要求、使用示例和故障排除。

快速转换：

python

hf_jobs("uv", {
    "script": "<查看references/gguf_conversion.md获取完整脚本>",
    "flavor": "a10g-large",
    "timeout": "45m",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
    "env": {
        "ADAPTER_MODEL": "username/my-finetuned-model",
        "BASE_MODEL": "Qwen/Qwen2.5-0.5B",
        "OUTPUT_REPO": "username/my-model-gguf"
    }
})

Common Training Patterns

常见训练模式

See

references/training_patterns.md

for detailed examples including:

Quick demo (5-10 minutes)
Production with checkpoints
Multi-GPU training
DPO training (preference learning)
GRPO training (online RL)

查看

references/training_patterns.md

获取详细示例，包括：

快速演示（5-10分钟）
带检查点的生产训练
多GPU训练
DPO训练（偏好学习）
GRPO训练（在线RL）

Common Failure Modes

常见失败模式

Out of Memory (OOM)

内存不足（OOM）

Fix (try in order):

Reduce batch size:
```
per_device_train_batch_size=1
```
, increase
```
gradient_accumulation_steps=8
```
. Effective batch size is
```
per_device_train_batch_size
```
x
```
gradient_accumulation_steps
```
. For best performance keep effective batch size close to 128.
Enable:
```
gradient_checkpointing=True
```
Upgrade hardware: t4-small → l4x1, a10g-small → a10g-large etc.

修复方法（按顺序尝试）：

减小批次大小：
```
per_device_train_batch_size=1
```
，增加
```
gradient_accumulation_steps=8
```
。有效批次大小为
```
per_device_train_batch_size
```
x
```
gradient_accumulation_steps
```
。为获得最佳性能，保持有效批次大小接近128。
启用：
```
gradient_checkpointing=True
```
升级硬件：t4-small → l4x1，a10g-small → a10g-large等。

Dataset Misformatted

数据集格式错误

Fix:

Validate first with dataset inspector:

bash

uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
  --dataset name --split train

Check output for compatibility markers (✓ READY, ✗ NEEDS MAPPING, ✗ INCOMPATIBLE)
Apply mapping code from inspector output if needed

修复方法：

先使用数据集检查器验证：

bash

uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
  --dataset name --split train

检查输出中的兼容性标记（✓ 就绪、✗ 需要映射、✗ 不兼容）
如有需要，应用检查器输出中的映射代码

Job Timeout

任务超时

Fix:

Check logs for actual runtime:
```
hf_jobs("logs", {"job_id": "..."})
```
Increase timeout with buffer:
```
"timeout": "3h"
```
(add 30% to estimated time)
Or reduce training: lower
```
num_train_epochs
```
, use smaller dataset, enable
```
max_steps
```

Save checkpoints:

save_strategy="steps"

save_steps=500

hub_strategy="every_save"

Note: Default 30min is insufficient for real training. Minimum 1-2 hours.

修复方法：

检查日志查看实际运行时间：
```
hf_jobs("logs", {"job_id": "..."})
```
增加超时时间并添加缓冲：
```
"timeout": "3h"
```
（在预计时间基础上增加30%）
或减少训练量：降低
```
num_train_epochs
```
，使用更小的数据集，启用
```
max_steps
```

保存检查点：

save_strategy="steps"

，

save_steps=500

，

hub_strategy="every_save"

注意： 默认30分钟不足以完成实际训练。最低建议1-2小时。

Hub Push Failures

Hub推送失败

Fix:

Add to job:
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```

Add to config:

push_to_hub=True

hub_model_id="username/model-name"

Verify auth:
```
mcp__huggingface__hf_whoami()
```
Check token has write permissions and repo exists (or set
```
hub_private_repo=True
```
)

修复方法：

任务配置中添加：
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```

训练配置中添加：

push_to_hub=True

，

hub_model_id="用户名/模型名"

验证认证：
```
mcp__huggingface__hf_whoami()
```
检查Token有写入权限，且仓库存在（或设置
```
hub_private_repo=True
```
）

Missing Dependencies

依赖缺失

Fix: Add to PEP 723 header:

python

undefined

修复方法： 添加到PEP 723头部：

python

undefined

/// script

dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-package"]

dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "缺失的包名"]

///

undefined

undefined

Troubleshooting

故障排除

Common issues:

Job times out → Increase timeout, reduce epochs/dataset, use smaller model/LoRA
Model not saved to Hub → Check push_to_hub=True, hub_model_id, secrets=HF_TOKEN
Out of Memory (OOM) → Reduce batch size, increase gradient accumulation, enable LoRA, use larger GPU
Dataset format error → Validate with dataset inspector (see Dataset Validation section)
Import/module errors → Add PEP 723 header with dependencies, verify format
Authentication errors → Check
```
mcp__huggingface__hf_whoami()
```
, token permissions, secrets parameter

See:

references/troubleshooting.md

for complete troubleshooting guide

常见问题：

任务超时 → 增加超时时间，减少轮次/数据集大小，使用更小的模型/LoRA

模型未保存到Hub → 检查

push_to_hub=True

、

hub_model_id

、

secrets=HF_TOKEN

配置

内存不足（OOM） → 减小批次大小，增加梯度累积，启用LoRA，使用更大的GPU
数据集格式错误 → 使用数据集检查器验证（参见数据集验证章节）
导入/模块错误 → 添加包含依赖的PEP 723头部，验证格式
认证错误 → 检查
```
mcp__huggingface__hf_whoami()
```
、Token权限、secrets参数

参见：

references/troubleshooting.md

获取完整故障排除指南

Resources

资源

References (In This Skill)

参考文档（本技能内）

```
references/training_methods.md
```
- Overview of SFT, DPO, GRPO, KTO, PPO, Reward Modeling
```
references/training_patterns.md
```
- Common training patterns and examples
```
references/unsloth.md
```
- Unsloth for fast VLM training (~2x speed, 60% less VRAM)
```
references/gguf_conversion.md
```
- Complete GGUF conversion guide
```
references/trackio_guide.md
```
- Trackio monitoring setup
```
references/hardware_guide.md
```
- Hardware specs and selection
```
references/hub_saving.md
```
- Hub authentication troubleshooting
```
references/troubleshooting.md
```
- Common issues and solutions

```
references/training_methods.md
```
- SFT、DPO、GRPO、KTO、PPO、奖励建模概述
```
references/training_patterns.md
```
- 常见训练模式和示例
```
references/unsloth.md
```
- Unsloth实现快速VLM训练（速度提升约2倍，VRAM减少60%）
```
references/gguf_conversion.md
```
- 完整GGUF转换指南
```
references/trackio_guide.md
```
- Trackio监控设置
```
references/hardware_guide.md
```
- 硬件规格和选择指南
```
references/hub_saving.md
```
- Hub认证故障排除
```
references/troubleshooting.md
```
- 常见问题和解决方案

Scripts (In This Skill)

脚本（本技能内）

```
scripts/train_sft_example.py
```
- Production SFT template
```
scripts/train_dpo_example.py
```
- Production DPO template
```
scripts/train_grpo_example.py
```
- Production GRPO template
```
scripts/unsloth_sft_example.py
```
- Unsloth text LLM training template (faster, less VRAM)
```
scripts/estimate_cost.py
```
- Estimate time and cost (offer when appropriate)
```
scripts/convert_to_gguf.py
```
- Complete GGUF conversion script

```
scripts/train_sft_example.py
```
- 生产级SFT模板
```
scripts/train_dpo_example.py
```
- 生产级DPO模板
```
scripts/train_grpo_example.py
```
- 生产级GRPO模板
```
scripts/unsloth_sft_example.py
```
- Unsloth文本LLM训练模板（更快、占用更少VRAM）
```
scripts/estimate_cost.py
```
- 估算时间和成本（适当时提供）
```
scripts/convert_to_gguf.py
```
- 完整GGUF转换脚本

External Scripts

外部脚本

Dataset Inspector - Validate dataset format before training (use via
```
uv run
```
or
```
hf_jobs
```
)

数据集检查器 - 训练前验证数据集格式（通过
```
uv run
```
或
```
hf_jobs
```
使用）

External Links

外部链接

Key Takeaways

核心要点

Submit scripts inline - The
```
script
```
parameter accepts Python code directly; no file saving required unless user requests
Jobs are asynchronous - Don't wait/poll; let user check when ready
Always set timeout - Default 30 min is insufficient; minimum 1-2 hours recommended
Always enable Hub push - Environment is ephemeral; without push, all results lost
Include Trackio - Use example scripts as templates for real-time monitoring
Offer cost estimation - When parameters are known, use
```
scripts/estimate_cost.py
```
Use UV scripts (Approach 1) - Default to
```
hf_jobs("uv", {...})
```
with inline scripts; TRL maintained scripts for standard training; avoid bash
```
trl-jobs
```
commands in Claude Code
Use hf_doc_fetch/hf_doc_search for latest TRL documentation
Validate dataset format before training with dataset inspector (see Dataset Validation section)
Choose appropriate hardware for model size; use LoRA for models >7B

内联提交脚本 -
```
script
```
参数直接接受Python代码；除非用户要求，否则无需保存文件
任务是异步的 - 不要等待/轮询；让用户在需要时检查状态
始终设置超时时间 - 默认30分钟不足；建议最低1-2小时
始终启用Hub推送 - 环境是临时的；不推送的话所有结果都会丢失
包含Trackio - 使用示例脚本作为模板实现实时监控
提供成本估算 - 当参数已知时，使用
```
scripts/estimate_cost.py
```
使用UV脚本（方法1） - 默认使用
```
hf_jobs("uv", {...})
```
加内联脚本；标准训练使用TRL维护的脚本；Claude Code中避免使用bash的
```
trl-jobs
```
命令
使用
hf_doc_fetch
/
hf_doc_search
获取最新的TRL文档
训练前验证数据集格式 使用数据集检查器（参见数据集验证章节）
根据模型大小选择合适的硬件；>7B的模型使用LoRA