hugging-face-vision-trainer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Vision Model Training on Hugging Face Jobs

在Hugging Face Jobs上训练视觉模型

Train object detection, image classification, and SAM/SAM2 segmentation models on managed cloud GPUs. No local GPU setup required—results are automatically saved to the Hugging Face Hub.

在托管云GPU上训练目标检测、图像分类和SAM/SAM2分割模型。无需本地GPU配置——训练结果将自动保存至Hugging Face Hub。

When to Use This Skill

何时使用本技能

Use this skill when users want to:

Fine-tune object detection models (D-FINE, RT-DETR v2, DETR, YOLOS) on cloud GPUs or local
Fine-tune image classification models (timm: MobileNetV3, MobileViT, ResNet, ViT/DINOv3, or any Transformers classifier) on cloud GPUs or local
Fine-tune SAM or SAM2 models for segmentation / image matting using bbox or point prompts
Train bounding-box detectors on custom datasets
Train image classifiers on custom datasets
Train segmentation models on custom mask datasets with prompts
Run vision training jobs on Hugging Face Jobs infrastructure
Ensure trained vision models are permanently saved to the Hub

当用户需要以下功能时，可使用本技能：

在云GPU或本地微调目标检测模型（D-FINE、RT-DETR v2、DETR、YOLOS）
在云GPU或本地微调图像分类模型（timm系列：MobileNetV3、MobileViT、ResNet、ViT/DINOv3，或任意Transformers分类器）
使用边界框或点提示微调SAM或SAM2模型，用于分割/图像抠图
在自定义数据集上训练边界框检测器
在自定义数据集上训练图像分类器
在带提示的自定义掩码数据集上训练分割模型
在Hugging Face Jobs基础设施上运行视觉训练任务
确保训练后的视觉模型永久保存至Hub

Related Skills

Local Script Execution

本地脚本执行

Helper scripts use PEP 723 inline dependencies. Run them with

uv run

bash

uv run scripts/dataset_inspector.py --dataset username/dataset-name --split train
uv run scripts/estimate_cost.py --help

辅助脚本使用PEP 723内联依赖，可通过

uv run

运行：

bash

uv run scripts/dataset_inspector.py --dataset username/dataset-name --split train
uv run scripts/estimate_cost.py --help

Prerequisites Checklist

前置检查清单

Before starting any training job, verify:

启动任何训练任务前，请验证以下内容：

Account & Authentication

账户与认证

Hugging Face Account with Pro, Team, or Enterprise plan (Jobs require paid plan)
Authenticated login: Check with
```
hf_whoami()
```
(tool) or
```
hf auth whoami
```
(terminal)
Token has write permissions
MUST pass token in job secrets — see directive #3 below for syntax (MCP tool vs Python API)

拥有Pro、Team或Enterprise计划的Hugging Face账户（Jobs需要付费计划）
已完成登录认证：可通过工具
```
hf_whoami()
```
或终端命令
```
hf auth whoami
```
检查
令牌拥有写入权限
必须在任务密钥中传递令牌——语法请参考下方指令#3（MCP工具与Python API的差异）

Dataset Requirements — Object Detection

目标检测数据集要求

Dataset must exist on Hub
Annotations must use the
```
objects
```
column with
```
bbox
```
,
```
category
```
(and optionally
```
area
```
) sub-fields
Bboxes can be in xywh (COCO) or xyxy (Pascal VOC) format — auto-detected and converted
Categories can be integers or strings — strings are auto-remapped to integer IDs
```
image_id
```
column is optional — generated automatically if missing
ALWAYS validate unknown datasets before GPU training (see Dataset Validation section)

数据集必须已存在于Hub
注释需使用包含
```
bbox
```
、
```
category
```
（可选
```
area
```
）子字段的
```
objects
```
列
边界框支持**xywh（COCO）或xyxy（Pascal VOC）**格式——会自动检测并转换
类别支持整数或字符串——字符串会自动映射为整数ID
```
image_id
```
列可选——缺失时会自动生成
GPU训练前务必验证未知数据集（请参考数据集验证章节）

Dataset Requirements — Image Classification

图像分类数据集要求

Dataset must exist on Hub
Must have an image
column (PIL images) and a label
column (integer class IDs or strings)
The label column can be
```
ClassLabel
```
type (with names) or plain integers/strings — strings are auto-remapped
Common column names auto-detected:
```
label
```
,
```
labels
```
,
```
class
```
,
```
fine_label
```
ALWAYS validate unknown datasets before GPU training (see Dataset Validation section)

数据集必须已存在于Hub
必须包含**
```
image
```
列**（PIL图像）和**
```
label
```
列**（整数类别ID或字符串）
标签列可以是
```
ClassLabel
```
类型（带名称）或普通整数/字符串——字符串会自动映射
自动识别常见列名：
```
label
```
、
```
labels
```
、
```
class
```
、
```
fine_label
```
GPU训练前务必验证未知数据集（请参考数据集验证章节）

Dataset Requirements — SAM/SAM2 Segmentation

SAM/SAM2分割数据集要求

Dataset must exist on Hub
Must have an image
column (PIL images) and a mask
column (binary ground-truth segmentation mask)
Must have a prompt — either:
- A prompt
  column with JSON containing
```
{"bbox": [x0,y0,x1,y1]}
```
  or
```
{"point": [x,y]}
```
- OR a dedicated bbox
  column with
```
[x0,y0,x1,y1]
```
  values
- OR a dedicated point
  column with
```
[x,y]
```
  or
```
[[x,y],...]
```
  values
Bboxes should be in xyxy format (absolute pixel coordinates)
Example dataset:
```
merve/MicroMat-mini
```
(image matting with bbox prompts)
ALWAYS validate unknown datasets before GPU training (see Dataset Validation section)

数据集必须已存在于Hub
必须包含**
```
image
```
列**（PIL图像）和**
```
mask
```
列**（二值真实分割掩码）
必须包含提示信息——以下任一形式：
- 包含JSON格式
```
{"bbox": [x0,y0,x1,y1]}
```
  或
```
{"point": [x,y]}
```
  的**
```
prompt
```
  列**
- 或单独的**
```
bbox
```
  列**，值为
```
[x0,y0,x1,y1]
```
- 或单独的**
```
point
```
  列**，值为
```
[x,y]
```
  或
```
[[x,y],...]
```
边界框需为xyxy格式（绝对像素坐标）
示例数据集：
```
merve/MicroMat-mini
```
（带边界框提示的图像抠图数据集）
GPU训练前务必验证未知数据集（请参考数据集验证章节）

Critical Settings

关键设置

Timeout must exceed expected training time — Default 30min is TOO SHORT. See directive #6 for recommended values.

Hub push must be enabled —

push_to_hub=True

hub_model_id="username/model-name"

, token in

secrets

超时时间必须超过预期训练时长——默认30分钟过短。推荐值请参考指令#6。
必须启用Hub推送——设置
```
push_to_hub=True
```
、
```
hub_model_id="username/model-name"
```
，并在密钥中传入令牌

Dataset Validation

数据集验证

Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.

ALWAYS validate for unknown/custom datasets or any dataset you haven't trained with before. Skip for

cppe-5

(the default in the training script).

GPU训练前务必验证数据集格式，这是避免训练失败的首要原因：格式不匹配。

务必验证未知/自定义数据集，或之前未用于训练的数据集。可跳过验证的数据集：训练脚本中的默认数据集

cppe-5

。

Running the Inspector

运行检查工具

Option 1: Via HF Jobs (recommended — avoids local SSL/dependency issues):

python

hf_jobs("uv", {
    "script": "path/to/dataset_inspector.py",
    "script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})

Option 2: Locally:

bash

uv run scripts/dataset_inspector.py --dataset username/dataset-name --split train

Option 3: Via
HfApi().run_uv_job()
(if hf_jobs MCP unavailable):

python

from huggingface_hub import HfApi
api = HfApi()
api.run_uv_job(
    script="scripts/dataset_inspector.py",
    script_args=["--dataset", "username/dataset-name", "--split", "train"],
    flavor="cpu-basic",
    timeout=300,
)

选项1：通过HF Jobs（推荐——避免本地SSL/依赖问题）：

python

hf_jobs("uv", {
    "script": "path/to/dataset_inspector.py",
    "script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})

选项2：本地运行：

bash

uv run scripts/dataset_inspector.py --dataset username/dataset-name --split train

选项3：通过
HfApi().run_uv_job()
（如果hf_jobs MCP工具不可用）：

python

from huggingface_hub import HfApi
api = HfApi()
api.run_uv_job(
    script="scripts/dataset_inspector.py",
    script_args=["--dataset", "username/dataset-name", "--split", "train"],
    flavor="cpu-basic",
    timeout=300,
)

Reading Results

结果解读

✓ READY
— Dataset is compatible, use directly
✗ NEEDS FORMATTING
— Needs preprocessing (mapping code provided in output)

✓ READY
—— 数据集兼容，可直接使用
✗ NEEDS FORMATTING
—— 需要预处理（输出中会提供映射代码）

Automatic Bbox Preprocessing

自动边界框预处理

The object detection training script (

scripts/object_detection_training.py

) automatically handles bbox format detection (xyxy→xywh conversion), bbox sanitization,

image_id

generation, string category→integer remapping, and dataset truncation. No manual preprocessing needed — just ensure the dataset has

objects.bbox

and

objects.category

columns.

目标检测训练脚本（

scripts/object_detection_training.py

）会自动处理边界框格式检测（xyxy→xywh转换）、边界框清理、

image_id

生成、字符串类别→整数映射及数据集截断。无需手动预处理——只需确保数据集包含

objects.bbox

和

objects.category

列即可。

Training workflow

训练工作流

Copy this checklist and track progress:

Training Progress:
- [ ] Step 1: Verify prerequisites (account, token, dataset)
- [ ] Step 2: Validate dataset format (run dataset_inspector.py)
- [ ] Step 3: Ask user about dataset size and validation split
- [ ] Step 4: Prepare training script (OD: scripts/object_detection_training.py, IC: scripts/image_classification_training.py, SAM: scripts/sam_segmentation_training.py)
- [ ] Step 5: Save script locally, submit job, and report details

Step 1: Verify prerequisites

Follow the Prerequisites Checklist above.

Step 2: Validate dataset

Run the dataset inspector BEFORE spending GPU time. See "Dataset Validation" section above.

Step 3: Ask user preferences

ALWAYS use the AskUserQuestion tool with option-style format:

python

AskUserQuestion({
    "questions": [
        {
            "question": "Do you want to run a quick test with a subset of the data first?",
            "header": "Dataset Size",
            "options": [
                {"label": "Quick test run (10% of data)", "description": "Faster, cheaper (~30-60 min, ~$2-5) to validate setup"},
                {"label": "Full dataset (Recommended)", "description": "Complete training for best model quality"}
            ],
            "multiSelect": false
        },
        {
            "question": "Do you want to create a validation split from the training data?",
            "header": "Split data",
            "options": [
                {"label": "Yes (Recommended)", "description": "Automatically split 15% of training data for validation"},
                {"label": "No", "description": "Use existing validation split from dataset"}
            ],
            "multiSelect": false
        },
        {
            "question": "Which GPU hardware do you want to use?",
            "header": "Hardware Flavor",
            "options": [
                {"label": "t4-small ($0.40/hr)", "description": "1x T4, 16 GB VRAM — sufficient for all OD models under 100M params"},
                {"label": "l4x1 ($0.80/hr)", "description": "1x L4, 24 GB VRAM — more headroom for large images or batch sizes"},
                {"label": "a10g-large ($1.50/hr)", "description": "1x A10G, 24 GB VRAM — faster training, more CPU/RAM"},
                {"label": "a100-large ($2.50/hr)", "description": "1x A100, 80 GB VRAM — fastest, for very large datasets or image sizes"}
            ],
            "multiSelect": false
        }
    ]
})

Step 4: Prepare training script

For object detection, use scripts/object_detection_training.py as the production-ready template. For image classification, use scripts/image_classification_training.py. For SAM/SAM2 segmentation, use scripts/sam_segmentation_training.py. All scripts use

HfArgumentParser

— all configuration is passed via CLI arguments in

script_args

, NOT by editing Python variables. For timm model details, see references/timm_trainer.md. For SAM2 training details, see references/finetune_sam2_trainer.md.

Step 5: Save script, submit job, and report

Save the script locally to
```
submitted_jobs/
```
in the workspace root (create if needed) with a descriptive name like
```
training_<dataset>_<YYYYMMDD_HHMMSS>.py
```
. Tell the user the path.
Submit using
```
hf_jobs
```
MCP tool (preferred) or
```
HfApi().run_uv_job()
```
— see directive #1 for both methods. Pass all config via
```
script_args
```
.
Report the job ID (from
```
.id
```
attribute), monitoring URL, Trackio dashboard (
```
https://huggingface.co/spaces/{username}/trackio
```
), expected time, and estimated cost.
Wait for user to request status checks — don't poll automatically. Training jobs run asynchronously and can take hours.

复制以下清单并跟踪进度：

训练进度：
- [ ] 步骤1：验证前置条件（账户、令牌、数据集）
- [ ] 步骤2：验证数据集格式（运行dataset_inspector.py）
- [ ] 步骤3：询问用户数据集大小和验证拆分偏好
- [ ] 步骤4：准备训练脚本（目标检测：scripts/object_detection_training.py，图像分类：scripts/image_classification_training.py，SAM分割：scripts/sam_segmentation_training.py）
- [ ] 步骤5：本地保存脚本、提交任务并报告详情

步骤1：验证前置条件

遵循上述前置检查清单。

步骤2：验证数据集

在投入GPU资源前运行数据集检查工具。请参考“数据集验证”章节。

步骤3：询问用户偏好

务必使用AskUserQuestion工具，采用选项式格式：

python

AskUserQuestion({
    "questions": [
        {
            "question": "是否要先使用数据集子集进行快速测试？",
            "header": "数据集规模",
            "options": [
                {"label": "快速测试（10%数据）", "description": "更快、更便宜（约30-60分钟，2-5美元），用于验证配置"},
                {"label": "完整数据集（推荐）", "description": "完整训练以获得最佳模型质量"}
            ],
            "multiSelect": false
        },
        {
            "question": "是否要从训练数据中拆分出验证集？",
            "header": "数据拆分",
            "options": [
                {"label": "是（推荐）", "description": "自动从训练数据中拆分15%作为验证集"},
                {"label": "否", "description": "使用数据集已有的验证拆分"}
            ],
            "multiSelect": false
        },
        {
            "question": "要使用哪种GPU硬件？",
            "header": "硬件规格",
            "options": [
                {"label": "t4-small（0.40美元/小时）", "description": "1x T4，16 GB显存——适用于所有参数低于1亿的目标检测模型"},
                {"label": "l4x1（0.80美元/小时）", "description": "1x L4，24 GB显存——处理大尺寸图像或更大批次数据时更有余量"},
                {"label": "a10g-large（1.50美元/小时）", "description": "1x A10G，24 GB显存——训练速度更快，CPU/RAM资源更充足"},
                {"label": "a100-large（2.50美元/小时）", "description": "1x A100，80 GB显存——速度最快，适用于超大规模数据集或大尺寸图像"}
            ],
            "multiSelect": false
        }
    ]
})

步骤4：准备训练脚本

目标检测请使用scripts/object_detection_training.py作为生产级模板。图像分类请使用scripts/image_classification_training.py。SAM/SAM2分割请使用scripts/sam_segmentation_training.py。所有脚本均使用

HfArgumentParser

——所有配置通过

script_args

中的CLI参数传递，无需编辑Python变量。timm模型详情请参考references/timm_trainer.md。SAM2训练详情请参考references/finetune_sam2_trainer.md。

步骤5：保存脚本、提交任务并报告

本地保存脚本至工作区根目录的
```
submitted_jobs/
```
文件夹（不存在则创建），命名需清晰，例如
```
training_<dataset>_<YYYYMMDD_HHMMSS>.py
```
。告知用户保存路径。
提交任务：优先使用
```
hf_jobs
```
MCP工具，或
```
HfApi().run_uv_job()
```
——两种方法请参考指令#1。所有配置通过
```
script_args
```
传递。
报告信息：任务ID（来自
```
.id
```
属性）、监控URL、Trackio仪表盘（
```
https://huggingface.co/spaces/{username}/trackio
```
）、预计时长及估算成本。
等待用户请求：不要自动轮询状态。训练任务为异步运行，可能需要数小时。

Critical directives

关键指令

These rules prevent common failures. Follow them exactly.

以下规则可避免常见失败，请严格遵循。

1. Job submission:

hf_jobs

MCP tool vs Python API

1. 任务提交：

hf_jobs

MCP工具 vs Python API

hf_jobs()
is an MCP tool, NOT a Python function. Do NOT try to import it from

huggingface_hub

. Call it as a tool:

hf_jobs("uv", {"script": training_script_content, "flavor": "a10g-large", "timeout": "4h", "secrets": {"HF_TOKEN": "$HF_TOKEN"}})

If
hf_jobs
MCP tool is unavailable, use the Python API directly:

python

from huggingface_hub import HfApi, get_token
api = HfApi()
job_info = api.run_uv_job(
    script="path/to/training_script.py",  # file PATH, NOT content
    script_args=["--dataset_name", "cppe-5", ...],
    flavor="a10g-large",
    timeout=14400,  # seconds (4 hours)
    env={"PYTHONUNBUFFERED": "1"},
    secrets={"HF_TOKEN": get_token()},  # MUST use get_token(), NOT "$HF_TOKEN"
)
print(f"Job ID: {job_info.id}")

Critical differences between the two methods:

	`hf_jobs` MCP tool	`HfApi().run_uv_job()`
`script` param	Python code string or URL (NOT local paths)	File path to `.py` file (NOT content)
Token in secrets	`"$HF_TOKEN"` (auto-replaced)	`get_token()` (actual token value)
Timeout format	String ( `"4h"` )	Seconds ( `14400` )

Rules for both methods:

The training script MUST include PEP 723 inline metadata with dependencies
Do NOT use
```
image
```
or
```
command
```
parameters (those belong to
```
run_job()
```
, not
```
run_uv_job()
```
)

hf_jobs()
是MCP工具，而非Python函数。请勿尝试从

huggingface_hub

导入它，需作为工具调用：

hf_jobs("uv", {"script": training_script_content, "flavor": "a10g-large", "timeout": "4h", "secrets": {"HF_TOKEN": "$HF_TOKEN"}})

如果
hf_jobs
MCP工具不可用，请直接使用Python API：

python

from huggingface_hub import HfApi, get_token
api = HfApi()
job_info = api.run_uv_job(
    script="path/to/training_script.py",  # 文件路径，而非代码内容
    script_args=["--dataset_name", "cppe-5", ...],
    flavor="a10g-large",
    timeout=14400,  # 秒（4小时）
    env={"PYTHONUNBUFFERED": "1"},
    secrets={"HF_TOKEN": get_token()},  # 必须使用get_token()，而非"$HF_TOKEN"
)
print(f"Job ID: {job_info.id}")

两种方法的关键差异：

	`hf_jobs` MCP工具	`HfApi().run_uv_job()`
`script` 参数	Python代码字符串或URL（非本地路径）	`.py` 文件路径（非代码内容）
密钥中的令牌	`"$HF_TOKEN"` （自动替换）	`get_token()` （实际令牌值）
超时格式	字符串（ `"4h"` ）	秒数（ `14400` ）

两种方法的通用规则：

训练脚本必须包含带依赖的PEP 723内联元数据
请勿使用
```
image
```
或
```
command
```
参数（这些属于
```
run_job()
```
，而非
```
run_uv_job()
```
）

2. Authentication via job secrets + explicit hub_token injection

2. 通过任务密钥认证 + 显式注入hub_token

Job config MUST include the token in secrets — syntax depends on submission method (see table above).

Training script requirement: The Transformers

Trainer

calls

create_repo(token=self.args.hub_token)

during

__init__()

when

push_to_hub=True

. The training script MUST inject

HF_TOKEN

into

training_args.hub_token

AFTER parsing args but BEFORE creating the

Trainer

. The template

scripts/object_detection_training.py

already includes this:

python

hf_token = os.environ.get("HF_TOKEN")
if training_args.push_to_hub and not training_args.hub_token:
    if hf_token:
        training_args.hub_token = hf_token

If you write a custom script, you MUST include this token injection before the

Trainer(...)

call.

Do NOT call
```
login()
```
in custom scripts unless replicating the full pattern from
```
scripts/object_detection_training.py
```
Do NOT rely on implicit token resolution (
```
hub_token=None
```
) — unreliable in Jobs
See the
```
hugging-face-jobs
```
skill → Token Usage Guide for full details

任务配置必须在密钥中包含令牌——语法取决于提交方法（请参考上表）。

训练脚本要求： 当

push_to_hub=True

时，Transformers

Trainer

会在

__init__()

期间调用

create_repo(token=self.args.hub_token)

。训练脚本必须在解析参数后、创建

Trainer

前，将

HF_TOKEN

注入

training_args.hub_token

。模板

scripts/object_detection_training.py

已包含该逻辑：

python

hf_token = os.environ.get("HF_TOKEN")
if training_args.push_to_hub and not training_args.hub_token:
    if hf_token:
        training_args.hub_token = hf_token

如果编写自定义脚本，必须在

Trainer(...)

调用前添加该令牌注入逻辑。

除非完全复制
```
scripts/object_detection_training.py
```
中的完整逻辑，否则请勿在自定义脚本中调用
```
login()
```
请勿依赖隐式令牌解析（
```
hub_token=None
```
）——在Jobs环境中不可靠
完整细节请参考
```
hugging-face-jobs
```
技能 → 令牌使用指南

3. JobInfo attribute

3. JobInfo属性

Access the job identifier using

.id

(NOT

.job_id

.name

— these don't exist):

python

job_info = api.run_uv_job(...)  # or hf_jobs("uv", {...})
job_id = job_info.id  # Correct -- returns string like "687fb701029421ae5549d998"

通过

.id

访问任务标识符（不要使用

.job_id

或

.name

——这些属性不存在）：

python

job_info = api.run_uv_job(...)  # 或hf_jobs("uv", {...})
job_id = job_info.id  # 正确方式——返回类似"687fb701029421ae5549d998"的字符串

4. Required training flags and HfArgumentParser boolean syntax

4. 必要训练标志与HfArgumentParser布尔语法

scripts/object_detection_training.py

uses

HfArgumentParser

— all config is passed via

script_args

. Boolean arguments have two syntaxes:

bool
fields (e.g.,

push_to_hub

do_train

): Use as bare flags (

--push_to_hub

) or negate with

--no_

prefix (

--no_remove_unused_columns

)

Optional[bool]
fields (e.g.,

greater_is_better

): MUST pass explicit value (

--greater_is_better True

). Bare

--greater_is_better

causes

error: expected one argument

Required flags for object detection:

--no_remove_unused_columns          # MUST: preserves image column for pixel_values
--no_eval_do_concat_batches         # MUST: images have different numbers of target boxes
--push_to_hub                       # MUST: environment is ephemeral
--hub_model_id username/model-name
--metric_for_best_model eval_map
--greater_is_better True            # MUST pass "True" explicitly (Optional[bool])
--do_train
--do_eval

Required flags for image classification:

--no_remove_unused_columns          # MUST: preserves image column for pixel_values
--push_to_hub                       # MUST: environment is ephemeral
--hub_model_id username/model-name
--metric_for_best_model eval_accuracy
--greater_is_better True            # MUST pass "True" explicitly (Optional[bool])
--do_train
--do_eval

Required flags for SAM/SAM2 segmentation:

--remove_unused_columns False       # MUST: preserves input_boxes/input_points
--push_to_hub                       # MUST: environment is ephemeral
--hub_model_id username/model-name
--do_train
--prompt_type bbox                  # or "point"
--dataloader_pin_memory False       # MUST: avoids pin_memory issues with custom collator

scripts/object_detection_training.py

使用

HfArgumentParser

——所有配置通过

script_args

传递。布尔参数有两种语法：

bool
字段（例如
```
push_to_hub
```
、
```
do_train
```
）：可作为裸标志（
```
--push_to_hub
```
）或使用
```
--no_
```
前缀取反（
```
--no_remove_unused_columns
```
）

Optional[bool]
字段（例如

greater_is_better

）：必须传递显式值（

--greater_is_better True

）。仅使用裸标志

--greater_is_better

会导致

error: expected one argument

目标检测的必要标志：

--no_remove_unused_columns          # 必须设置：保留用于生成pixel_values的image列
--no_eval_do_concat_batches         # 必须设置：处理图像中目标框数量不同的情况
--push_to_hub                       # 必须设置：环境为临时环境
--hub_model_id username/model-name
--metric_for_best_model eval_map
--greater_is_better True            # 必须显式传递"True"（Optional[bool]类型）
--do_train
--do_eval

图像分类的必要标志：

--no_remove_unused_columns          # 必须设置：保留用于生成pixel_values的image列
--push_to_hub                       # 必须设置：环境为临时环境
--hub_model_id username/model-name
--metric_for_best_model eval_accuracy
--greater_is_better True            # 必须显式传递"True"（Optional[bool]类型）
--do_train
--do_eval

SAM/SAM2分割的必要标志：

--remove_unused_columns False       # 必须设置：保留input_boxes/input_points
--push_to_hub                       # 必须设置：环境为临时环境
--hub_model_id username/model-name
--do_train
--prompt_type bbox                  # 或"point"
--dataloader_pin_memory False       # 必须设置：避免自定义collator的pin_memory问题

5. Timeout management

5. 超时管理

Default 30 min is TOO SHORT for object detection. Set minimum 2-4 hours. Add 30% buffer for model loading, preprocessing, and Hub push.

Scenario	Timeout
Quick test (100-200 images, 5-10 epochs)	1h
Development (500-1K images, 15-20 epochs)	2-3h
Production (1K-5K images, 30 epochs)	4-6h
Large dataset (5K+ images)	6-12h

默认30分钟对目标检测来说过短。最少设置2-4小时，为模型加载、预处理和Hub推送预留30%的缓冲时间。

场景	超时设置
快速测试（100 - 200张图像，5 - 10轮）	1小时
开发测试（500 - 1000张图像，15 - 20轮）	2 - 3小时
生产训练（1000 - 5000张图像，30轮）	4 - 6小时
大规模数据集（5000+张图像）	6 - 12小时

6. Trackio monitoring

6. Trackio监控

Trackio is always enabled in the object detection training script — it calls

trackio.init()

and

trackio.finish()

automatically. No need to pass

--report_to trackio

. The project name is taken from

--output_dir

and the run name from

--run_name

. For image classification, pass

--report_to trackio

TrainingArguments

Dashboard at:

https://huggingface.co/spaces/{username}/trackio

目标检测训练脚本中默认启用Trackio——会自动调用

trackio.init()

和

trackio.finish()

。无需传递

--report_to trackio

。项目名称取自

--output_dir

，运行名称取自

--run_name

。图像分类需在

TrainingArguments

中传递

--report_to trackio

。

仪表盘地址：

https://huggingface.co/spaces/{username}/trackio

Model & hardware selection

模型与硬件选择

Recommended object detection models

Model	Params	Use case
`ustc-community/dfine-small-coco`	10.4M	Best starting point — fast, cheap, SOTA quality
`PekingU/rtdetr_v2_r18vd`	20.2M	Lightweight real-time detector
`ustc-community/dfine-large-coco`	31.4M	Higher accuracy, still efficient
`PekingU/rtdetr_v2_r50vd`	43M	Strong real-time baseline
`ustc-community/dfine-xlarge-obj365`	63.5M	Best accuracy (pretrained on Objects365)
`PekingU/rtdetr_v2_r101vd`	76M	Largest RT-DETR v2 variant

模型	参数规模	使用场景
`ustc-community/dfine-small-coco`	10.4M	最佳入门选择——快速、低成本、SOTA精度
`PekingU/rtdetr_v2_r18vd`	20.2M	轻量级实时检测器
`ustc-community/dfine-large-coco`	31.4M	更高精度，仍保持高效
`PekingU/rtdetr_v2_r50vd`	43M	强大的实时基线模型
`ustc-community/dfine-xlarge-obj365`	63.5M	最佳精度（在Objects365上预训练）
`PekingU/rtdetr_v2_r101vd`	76M	最大的RT-DETR v2变体

Recommended image classification models

Model	Params	Use case
`timm/mobilenetv3_small_100.lamb_in1k`	2.5M	Ultra-lightweight — mobile/edge, fastest training
`timm/mobilevit_s.cvnets_in1k`	5.6M	Mobile transformer — good accuracy/speed trade-off
`timm/resnet50.a1_in1k`	25.6M	Strong CNN baseline — reliable, well-studied
`timm/vit_base_patch16_dinov3.lvd1689m`	86.6M	Best accuracy — DINOv3 self-supervised ViT

模型	参数规模	使用场景
`timm/mobilenetv3_small_100.lamb_in1k`	2.5M	超轻量级——适用于移动端/边缘设备，训练速度最快
`timm/mobilevit_s.cvnets_in1k`	5.6M	移动端Transformer——精度与速度的平衡之选
`timm/resnet50.a1_in1k`	25.6M	强大的CNN基线——可靠且研究充分
`timm/vit_base_patch16_dinov3.lvd1689m`	86.6M	最佳精度——基于DINOv3自监督的ViT模型

Recommended SAM/SAM2 segmentation models

Model	Params	Use case
`facebook/sam2.1-hiera-tiny`	38.9M	Fastest SAM2 — good for quick experiments
`facebook/sam2.1-hiera-small`	46.0M	Best starting point — good quality/speed balance
`facebook/sam2.1-hiera-base-plus`	80.8M	Higher capacity for complex segmentation
`facebook/sam2.1-hiera-large`	224.4M	Best SAM2 accuracy — requires more VRAM
`facebook/sam-vit-base`	93.7M	Original SAM — ViT-B backbone
`facebook/sam-vit-large`	312.3M	Original SAM — ViT-L backbone
`facebook/sam-vit-huge`	641.1M	Original SAM — ViT-H, best SAM v1 accuracy

模型	参数规模	使用场景
`facebook/sam2.1-hiera-tiny`	38.9M	最快的SAM2——适用于快速实验
`facebook/sam2.1-hiera-small`	46.0M	最佳入门选择——精度与速度平衡
`facebook/sam2.1-hiera-base-plus`	80.8M	更高容量，适用于复杂分割任务
`facebook/sam2.1-hiera-large`	224.4M	SAM2最佳精度——需要更多显存
`facebook/sam-vit-base`	93.7M	初代SAM——ViT-B backbone
`facebook/sam-vit-large`	312.3M	初代SAM——ViT-L backbone
`facebook/sam-vit-huge`	641.1M	初代SAM——ViT-H，SAM v1最佳精度

Hardware recommendation

硬件推荐

All recommended OD and IC models are under 100M params — t4-small
(16 GB VRAM, $0.40/hr) is sufficient for all of them. Image classification models are generally smaller and faster than object detection models —

t4-small

handles even ViT-Base comfortably. For SAM2 models up to

hiera-base-plus

t4-small

is sufficient since only the mask decoder is trained. For

sam2.1-hiera-large

or SAM v1 models, use

l4x1

a10g-large

. Only upgrade if you hit OOM from large batch sizes — reduce batch size first before switching hardware. Common upgrade path:

t4-small

→

l4x1

($0.80/hr, 24 GB) →

a10g-large

($1.50/hr, 24 GB).

For full hardware flavor list: refer to the

hugging-face-jobs

skill. For cost estimation: run

scripts/estimate_cost.py

所有推荐的目标检测和图像分类模型参数均低于1亿——t4-small
（16 GB显存，0.40美元/小时）足以运行所有模型。图像分类模型通常比目标检测模型更小、更快——

t4-small

甚至可以轻松处理ViT-Base模型。对于

hiera-base-plus

及以下的SAM2模型，

t4-small

足够（因为仅训练掩码解码器）。对于

sam2.1-hiera-large

或SAM v1模型，建议使用

l4x1

或

a10g-large

。仅当大批次训练出现OOM时才升级硬件——优先减小批次大小，再考虑切换硬件。常见升级路径：

t4-small

→

l4x1

（0.80美元/小时，24 GB） →

a10g-large

（1.50美元/小时，24 GB）。

完整硬件规格列表请参考

hugging-face-jobs

技能。成本估算可运行

scripts/estimate_cost.py

。

Quick start — Object Detection

快速入门——目标检测

The

script_args

below are the same for both submission methods. See directive #1 for the critical differences between them.

python

OD_SCRIPT_ARGS = [
    "--model_name_or_path", "ustc-community/dfine-small-coco",
    "--dataset_name", "cppe-5",
    "--image_square_size", "640",
    "--output_dir", "dfine_finetuned",
    "--num_train_epochs", "30",
    "--per_device_train_batch_size", "8",
    "--learning_rate", "5e-5",
    "--eval_strategy", "epoch",
    "--save_strategy", "epoch",
    "--save_total_limit", "2",
    "--load_best_model_at_end",
    "--metric_for_best_model", "eval_map",
    "--greater_is_better", "True",
    "--no_remove_unused_columns",
    "--no_eval_do_concat_batches",
    "--push_to_hub",
    "--hub_model_id", "username/model-name",
    "--do_train",
    "--do_eval",
]

python

from huggingface_hub import HfApi, get_token
api = HfApi()
job_info = api.run_uv_job(
    script="scripts/object_detection_training.py",
    script_args=OD_SCRIPT_ARGS,
    flavor="t4-small",
    timeout=14400,
    env={"PYTHONUNBUFFERED": "1"},
    secrets={"HF_TOKEN": get_token()},
)
print(f"Job ID: {job_info.id}")

以下

script_args

适用于两种提交方法。两种方法的关键差异请参考指令#1。

python

OD_SCRIPT_ARGS = [
    "--model_name_or_path", "ustc-community/dfine-small-coco",
    "--dataset_name", "cppe-5",
    "--image_square_size", "640",
    "--output_dir", "dfine_finetuned",
    "--num_train_epochs", "30",
    "--per_device_train_batch_size", "8",
    "--learning_rate", "5e-5",
    "--eval_strategy", "epoch",
    "--save_strategy", "epoch",
    "--save_total_limit", "2",
    "--load_best_model_at_end",
    "--metric_for_best_model", "eval_map",
    "--greater_is_better", "True",
    "--no_remove_unused_columns",
    "--no_eval_do_concat_batches",
    "--push_to_hub",
    "--hub_model_id", "username/model-name",
    "--do_train",
    "--do_eval",
]

python

from huggingface_hub import HfApi, get_token
api = HfApi()
job_info = api.run_uv_job(
    script="scripts/object_detection_training.py",
    script_args=OD_SCRIPT_ARGS,
    flavor="t4-small",
    timeout=14400,
    env={"PYTHONUNBUFFERED": "1"},
    secrets={"HF_TOKEN": get_token()},
)
print(f"Job ID: {job_info.id}")

Key OD

script_args

目标检测关键

script_args

--model_name_or_path

— recommended:

"ustc-community/dfine-small-coco"

(see model table above)

```
--dataset_name
```
— the Hub dataset ID
```
--image_square_size
```
— 480 (fast iteration) or 800 (better accuracy)
```
--hub_model_id
```
—
```
"username/model-name"
```
for Hub persistence
```
--num_train_epochs
```
— 30 typical for convergence
```
--train_val_split
```
— fraction to split for validation (default 0.15), set if dataset lacks a validation split
```
--max_train_samples
```
— truncate training set (useful for quick test runs, e.g.
```
"785"
```
for ~10% of a 7.8K dataset)
```
--max_eval_samples
```
— truncate evaluation set

--model_name_or_path

—— 推荐：

"ustc-community/dfine-small-coco"

（请参考上方模型表格）

```
--dataset_name
```
—— Hub上的数据集ID
```
--image_square_size
```
—— 480（快速迭代）或800（更高精度）
```
--hub_model_id
```
—— 用于Hub持久化的
```
"username/model-name"
```
```
--num_train_epochs
```
—— 通常30轮即可收敛
```
--train_val_split
```
—— 用于拆分验证集的比例（默认0.15），当数据集无验证拆分时设置
```
--max_train_samples
```
—— 截断训练集（适用于快速测试，例如785表示取7.8K数据集的10%）
```
--max_eval_samples
```
—— 截断评估集

Quick start — Image Classification

快速入门——图像分类

python

IC_SCRIPT_ARGS = [
    "--model_name_or_path", "timm/mobilenetv3_small_100.lamb_in1k",
    "--dataset_name", "ethz/food101",
    "--output_dir", "food101_classifier",
    "--num_train_epochs", "5",
    "--per_device_train_batch_size", "32",
    "--per_device_eval_batch_size", "32",
    "--learning_rate", "5e-5",
    "--eval_strategy", "epoch",
    "--save_strategy", "epoch",
    "--save_total_limit", "2",
    "--load_best_model_at_end",
    "--metric_for_best_model", "eval_accuracy",
    "--greater_is_better", "True",
    "--no_remove_unused_columns",
    "--push_to_hub",
    "--hub_model_id", "username/food101-classifier",
    "--do_train",
    "--do_eval",
]

python

from huggingface_hub import HfApi, get_token
api = HfApi()
job_info = api.run_uv_job(
    script="scripts/image_classification_training.py",
    script_args=IC_SCRIPT_ARGS,
    flavor="t4-small",
    timeout=7200,
    env={"PYTHONUNBUFFERED": "1"},
    secrets={"HF_TOKEN": get_token()},
)
print(f"Job ID: {job_info.id}")

python

IC_SCRIPT_ARGS = [
    "--model_name_or_path", "timm/mobilenetv3_small_100.lamb_in1k",
    "--dataset_name", "ethz/food101",
    "--output_dir", "food101_classifier",
    "--num_train_epochs", "5",
    "--per_device_train_batch_size", "32",
    "--per_device_eval_batch_size", "32",
    "--learning_rate", "5e-5",
    "--eval_strategy", "epoch",
    "--save_strategy", "epoch",
    "--save_total_limit", "2",
    "--load_best_model_at_end",
    "--metric_for_best_model", "eval_accuracy",
    "--greater_is_better", "True",
    "--no_remove_unused_columns",
    "--push_to_hub",
    "--hub_model_id", "username/food101-classifier",
    "--do_train",
    "--do_eval",
]

python

from huggingface_hub import HfApi, get_token
api = HfApi()
job_info = api.run_uv_job(
    script="scripts/image_classification_training.py",
    script_args=IC_SCRIPT_ARGS,
    flavor="t4-small",
    timeout=7200,
    env={"PYTHONUNBUFFERED": "1"},
    secrets={"HF_TOKEN": get_token()},
)
print(f"Job ID: {job_info.id}")

Key IC

script_args

图像分类关键

script_args

```
--model_name_or_path
```
— any
```
timm/
```
model or Transformers classification model (see model table above)
```
--dataset_name
```
— the Hub dataset ID
```
--image_column_name
```
— column containing PIL images (default:
```
"image"
```
)
```
--label_column_name
```
— column containing class labels (default:
```
"label"
```
)
```
--hub_model_id
```
—
```
"username/model-name"
```
for Hub persistence
```
--num_train_epochs
```
— 3-5 typical for classification (fewer than OD)
```
--per_device_train_batch_size
```
— 16-64 (classification models use less memory than OD)
```
--train_val_split
```
— fraction to split for validation (default 0.15), set if dataset lacks a validation split
```
--max_train_samples
```
/
```
--max_eval_samples
```
— truncate for quick tests

```
--model_name_or_path
```
—— 任何
```
timm/
```
模型或Transformers分类模型（请参考上方模型表格）
```
--dataset_name
```
—— Hub上的数据集ID
```
--image_column_name
```
—— 包含PIL图像的列（默认：
```
"image"
```
）
```
--label_column_name
```
—— 包含类标签的列（默认：
```
"label"
```
）
```
--hub_model_id
```
—— 用于Hub持久化的
```
"username/model-name"
```
```
--num_train_epochs
```
—— 分类任务通常3-5轮即可
```
--per_device_train_batch_size
```
—— 16-64（分类模型比目标检测模型占用内存更少）
```
--train_val_split
```
—— 用于拆分验证集的比例（默认0.15），当数据集无验证拆分时设置
```
--max_train_samples
```
/
```
--max_eval_samples
```
—— 截断数据集以进行快速测试

Quick start — SAM/SAM2 Segmentation

快速入门——SAM/SAM2分割

python

SAM_SCRIPT_ARGS = [
    "--model_name_or_path", "facebook/sam2.1-hiera-small",
    "--dataset_name", "merve/MicroMat-mini",
    "--prompt_type", "bbox",
    "--prompt_column_name", "prompt",
    "--output_dir", "sam2-finetuned",
    "--num_train_epochs", "30",
    "--per_device_train_batch_size", "4",
    "--learning_rate", "1e-5",
    "--logging_steps", "1",
    "--save_strategy", "epoch",
    "--save_total_limit", "2",
    "--remove_unused_columns", "False",
    "--dataloader_pin_memory", "False",
    "--push_to_hub",
    "--hub_model_id", "username/sam2-finetuned",
    "--do_train",
    "--report_to", "trackio",
]

python

from huggingface_hub import HfApi, get_token
api = HfApi()
job_info = api.run_uv_job(
    script="scripts/sam_segmentation_training.py",
    script_args=SAM_SCRIPT_ARGS,
    flavor="t4-small",
    timeout=7200,
    env={"PYTHONUNBUFFERED": "1"},
    secrets={"HF_TOKEN": get_token()},
)
print(f"Job ID: {job_info.id}")

python

SAM_SCRIPT_ARGS = [
    "--model_name_or_path", "facebook/sam2.1-hiera-small",
    "--dataset_name", "merve/MicroMat-mini",
    "--prompt_type", "bbox",
    "--prompt_column_name", "prompt",
    "--output_dir", "sam2-finetuned",
    "--num_train_epochs", "30",
    "--per_device_train_batch_size", "4",
    "--learning_rate", "1e-5",
    "--logging_steps", "1",
    "--save_strategy", "epoch",
    "--save_total_limit", "2",
    "--remove_unused_columns", "False",
    "--dataloader_pin_memory", "False",
    "--push_to_hub",
    "--hub_model_id", "username/sam2-finetuned",
    "--do_train",
    "--report_to", "trackio",
]

python

from huggingface_hub import HfApi, get_token
api = HfApi()
job_info = api.run_uv_job(
    script="scripts/sam_segmentation_training.py",
    script_args=SAM_SCRIPT_ARGS,
    flavor="t4-small",
    timeout=7200,
    env={"PYTHONUNBUFFERED": "1"},
    secrets={"HF_TOKEN": get_token()},
)
print(f"Job ID: {job_info.id}")

Key SAM

script_args

SAM分割关键

script_args

```
--model_name_or_path
```
— SAM or SAM2 model (see model table above); auto-detects SAM vs SAM2
```
--dataset_name
```
— the Hub dataset ID (e.g.,
```
"merve/MicroMat-mini"
```
)
```
--prompt_type
```
—
```
"bbox"
```
or
```
"point"
```
— type of prompt in the dataset
```
--prompt_column_name
```
— column with JSON-encoded prompts (default:
```
"prompt"
```
)
```
--bbox_column_name
```
— dedicated bbox column (alternative to JSON prompt column)
```
--point_column_name
```
— dedicated point column (alternative to JSON prompt column)
```
--mask_column_name
```
— column with ground-truth masks (default:
```
"mask"
```
)
```
--hub_model_id
```
—
```
"username/model-name"
```
for Hub persistence
```
--num_train_epochs
```
— 20-30 typical for SAM fine-tuning
```
--per_device_train_batch_size
```
— 2-4 (SAM models use significant memory)
```
--freeze_vision_encoder
```
/
```
--freeze_prompt_encoder
```
— freeze encoder weights (default: both frozen, only mask decoder trains)
```
--train_val_split
```
— fraction to split for validation (default 0.1)

```
--model_name_or_path
```
—— SAM或SAM2模型（请参考上方模型表格）；会自动检测SAM与SAM2
```
--dataset_name
```
—— Hub上的数据集ID（例如
```
"merve/MicroMat-mini"
```
）
```
--prompt_type
```
——
```
"bbox"
```
或
```
"point"
```
——数据集中的提示类型
```
--prompt_column_name
```
—— 包含JSON格式提示的列（默认：
```
"prompt"
```
）
```
--bbox_column_name
```
—— 单独的边界框列（替代JSON提示列）
```
--point_column_name
```
—— 单独的点提示列（替代JSON提示列）
```
--mask_column_name
```
—— 包含真实掩码的列（默认：
```
"mask"
```
）
```
--hub_model_id
```
—— 用于Hub持久化的
```
"username/model-name"
```
```
--num_train_epochs
```
—— SAM微调通常需要20-30轮
```
--per_device_train_batch_size
```
—— 2-4（SAM模型占用内存较多）
```
--freeze_vision_encoder
```
/
```
--freeze_prompt_encoder
```
—— 冻结编码器权重（默认：两者均冻结，仅训练掩码解码器）
```
--train_val_split
```
—— 用于拆分验证集的比例（默认0.1）

Checking job status

检查任务状态

MCP tool (if available):

hf_jobs("ps")                                   # List all jobs
hf_jobs("logs", {"job_id": "your-job-id"})      # View logs
hf_jobs("inspect", {"job_id": "your-job-id"})   # Job details

Python API fallback:

python

from huggingface_hub import HfApi
api = HfApi()
api.list_jobs()                                  # List all jobs
api.get_job_logs(job_id="your-job-id")           # View logs
api.get_job(job_id="your-job-id")                # Job details

MCP工具（如果可用）：

hf_jobs("ps")                                   # 列出所有任务
hf_jobs("logs", {"job_id": "your-job-id"})      # 查看日志
hf_jobs("inspect", {"job_id": "your-job-id"})   # 任务详情

Python API备选方案：

python

from huggingface_hub import HfApi
api = HfApi()
api.list_jobs()                                  # 列出所有任务
api.get_job_logs(job_id="your-job-id")           # 查看日志
api.get_job(job_id="your-job-id")                # 任务详情

Common failure modes

常见失败模式

OOM (CUDA out of memory)

OOM（CUDA内存不足）

Reduce

per_device_train_batch_size

(try 4, then 2), reduce

IMAGE_SIZE

, or upgrade hardware.

减小

per_device_train_batch_size

（尝试4，再尝试2）、降低

IMAGE_SIZE

，或升级硬件。

Dataset format errors

数据集格式错误

Run

scripts/dataset_inspector.py

first. The training script auto-detects xyxy vs xywh, converts string categories to integer IDs, and adds

image_id

if missing. Ensure

objects.bbox

contains 4-value coordinate lists in absolute pixels and

objects.category

contains either integer IDs or string labels.

先运行

scripts/dataset_inspector.py

。训练脚本会自动检测xyxy与xywh格式、将字符串类别转换为整数ID，并在缺失时添加

image_id

。确保

objects.bbox

包含4个绝对像素坐标值的列表，且

objects.category

包含整数ID或字符串标签。

Hub push failures (401)

Hub推送失败（401）

Verify: (1) job secrets include token (see directive #2), (2) script sets

training_args.hub_token

BEFORE creating the

Trainer

, (3)

push_to_hub=True

is set, (4) correct

hub_model_id

, (5) token has write permissions.

验证以下内容：(1) 任务密钥包含令牌（请参考指令#2），(2) 脚本在创建

Trainer

前设置了

training_args.hub_token

，(3) 已设置

push_to_hub=True

，(4)

hub_model_id

正确，(5) 令牌拥有写入权限。

Job timeout

任务超时

Increase timeout (see directive #5 table), reduce epochs/dataset, or use checkpoint strategy with

hub_strategy="every_save"

增加超时时间（请参考指令#5的表格）、减少轮数/数据集规模，或使用

hub_strategy="every_save"

的检查点策略。

KeyError: 'test' (missing test split)

KeyError: 'test'（缺失测试拆分）

The object detection training script handles this gracefully — it falls back to the

validation

split. Ensure you're using the latest

scripts/object_detection_training.py

目标检测训练脚本会优雅处理该问题——自动回退到

validation

拆分。请确保使用最新版本的

scripts/object_detection_training.py

。

Single-class dataset: "iteration over a 0-d tensor"

单类别数据集："iteration over a 0-d tensor"

torchmetrics.MeanAveragePrecision

returns scalar (0-d) tensors for per-class metrics when there's only one class. The template

scripts/object_detection_training.py

handles this by calling

.unsqueeze(0)

on these tensors. Ensure you're using the latest template.

当只有一个类别时，

torchmetrics.MeanAveragePrecision

返回标量（0维）张量作为每类指标。模板

scripts/object_detection_training.py

通过对这些张量调用

.unsqueeze(0)

处理该问题。请确保使用最新模板。

Poor detection performance (mAP < 0.15)

检测性能不佳（mAP < 0.15）

Increase epochs (30-50), ensure 500+ images, check per-class mAP for imbalanced classes, try different learning rates (1e-5 to 1e-4), increase image size.

For comprehensive troubleshooting: see references/reliability_principles.md

增加轮数（30-50）、确保数据集包含500+张图像、检查类别不平衡情况下的每类mAP、尝试不同学习率（1e-5至1e-4）、增大图像尺寸。

完整故障排除请参考references/reliability_principles.md

Reference files

参考文件

scripts/object_detection_training.py — Production-ready object detection training script
scripts/image_classification_training.py — Production-ready image classification training script (supports timm models)
scripts/sam_segmentation_training.py — Production-ready SAM/SAM2 segmentation training script (bbox & point prompts)
scripts/dataset_inspector.py — Validate dataset format for OD, classification, and SAM segmentation
scripts/estimate_cost.py — Estimate training costs for any vision model (includes SAM/SAM2)
references/object_detection_training_notebook.md — Object detection training workflow, augmentation strategies, and training patterns
references/image_classification_training_notebook.md — Image classification training workflow with ViT, preprocessing, and evaluation
references/finetune_sam2_trainer.md — SAM2 fine-tuning walkthrough with MicroMat dataset, DiceCE loss, and Trainer integration
references/timm_trainer.md — Using timm models with HF Trainer (TimmWrapper, transforms, full example)
references/hub_saving.md — Detailed Hub persistence guide and verification checklist
references/reliability_principles.md — Failure prevention principles from production experience

scripts/object_detection_training.py —— 生产级目标检测训练脚本
scripts/image_classification_training.py —— 生产级图像分类训练脚本（支持timm模型）
scripts/sam_segmentation_training.py —— 生产级SAM/SAM2分割训练脚本（支持边界框和点提示）
scripts/dataset_inspector.py —— 验证目标检测、分类和SAM分割的数据集格式
scripts/estimate_cost.py —— 估算任意视觉模型的训练成本（含SAM/SAM2）
references/object_detection_training_notebook.md —— 目标检测训练工作流、数据增强策略及训练模式
references/image_classification_training_notebook.md —— 基于ViT的图像分类训练工作流、预处理及评估
references/finetune_sam2_trainer.md —— SAM2微调指南，含MicroMat数据集、DiceCE损失及Trainer集成
references/timm_trainer.md —— 在HF Trainer中使用timm模型（TimmWrapper、变换、完整示例）
references/hub_saving.md —— 详细的Hub持久化指南及验证清单
references/reliability_principles.md —— 基于生产经验的故障预防原则

External links

外部链接

Transformers Object Detection Guide
Transformers Image Classification Guide
DETR Model Documentation
ViT Model Documentation
HF Jobs Guide — Main Jobs documentation
HF Jobs Configuration — Hardware, secrets, timeouts, namespaces
HF Jobs CLI Reference — Command line interface
Object Detection Models
Image Classification Models
SAM2 Model Documentation
SAM Model Documentation
Object Detection Datasets
Image Classification Datasets

hugging-face-vision-trainer

Original

Translation

Vision Model Training on Hugging Face Jobs

在Hugging Face Jobs上训练视觉模型

When to Use This Skill

何时使用本技能

Related Skills

相关技能

Local Script Execution

本地脚本执行

Prerequisites Checklist

前置检查清单

Account & Authentication

账户与认证

Dataset Requirements — Object Detection

目标检测数据集要求

Dataset Requirements — Image Classification

图像分类数据集要求

Dataset Requirements — SAM/SAM2 Segmentation

SAM/SAM2分割数据集要求

Critical Settings

关键设置

Dataset Validation

数据集验证

Running the Inspector

运行检查工具

Reading Results

结果解读

Automatic Bbox Preprocessing

自动边界框预处理

Training workflow

训练工作流

Critical directives

关键指令

1. Job submission: hf_jobs MCP tool vs Python API

1. 任务提交：hf_jobs MCP工具 vs Python API

2. Authentication via job secrets + explicit hub_token injection

2. 通过任务密钥认证 + 显式注入hub_token

3. JobInfo attribute

3. JobInfo属性

4. Required training flags and HfArgumentParser boolean syntax

4. 必要训练标志与HfArgumentParser布尔语法

5. Timeout management

5. 超时管理

6. Trackio monitoring

6. Trackio监控

Model & hardware selection

模型与硬件选择

Recommended object detection models

推荐目标检测模型

Recommended image classification models

推荐图像分类模型

Recommended SAM/SAM2 segmentation models

推荐SAM/SAM2分割模型

Hardware recommendation

硬件推荐

Quick start — Object Detection

快速入门——目标检测

Key OD script_args

目标检测关键script_args

Quick start — Image Classification

快速入门——图像分类

Key IC script_args

图像分类关键script_args

Quick start — SAM/SAM2 Segmentation

快速入门——SAM/SAM2分割

Key SAM script_args

SAM分割关键script_args

Checking job status

检查任务状态

Common failure modes

常见失败模式

OOM (CUDA out of memory)

OOM（CUDA内存不足）

Dataset format errors

数据集格式错误

Hub push failures (401)

Hub推送失败（401）

Job timeout

1. Job submission:
`hf_jobs`
MCP tool vs Python API

1. 任务提交：
`hf_jobs`
MCP工具 vs Python API

Key OD
`script_args`

目标检测关键
`script_args`

Key IC
`script_args`

图像分类关键
`script_args`

Key SAM
`script_args`

SAM分割关键
`script_args`