tao-port-huggingface-model

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<!-- Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->
<!-- 版权所有 (c) 2026,NVIDIA CORPORATION。保留所有权利。 根据 Apache License, Version 2.0(“许可证”)授权; 除非符合许可证的规定,否则不得使用本文件。 您可以在以下地址获取许可证副本: http://www.apache.org/licenses/LICENSE-2.0 除非适用法律要求或书面同意,否则根据许可证分发的软件 按“原样”分发,不附带任何明示或暗示的担保或条件。 请参阅许可证以了解管理权限和限制的特定语言。 -->

TAO-HF Integration Skill

TAO-HF集成技能

Integrate a HuggingFace (HF) Computer Vision model into the NVIDIA TAO Toolkit ecosystem. Work the phases iteratively — not purely linearly — following a build → test → debug → fix → retest loop at every step.
This SKILL.md is the workflow coordinator. Each phase has a dedicated reference file under
references/
with the full step-by-step content, code blocks, docker invocations, and gates. Read the matching reference at the start of each phase — the summaries below are not sufficient on their own.

将HuggingFace(HF)计算机视觉模型集成到NVIDIA TAO Toolkit生态系统中。各阶段需迭代执行——而非单纯线性推进——每一步都遵循构建→测试→调试→修复→重新测试的循环。
本SKILL.md为工作流协调器。每个阶段在
references/
目录下都有对应的参考文件,包含完整的分步内容、代码块、Docker调用指令和检查点。在开始每个阶段前,请阅读对应的参考文件——以下仅为摘要,无法替代完整参考内容。

Local-Only Rule

本地唯一规则

All work is strictly local. You may only read/clone from remotes; all file edits, Docker builds, and test runs stay on the local machine. Do NOT
git commit
/
git push
/create remote branches (GitLab, GitHub, HuggingFace), create merge requests / pull requests / issues, or upload/publish/push Docker images to any registry or artifact store. This follows from the bind-mounted local-clone layout in
references/execution-and-debugging.md
.

所有工作均严格在本地完成。仅可从远程仓库读取/克隆代码;所有文件编辑、Docker构建和测试运行均需在本地机器上进行。禁止执行
git commit
/
git push
、创建远程分支(GitLab、GitHub、HuggingFace)、创建合并请求/拉取请求/议题,或向任何镜像仓库或制品库上传/发布/推送Docker镜像。此规则遵循
references/execution-and-debugging.md
中绑定挂载本地克隆的布局要求。

Submodule Override & Execution Platform

子模块覆盖与执行平台

local-docker
is the default platform. The user clones the four TAO repos (
tao-core
,
tao-pytorch
,
tao-deploy
,
tao-dataservices
) independently into one working directory; each repo also carries nested
tao-core/
(and
tao-pytorch/
) submodules pinned at the original unmodified commit that are stale — modifications live only in the top-level
tao-core/
. Always install from the top-level
tao-core/
, never from
<repo>/tao-core/
(the nested submodule silently drops all modifications). The override of the CI
pip install tao-core/
is three rules: mount the whole working directory (
-v $(pwd):/workspace
);
pip install /workspace/tao-core
FIRST so modified schemas win; put top-level tao-core first on
PYTHONPATH
(
-e PYTHONPATH=/workspace/tao-core:/workspace/tao-pytorch
).
Every test, smoke run, and end-to-end validation runs inside a locally prepared TAO Toolkit container (
tao-pytorch-base:latest
,
tao-deploy-base:latest
, optionally
tao-dataservices-base:latest
, all from Phase 0), with local clones bind-mounted at
/workspace
and installed via
pip install /workspace/tao-core
+
setup.py develop
. All Python work runs in containers — no host venvs, no host
pip install
s. The platform skills own the how of running containers — host GPU runtime via
tao-setup-nvidia-gpu-host
;
docker run
flags / NGC auth / mounts / env passthrough /
--ipc=host
/
--shm-size
/ inspection / error modes via
tao-run-on-docker
and
tao-run-on-local-docker
. This workflow specifies only what to run inside them and never forks those conventions. The annotated working-directory tree, canonical
docker run
flag set with the workflow-specific
-w
/
PYTHONPATH
/install-shell additions, three isolation contexts, four isolation rules, the Development Loop, and the Debugging Playbook table:
references/execution-and-debugging.md
.

local-docker
为默认执行平台。用户需将四个TAO仓库(
tao-core
tao-pytorch
tao-deploy
tao-dataservices
)独立克隆到同一个工作目录;每个仓库还包含嵌套的
tao-core/
(及
tao-pytorch/
子模块,这些子模块固定在原始未修改的提交版本,已过时——所有修改仅需在顶层
tao-core/
中进行。始终从顶层
tao-core/
安装,绝不要从
<repo>/tao-core/
(嵌套子模块)安装
,否则会静默丢弃所有修改。覆盖CI的
pip install tao-core/
规则包含三点:挂载整个工作目录(
-v $(pwd):/workspace
);优先执行
pip install /workspace/tao-core
,确保修改后的 schema 生效;将顶层tao-core置于
PYTHONPATH
的最前面(
-e PYTHONPATH=/workspace/tao-core:/workspace/tao-pytorch
)。
所有测试、冒烟测试和端到端验证均在本地准备的TAO Toolkit容器(
tao-pytorch-base:latest
tao-deploy-base:latest
,可选
tao-dataservices-base:latest
,均来自第0阶段)中运行,本地克隆代码绑定挂载到
/workspace
,并通过
pip install /workspace/tao-core
+
setup.py develop
安装。所有Python工作均在容器内运行——禁止使用主机虚拟环境或主机
pip install
。平台技能负责容器运行的具体方式:通过
tao-setup-nvidia-gpu-host
配置主机GPU运行时;通过
tao-run-on-docker
tao-run-on-local-docker
处理
docker run
参数/NGC认证/挂载/环境变量传递/
--ipc=host
/
--shm-size
/检查/错误模式。本工作流仅指定容器内运行的内容,不会修改这些约定。带注释的工作目录树、包含工作流特定
-w
/
PYTHONPATH
/安装脚本补充的标准
docker run
参数集、三种隔离上下文、四条隔离规则、开发循环调试手册表格,请参阅:
references/execution-and-debugging.md

Phase Map

阶段映射

The seven phases (full goals + gates below; references per phase):
  • Phase 0 — Prerequisites + TAO Toolkit images + local image tags: phase-0-prereqs.md
  • Phase 1 — HF-inspection environment, validate HF model + dataset: phase-1-inspection.md, hf-inspection.md
  • Phase 2 — Closest existing TAO reference model: phase-2-codebase.md, task-type-guide.md
  • Phase 3 — tao-core config + tao-pytorch trainer / native eval / inference: phase-3-implementation.md, tao-patterns.md, repo-structure.md
  • Phase 4 — ONNX export + tao-deploy TRT engine, inference, evaluation: phase-4-deploy.md
  • Phase 5 — Packaging (
    setup.py
    console_scripts) + L0 tests: phase-5-packaging.md
  • Phase 6 — Container-based testing + end-to-end pipeline validation: phase-6-container-tests.md, docker-patterns.md
  • Phase 7 — (conditional) Accuracy / latency / size tuning: phase-7-optimization.md
IMPORTANT — Continuous Execution Through Phase 6: Do NOT stop after implementation (Phases 3–5) to wait for the user to run tests; immediately proceed to the mandatory Phase 6. The implementation is not complete until tests pass inside the TAO Toolkit containers and the end-to-end pipeline is validated. Apply the build-test-debug loop at every step — write, test immediately, fix on failure, never accumulate untested code.

七个阶段(以下为完整目标及检查点;各阶段对应参考文件):
  • 第0阶段 — 先决条件 + TAO Toolkit镜像 + 本地镜像标签:phase-0-prereqs.md
  • 第1阶段 — HF模型检查环境、验证HF模型+数据集:phase-1-inspection.mdhf-inspection.md
  • 第2阶段 — 最匹配的现有TAO参考模型:phase-2-codebase.mdtask-type-guide.md
  • 第3阶段 — tao-core配置 + tao-pytorch训练器/原生评估/推理:phase-3-implementation.mdtao-patterns.mdrepo-structure.md
  • 第4阶段 — ONNX导出 + tao-deploy TRT引擎、推理、评估:phase-4-deploy.md
  • 第5阶段 — 打包(
    setup.py
    console_scripts) + L0测试:phase-5-packaging.md
  • 第6阶段 — 基于容器的测试 + 端到端流水线验证:phase-6-container-tests.mddocker-patterns.md
  • 第7阶段 — (可选)精度/延迟/模型大小调优:phase-7-optimization.md
重要提示——持续执行至第6阶段: 完成实现(第3-5阶段)后请勿停止,等待用户运行测试;需立即进入强制的第6阶段。只有当代码在TAO Toolkit容器内测试通过,且端到端流水线验证完成后,实现工作才算结束。每一步都要应用构建-测试-调试循环——编写代码后立即测试,失败则修复,绝不累积未测试的代码。

Phase 0 — Prerequisites Check

第0阶段 — 先决条件检查

Goal: verify Python 3.10+ and
git
; delegate the NVIDIA driver / CUDA / Docker / NVIDIA Container Toolkit host check to
tao-setup-nvidia-gpu-host
; verify NGC
docker login
for
nvcr.io
. Then ask the user for the TAO Toolkit image references (tao-pytorch, tao-deploy, optionally tao-dataservices), pull them, and prepare local image tags
tao-pytorch-base:latest
,
tao-deploy-base:latest
,
tao-dataservices-base:latest
for Phases 3–6. Preparation strips the released TAO packages already in those images so the user's local clones (mounted at
/workspace/...
) install and get picked up at run time. Hard stop if any check fails. Full commands, user-prompt wording, and per-image preparation
Dockerfile
snippets: phase-0-prereqs.md.
Gate: all prerequisite checks pass; the user has supplied the required image references;
tao-pytorch-base:latest
and
tao-deploy-base:latest
exist locally;
tao-dataservices-base:latest
exists if dataservices work is expected.

目标: 验证Python 3.10+和
git
是否安装;将NVIDIA驱动/CUDA/Docker/NVIDIA容器工具包的主机检查工作委托给
tao-setup-nvidia-gpu-host
;验证NGC的
docker login
是否可访问
nvcr.io
。然后询问用户获取TAO Toolkit镜像引用(tao-pytorch、tao-deploy,可选tao-dataservices),拉取镜像,并为第3-6阶段准备本地镜像标签
tao-pytorch-base:latest
tao-deploy-base:latest
tao-dataservices-base:latest
。准备过程会移除这些镜像中已有的TAO发布包,确保用户的本地克隆代码(挂载到
/workspace/...
)在运行时被安装并优先加载。若任何检查失败,需立即终止流程。完整命令、用户提示话术、每个镜像的准备
Dockerfile
片段,请参阅:phase-0-prereqs.md
检查点: 所有先决条件检查通过;用户已提供所需的镜像引用;
tao-pytorch-base:latest
tao-deploy-base:latest
已存在于本地;若需使用dataservices,则
tao-dataservices-base:latest
也需存在。

Phase 1 — Information Gathering & Validation

第1阶段 — 信息收集与验证

Goal: decide whether to proceed. Gather credentials, locate (or clone) the four TAO repos and create a consistent local working branch across them, launch the long-lived
tao-hf-inspect
container (isolation Context A), validate that the HF model is a CV model with a supported
pipeline_tag
, extract config + state-dict schema, sanity-check ONNX export, and clean up. Full step-by-step (1.1–1.7): phase-1-inspection.md; generic patterns: hf-inspection.md.
Reject if
pipeline_tag
is NLP / audio / LLM (out of CV scope),
AutoConfig
raises, or ONNX export fundamentally cannot work and has no rewrite path.
Gate: all 4 TAO repos located/cloned with a consistent working branch;
pipeline_tag
confirmed CV;
model_type
,
image_size
,
hidden_size
,
num_labels
extracted; state-dict keys documented and the HF→TAO remapping plan drafted; ONNX sanity check passed (or failure mode understood); user confirmed
model_short_name
and task type. Present findings and confirm before proceeding.

目标: 判断是否可继续推进。收集凭证,定位(或克隆)四个TAO仓库并在所有仓库中创建一致的本地工作分支,启动长期运行的
tao-hf-inspect
容器(隔离上下文A),验证HF模型是否为支持
pipeline_tag
的CV模型,提取配置+状态字典 schema,检查ONNX导出的可行性,最后清理环境。完整分步流程(1.1–1.7)请参阅:phase-1-inspection.md;通用模式请参阅:hf-inspection.md
拒绝条件:
pipeline_tag
为NLP/音频/LLM(超出CV范围)、
AutoConfig
报错,或ONNX导出从根本上无法实现且无重构路径,则拒绝执行。
检查点: 已定位/克隆4个TAO仓库并创建一致的工作分支;确认
pipeline_tag
属于CV领域;提取
model_type
image_size
hidden_size
num_labels
;记录状态字典键并草拟HF→TAO的映射方案;ONNX导出检查通过(或已明确失败原因);用户确认
model_short_name
和任务类型。需向用户展示结果并确认后再继续。

Phase 2 — Codebase Exploration

第2阶段 — 代码库探索

Goal: find the closest existing TAO reference model for the detected
pipeline_tag
(classification →
classification_pyt
, detection →
dino
/
rtdetr
, segmentation →
segformer
, instance →
mask2former
, panoptic →
oneformer
, zero-shot →
grounding_dino
, depth →
mono_depth
), read its full implementation across
tao-core
,
tao-pytorch
, and
tao-deploy
, and decide whether the backbone already exists in
backbone_v2/
. The chosen reference drives everything downstream — config structure, architecture, loss, ONNX export shape, TRT builder, deploy inferencer/loader, metrics, dataset format. The full reference list (12 files per model), the
backbone_v2/
coverage check (it already provides
vit
,
swin
,
resnet
,
dino_v2
, and others), and the
tao-dataservices
coverage check: phase-2-codebase.md; per-task details: task-type-guide.md.
If a new backbone is needed, decide the strategy (timm wrap > re-implement from scratch > HF black-box wrap) before Phase 3 — it changes weight loading, ONNX export, and the deploy pipeline. Never dual-inherit from
transformers.PreTrainedModel
and
BackboneBase
(metaclass conflict).
Gate: reference TAO model identified and all 12 locations read; task-type implications understood (architecture, loss, ONNX outputs, deploy classes, metrics, dataset); backbone coverage decided (reuse / wrap timm / new); dataservices coverage checked.

目标: 为检测到的
pipeline_tag
找到最匹配的现有TAO参考模型(分类→
classification_pyt
、检测→
dino
/
rtdetr
、分割→
segformer
、实例分割→
mask2former
、全景分割→
oneformer
、零样本检测→
grounding_dino
、深度估计→
mono_depth
),阅读其在
tao-core
tao-pytorch
tao-deploy
中的完整实现,并判断
backbone_v2/
中是否已存在对应的骨干网络。所选参考模型将决定后续所有工作——配置结构、架构、损失函数、ONNX导出形状、TRT构建器、部署推理器/加载器、指标、数据集格式。完整参考列表(每个模型对应12个文件)、
backbone_v2/
覆盖范围检查(已包含
vit
swin
resnet
dino_v2
等)、
tao-dataservices
覆盖范围检查,请参阅:phase-2-codebase.md;各任务细节请参阅:task-type-guide.md
若需要新的骨干网络,需在第3阶段前确定策略(timm封装 > 从零重构 > HF黑盒封装)——这会影响权重加载、ONNX导出和部署流水线。禁止同时继承
transformers.PreTrainedModel
BackboneBase
(元类冲突)。
检查点: 已确定参考TAO模型并阅读其所有12个位置的代码;理解任务类型的影响(架构、损失函数、ONNX输出、部署类、指标、数据集);确定骨干网络的覆盖方案(复用/封装timm/新增);完成dataservices覆盖范围检查。

Phase 3 — TAO Core Configuration & Native Implementation

第3阶段 — TAO Core配置与原生实现

Goal: write the tao-core config schema and the tao-pytorch trainer + native inference + native evaluation, smoke-testing in between. Use
<model_name>
(
snake_case
from Phase 1) and
<ModelName>
(
PascalCase
). Seven steps: (1)
tao-core
config under
config/<model_name>/
ExperimentConfig(CommonExperimentConfig)
MUST contain
model
,
dataset
,
train
,
evaluate
,
inference
,
export
,
gen_trt_engine
,
quantize
; (2)
tao-pytorch
trainer under
cv/<model_name>/
(
build_model()
,
<ModelName>PlModel(TAOLightningModule)
,
train.py
, entrypoint,
experiment_spec.yaml
; new backbone → add+register
cv/backbone_v2/<backbone_name>.py
); (3) multi-GPU/multi-node via the entrypoint's
launch()
; (4) native inference →
result.csv
; (5) native evaluation →
results.json
; (6–7) MLOps wiring (
@monitor_status
status.json
). Consistency rules (including
export.onnx_file
vs
gen_trt_engine.onnx_file
and
???
= required
MISSING
) are enforced by the Cross-Phase checklist below.
Full per-step code and the canonical
experiment_spec.yaml
: phase-3-implementation.md (with snippets tao-patterns.md, layout repo-structure.md, per-task task-type-guide.md).
Gates: Step 1 —
ExperimentConfig
imports cleanly in the container; Step 2 —
build_model(cfg)
runs and the PLModel instantiates; overall — all 7 steps complete, smoke tests pass, no missing
__init__.py
.

目标: 编写tao-core配置schema和tao-pytorch训练器+原生推理+原生评估代码,并在过程中进行冒烟测试。使用
<model_name>
(第1阶段确定的蛇形命名)和
<ModelName>
(大驼峰命名)。包含七个步骤:(1) 在
config/<model_name>/
下编写tao-core配置——
ExperimentConfig(CommonExperimentConfig)
必须包含
model
dataset
train
evaluate
inference
export
gen_trt_engine
quantize
;(2) 在
cv/<model_name>/
下编写tao-pytorch训练器(
build_model()
<ModelName>PlModel(TAOLightningModule)
train.py
、入口点、
experiment_spec.yaml
;若需新增骨干网络,则在
cv/backbone_v2/<backbone_name>.py
中添加并注册);(3) 通过入口点的
launch()
实现多GPU/多节点训练;(4) 原生推理→生成
result.csv
;(5) 原生评估→生成
results.json
;(6–7) MLOps集成(
@monitor_status
→生成
status.json
)。一致性规则(包括
export.onnx_file
gen_trt_engine.onnx_file
的区别,以及
???
表示必填的
MISSING
项)由下文的跨阶段检查清单强制执行。
完整分步代码和标准
experiment_spec.yaml
请参阅:phase-3-implementation.md(含代码片段tao-patterns.md、目录结构repo-structure.md、各任务细节task-type-guide.md)。
检查点: 步骤1——
ExperimentConfig
可在容器中正常导入;步骤2——
build_model(cfg)
可运行且PLModel可实例化;整体——所有7个步骤完成,冒烟测试通过,无缺失的
__init__.py
文件。

Phase 4 — Export, Deployment & TensorRT Integration

第4阶段 — 导出、部署与TensorRT集成

Goal: ship ONNX export from tao-pytorch, then a TRT engine builder + TRT inference + TRT evaluation in tao-deploy that reuse the tao-core
ExperimentConfig
. Four steps (8–11): ONNX export (
scripts/export.py
, per-task input/output names,
batch_size=-1
⇒ dynamic batch); TRT engine builder (
gen_trt_engine.py
, subclasses
EngineBuilder
or reuses
ClassificationEngineBuilder
, writes
specs/{gen_trt_engine,inference,evaluate}.yaml
); TRT inference (NumPy-only
ClassificationLoader
result.csv
); TRT evaluation (sklearn/pycocotools →
results.json
). Full code and the Phase 3+4 gate: phase-4-deploy.md.
Module pitfall: tao-pytorch and tao-deploy have separate
hydra_runner
and
monitor_status
implementations — use the deploy versions in deploy scripts;
ExperimentConfig
is imported from
nvidia_tao_core
in both repos (same schema, same field paths).
Phase 3+4 gate: all three in-container checks pass —
tao-pytorch
imports + model + ONNX export, and
tao-deploy
imports.

目标: 实现从tao-pytorch导出ONNX,然后在tao-deploy中实现TRT引擎构建+TRT推理+TRT评估,且复用tao-core的
ExperimentConfig
。包含四个步骤(8–11):ONNX导出(
scripts/export.py
,按任务定义输入/输出名称,
batch_size=-1
⇒动态批量);TRT引擎构建(
gen_trt_engine.py
,继承
EngineBuilder
或复用
ClassificationEngineBuilder
,编写
specs/{gen_trt_engine,inference,evaluate}.yaml
);TRT推理(仅使用NumPy的
ClassificationLoader
→生成
result.csv
);TRT评估(使用sklearn/pycocotools→生成
results.json
)。完整代码和第3+4阶段检查点请参阅:phase-4-deploy.md
模块陷阱:tao-pytorch和tao-deploy的
hydra_runner
monitor_status
实现相互独立——在部署脚本中需使用deploy版本;
ExperimentConfig
在两个仓库中均从
nvidia_tao_core
导入(相同schema,相同字段路径)。
第3+4阶段检查点: 三项容器内检查均通过——tao-pytorch可导入+模型可运行+ONNX导出成功,且tao-deploy可正常导入。

Phase 5 — Packaging & L0 Testing

第5阶段 — 打包与L0测试

Goal: register the model as a
'<model_name>=...entrypoint.<model_name>:main'
console_script in both
tao-pytorch/setup.py
and
tao-deploy/setup.py
(deploy entrypoint uses
nvidia_tao_deploy.cv.common.entrypoint.entrypoint_hydra
), and add L0 tests — deploy tests (
tao-deploy/tests/<model_name>/
, subprocess +
--buildOnly
trtexec
) and trainer tests (
tao-pytorch/tests/cv_unit_test/<model_name>/
,
Trainer(..., fast_dev_run=True)
, markers
@pytest.mark.cv_unit @pytest.mark.<model_name>
). Full code and test layout: phase-5-packaging.md.
Gate: entrypoints registered; pytest files exist and follow the marker convention. Do NOT stop here — proceed directly to Phase 6.

目标:
tao-pytorch/setup.py
tao-deploy/setup.py
中注册模型为
'<model_name>=...entrypoint.<model_name>:main'
控制台脚本(部署入口点使用
nvidia_tao_deploy.cv.common.entrypoint.entrypoint_hydra
),并添加L0测试——部署测试(
tao-deploy/tests/<model_name>/
,子进程+
--buildOnly
trtexec
)和训练器测试(
tao-pytorch/tests/cv_unit_test/<model_name>/
Trainer(..., fast_dev_run=True)
,标记
@pytest.mark.cv_unit @pytest.mark.<model_name>
)。完整代码和测试目录结构请参阅:phase-5-packaging.md
检查点: 入口点已注册;pytest文件已创建且符合标记约定。请勿在此处停止——直接进入第6阶段

Cross-Phase Data Flow & Consistency Verification

跨阶段数据流与一致性验证

Before Docker testing, verify the artifact chain —
train
produces
<results_dir>/train/<model_name>_model_latest.pth
export.checkpoint
<results_dir>/export/<model_name>.onnx
gen_trt_engine
<results_dir>/trt/<model_name>.engine
inference.trt_engine
/
evaluate.trt_engine
. Then confirm the consistency checklist: the
*_latest.pth
name;
augmentation.mean
/
std
matching across the training spec,
inference.yaml
,
evaluate.yaml
, and builder
preprocess_mode
; ONNX
input_names
/
output_names
;
export.input_width
/
input_height
vs
dataset.img_size
;
model.head.in_channels
vs
model_params_mapping.py
; shared
classes.txt
; and an
__init__.py
in every package dir (including
scripts/__init__.py
for
get_subtasks()
pkgutil
discovery). Full interpolation paths, itemized checklist, and config field paths: workflow-consistency.md.

在Docker测试前,需验证工件链——
train
生成
<results_dir>/train/<model_name>_model_latest.pth
export.checkpoint
<results_dir>/export/<model_name>.onnx
gen_trt_engine
<results_dir>/trt/<model_name>.engine
inference.trt_engine
/
evaluate.trt_engine
。然后确认一致性检查清单:
*_latest.pth
的命名;
augmentation.mean
/
std
在训练配置、
inference.yaml
evaluate.yaml
和构建器
preprocess_mode
中的一致性;ONNX的
input_names
/
output_names
export.input_width
/
input_height
dataset.img_size
的匹配性;
model.head.in_channels
model_params_mapping.py
的匹配性;共享的
classes.txt
;每个包目录(包括
scripts/__init__.py
,用于
get_subtasks()
pkgutil
发现)中均存在
__init__.py
。完整插值路径、逐项检查清单和配置字段路径请参阅:workflow-consistency.md

Phase 6 — Container Testing & End-to-End Validation

第6阶段 — 容器测试与端到端验证

Mandatory — start immediately after Phase 5. All TAO models ship as Docker images; code that only works outside a container is incomplete. Testing runs directly inside the TAO Toolkit container (no Docker image build in the test loop): mount the local source into the Phase-0 image tags, install via
setup.py develop
, and invoke
pytest
/
pylint
/
pydocstyle
/
flake8
directly — use vanilla
pytest
+ lint binaries, NOT any
ci/run_functional_tests.py
/
ci/run_static_tests.py
wrappers (those exist only in NVIDIA's internal mirrors; the public
github.com/NVIDIA-TAO/
mirrors have no
ci/
directory).
Steps 16–25, in order: verify the local image tags (16); container
pytest
for tao-core (17), tao-pytorch (18,
-m cv_unit
,
--shm-size=16G
), tao-deploy (19); static/lint tests (20,
pylint --errors-only
+ optional
pydocstyle
/
flake8
); wheel builds (21); the end-to-end pipeline (22 — train dry-run + export in one tao-pytorch session, then gen_trt_engine + inference + evaluate in one tao-deploy session, since
--rm
discards installed packages); native-vs-TRT cross-check (23 — FP32 ≈ exact, FP16 ≈ small delta, divergence ⇒ ONNX/TRT issue); interactive debug shells (24); optional release Docker image build (25, distribution-only). Full per-step commands and the fix-and-retest loop: phase-6-container-tests.md; build scripts, runner patterns, requirements, CI conventions: docker-patterns.md.
Phase 6 gate (Done criteria): tao-core / tao-pytorch / tao-deploy unit tests pass in their TAO Toolkit containers; static tests pass (or only legacy lint warnings); wheels build; end-to-end
<model_name>_model_latest.pth
model.onnx
model.engine
→ non-empty
result.csv
and
results.json
; native vs TRT predictions agree within tolerance.

强制要求——完成第5阶段后立即启动。所有TAO模型均以Docker镜像形式交付;仅在容器外运行的代码视为未完成。测试需直接在TAO Toolkit容器内运行(测试循环中无需构建Docker镜像):将本地源码挂载到第0阶段的镜像标签中,通过
setup.py develop
安装,直接调用
pytest
/
pylint
/
pydocstyle
/
flake8
——使用原生
pytest
和 lint 二进制文件,请勿使用任何
ci/run_functional_tests.py
/
ci/run_static_tests.py
封装脚本(这些仅存在于NVIDIA内部镜像;公开的
github.com/NVIDIA-TAO/
镜像无
ci/
目录)。
按顺序执行步骤16–25:验证本地镜像标签(16);在容器内运行tao-core的
pytest
(17)、tao-pytorch的
pytest
(18,
-m cv_unit
--shm-size=16G
)、tao-deploy的
pytest
(19);静态/lint测试(20,
pylint --errors-only
+可选
pydocstyle
/
flake8
);构建wheel包(21);端到端流水线(22——在同一个tao-pytorch会话中完成训练试运行+导出,然后在同一个tao-deploy会话中完成gen_trt_engine+推理+评估,因为
--rm
会丢弃已安装的包);原生与TRT结果交叉校验(23——FP32结果几乎完全一致,FP16结果差异微小,若出现分歧则说明ONNX/TRT存在问题);交互式调试shell(24);可选的发布Docker镜像构建(25,仅用于分发)。完整分步命令和修复-重测循环请参阅:phase-6-container-tests.md;构建脚本、运行模式、依赖、CI约定请参阅:docker-patterns.md
第6阶段检查点(完成标准): tao-core/tao-pytorch/tao-deploy的单元测试在各自的TAO Toolkit容器内通过;静态测试通过(或仅存在遗留lint警告);wheel包构建成功;端到端流程生成
<model_name>_model_latest.pth
model.onnx
model.engine
→非空的
result.csv
results.json
;原生与TRT预测结果在容差范围内一致。

Phase 7 — Optimization & Tuning (conditional)

第7阶段 — 优化与调优(可选)

Enter only if Phase 6 passes but accuracy / latency / model size needs improvement. Ask the user for target metrics first. Diagnose (Step 26) across four categories — accuracy too low, TRT-vs-native gap, training too slow, inference too slow — then apply the relevant technique: hyperparameter tuning (27), INT8 quantization (28), channel pruning + retrain (29), knowledge distillation (30), or resolution tuning (31). Full diagnostics, config blocks, YAML overrides, and decision tree: phase-7-optimization.md.

仅当第6阶段通过,但精度/延迟/模型大小需要改进时进入此阶段。需先询问用户目标指标。从四个维度进行诊断(步骤26)——精度过低、TRT与原生结果差距大、训练过慢、推理过慢,然后应用相应的技术:超参数调优(27)、INT8量化(28)、通道剪枝+重训练(29)、知识蒸馏(30)或分辨率调优(31)。完整诊断方法、配置块、YAML覆盖规则和决策树请参阅:phase-7-optimization.md

Argument

参数

$ARGUMENTS
If provided, interpret
$ARGUMENTS
as the HuggingFace model ID or URL to use as the starting point for Phase 1. If credentials or model short-name are not included, ask the user for them before proceeding.
$ARGUMENTS
若提供该参数,将
$ARGUMENTS
解释为HuggingFace模型ID或URL,作为第1阶段的起点。若未包含凭证或模型简称,需先询问用户获取相关信息再继续。