tao-port-huggingface-model

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

TAO-HF Integration Skill

TAO-HF集成技能

Integrate a HuggingFace (HF) Computer Vision model into the NVIDIA TAO Toolkit ecosystem. Work the phases iteratively — not purely linearly — following a build → test → debug → fix → retest loop at every step.

This SKILL.md is the workflow coordinator. Each phase has a dedicated reference file under

references/

with the full step-by-step content, code blocks, docker invocations, and gates. Read the matching reference at the start of each phase — the summaries below are not sufficient on their own.

将HuggingFace（HF）计算机视觉模型集成到NVIDIA TAO Toolkit生态系统中。各阶段需迭代执行——而非单纯线性推进——每一步都遵循构建→测试→调试→修复→重新测试的循环。

本SKILL.md为工作流协调器。每个阶段在

references/

目录下都有对应的参考文件，包含完整的分步内容、代码块、Docker调用指令和检查点。在开始每个阶段前，请阅读对应的参考文件——以下仅为摘要，无法替代完整参考内容。

Local-Only Rule

本地唯一规则

All work is strictly local. You may only read/clone from remotes; all file edits, Docker builds, and test runs stay on the local machine. Do NOT

git commit

git push

/create remote branches (GitLab, GitHub, HuggingFace), create merge requests / pull requests / issues, or upload/publish/push Docker images to any registry or artifact store. This follows from the bind-mounted local-clone layout in

references/execution-and-debugging.md

所有工作均严格在本地完成。仅可从远程仓库读取/克隆代码；所有文件编辑、Docker构建和测试运行均需在本地机器上进行。禁止执行

git commit

git push

、创建远程分支（GitLab、GitHub、HuggingFace）、创建合并请求/拉取请求/议题，或向任何镜像仓库或制品库上传/发布/推送Docker镜像。此规则遵循

references/execution-and-debugging.md

中绑定挂载本地克隆的布局要求。

Submodule Override & Execution Platform

子模块覆盖与执行平台

local-docker

is the default platform. The user clones the four TAO repos (

tao-core

tao-pytorch

tao-deploy

tao-dataservices

) independently into one working directory; each repo also carries nested

tao-core/

(and

tao-pytorch/

) submodules pinned at the original unmodified commit that are stale — modifications live only in the top-level

tao-core/

. Always install from the top-level
tao-core/
, never from
<repo>/tao-core/
(the nested submodule silently drops all modifications). The override of the CI

pip install tao-core/

is three rules: mount the whole working directory (

-v $(pwd):/workspace

);

pip install /workspace/tao-core

FIRST so modified schemas win; put top-level tao-core first on

PYTHONPATH

(

-e PYTHONPATH=/workspace/tao-core:/workspace/tao-pytorch

Every test, smoke run, and end-to-end validation runs inside a locally prepared TAO Toolkit container (

tao-pytorch-base:latest

tao-deploy-base:latest

, optionally

tao-dataservices-base:latest

, all from Phase 0), with local clones bind-mounted at

/workspace

and installed via

pip install /workspace/tao-core

setup.py develop

. All Python work runs in containers — no host venvs, no host

pip install

s. The platform skills own the how of running containers — host GPU runtime via

tao-setup-nvidia-gpu-host

;

docker run

flags / NGC auth / mounts / env passthrough /

--ipc=host

--shm-size

/ inspection / error modes via

tao-run-on-docker

and

tao-run-on-local-docker

. This workflow specifies only what to run inside them and never forks those conventions. The annotated working-directory tree, canonical

docker run

flag set with the workflow-specific

-w

PYTHONPATH

/install-shell additions, three isolation contexts, four isolation rules, the Development Loop, and the Debugging Playbook table:

references/execution-and-debugging.md

local-docker

为默认执行平台。用户需将四个TAO仓库（

tao-core

、

tao-pytorch

、

tao-deploy

、

tao-dataservices

）独立克隆到同一个工作目录；每个仓库还包含嵌套的

tao-core/

（及

tao-pytorch/

）子模块，这些子模块固定在原始未修改的提交版本，已过时——所有修改仅需在顶层

tao-core/

中进行。始终从顶层
tao-core/
安装，绝不要从
<repo>/tao-core/
（嵌套子模块）安装，否则会静默丢弃所有修改。覆盖CI的

pip install tao-core/

规则包含三点：挂载整个工作目录（

-v $(pwd):/workspace

）；优先执行

pip install /workspace/tao-core

，确保修改后的 schema 生效；将顶层tao-core置于

PYTHONPATH

的最前面（

-e PYTHONPATH=/workspace/tao-core:/workspace/tao-pytorch

）。

所有测试、冒烟测试和端到端验证均在本地准备的TAO Toolkit容器（

tao-pytorch-base:latest

、

tao-deploy-base:latest

，可选

tao-dataservices-base:latest

，均来自第0阶段）中运行，本地克隆代码绑定挂载到

/workspace

，并通过

pip install /workspace/tao-core

setup.py develop

安装。所有Python工作均在容器内运行——禁止使用主机虚拟环境或主机

pip install

。平台技能负责容器运行的具体方式：通过

tao-setup-nvidia-gpu-host

配置主机GPU运行时；通过

tao-run-on-docker

和

tao-run-on-local-docker

处理

docker run

参数/NGC认证/挂载/环境变量传递/

--ipc=host

--shm-size

/检查/错误模式。本工作流仅指定容器内运行的内容，不会修改这些约定。带注释的工作目录树、包含工作流特定

-w

PYTHONPATH

/安装脚本补充的标准

docker run

参数集、三种隔离上下文、四条隔离规则、开发循环和调试手册表格，请参阅：

references/execution-and-debugging.md

。

Phase Map

阶段映射

The seven phases (full goals + gates below; references per phase):

Phase 0 — Prerequisites + TAO Toolkit images + local image tags: phase-0-prereqs.md
Phase 1 — HF-inspection environment, validate HF model + dataset: phase-1-inspection.md, hf-inspection.md
Phase 2 — Closest existing TAO reference model: phase-2-codebase.md, task-type-guide.md
Phase 3 — tao-core config + tao-pytorch trainer / native eval / inference: phase-3-implementation.md, tao-patterns.md, repo-structure.md
Phase 4 — ONNX export + tao-deploy TRT engine, inference, evaluation: phase-4-deploy.md
Phase 5 — Packaging (
```
setup.py
```
console_scripts) + L0 tests: phase-5-packaging.md
Phase 6 — Container-based testing + end-to-end pipeline validation: phase-6-container-tests.md, docker-patterns.md
Phase 7 — (conditional) Accuracy / latency / size tuning: phase-7-optimization.md

IMPORTANT — Continuous Execution Through Phase 6: Do NOT stop after implementation (Phases 3–5) to wait for the user to run tests; immediately proceed to the mandatory Phase 6. The implementation is not complete until tests pass inside the TAO Toolkit containers and the end-to-end pipeline is validated. Apply the build-test-debug loop at every step — write, test immediately, fix on failure, never accumulate untested code.

七个阶段（以下为完整目标及检查点；各阶段对应参考文件）：

第0阶段 — 先决条件 + TAO Toolkit镜像 + 本地镜像标签：phase-0-prereqs.md
第1阶段 — HF模型检查环境、验证HF模型+数据集：phase-1-inspection.md、hf-inspection.md
第2阶段 — 最匹配的现有TAO参考模型：phase-2-codebase.md、task-type-guide.md
第3阶段 — tao-core配置 + tao-pytorch训练器/原生评估/推理：phase-3-implementation.md、tao-patterns.md、repo-structure.md
第4阶段 — ONNX导出 + tao-deploy TRT引擎、推理、评估：phase-4-deploy.md
第5阶段 — 打包（
```
setup.py
```
console_scripts） + L0测试：phase-5-packaging.md
第6阶段 — 基于容器的测试 + 端到端流水线验证：phase-6-container-tests.md、docker-patterns.md
第7阶段 — （可选）精度/延迟/模型大小调优：phase-7-optimization.md

重要提示——持续执行至第6阶段： 完成实现（第3-5阶段）后请勿停止，等待用户运行测试；需立即进入强制的第6阶段。只有当代码在TAO Toolkit容器内测试通过，且端到端流水线验证完成后，实现工作才算结束。每一步都要应用构建-测试-调试循环——编写代码后立即测试，失败则修复，绝不累积未测试的代码。

Phase 0 — Prerequisites Check

第0阶段 — 先决条件检查

Goal: verify Python 3.10+ and

git

; delegate the NVIDIA driver / CUDA / Docker / NVIDIA Container Toolkit host check to

tao-setup-nvidia-gpu-host

; verify NGC

docker login

for

nvcr.io

. Then ask the user for the TAO Toolkit image references (tao-pytorch, tao-deploy, optionally tao-dataservices), pull them, and prepare local image tags

tao-pytorch-base:latest

tao-deploy-base:latest

tao-dataservices-base:latest

for Phases 3–6. Preparation strips the released TAO packages already in those images so the user's local clones (mounted at

/workspace/...

) install and get picked up at run time. Hard stop if any check fails. Full commands, user-prompt wording, and per-image preparation

Dockerfile

snippets: phase-0-prereqs.md.

Gate: all prerequisite checks pass; the user has supplied the required image references;

tao-pytorch-base:latest

and

tao-deploy-base:latest

exist locally;

tao-dataservices-base:latest

exists if dataservices work is expected.

目标： 验证Python 3.10+和

git

是否安装；将NVIDIA驱动/CUDA/Docker/NVIDIA容器工具包的主机检查工作委托给

tao-setup-nvidia-gpu-host

；验证NGC的

docker login

是否可访问

nvcr.io

。然后询问用户获取TAO Toolkit镜像引用（tao-pytorch、tao-deploy，可选tao-dataservices），拉取镜像，并为第3-6阶段准备本地镜像标签

tao-pytorch-base:latest

、

tao-deploy-base:latest

、

tao-dataservices-base:latest

。准备过程会移除这些镜像中已有的TAO发布包，确保用户的本地克隆代码（挂载到

/workspace/...

）在运行时被安装并优先加载。若任何检查失败，需立即终止流程。完整命令、用户提示话术、每个镜像的准备

Dockerfile

片段，请参阅：phase-0-prereqs.md。

检查点： 所有先决条件检查通过；用户已提供所需的镜像引用；

tao-pytorch-base:latest

和

tao-deploy-base:latest

已存在于本地；若需使用dataservices，则

tao-dataservices-base:latest

也需存在。

Phase 1 — Information Gathering & Validation

第1阶段 — 信息收集与验证

Goal: decide whether to proceed. Gather credentials, locate (or clone) the four TAO repos and create a consistent local working branch across them, launch the long-lived

tao-hf-inspect

container (isolation Context A), validate that the HF model is a CV model with a supported

pipeline_tag

, extract config + state-dict schema, sanity-check ONNX export, and clean up. Full step-by-step (1.1–1.7): phase-1-inspection.md; generic patterns: hf-inspection.md.

Reject if

pipeline_tag

is NLP / audio / LLM (out of CV scope),

AutoConfig

raises, or ONNX export fundamentally cannot work and has no rewrite path.

Gate: all 4 TAO repos located/cloned with a consistent working branch;

pipeline_tag

confirmed CV;

model_type

image_size

hidden_size

num_labels

extracted; state-dict keys documented and the HF→TAO remapping plan drafted; ONNX sanity check passed (or failure mode understood); user confirmed

model_short_name

and task type. Present findings and confirm before proceeding.

目标： 判断是否可继续推进。收集凭证，定位（或克隆）四个TAO仓库并在所有仓库中创建一致的本地工作分支，启动长期运行的

tao-hf-inspect

容器（隔离上下文A），验证HF模型是否为支持

pipeline_tag

的CV模型，提取配置+状态字典 schema，检查ONNX导出的可行性，最后清理环境。完整分步流程（1.1–1.7）请参阅：phase-1-inspection.md；通用模式请参阅：hf-inspection.md。

拒绝条件： 若

pipeline_tag

为NLP/音频/LLM（超出CV范围）、

AutoConfig

报错，或ONNX导出从根本上无法实现且无重构路径，则拒绝执行。

检查点： 已定位/克隆4个TAO仓库并创建一致的工作分支；确认

pipeline_tag

属于CV领域；提取

model_type

、

image_size

、

hidden_size

、

num_labels

；记录状态字典键并草拟HF→TAO的映射方案；ONNX导出检查通过（或已明确失败原因）；用户确认

model_short_name

和任务类型。需向用户展示结果并确认后再继续。

Phase 2 — Codebase Exploration

第2阶段 — 代码库探索

Goal: find the closest existing TAO reference model for the detected

pipeline_tag

(classification →

classification_pyt

, detection →

dino

rtdetr

, segmentation →

segformer

, instance →

mask2former

, panoptic →

oneformer

, zero-shot →

grounding_dino

, depth →

mono_depth

), read its full implementation across

tao-core

tao-pytorch

, and

tao-deploy

, and decide whether the backbone already exists in

backbone_v2/

. The chosen reference drives everything downstream — config structure, architecture, loss, ONNX export shape, TRT builder, deploy inferencer/loader, metrics, dataset format. The full reference list (12 files per model), the

backbone_v2/

coverage check (it already provides

vit

swin

resnet

dino_v2

, and others), and the

tao-dataservices

coverage check: phase-2-codebase.md; per-task details: task-type-guide.md.

If a new backbone is needed, decide the strategy (timm wrap > re-implement from scratch > HF black-box wrap) before Phase 3 — it changes weight loading, ONNX export, and the deploy pipeline. Never dual-inherit from
transformers.PreTrainedModel
and
BackboneBase
(metaclass conflict).

Gate: reference TAO model identified and all 12 locations read; task-type implications understood (architecture, loss, ONNX outputs, deploy classes, metrics, dataset); backbone coverage decided (reuse / wrap timm / new); dataservices coverage checked.

目标： 为检测到的

pipeline_tag

找到最匹配的现有TAO参考模型（分类→

classification_pyt

、检测→

dino

rtdetr

、分割→

segformer

、实例分割→

mask2former

、全景分割→

oneformer

、零样本检测→

grounding_dino

、深度估计→

mono_depth

），阅读其在

tao-core

、

tao-pytorch

和

tao-deploy

中的完整实现，并判断

backbone_v2/

中是否已存在对应的骨干网络。所选参考模型将决定后续所有工作——配置结构、架构、损失函数、ONNX导出形状、TRT构建器、部署推理器/加载器、指标、数据集格式。完整参考列表（每个模型对应12个文件）、

backbone_v2/

覆盖范围检查（已包含

vit

、

swin

、

resnet

、

dino_v2

等）、

tao-dataservices

覆盖范围检查，请参阅：phase-2-codebase.md；各任务细节请参阅：task-type-guide.md。

若需要新的骨干网络，需在第3阶段前确定策略（timm封装 > 从零重构 > HF黑盒封装）——这会影响权重加载、ONNX导出和部署流水线。禁止同时继承
transformers.PreTrainedModel
和
BackboneBase
（元类冲突）。

检查点： 已确定参考TAO模型并阅读其所有12个位置的代码；理解任务类型的影响（架构、损失函数、ONNX输出、部署类、指标、数据集）；确定骨干网络的覆盖方案（复用/封装timm/新增）；完成dataservices覆盖范围检查。

Phase 3 — TAO Core Configuration & Native Implementation

第3阶段 — TAO Core配置与原生实现

Goal: write the tao-core config schema and the tao-pytorch trainer + native inference + native evaluation, smoke-testing in between. Use

<model_name>

(

snake_case

from Phase 1) and

<ModelName>

(

PascalCase

). Seven steps: (1)

tao-core

config under

config/<model_name>/

—

ExperimentConfig(CommonExperimentConfig)

MUST contain

model

dataset

train

evaluate

inference

export

gen_trt_engine

quantize

; (2)

tao-pytorch

trainer under

cv/<model_name>/

(

build_model()

<ModelName>PlModel(TAOLightningModule)

train.py

, entrypoint,

experiment_spec.yaml

; new backbone → add+register

cv/backbone_v2/<backbone_name>.py

); (3) multi-GPU/multi-node via the entrypoint's

launch()

; (4) native inference →

result.csv

; (5) native evaluation →

results.json

; (6–7) MLOps wiring (

@monitor_status

→

status.json

). Consistency rules (including

export.onnx_file

gen_trt_engine.onnx_file

and

???

= required

MISSING

) are enforced by the Cross-Phase checklist below.

Full per-step code and the canonical

experiment_spec.yaml

: phase-3-implementation.md (with snippets tao-patterns.md, layout repo-structure.md, per-task task-type-guide.md).

Gates: Step 1 —

ExperimentConfig

imports cleanly in the container; Step 2 —

build_model(cfg)

runs and the PLModel instantiates; overall — all 7 steps complete, smoke tests pass, no missing

__init__.py

目标： 编写tao-core配置schema和tao-pytorch训练器+原生推理+原生评估代码，并在过程中进行冒烟测试。使用

<model_name>

（第1阶段确定的蛇形命名）和

<ModelName>

（大驼峰命名）。包含七个步骤：(1) 在

config/<model_name>/

下编写tao-core配置——

ExperimentConfig(CommonExperimentConfig)

必须包含

model

、

dataset

、

train

、

evaluate

、

inference

、

export

、

gen_trt_engine

、

quantize

；(2) 在

cv/<model_name>/

下编写tao-pytorch训练器（

build_model()

、

<ModelName>PlModel(TAOLightningModule)

、

train.py

、入口点、

experiment_spec.yaml

；若需新增骨干网络，则在

cv/backbone_v2/<backbone_name>.py

中添加并注册）；(3) 通过入口点的

launch()

实现多GPU/多节点训练；(4) 原生推理→生成

result.csv

；(5) 原生评估→生成

results.json

；(6–7) MLOps集成（

@monitor_status

→生成

status.json

）。一致性规则（包括

export.onnx_file

与

gen_trt_engine.onnx_file

的区别，以及

???

表示必填的

MISSING

项）由下文的跨阶段检查清单强制执行。

完整分步代码和标准

experiment_spec.yaml

请参阅：phase-3-implementation.md（含代码片段tao-patterns.md、目录结构repo-structure.md、各任务细节task-type-guide.md）。

检查点： 步骤1——

ExperimentConfig

可在容器中正常导入；步骤2——

build_model(cfg)

可运行且PLModel可实例化；整体——所有7个步骤完成，冒烟测试通过，无缺失的

__init__.py

文件。

Phase 4 — Export, Deployment & TensorRT Integration

第4阶段 — 导出、部署与TensorRT集成

Goal: ship ONNX export from tao-pytorch, then a TRT engine builder + TRT inference + TRT evaluation in tao-deploy that reuse the tao-core

ExperimentConfig

. Four steps (8–11): ONNX export (

scripts/export.py

, per-task input/output names,

batch_size=-1

⇒ dynamic batch); TRT engine builder (

gen_trt_engine.py

, subclasses

EngineBuilder

or reuses

ClassificationEngineBuilder

, writes

specs/{gen_trt_engine,inference,evaluate}.yaml

); TRT inference (NumPy-only

ClassificationLoader

→

result.csv

); TRT evaluation (sklearn/pycocotools →

results.json

). Full code and the Phase 3+4 gate: phase-4-deploy.md.

Module pitfall: tao-pytorch and tao-deploy have separate

hydra_runner

and

monitor_status

implementations — use the deploy versions in deploy scripts;

ExperimentConfig

is imported from

nvidia_tao_core

in both repos (same schema, same field paths).

Phase 3+4 gate: all three in-container checks pass —

tao-pytorch

imports + model + ONNX export, and

tao-deploy

imports.

目标： 实现从tao-pytorch导出ONNX，然后在tao-deploy中实现TRT引擎构建+TRT推理+TRT评估，且复用tao-core的

ExperimentConfig

。包含四个步骤（8–11）：ONNX导出（

scripts/export.py

，按任务定义输入/输出名称，

batch_size=-1

⇒动态批量）；TRT引擎构建（

gen_trt_engine.py

，继承

EngineBuilder

或复用

ClassificationEngineBuilder

，编写

specs/{gen_trt_engine,inference,evaluate}.yaml

）；TRT推理（仅使用NumPy的

ClassificationLoader

→生成

result.csv

）；TRT评估（使用sklearn/pycocotools→生成

results.json

）。完整代码和第3+4阶段检查点请参阅：phase-4-deploy.md。

模块陷阱：tao-pytorch和tao-deploy的

hydra_runner

和

monitor_status

实现相互独立——在部署脚本中需使用deploy版本；

ExperimentConfig

在两个仓库中均从

nvidia_tao_core

导入（相同schema，相同字段路径）。

第3+4阶段检查点： 三项容器内检查均通过——tao-pytorch可导入+模型可运行+ONNX导出成功，且tao-deploy可正常导入。

Phase 5 — Packaging & L0 Testing

第5阶段 — 打包与L0测试

Goal: register the model as a

'<model_name>=...entrypoint.<model_name>:main'

console_script in both

tao-pytorch/setup.py

and

tao-deploy/setup.py

(deploy entrypoint uses

nvidia_tao_deploy.cv.common.entrypoint.entrypoint_hydra

), and add L0 tests — deploy tests (

tao-deploy/tests/<model_name>/

, subprocess +

--buildOnly

trtexec

) and trainer tests (

tao-pytorch/tests/cv_unit_test/<model_name>/

Trainer(..., fast_dev_run=True)

, markers

@pytest.mark.cv_unit @pytest.mark.<model_name>

). Full code and test layout: phase-5-packaging.md.

Gate: entrypoints registered; pytest files exist and follow the marker convention. Do NOT stop here — proceed directly to Phase 6.

目标： 在

tao-pytorch/setup.py

和

tao-deploy/setup.py

中注册模型为

'<model_name>=...entrypoint.<model_name>:main'

控制台脚本（部署入口点使用

nvidia_tao_deploy.cv.common.entrypoint.entrypoint_hydra

），并添加L0测试——部署测试（

tao-deploy/tests/<model_name>/

，子进程+

--buildOnly

trtexec

）和训练器测试（

tao-pytorch/tests/cv_unit_test/<model_name>/

，

Trainer(..., fast_dev_run=True)

，标记

@pytest.mark.cv_unit @pytest.mark.<model_name>

）。完整代码和测试目录结构请参阅：phase-5-packaging.md。

检查点： 入口点已注册；pytest文件已创建且符合标记约定。请勿在此处停止——直接进入第6阶段。

Cross-Phase Data Flow & Consistency Verification

跨阶段数据流与一致性验证

Before Docker testing, verify the artifact chain —

train

produces

<results_dir>/train/<model_name>_model_latest.pth

→

export.checkpoint

→

<results_dir>/export/<model_name>.onnx

→

gen_trt_engine

→

<results_dir>/trt/<model_name>.engine

→

inference.trt_engine

evaluate.trt_engine

. Then confirm the consistency checklist: the

*_latest.pth

name;

augmentation.mean

std

matching across the training spec,

inference.yaml

evaluate.yaml

, and builder

preprocess_mode

; ONNX

input_names

output_names

;

export.input_width

input_height

dataset.img_size

;

model.head.in_channels

model_params_mapping.py

; shared

classes.txt

; and an

__init__.py

in every package dir (including

scripts/__init__.py

for

get_subtasks()

pkgutil

discovery). Full interpolation paths, itemized checklist, and config field paths: workflow-consistency.md.

在Docker测试前，需验证工件链——

train

生成

<results_dir>/train/<model_name>_model_latest.pth

→

export.checkpoint

→

<results_dir>/export/<model_name>.onnx

→

gen_trt_engine

→

<results_dir>/trt/<model_name>.engine

→

inference.trt_engine

evaluate.trt_engine

。然后确认一致性检查清单：

*_latest.pth

的命名；

augmentation.mean

std

在训练配置、

inference.yaml

、

evaluate.yaml

和构建器

preprocess_mode

中的一致性；ONNX的

input_names

output_names

；

export.input_width

input_height

与

dataset.img_size

的匹配性；

model.head.in_channels

与

model_params_mapping.py

的匹配性；共享的

classes.txt

；每个包目录（包括

scripts/__init__.py

，用于

get_subtasks()

的

pkgutil

发现）中均存在

__init__.py

。完整插值路径、逐项检查清单和配置字段路径请参阅：workflow-consistency.md。

Phase 6 — Container Testing & End-to-End Validation

第6阶段 — 容器测试与端到端验证

Mandatory — start immediately after Phase 5. All TAO models ship as Docker images; code that only works outside a container is incomplete. Testing runs directly inside the TAO Toolkit container (no Docker image build in the test loop): mount the local source into the Phase-0 image tags, install via

setup.py develop

, and invoke

pytest

pylint

pydocstyle

flake8

directly — use vanilla

pytest

+ lint binaries, NOT any

ci/run_functional_tests.py

ci/run_static_tests.py

wrappers (those exist only in NVIDIA's internal mirrors; the public

github.com/NVIDIA-TAO/

mirrors have no

ci/

directory).

Steps 16–25, in order: verify the local image tags (16); container

pytest

for tao-core (17), tao-pytorch (18,

-m cv_unit

--shm-size=16G

), tao-deploy (19); static/lint tests (20,

pylint --errors-only

+ optional

pydocstyle

flake8

); wheel builds (21); the end-to-end pipeline (22 — train dry-run + export in one tao-pytorch session, then gen_trt_engine + inference + evaluate in one tao-deploy session, since

--rm

discards installed packages); native-vs-TRT cross-check (23 — FP32 ≈ exact, FP16 ≈ small delta, divergence ⇒ ONNX/TRT issue); interactive debug shells (24); optional release Docker image build (25, distribution-only). Full per-step commands and the fix-and-retest loop: phase-6-container-tests.md; build scripts, runner patterns, requirements, CI conventions: docker-patterns.md.

Phase 6 gate (Done criteria): tao-core / tao-pytorch / tao-deploy unit tests pass in their TAO Toolkit containers; static tests pass (or only legacy lint warnings); wheels build; end-to-end

<model_name>_model_latest.pth

→

model.onnx

→

model.engine

→ non-empty

result.csv

and

results.json

; native vs TRT predictions agree within tolerance.

强制要求——完成第5阶段后立即启动。所有TAO模型均以Docker镜像形式交付；仅在容器外运行的代码视为未完成。测试需直接在TAO Toolkit容器内运行（测试循环中无需构建Docker镜像）：将本地源码挂载到第0阶段的镜像标签中，通过

setup.py develop

安装，直接调用

pytest

pylint

pydocstyle

flake8

——使用原生

pytest

和 lint 二进制文件，请勿使用任何

ci/run_functional_tests.py

ci/run_static_tests.py

封装脚本（这些仅存在于NVIDIA内部镜像；公开的

github.com/NVIDIA-TAO/

镜像无

ci/

目录）。

按顺序执行步骤16–25：验证本地镜像标签（16）；在容器内运行tao-core的

pytest

（17）、tao-pytorch的

pytest

（18，

-m cv_unit

，

--shm-size=16G

）、tao-deploy的

pytest

（19）；静态/lint测试（20，

pylint --errors-only

+可选

pydocstyle

flake8

）；构建wheel包（21）；端到端流水线（22——在同一个tao-pytorch会话中完成训练试运行+导出，然后在同一个tao-deploy会话中完成gen_trt_engine+推理+评估，因为

--rm

会丢弃已安装的包）；原生与TRT结果交叉校验（23——FP32结果几乎完全一致，FP16结果差异微小，若出现分歧则说明ONNX/TRT存在问题）；交互式调试shell（24）；可选的发布Docker镜像构建（25，仅用于分发）。完整分步命令和修复-重测循环请参阅：phase-6-container-tests.md；构建脚本、运行模式、依赖、CI约定请参阅：docker-patterns.md。

第6阶段检查点（完成标准）： tao-core/tao-pytorch/tao-deploy的单元测试在各自的TAO Toolkit容器内通过；静态测试通过（或仅存在遗留lint警告）；wheel包构建成功；端到端流程生成

<model_name>_model_latest.pth

→

model.onnx

→

model.engine

→非空的

result.csv

和

results.json

；原生与TRT预测结果在容差范围内一致。

Phase 7 — Optimization & Tuning (conditional)

第7阶段 — 优化与调优（可选）

Enter only if Phase 6 passes but accuracy / latency / model size needs improvement. Ask the user for target metrics first. Diagnose (Step 26) across four categories — accuracy too low, TRT-vs-native gap, training too slow, inference too slow — then apply the relevant technique: hyperparameter tuning (27), INT8 quantization (28), channel pruning + retrain (29), knowledge distillation (30), or resolution tuning (31). Full diagnostics, config blocks, YAML overrides, and decision tree: phase-7-optimization.md.

仅当第6阶段通过，但精度/延迟/模型大小需要改进时进入此阶段。需先询问用户目标指标。从四个维度进行诊断（步骤26）——精度过低、TRT与原生结果差距大、训练过慢、推理过慢，然后应用相应的技术：超参数调优（27）、INT8量化（28）、通道剪枝+重训练（29）、知识蒸馏（30）或分辨率调优（31）。完整诊断方法、配置块、YAML覆盖规则和决策树请参阅：phase-7-optimization.md。

Argument

参数

$ARGUMENTS

If provided, interpret

$ARGUMENTS

as the HuggingFace model ID or URL to use as the starting point for Phase 1. If credentials or model short-name are not included, ask the user for them before proceeding.

$ARGUMENTS

若提供该参数，将

$ARGUMENTS

解释为HuggingFace模型ID或URL，作为第1阶段的起点。若未包含凭证或模型简称，需先询问用户获取相关信息再继续。