tao-train-foundation-stereo
Original:🇺🇸 English
Translated
Stereo depth estimation using FoundationStereo. Predicts disparity maps from stereo image pairs for 3D reconstruction. Use when training, evaluating, exporting, or running inference for a TAO FoundationStereo model. Trigger phrases include "train stereo depth", "FoundationStereo", "stereo disparity estimation", "3D reconstruction from stereo".
7installs
Added on
NPX Install
npx skill4agent add promptingcompany/nv-skills tao-train-foundation-stereoTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Depth Net Stereo
Stereo depth estimation using FoundationStereo architecture. Predicts disparity maps from stereo image pairs for 3D reconstruction.
Uses pretrained Depth Anything v2 and EdgeNeXt encoders. Set and .
model.stereo_backbone.depth_anything_v2_pretrained_pathmodel.stereo_backbone.edgenext_pretrained_pathThe mono and stereo skills both invoke the unified TAO CLI inside the container; the mono/stereo family is selected via (e.g., ).
depth_netmodel.model_typeFoundationStereoFor TAO Deploy TensorRT actions (, TensorRT , and TensorRT ), read first. The deploy spec template lives in this skill's .
gen_trt_engineevaluateinferencereferences/tao-deploy-foundation-stereo.mdreferences/spec_template_deploy.yamlTrain Action Policy
This model is AutoML-enabled at the model layer. Before handling any train-stage request, read and resolve the run override from either an explicit value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as for this run only; otherwise default to . When , , and both and are packaged, route the train action through by default with this model's . Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and . Use direct model training only when or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.
references/skill_info.yamlautoml_policyautoml_policy: offautoautoml_policy: autoautoml_enabled: trueschemas/train.schema.jsonreferences/spec_template_train.yamltao-skill-bank:tao-run-automlskill_dirautoml_policyautoml_policy: offNon-train actions such as , , , and deploy flows stay in this model skill. The per-run override does not change model metadata.
evaluateinferenceexportautoml_policyWorkflow
Prerequisites — data accessibility
Your dataset (left + right images + GT disparity) must be reachable from inside the container:
- SDK runner: place files at the S3 paths the runner resolves (the /
S3_TRAINplaceholders shown in Typical Spec Overrides). The runner handles S3 → container-path mounting transparently.S3_EVAL - Direct (e.g. local testing): mount the host dataset root read-only at the same in-container path:
docker run
docker run ... -v <host_data_root>:<host_data_root>:ro <container> ...The same accessibility requirement applies to the written by all actions.
<output_dir>Step 1 — Annotation file
Per-line annotation file referenced by :
data_sources[*].data_file| Columns | Format | Use |
|---|---|---|
| 2 | | Stereo inference (no GT) |
| 3 | | Stereo with GT |
| 4 | | Stereo with GT and occlusion mask |
If you already have one, point to it. Otherwise generate via :
depth_net convertdepth_net convert -e <convert_spec.yaml>convert_spec.yamlyaml
data_root: <directory whose immediate children are scene folders that contain your image+depth files; convert walks data_root recursively but expects per-scene subdirectories at one level below>
image_dir_pattern: [<substring matching left image paths>]
right_dir_pattern: [<substring matching right image paths>]
depth_dir_pattern: [<substring matching GT disparity paths>]
nocc_dir_pattern: [] # optional, occlusion mask paths
image_extension: '.png' # always include the leading dot
depth_extension: '.png' # form must match image_extension (the swap is a substring replace)
nocc_extension: ''
split_ratio: 0.0 # 0.0/1.0 = test-only; 0.8 = 80/20 train+valconvertdata_rootimage_dir_patternimage_dir_pattern[0]im0im1disp0GTStep 2 — Pair model_type
and dataset_name
based on your data
model_typedataset_namePrefer the dataset-specific class when your layout matches a supported one — it applies class-specific path conventions, evaluation crops, and (where applicable) occlusion-mask handling. Fall back to only for layouts that do not match any registered class.
GenericDataset| Data category | | |
|---|---|---|
| Middlebury data | | |
| KITTI data | | |
| ETH3D data | | |
| FSD synthetic data | | |
| IsaacReal synthetic data | | |
| Crestereo synthetic data | | |
| Other / non-canonical layout | | |
See Training Requirements → Formats for the full registered-class list. The same value applies across train and evaluate actions (all of which use 3-column or 4-column annotations with GT disparity). The deploy-side action follows the same rule — see . For inference with 2-column annotations (left + right, no GT), use regardless of data layout — the dataset-specific classes ( / / / / / ) require 3-column input and reject 2-column annotations at the dataloader level. For inference with 3-column annotations (left + right + GT), the dataset-specific class is fine.
dataset_nameevaluatereferences/tao-deploy-foundation-stereo.mddataset_name: GenericDatasetMiddleburyKittiEth3dFSDIsaacRealDatasetCrestereoStep 3 — Write spec yaml from Typical Spec Overrides
Copy the action block from (per-action , mandatory data sources). Replace:
references/foundation-stereo-spec-overrides.mdspec_overrides- from Step 2 (typically
model.model_type)FoundationStereo - from Step 2
dataset.<...>.data_sources[*].dataset_name - with the path from Step 1
dataset.<...>.data_sources[*].data_file - For deploy-side : enforce
evaluate(seedataset.test_dataset.batch_size: 1).references/tao-deploy-foundation-stereo.md
Shape consistency: the in should match / so the trained-model evaluator and the deploy-side TensorRT evaluator operate at the same shape — see .
crop_sizedataset.test_dataset.augmentation.crop_sizeexport.input_heightinput_widthreferences/foundation-stereo-troubleshooting.mdStep 4 — Run
docker run --gpus 'device=0' --shm-size 16G --ipc=host \
--user $(id -u):$(id -g) \
-v <data_root>:<data_root>:ro \
-v <output_dir>:<output_dir> \
<container> \
depth_net <action> -e <spec.yaml>Without the container writes outputs as , blocking host-side cleanup / retry.
--user $(id -u):$(id -g)nobody:nogroupStep 5 — Verify
- Container exit code 0
status.jsonblock populatedkpi- For : inspect per-step
traindirectly (the entrypoint reportstrain_losseven when loss is NaN)Execution status: PASS - For : rely on
evaluate/epe/bp1/bp2/bp3/d1(the evaluator also emitsrmse/abs_rel/sq_relwhich are non-meaningful for stereo — seermse_log)references/foundation-stereo-parameters.md - For : artifacts under
inferenceresults_dir
For TAO Deploy TensorRT actions (, TensorRT , and TensorRT ), read first. Deploy spec templates live in this skill's folder with the prefix.
gen_trt_engineevaluateinferencereferences/tao-deploy-foundation-stereo.mdreferences/spec_template_deploy_*.yamlTraining Requirements
- Valid values for stereo
dataset_name(case-insensitive):data_sources,FSD,IsaacRealDataset,Crestereo,Middlebury,Eth3d,KittiGenericDataset - Monitoring metric: val/loss
Per-Action Dataset Requirements
| Action | Spec Key | Source | Files | List? |
|---|---|---|---|---|
| evaluate | dataset.test_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
| inference | dataset.infer_dataset.data_sources | inference_dataset | data_file: annotations.txt + dataset_name | Yes |
| quantize | dataset.train_dataset.data_sources | train_datasets | data_file: annotations.txt + dataset_name | Yes |
| quantize | dataset.val_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
| quantize | dataset.quant_calibration_dataset.images_dir | train_datasets | images.tar.gz | No |
| train | dataset.train_dataset.data_sources | train_datasets | data_file: annotations.txt + dataset_name | Yes |
| train | dataset.val_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
Typical Spec Overrides
Data source overrides are mandatory for every action — the agent MUST construct data source paths from the Per-Action Dataset Requirements table above and include them in . Each entry is a dict with two mandatory fields: and .
spec_overridesdata_sourcesdata_filedataset_nameSee for the full per-action blocks (train, evaluate, export, gen_trt_engine, inference, quantize) with / placeholders.
references/foundation-stereo-spec-overrides.mdspec_overridesS3_TRAINS3_EVALEval Dataset
Optional. Val dataset configured via (each entry needs and ).
dataset.val_dataset.data_sourcesdata_filedataset_nameImportant Parameters
Key defaults: = (only selectable type); (top-level, not under ) schema default but FS small NGC ckpt requires , override explicitly; default 416; default 1e-4; fp32 (recommended) or fp16 (no bf16); default . The field name is , not .
model.model_typeFoundationStereomodel.encoderstereo_backbonevitlvitsmodel.max_disparitytrain.optim.lrtrain.precisionexport.batch_size-1workersworkersnum_workersSee for the full parameter glossary (all , , , fields with defaults and ranges) and the Evaluation Metrics reference (which / / / to trust and why / / are non-meaningful for stereo).
references/foundation-stereo-parameters.mdmodel.*dataset.*train.*export.*epebp*d1rmseabs_relsq_relrmse_logMulti-GPU / Multi-Node
Launch method: Lightning-managed (single process, Lightning spawns workers).
python| Spec Key | Description | Default |
|---|---|---|
| Number of GPUs | 1 |
| GPU device indices | [0] |
| Number of nodes | 1 |
| | |
Same DDP/FSDP behavior as depth-net-mono. Multi-node requires , , , env vars.
WORLD_SIZENODE_RANKMASTER_ADDRMASTER_PORTExport / TRT Defaults
TRT data types FP32 / FP16. Static-shape ONNX () and batch-only dynamic ONNX () both support ; height and width are always pinned to the trace shape (H/W-dynamic engines are not supported — build separate engines per (H, W)). For the NGC release (576×960), set , , .
export.batch_size: 1export.batch_size: -1fp16export.batch_size: 1export.opset_version: 17export.on_cpu: TrueSee for the full export / TRT defaults (the opset-vs- pairing rules, determinism notes, GPU-memory thresholds) and the Hardware requirements. See for the three supported deploy paths and the validation table.
references/foundation-stereo-export-trt-hardware.mdon_cpuon_cpureferences/tao-deploy-foundation-stereo.mdFull TAO Deploy reference: tao-deploy-foundation-stereo.
Error Patterns
Common issues: disparity overflow (reduce ); missing pretrained paths (set both and ); ( is top-level ); (each entry needs both and ); (entrypoint is , no suffix).
model.max_disparitymodel.stereo_backbone.depth_anything_v2_pretrained_pathmodel.stereo_backbone.edgenext_pretrained_pathKey 'encoder' not in 'StereoBackBone'encodermodel.encoderKey 'dataset_name' is not in structdata_sourcesdata_filedataset_namebash: exec: depth_net_stereo: not founddepth_netSee for the full error patterns plus the pyt-vs-deploy discussion (the pyt path runs at native image resolution and ignores , with the Middlebury resolution guidance) and the Shape consistency rule.
references/foundation-stereo-troubleshooting.mdcrop_sizeevaluatecrop_sizeSpec Param / Parent Model Inference
Model-specific inference mappings belong in MD, not in . Generated runners read these mappings and apply them with SDK helpers before (mirrors the old microservices flow). For / , pass the upstream train/export/AutoML child job id as ; the SDK lists the parent result folder, filters checkpoint artifacts, and returns the selected model file or folder. Do not add these mappings back to and do not patch generated runner scripts to guess checkpoint paths.
config.jsoncreate_job()infer_params.pyparent_modelparent_model_folderparent_job_idconfig.jsonSee for the full per-action inference-mapping table (train / evaluate / inference / export / gen_trt_engine / quantize, including the train pretrained-path link/destination and resume-checkpoint mappings).
references/foundation-stereo-spec-param-inference.md