tao-train-fast-foundation-stereo
Original:🇺🇸 English
Translated
Real-time stereo depth estimation using FastFoundationStereo (FFS), the distilled bp2 commercial variant of FoundationStereo. Predicts disparity maps from stereo image pairs with ~10× lower latency than full FoundationStereo. Use when training, evaluating, exporting, or running inference for a TAO FastFoundationStereo (FFS) model. Trigger phrases include "train fast stereo", "real-time stereo disparity", "FastFoundationStereo", "distilled stereo depth".
7installs
Added on
NPX Install
npx skill4agent add promptingcompany/nv-skills tao-train-fast-foundation-stereoTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Depth Net Fast Stereo
Real-time stereo depth estimation using FastFoundationStereo (FFS) — the bp2 commercial distilled variant of FoundationStereo. Predicts disparity maps from rectified stereo image pairs with per-layer pruned widths for real-time inference.
The mono / stereo / fast-stereo skills share the unified TAO CLI; FFS is selected via . FFS differs from only in pruned per-layer widths and a serialized forward path; everything else (entrypoint, action verbs, dataset classes, deploy chain) is identical to .
depth_netmodel.model_type: FastFoundationStereoFoundationStereodepth-net-stereoFor TAO Deploy TensorRT actions (, TensorRT , TensorRT ), read first. The deploy spec template lives at .
gen_trt_engineevaluateinferencereferences/tao-deploy-fast-foundation-stereo.mdreferences/spec_template_deploy.yamlTrain Action Policy
This model is AutoML-enabled at the model layer. Before handling any train-stage request, read and resolve the run override from either an explicit value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as for this run only; otherwise default to . When , , and both and are packaged, route the train action through by default with this model's . Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and . Use direct model training only when or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.
references/skill_info.yamlautoml_policyautoml_policy: offautoautoml_policy: autoautoml_enabled: trueschemas/train.schema.jsonreferences/spec_template_train.yamltao-skill-bank:tao-run-automlskill_dirautoml_policyautoml_policy: offNon-train actions such as , , , and deploy flows stay in this model skill. The per-run override does not change model metadata.
evaluateinferenceexportautoml_policyTwo Use Cases
FFS ships with a pre-trained bp2 commercial checkpoint ().
model_best_bp2_serialize.pth- Raw deploy — use the bp2 ckpt as-is. Skip ; run
train/inference/evaluate/exportdirectly with the bp2 file as the action's checkpoint.gen_trt_engine - Finetune on user data — set to the bp2 file, train on user data, then verify + deploy on the resulting ckpt. The full 7-action sequence (train → evaluate pyt → inference pyt → export → gen_trt_engine → inference deploy → evaluate deploy) is supported.
train.pretrained_model_path
Workflow
Prerequisites — data accessibility
Your dataset (left + right images + GT disparity for train / evaluate, left + right only for inference) must be reachable from inside the container:
- SDK runner: place files at the S3 paths the runner resolves (/
S3_TRAINplaceholders shown in the spec overrides).S3_EVAL - Direct (e.g. local testing): mount the host dataset root read-only at the same in-container path:
docker run
docker run ... -v <host_data_root>:<host_data_root>:ro <container> ...The same accessibility requirement applies to the written by all actions, and to the bp2 checkpoint path.
<output_dir>Step 1 — Annotation file
Per-line annotation file referenced by . Schema is identical to :
data_sources[*].data_filedepth-net-stereo| Columns | Format | Use |
|---|---|---|
| 2 | | Stereo inference (no GT) |
| 3 | | Stereo with GT |
| 4 | | Stereo with GT and occlusion mask |
Generate via if needed; see the skill for template.
depth_net convertdepth-net-stereoconvert_spec.yamlStep 2 — Pair model_type
and dataset_name
based on your data
model_typedataset_nameUse for FFS. The choice mirrors the stereo skill — pick the dataset-specific class when your layout matches a registered one, otherwise .
model_type: FastFoundationStereodataset_nameGenericDataset| Data category | | |
|---|---|---|
| Middlebury | | |
| KITTI | | |
| ETH3D | | |
| FSD synthetic | | |
| IsaacReal synthetic | | |
| Crestereo synthetic | | |
| Other / non-canonical | | |
For inference with 2-column annotations (left + right, no GT), use regardless of layout.
dataset_name: GenericDatasetStep 3 — Set the bp2 distilled width overrides
FFS requires 15 model-section width override fields whose values match the bp2 commercial checkpoint exactly. Omitting any field falls back to TAO defaults that do not match the bp2 ckpt and produce shape-mismatch errors at forward time.
yaml
model:
model_type: FastFoundationStereo
encoder: vitl
hidden_dims: [128] # 1-layer GRU; NOT [128,128,128]
n_gru_layers: 1 # bp2 single-GRU
corr_radius: 4
corr_levels: 2
n_downsample: 2
valid_iters: 8
max_disparity: 192 # bp2 commercial; NOT 416 (full FS default)
volume_dim: 28 # bp2 ckpt invariant; NOT 32 (full FS default)
mixed_precision: false # see references/parameters.md
gwc_feature_normalize: true # see references/parameters.md
# 15 bp2 distilled width overrides — copy as-is
motion_encoder_widths: [56, 96, 16, 12]
motion_encoder_final: 48
gru_hidden: 60
gru_gating_conv_widths: [100, 168]
disp_head_input_dim: 60
disp_head_intermediate: 36
disp_head_pwconv1_widths: [212, 244]
mask_widths: [32, 16]
stem_2_widths: [12, 16]
spx_2_gru_widths: [16, 12, 16, 24]
spx_gru_out: 9
classifier_mid: 14
cnet_conv04_widths: [60, 48]
cam_mid_channels: 8
cost_agg_conv_patch_padding: [0, 0, 0]The spec templates at carry this block as the canonical source.
references/spec_template_*.yamlStep 4 — Write spec yaml from the spec overrides
Copy the action block from (per-action Python override dicts plus the shared ). Replace:
references/spec-overrides.mdFFS_MODEL_BLOCK- (already set)
model.model_type: FastFoundationStereo - from Step 2
dataset.<...>.data_sources[*].dataset_name - with the path from Step 1
dataset.<...>.data_sources[*].data_file - For raw deploy use cases (no train): set to the bp2 file path
<action>.checkpoint - For finetune use cases: set to the bp2 file path
train.pretrained_model_path
Chained train → next action checkpoint path: For local Docker chaining (no SDK runner), the trained checkpoint lives at — Lightning nests under the task name. Example: produces . Use that nested path for the next action's . SDK-runner deploys resolve this automatically via — see .
<train.results_dir>/<task>/dn_model_latest.pthModelCheckpointtrain.results_dir: /workspace/results/finetune/train/workspace/results/finetune/train/train/dn_model_latest.pth<action>.checkpointparent_job_idreferences/parent-model-inference.mdShape consistency: in should match / for end-to-end pyt-vs-deploy comparability — see 's shape table.
crop_sizedataset.test_dataset.augmentation.crop_sizeexport.input_heightinput_widthreferences/tao-deploy-fast-foundation-stereo.mdStep 5 — Run
docker run --gpus 'device=0' --shm-size 16G --ipc=host \
--user $(id -u):$(id -g) \
-v <data_root>:<data_root>:ro \
-v <output_dir>:<output_dir> \
-v <bp2_ckpt_dir>:<bp2_ckpt_dir>:ro \
<container> \
depth_net <action> -e <spec.yaml>Without the container writes outputs as , blocking host-side cleanup / retry.
--user $(id -u):$(id -g)nobody:nogroupFor the local bind-mount caveat (QA / development only — clearing stale files that shadow patched source), see → "Local bind-mount tip".
__pycache__.pycreferences/troubleshooting.mdStep 6 — Verify
- Container exit code 0
status.jsonblock populatedkpi- For : inspect per-step
traindirectly (the entrypoint reportstrain_losseven when loss is NaN)Execution status: PASS - For : rely on
evaluate/epe/bp1/bp2/bp3/d1(the evaluator also emitsrmse/abs_rel/sq_relwhich are non-meaningful for stereo)rmse_log - For : artifacts under
inferenceresults_dir - KPI namespace difference between pyt and deploy: pyt writes the metric set under
evaluate,kpi.val/epe, etc. (namespaced by Lightning'skpi.val/bp1prefix). Deployval/(TRT engine path) writes the same metric set underevaluate,kpi.epe, etc. (nokpi.bp1prefix). Downstream verification scripts that readval/need to handle both shapes.status.json - Validate drift on your own dataset: if you compare TAO FFS deploy (+ TRT
gen_trt_engine) against the upstream FFS deploy path on the same input, expect a small residual mean_abs disparity drift (TAO export graph + TRT 10.13 interaction; not improvable at the source-code level). The exact magnitude is dataset and hardware dependent — measure on your own data and decide whether the drift is acceptable for your downstream task.evaluate
7-action deploy flow
train (optional) → finetuned ckpt
evaluate (pyt) → PyT eager EPE / bp on val GT
inference (pyt) → PyT eager disparity samples (visual sanity)
export → static fp32 ONNX (recommended at 480×736 or 320×736)
gen_trt_engine → fp16 TRT engine on static ONNX path
inference (deploy) → TRT disparity samples
evaluate (deploy) → TRT EPE / bp drift vs PyT eager fp32Skip for raw-bp2 deploy. The remaining 6 actions (or the 4 deploy-only verbs starting from ) cover both use cases.
trainexportFull TAO Deploy reference: tao-deploy-fast-foundation-stereo.
Training Requirements
- Valid values for stereo
dataset_name(case-insensitive):data_sources,FSD,IsaacRealDataset,Crestereo,Middlebury,Eth3d,KittiGenericDataset - Monitoring metric: val/loss
Per-Action Dataset Requirements
| Action | Spec Key | Source | Files | List? |
|---|---|---|---|---|
| evaluate | dataset.test_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
| inference | dataset.infer_dataset.data_sources | inference_dataset | data_file: annotations.txt + dataset_name | Yes |
| train | dataset.train_dataset.data_sources | train_datasets | data_file: annotations.txt + dataset_name | Yes |
| train | dataset.val_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
Data source overrides are mandatory for every action. Each entry needs both and . The width fields from Step 3 are also mandatory. See for the complete per-action override dicts (train finetune, raw-bp2 evaluate / inference / export) and the shared .
data_sourcesdata_filedataset_namemodel.*references/spec-overrides.mdFFS_MODEL_BLOCKEval Dataset
Optional. Val dataset configured via (each entry needs and ).
dataset.val_dataset.data_sourcesdata_filedataset_nameParameters, Metrics, Hardware
See for the full parameter glossary ( / / knobs including , , , , , ), the evaluation-metric table ( / / / / / are meaningful; / / are not), multi-GPU / multi-node spec keys, and hardware requirements.
references/parameters.mdmodel.*dataset.*train.*max_disparity: 192gwc_feature_normalize: truemixed_precision: falsevolume_dim: 28valid_iterssave_raw_pfmepebp1bp2bp3d1rmseabs_relsq_relrmse_logExport / TRT Defaults
exportmodel.mixed_precisiongen_trt_enginegen_trt_engine.tensorrt.data_typefp16fp32fp16See for the full TRT/ONNX defaults and the four-way export use-case matrix ( × ; dynamic H/W is FFS-only). See for the deployment matrix and static-vs-dynamic shape guidance.
references/export-trt-defaults.mdexport.batch_sizeexport.dynamic_hwreferences/tao-deploy-fast-foundation-stereo.mdTroubleshooting
See for error patterns and fixes, including at forward (missing width override), missing (TAO Core too old), warning on FS / mono export, , missing in , negative disparity, larger-than-expected disparity drift (missing ), , decorative pyt-eval , the cosmetic warning, and silent dynamic-deploy stride-incompatibility.
references/troubleshooting.mdshape mismatchgwc_feature_normalizedynamic_hw: trueKey 'encoder' not in 'StereoBackBone'dataset_namedata_sourcesmax_disparity: 192depth_net_stereo: not foundcrop_sizeFailed to import SAM3Spec Param / Parent Model Inference
Model-specific inference mappings belong in this skill, not in . Generated runners should apply the mappings with SDK helpers before . See for the full per-action spec-field → inference-function mapping table.
config.jsoncreate_job()references/parent-model-inference.mdFor or , pass the upstream train / export / AutoML child job id as . The SDK lists the parent result folder, filters checkpoint artifacts, and returns the selected model file or folder. For raw-bp2 use cases without a parent train job, set the field explicitly to the bp2 file path. Do not patch generated runner scripts to guess checkpoint paths.
parent_modelparent_model_folderparent_job_id<action>.checkpoint