Loading...
Loading...
Grounding DINO for open-set object detection. Combines DINO-style detection with a BERT text encoder for language-guided detection — detects objects described by text prompts without a fixed class vocabulary. Use when training, evaluating, exporting, quantizing, or running inference for a TAO Grounding DINO model. Trigger phrases include "train Grounding DINO", "open-vocabulary detection", "text-prompted detector", "language-guided object detection".
npx skill4agent add nvidia/skills tao-train-grounding-dinogen_trt_engineevaluateinferencereferences/tao-deploy-grounding-dino.mdreferences/spec_template_deploy_*.yamlschemas/<action>.schema.jsonschemas/manifest.jsonreferences/spec_template_<action>.yamldefaultreferences/skill_info.yamlautoml_enabledschemas/train.schema.jsonreferences/spec_template_train.yamlautoml_default_parametersautoml_disabled_parameters~/tao-corereferences/skill_info.yamlautoml_policyautoml_policy: offautoautoml_policy: autoautoml_enabled: trueschemas/train.schema.jsonreferences/spec_template_train.yamltao-skill-bank:tao-run-automlskill_dirautoml_policyautoml_policy: offevaluateinferenceexportautoml_policy| Action | Spec Key | Source | Files | List? |
|---|---|---|---|---|
| evaluate | dataset.test_data_sources | eval_dataset | image_dir: images.tar.gz, json_file: annotations.json | No |
| inference | dataset.infer_data_sources | inference_dataset | image_dir: images.tar.gz, classmap: label_map.txt | No |
| quantize | dataset.train_data_sources | train_datasets | image_dir: images.tar.gz, json_file: annotations_odvg.jsonl, label_map: annotations_odvg_labelmap.json | Yes |
| quantize | dataset.val_data_sources | eval_dataset | image_dir: images.tar.gz, json_file: annotations.json | No |
| quantize | dataset.quant_calibration_data_sources | train_datasets | image_dir: images.tar.gz, json_file: annotations_odvg.jsonl, label_map: annotations_odvg_labelmap.json | No |
| train | dataset.train_data_sources | train_datasets | image_dir: images.tar.gz, json_file: annotations_odvg.jsonl, label_map: annotations_odvg_labelmap.json | Yes |
| train | dataset.val_data_sources | eval_dataset | image_dir: images.tar.gz, json_file: annotations.json | No |
spec_overridesS3_TRAIN = "s3://bucket/data/train"
S3_EVAL = "s3://bucket/data/eval"{
"train.num_epochs": 10,
"train.checkpoint_interval": 10,
"train.validation_interval": 10,
"train.num_gpus": 1,
"dataset.train_data_sources": [{"image_dir": f"{S3_TRAIN}/images.tar.gz", "json_file": f"{S3_TRAIN}/annotations_odvg.jsonl", "label_map": f"{S3_TRAIN}/annotations_odvg_labelmap.json"}],
"dataset.val_data_sources": {"image_dir": f"{S3_EVAL}/images.tar.gz", "json_file": f"{S3_EVAL}/annotations.json"},
}{
"gen_trt_engine.tensorrt.data_type": "FP16",
}{
"dataset.infer_data_sources.captions": [
"person"
],
"dataset.infer_data_sources": {"image_dir": f"{S3_EVAL}/images.tar.gz", "classmap": f"{S3_EVAL}/label_map.txt"},
}{
"dataset.test_data_sources": {"image_dir": f"{S3_EVAL}/images.tar.gz", "json_file": f"{S3_EVAL}/annotations.json"},
}{
"dataset.train_data_sources": [{"image_dir": f"{S3_TRAIN}/images.tar.gz", "json_file": f"{S3_TRAIN}/annotations_odvg.jsonl", "label_map": f"{S3_TRAIN}/annotations_odvg_labelmap.json"}],
"dataset.val_data_sources": {"image_dir": f"{S3_EVAL}/images.tar.gz", "json_file": f"{S3_EVAL}/annotations.json"},
"dataset.quant_calibration_data_sources": {"image_dir": f"{S3_TRAIN}/images.tar.gz", "json_file": f"{S3_TRAIN}/annotations_odvg.jsonl", "label_map": f"{S3_TRAIN}/annotations_odvg_labelmap.json"},
}python| Spec Key | Description | Default |
|---|---|---|
| Number of GPUs | 1 |
| GPU device indices | [0] |
| Number of nodes | 1 |
| | |
WORLD_SIZENODE_RANKMASTER_ADDRMASTER_PORTconfig.jsoncreate_job()infer_params.pygrounding_dino.config.json| Action | Spec Field | Inference Function | Meaning |
|---|---|---|---|
| evaluate | | | encryption key |
| evaluate | | | model file inferred from the parent job results folder |
| evaluate | | | model file inferred from the parent job results folder |
| evaluate | | | current job results directory |
| export | | | encryption key |
| export | | | model file inferred from the parent job results folder |
| export | | | output ONNX path |
| export | | | current job results directory |
| gen_trt_engine | | | encryption key |
| gen_trt_engine | | | model file inferred from the parent job results folder |
| gen_trt_engine | | | output TensorRT engine path |
| gen_trt_engine | | | current job results directory |
| inference | | | encryption key |
| inference | | | model file inferred from the parent job results folder |
| inference | | | model file inferred from the parent job results folder |
| inference | | | current job results directory |
| quantize | | | encryption key |
| quantize | | | model file inferred from the parent job results folder |
| quantize | | | current job results directory |
| train | | | encryption key |
| train | | | PTM when no resume checkpoint exists |
| train | | | current job results directory |
| train | | | PTM when no resume checkpoint exists |
| train | | | model file inferred from the current job results folder |
parent_modelparent_model_folderparent_job_idconfig.json