tao-finetune-cosmos-reason

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Cosmos-RL

Supervised fine-tuning (SFT) of nvidia/Cosmos-Reason2-8B on video reasoning tasks. Pretrained weights are sourced from HuggingFace, not NGC. This is a gated model — requires

HF_TOKEN

Uses FSDP-based parallelism with

dp_shard_size

for GPU count and

dp_replicate_size

for node count (not the standard

num_gpus

num_nodes

针对视频推理任务对nvidia/Cosmos-Reason2-8B进行监督微调（SFT）。预训练权重来自HuggingFace，而非NGC。这是一个gated model——需要

HF_TOKEN

权限。

采用基于FSDP的并行机制，使用

dp_shard_size

对应GPU数量，

dp_replicate_size

对应节点数量（而非标准的

num_gpus

num_nodes

）。

When to Use

使用场景

Use this skill to train, evaluate, quantize, or run inference on Cosmos-Reason2-8B for video question-answering and video reasoning. The core workflow is: confirm

HF_TOKEN

gating, sample annotations for

video_fps

, load the spec template, apply the critical train overrides below, then launch through the platform skill (or AutoML when enabled).

当你需要针对视频问答和视频推理任务对Cosmos-Reason2-8B进行训练、评估、量化或推理时，可使用该技能。核心工作流为：确认

HF_TOKEN

权限、为

video_fps

采样标注、加载规格模板、应用以下关键训练覆盖配置，然后通过平台技能（或启用AutoML时通过AutoML）启动任务。

Dataclass Schemas

数据类模式

Generated TAO Core schemas are packaged in

schemas/<action>.schema.json

, with

schemas/manifest.json

listing available actions. Each generated schema also emits

references/spec_template_<action>.yaml

from the schema top-level

default

field. AutoML enablement is declared at the model layer in

references/skill_info.yaml

via

automl_enabled

. Runnable AutoML still requires

schemas/train.schema.json

and

references/spec_template_train.yaml

to exist and parse. Use the packaged train schema for

automl_default_parameters

automl_disabled_parameters

, defaults, min/max bounds, enums, option weights, math conditions, dependencies, and popular parameters. Do not expect

~/tao-core

at runtime; maintainers regenerate schemas/templates before packaging the skill bank.

生成的TAO Core模式打包在

schemas/<action>.schema.json

中，

schemas/manifest.json

列出了可用动作。每个生成的模式还会从模式顶层

default

字段生成

references/spec_template_<action>.yaml

。AutoML启用状态在

references/skill_info.yaml

的模型层通过

automl_enabled

声明。可运行的AutoML仍要求

schemas/train.schema.json

和

references/spec_template_train.yaml

存在且可解析。使用打包的训练模式配置

automl_default_parameters

、

automl_disabled_parameters

、默认值、最小/最大边界、枚举值、选项权重、数学条件、依赖关系及常用参数。运行时不要依赖

~/tao-core

；维护人员在打包技能库前会重新生成模式/模板。

Train Action Policy

训练动作策略

This model is AutoML-enabled at the model layer. Before handling any train-stage request, read

references/skill_info.yaml

and resolve the run override from either an explicit

automl_policy

value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as

automl_policy: off

for this run only; otherwise default to

auto

. When

automl_policy: auto

automl_enabled: true

, and both

schemas/train.schema.json

and

references/spec_template_train.yaml

are packaged, route the train action through

tao-skill-bank:tao-run-automl

by default with this model's

skill_dir

. Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and

automl_policy

. Use direct model training only when

automl_policy: off

or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.

Non-train actions such as

evaluate

inference

export

, and deploy flows stay in this model skill. The per-run

automl_policy

override does not change model metadata.

该模型在模型层已启用AutoML。处理任何训练阶段请求前，需读取

references/skill_info.yaml

，并通过显式

automl_policy

值或用户工作流请求确定运行覆盖配置。将“turn off AutoML”、“disable AutoML”、“no HPO”或“plain training”这类短语视为本次运行的

automl_policy: off

；否则默认设为

auto

。当

automl_policy: auto

、

automl_enabled: true

且

schemas/train.schema.json

和

references/spec_template_train.yaml

已打包时，默认将训练动作路由至

tao-skill-bank:tao-run-automl

，并传入该模型的

skill_dir

。保留数据集、规格、输出目录、GPU/平台设置、父检查点及

automl_policy

的工作流/应用覆盖配置。仅当

automl_policy: off

或打包的训练模式/模板缺失时，才使用直接模型训练；若模式缺失，需告知用户该模型已启用AutoML但无法运行，直至生成模式。

非训练动作（如

evaluate

、

inference

、

export

及部署流程）仍在该模型技能中处理。每次运行的

automl_policy

覆盖配置不会更改模型元数据。

Credentials

凭证

HF_TOKEN (required): HuggingFace access token. The user must accept the model agreement at https://huggingface.co/nvidia/Cosmos-Reason2-8B and provide a token with read access. Passed to the container as a
```
docker_env_var
```
.

HF_TOKEN（必填）：HuggingFace访问令牌。用户必须在https://huggingface.co/nvidia/Cosmos-Reason2-8B接受模型协议，并提供具有读取权限的令牌。作为
```
docker_env_var
```
传入容器。

Datasets

数据集

Dataset type is vlm in llava format; accepted intents are training, evaluation, and testing. Inputs may be dataset roots (root mode maps

<root>/annotations.json

plus

<root>

as the media path) or direct spec-key paths (when annotations and media live in different locations). Before launching train/AutoML/evaluate, sample the annotation JSON and require

video_fps

in each record — missing

video_fps

makes the Cosmos-RL SFT loader fail with

Error processing sample: 'video_fps'

after the job starts. Stop before runner generation if it is absent and ask the user to fix the annotation files; do not start AutoML to discover this inside torchrun.

See

references/datasets.md

for the full training requirements, the launch intake reminder (spec-key options, root-mode mapping, container-image confirmation, and the

check_tao_launch_preflight.py

invocation), the Per-Action Dataset Requirements table, the

data_sources

mapping with direct-override examples, and the eval-dataset / auto-split policy.

数据集类型为vlm，格式为llava；支持的用途包括训练、评估和测试。输入可以是数据集根目录（根目录模式映射

<root>/annotations.json

，并将

<root>

作为媒体路径）或直接规格键路径（当标注和媒体位于不同位置时）。启动训练/AutoML/评估前，需采样标注JSON并要求每条记录包含

video_fps

——若缺失

video_fps

，Cosmos-RL SFT加载器会在任务启动后抛出

Error processing sample: 'video_fps'

错误。若缺失该字段，需在生成运行器前停止任务，并要求用户修复标注文件；不要启动AutoML后在torchrun中才发现该问题。

完整训练要求、启动提醒（规格键选项、根目录模式映射、容器镜像确认及

check_tao_launch_preflight.py

调用）、各动作数据集要求表、含直接覆盖示例的

data_sources

映射，以及评估数据集/自动拆分策略，请参阅

references/datasets.md

。

Spec Construction

规格构建

cosmos-rl is

mode: config

. Always start from
references/spec_template_train.yaml
(or

spec_template_evaluate.yaml

for evaluate) — load it via

yaml.safe_load(...)

and apply user overrides on top. The spec the model consumes is nested dicts, not flat dotted keys; the dotted override notation denotes paths into the nested spec, so walk the path and assign at the leaf. Data source overrides are mandatory for every action and must be built from the Per-Action Dataset Requirements table in

references/datasets.md

See

references/spec-construction.md

for the load-template-then-override pattern and the full typical override blocks for train (including

policy.model_max_length=81920

dp_shard_size

dp_replicate_size

, and LoRA

lora_alpha

lora_dropout

), evaluate, quantize, and inference, plus the note that

custom.val_dataset

leaf keys are valid even when absent from the default spec object.

cosmos-rl采用

mode: config

模式。务必从
references/spec_template_train.yaml
（评估时从
spec_template_evaluate.yaml
）开始——通过

yaml.safe_load(...)

加载模板，然后应用用户覆盖配置。模型使用的规格为嵌套字典，而非扁平点分隔键；点分隔覆盖符号表示嵌套规格中的路径，需遍历路径并在叶子节点赋值。数据源覆盖配置对每个动作都是必填项，必须根据

references/datasets.md

中的各动作数据集要求表构建。

加载模板再覆盖的模式、训练（包括

policy.model_max_length=81920

、

dp_shard_size

dp_replicate_size

及LoRA的

lora_alpha

lora_dropout

）、评估、量化和推理的典型完整覆盖块，以及

custom.val_dataset

叶子键即使在默认规格对象中缺失也有效的说明，请参阅

references/spec-construction.md

。

Critical Overrides (Train)

关键训练覆盖配置

These are the keys whose template defaults are wrong or where omission flips the run into a different mode:

Parameter	Template Default	Required Value	Why
`policy.model_name_or_path`	`nvidia/Cosmos-Reason2-8B`	`hf_model://nvidia/Cosmos-Reason2-8B` (or local checkpoint)	The bare HF id makes cosmos-rl fetch from HF Hub at runtime; the `hf_model://` URI form pre-downloads the weights before the training command starts
`policy.model_max_length`	40960	Keep at 40960 or higher	Smaller than ~40k causes `vision_embeds` shape mismatch on video inputs
`train.train_batch_per_replica`	32	Any multiple of `train.train_policy.mini_batch`	Mismatch raises an immediate AssertionError
`train.train_policy.type`	`"sft"`	Keep as `"sft"` for SFT workflows	If dropped during agent regeneration, cosmos-rl flips to RL mode → rollout replica allocated → multi-node attempted → hostname errors when `num_nodes=1`

以下参数的模板默认值不正确，或遗漏会导致运行模式改变：

参数	模板默认值	必填值	原因
`policy.model_name_or_path`	`nvidia/Cosmos-Reason2-8B`	`hf_model://nvidia/Cosmos-Reason2-8B` （或本地检查点）	仅使用HF ID会导致cosmos-rl在运行时从HF Hub拉取权重； `hf_model://` URI格式会在训练命令启动前预下载权重
`policy.model_max_length`	40960	保持40960或更高	小于约40k会导致视频输入的 `vision_embeds` 形状不匹配
`train.train_batch_per_replica`	32	`train.train_policy.mini_batch` 的任意倍数	不匹配会立即触发AssertionError
`train.train_policy.type`	`"sft"`	SFT工作流中保持为 `"sft"`	若在代理再生时丢失该参数，cosmos-rl会切换至RL模式→分配rollout副本→尝试多节点→当 `num_nodes=1` 时出现主机名错误

Parameters

参数说明

train.train_batch_per_replica

must be divisible by

train.train_policy.mini_batch

;

policy.model_max_length

must be 40960 or higher for video SFT;

policy.parallelism.dp_shard_size

should equal GPUs per node and

dp_replicate_size

the node count;

custom.vision.fps

and

custom.vision.nframes

are mutually exclusive (set exactly one). Cosmos-RL models are 8B parameters and benefit from multi-GPU FSDP sharding — recommended: 8x A100 or H100 (80GB each).

See

references/parameters.md

for the complete parameter reference: training loop, model & policy, parallelism (including multi-node guidance and platform-skill pointers), optimization & data loading, vision encoders (fps vs nframes details and the decord/torchvision failure mode), checkpointing, validation, logging, and hardware.

train.train_batch_per_replica

必须能被

train.train_policy.mini_batch

整除；视频SFT任务中

policy.model_max_length

必须为40960或更高；

policy.parallelism.dp_shard_size

应等于每节点GPU数，

dp_replicate_size

等于节点数；

custom.vision.fps

和

custom.vision.nframes

互斥（需恰好设置其中一个）。Cosmos-RL模型为8B参数，受益于多GPU FSDP分片——推荐配置：8x A100或H100（每卡80GB）。

完整参数参考（训练循环、模型与策略、并行机制（包括多节点指南和平台技能指向）、优化与数据加载、视觉编码器（fps与nframes细节及decord/torchvision故障模式）、检查点、验证、日志记录及硬件）请参阅

references/parameters.md

。

Evaluate

评估

The evaluator reads a flat TOML config with top-level keys

dataset

model

task

evaluation

vision

generation

metrics

results

num_gpus

results_dir

. Task type is

""

(General Evaluator, auto-detects binary yes/no classification and computes TP/FP/TN/FN/accuracy/precision/recall/F1) or

"its_directionality"

(left/right/straight; do NOT use for collision detection). The

actions.evaluate

block in

references/skill_info.yaml

declares inputs and outputs; for SDK invocation see

skills/platform/tao-run-platform/SKILL.md

See

references/evaluate.md

for the config-format detail, task-type notes, LoRA evaluation (checkpoint path via

spec_overrides

with

model.enable_lora

model.base_model_path

and adapter merge behavior), selective download (

{annotation, format, keys}

partial media pull), and the results format and metrics.

评估器读取扁平TOML配置，顶级键包括

dataset

、

model

、

task

、

evaluation

、

vision

、

generation

、

metrics

、

results

、

num_gpus

、

results_dir

。任务类型为

""

（通用评估器，自动检测二元是/否分类并计算TP/FP/TN/FN/准确率/精确率/召回率/F1）或

"its_directionality"

（左/右/直行；请勿用于碰撞检测）。

references/skill_info.yaml

中的

actions.evaluate

块声明了输入和输出；SDK调用说明请参阅

skills/platform/tao-run-platform/SKILL.md

。

配置格式细节、任务类型说明、LoRA评估（通过

spec_overrides

设置检查点路径，含

model.enable_lora

model.base_model_path

及适配器合并行为）、选择性下载（

{annotation, format, keys}

部分媒体拉取），以及结果格式和指标，请参阅

references/evaluate.md

。

Error Patterns

错误模式

Common failures include CUDA OOM in train (reduce

mini_batch

or raise

dp_shard_size

), OOM during LoRA evaluation, NaN loss, the

vision_embeds

shape mismatch (raise

model_max_length

to 40960),

train_batch_per_replica

not divisible by

mini_batch

train_batch_per_replica

larger than samples per rank (the

'NoneType' object has no attribute 'state_dict'

0-step crash), stale dataset cache after changing fps/total_pixels, and the gated-repo authentication loop.

See

references/troubleshooting.md

for the full diagnosis and fix for each error pattern.

常见故障包括训练时CUDA OOM（减少

mini_batch

或增大

dp_shard_size

）、LoRA评估时OOM、NaN损失、

vision_embeds

形状不匹配（将

model_max_length

增大至40960）、

train_batch_per_replica

不能被

mini_batch

整除、

train_batch_per_replica

大于每rank样本数（触发

'NoneType' object has no attribute 'state_dict'

的0步崩溃）、修改fps/total_pixels后数据集缓存过期，以及 gated repo认证循环。

各错误模式的完整诊断和修复方法，请参阅

references/troubleshooting.md

。

DEFT Support and Parent-Model Inference

DEFT支持与父模型推理

Cosmos-RL implements the DEFT workflow contract for video QA tasks (see

config.json

and

workflow/deft/deft.md

). Gap analysis via

scripts/analyze_gaps.py

reads cosmos-rl

results.json

, compares predictions by exact string match after

.lower().strip()

, and emits a parquet of failure cases — so eval prompts must force short constrained answers. Model-specific parent-model inference mappings (evaluate/inference/quantize/train spec fields → inference functions, checkpoint metadata, and

parent_job_id

handling) live in the reference, not in

config.json

See

references/deft-and-inference-mappings.md

for the gap-analysis detail and limitation, and the full parent-model inference mapping table.

Cosmos-RL针对视频QA任务实现了DEFT工作流协议（请参阅

config.json

和

workflow/deft/deft.md

）。通过

scripts/analyze_gaps.py

进行差距分析，读取cosmos-rl的

results.json

，对预测结果进行

.lower().strip()

后精确字符串匹配对比，并输出失败案例的parquet文件——因此评估提示必须强制生成简短的受限答案。模型特定的父模型推理映射（评估/推理/量化/训练规格字段→推理函数、检查点元数据及

parent_job_id

处理）存放在参考文档中，而非

config.json

。

差距分析的细节和限制，以及完整的父模型推理映射表，请参阅

references/deft-and-inference-mappings.md

。