tao-finetune-cosmos-reason

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cosmos-RL

Cosmos-RL

Supervised fine-tuning (SFT) of nvidia/Cosmos-Reason2-8B on video reasoning tasks. Pretrained weights are sourced from HuggingFace, not NGC. This is a gated model — requires
HF_TOKEN
.
Uses FSDP-based parallelism with
dp_shard_size
for GPU count and
dp_replicate_size
for node count (not the standard
num_gpus
/
num_nodes
).
针对视频推理任务对nvidia/Cosmos-Reason2-8B进行监督微调(SFT)。预训练权重来自HuggingFace,而非NGC。这是一个gated model——需要
HF_TOKEN
权限。
采用基于FSDP的并行机制,使用
dp_shard_size
对应GPU数量,
dp_replicate_size
对应节点数量(而非标准的
num_gpus
/
num_nodes
)。

When to Use

使用场景

Use this skill to train, evaluate, quantize, or run inference on Cosmos-Reason2-8B for video question-answering and video reasoning. The core workflow is: confirm
HF_TOKEN
gating, sample annotations for
video_fps
, load the spec template, apply the critical train overrides below, then launch through the platform skill (or AutoML when enabled).
当你需要针对视频问答和视频推理任务对Cosmos-Reason2-8B进行训练、评估、量化或推理时,可使用该技能。核心工作流为:确认
HF_TOKEN
权限、为
video_fps
采样标注、加载规格模板、应用以下关键训练覆盖配置,然后通过平台技能(或启用AutoML时通过AutoML)启动任务。

Dataclass Schemas

数据类模式

Generated TAO Core schemas are packaged in
schemas/<action>.schema.json
, with
schemas/manifest.json
listing available actions. Each generated schema also emits
references/spec_template_<action>.yaml
from the schema top-level
default
field. AutoML enablement is declared at the model layer in
references/skill_info.yaml
via
automl_enabled
. Runnable AutoML still requires
schemas/train.schema.json
and
references/spec_template_train.yaml
to exist and parse. Use the packaged train schema for
automl_default_parameters
,
automl_disabled_parameters
, defaults, min/max bounds, enums, option weights, math conditions, dependencies, and popular parameters. Do not expect
~/tao-core
at runtime; maintainers regenerate schemas/templates before packaging the skill bank.
生成的TAO Core模式打包在
schemas/<action>.schema.json
中,
schemas/manifest.json
列出了可用动作。每个生成的模式还会从模式顶层
default
字段生成
references/spec_template_<action>.yaml
。AutoML启用状态在
references/skill_info.yaml
的模型层通过
automl_enabled
声明。可运行的AutoML仍要求
schemas/train.schema.json
references/spec_template_train.yaml
存在且可解析。使用打包的训练模式配置
automl_default_parameters
automl_disabled_parameters
、默认值、最小/最大边界、枚举值、选项权重、数学条件、依赖关系及常用参数。运行时不要依赖
~/tao-core
;维护人员在打包技能库前会重新生成模式/模板。

Train Action Policy

训练动作策略

This model is AutoML-enabled at the model layer. Before handling any train-stage request, read
references/skill_info.yaml
and resolve the run override from either an explicit
automl_policy
value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as
automl_policy: off
for this run only; otherwise default to
auto
. When
automl_policy: auto
,
automl_enabled: true
, and both
schemas/train.schema.json
and
references/spec_template_train.yaml
are packaged, route the train action through
tao-skill-bank:tao-run-automl
by default with this model's
skill_dir
. Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and
automl_policy
. Use direct model training only when
automl_policy: off
or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.
Non-train actions such as
evaluate
,
inference
,
export
, and deploy flows stay in this model skill. The per-run
automl_policy
override does not change model metadata.
该模型在模型层已启用AutoML。处理任何训练阶段请求前,需读取
references/skill_info.yaml
,并通过显式
automl_policy
值或用户工作流请求确定运行覆盖配置。将“turn off AutoML”、“disable AutoML”、“no HPO”或“plain training”这类短语视为本次运行的
automl_policy: off
;否则默认设为
auto
。当
automl_policy: auto
automl_enabled: true
schemas/train.schema.json
references/spec_template_train.yaml
已打包时,默认将训练动作路由至
tao-skill-bank:tao-run-automl
,并传入该模型的
skill_dir
。保留数据集、规格、输出目录、GPU/平台设置、父检查点及
automl_policy
的工作流/应用覆盖配置。仅当
automl_policy: off
或打包的训练模式/模板缺失时,才使用直接模型训练;若模式缺失,需告知用户该模型已启用AutoML但无法运行,直至生成模式。
非训练动作(如
evaluate
inference
export
及部署流程)仍在该模型技能中处理。每次运行的
automl_policy
覆盖配置不会更改模型元数据。

Credentials

凭证

Datasets

数据集

Dataset type is vlm in llava format; accepted intents are training, evaluation, and testing. Inputs may be dataset roots (root mode maps
<root>/annotations.json
plus
<root>
as the media path) or direct spec-key paths (when annotations and media live in different locations). Before launching train/AutoML/evaluate, sample the annotation JSON and require
video_fps
in each record — missing
video_fps
makes the Cosmos-RL SFT loader fail with
Error processing sample: 'video_fps'
after the job starts. Stop before runner generation if it is absent and ask the user to fix the annotation files; do not start AutoML to discover this inside torchrun.
See
references/datasets.md
for the full training requirements, the launch intake reminder (spec-key options, root-mode mapping, container-image confirmation, and the
check_tao_launch_preflight.py
invocation), the Per-Action Dataset Requirements table, the
data_sources
mapping with direct-override examples, and the eval-dataset / auto-split policy.
数据集类型为vlm,格式为llava;支持的用途包括训练、评估和测试。输入可以是数据集根目录(根目录模式映射
<root>/annotations.json
,并将
<root>
作为媒体路径)或直接规格键路径(当标注和媒体位于不同位置时)。启动训练/AutoML/评估前,需采样标注JSON并要求每条记录包含
video_fps
——若缺失
video_fps
,Cosmos-RL SFT加载器会在任务启动后抛出
Error processing sample: 'video_fps'
错误。若缺失该字段,需在生成运行器前停止任务,并要求用户修复标注文件;不要启动AutoML后在torchrun中才发现该问题。
完整训练要求、启动提醒(规格键选项、根目录模式映射、容器镜像确认及
check_tao_launch_preflight.py
调用)、各动作数据集要求表、含直接覆盖示例的
data_sources
映射,以及评估数据集/自动拆分策略,请参阅
references/datasets.md

Spec Construction

规格构建

cosmos-rl is
mode: config
. Always start from
references/spec_template_train.yaml
(or
spec_template_evaluate.yaml
for evaluate) — load it via
yaml.safe_load(...)
and apply user overrides on top. The spec the model consumes is nested dicts, not flat dotted keys; the dotted override notation denotes paths into the nested spec, so walk the path and assign at the leaf. Data source overrides are mandatory for every action and must be built from the Per-Action Dataset Requirements table in
references/datasets.md
.
See
references/spec-construction.md
for the load-template-then-override pattern and the full typical override blocks for train (including
policy.model_max_length=81920
,
dp_shard_size
/
dp_replicate_size
, and LoRA
lora_alpha
/
r
/
lora_dropout
), evaluate, quantize, and inference, plus the note that
custom.val_dataset
leaf keys are valid even when absent from the default spec object.
cosmos-rl采用
mode: config
模式。务必从
references/spec_template_train.yaml
(评估时从
spec_template_evaluate.yaml
)开始
——通过
yaml.safe_load(...)
加载模板,然后应用用户覆盖配置。模型使用的规格为嵌套字典,而非扁平点分隔键;点分隔覆盖符号表示嵌套规格中的路径,需遍历路径并在叶子节点赋值。数据源覆盖配置对每个动作都是必填项,必须根据
references/datasets.md
中的各动作数据集要求表构建。
加载模板再覆盖的模式、训练(包括
policy.model_max_length=81920
dp_shard_size
/
dp_replicate_size
及LoRA的
lora_alpha
/
r
/
lora_dropout
)、评估、量化和推理的典型完整覆盖块,以及
custom.val_dataset
叶子键即使在默认规格对象中缺失也有效的说明,请参阅
references/spec-construction.md

Critical Overrides (Train)

关键训练覆盖配置

These are the keys whose template defaults are wrong or where omission flips the run into a different mode:
ParameterTemplate DefaultRequired ValueWhy
policy.model_name_or_path
nvidia/Cosmos-Reason2-8B
hf_model://nvidia/Cosmos-Reason2-8B
(or local checkpoint)
The bare HF id makes cosmos-rl fetch from HF Hub at runtime; the
hf_model://
URI form pre-downloads the weights before the training command starts
policy.model_max_length
40960Keep at 40960 or higherSmaller than ~40k causes
vision_embeds
shape mismatch on video inputs
train.train_batch_per_replica
32Any multiple of
train.train_policy.mini_batch
Mismatch raises an immediate AssertionError
train.train_policy.type
"sft"
Keep as
"sft"
for SFT workflows
If dropped during agent regeneration, cosmos-rl flips to RL mode → rollout replica allocated → multi-node attempted → hostname errors when
num_nodes=1
以下参数的模板默认值不正确,或遗漏会导致运行模式改变:
参数模板默认值必填值原因
policy.model_name_or_path
nvidia/Cosmos-Reason2-8B
hf_model://nvidia/Cosmos-Reason2-8B
(或本地检查点)
仅使用HF ID会导致cosmos-rl在运行时从HF Hub拉取权重;
hf_model://
URI格式会在训练命令启动前预下载权重
policy.model_max_length
40960保持40960或更高小于约40k会导致视频输入的
vision_embeds
形状不匹配
train.train_batch_per_replica
32
train.train_policy.mini_batch
的任意倍数
不匹配会立即触发AssertionError
train.train_policy.type
"sft"
SFT工作流中保持为
"sft"
若在代理再生时丢失该参数,cosmos-rl会切换至RL模式→分配rollout副本→尝试多节点→当
num_nodes=1
时出现主机名错误

Parameters

参数说明

train.train_batch_per_replica
must be divisible by
train.train_policy.mini_batch
;
policy.model_max_length
must be 40960 or higher for video SFT;
policy.parallelism.dp_shard_size
should equal GPUs per node and
dp_replicate_size
the node count;
custom.vision.fps
and
custom.vision.nframes
are mutually exclusive (set exactly one). Cosmos-RL models are 8B parameters and benefit from multi-GPU FSDP sharding — recommended: 8x A100 or H100 (80GB each).
See
references/parameters.md
for the complete parameter reference: training loop, model & policy, parallelism (including multi-node guidance and platform-skill pointers), optimization & data loading, vision encoders (fps vs nframes details and the decord/torchvision failure mode), checkpointing, validation, logging, and hardware.
train.train_batch_per_replica
必须能被
train.train_policy.mini_batch
整除;视频SFT任务中
policy.model_max_length
必须为40960或更高;
policy.parallelism.dp_shard_size
应等于每节点GPU数,
dp_replicate_size
等于节点数;
custom.vision.fps
custom.vision.nframes
互斥(需恰好设置其中一个)。Cosmos-RL模型为8B参数,受益于多GPU FSDP分片——推荐配置:8x A100或H100(每卡80GB)。
完整参数参考(训练循环、模型与策略、并行机制(包括多节点指南和平台技能指向)、优化与数据加载、视觉编码器(fps与nframes细节及decord/torchvision故障模式)、检查点、验证、日志记录及硬件)请参阅
references/parameters.md

Evaluate

评估

The evaluator reads a flat TOML config with top-level keys
dataset
,
model
,
task
,
evaluation
,
vision
,
generation
,
metrics
,
results
,
num_gpus
,
results_dir
. Task type is
""
(General Evaluator, auto-detects binary yes/no classification and computes TP/FP/TN/FN/accuracy/precision/recall/F1) or
"its_directionality"
(left/right/straight; do NOT use for collision detection). The
actions.evaluate
block in
references/skill_info.yaml
declares inputs and outputs; for SDK invocation see
skills/platform/tao-run-platform/SKILL.md
.
See
references/evaluate.md
for the config-format detail, task-type notes, LoRA evaluation (checkpoint path via
spec_overrides
with
model.enable_lora
/
model.base_model_path
and adapter merge behavior), selective download (
{annotation, format, keys}
partial media pull), and the results format and metrics.
评估器读取扁平TOML配置,顶级键包括
dataset
model
task
evaluation
vision
generation
metrics
results
num_gpus
results_dir
。任务类型为
""
(通用评估器,自动检测二元是/否分类并计算TP/FP/TN/FN/准确率/精确率/召回率/F1)或
"its_directionality"
(左/右/直行;请勿用于碰撞检测)。
references/skill_info.yaml
中的
actions.evaluate
块声明了输入和输出;SDK调用说明请参阅
skills/platform/tao-run-platform/SKILL.md
配置格式细节、任务类型说明、LoRA评估(通过
spec_overrides
设置检查点路径,含
model.enable_lora
/
model.base_model_path
及适配器合并行为)、选择性下载(
{annotation, format, keys}
部分媒体拉取),以及结果格式和指标,请参阅
references/evaluate.md

Error Patterns

错误模式

Common failures include CUDA OOM in train (reduce
mini_batch
or raise
dp_shard_size
), OOM during LoRA evaluation, NaN loss, the
vision_embeds
shape mismatch (raise
model_max_length
to 40960),
train_batch_per_replica
not divisible by
mini_batch
,
train_batch_per_replica
larger than samples per rank (the
'NoneType' object has no attribute 'state_dict'
0-step crash), stale dataset cache after changing fps/total_pixels, and the gated-repo authentication loop.
See
references/troubleshooting.md
for the full diagnosis and fix for each error pattern.
常见故障包括训练时CUDA OOM(减少
mini_batch
或增大
dp_shard_size
)、LoRA评估时OOM、NaN损失、
vision_embeds
形状不匹配(将
model_max_length
增大至40960)、
train_batch_per_replica
不能被
mini_batch
整除、
train_batch_per_replica
大于每rank样本数(触发
'NoneType' object has no attribute 'state_dict'
的0步崩溃)、修改fps/total_pixels后数据集缓存过期,以及 gated repo认证循环。
各错误模式的完整诊断和修复方法,请参阅
references/troubleshooting.md

DEFT Support and Parent-Model Inference

DEFT支持与父模型推理

Cosmos-RL implements the DEFT workflow contract for video QA tasks (see
config.json
and
workflow/deft/deft.md
). Gap analysis via
scripts/analyze_gaps.py
reads cosmos-rl
results.json
, compares predictions by exact string match after
.lower().strip()
, and emits a parquet of failure cases — so eval prompts must force short constrained answers. Model-specific parent-model inference mappings (evaluate/inference/quantize/train spec fields → inference functions, checkpoint metadata, and
parent_job_id
handling) live in the reference, not in
config.json
.
See
references/deft-and-inference-mappings.md
for the gap-analysis detail and limitation, and the full parent-model inference mapping table.
Cosmos-RL针对视频QA任务实现了DEFT工作流协议(请参阅
config.json
workflow/deft/deft.md
)。通过
scripts/analyze_gaps.py
进行差距分析,读取cosmos-rl的
results.json
,对预测结果进行
.lower().strip()
后精确字符串匹配对比,并输出失败案例的parquet文件——因此评估提示必须强制生成简短的受限答案。模型特定的父模型推理映射(评估/推理/量化/训练规格字段→推理函数、检查点元数据及
parent_job_id
处理)存放在参考文档中,而非
config.json
差距分析的细节和限制,以及完整的父模型推理映射表,请参阅
references/deft-and-inference-mappings.md