tao-finetune-cosmos-reason
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCosmos-RL
Cosmos-RL
Supervised fine-tuning (SFT) of nvidia/Cosmos-Reason2-8B on video reasoning tasks. Pretrained weights are sourced from HuggingFace, not NGC. This is a gated model — requires .
HF_TOKENUses FSDP-based parallelism with for GPU count and for node count (not the standard /).
dp_shard_sizedp_replicate_sizenum_gpusnum_nodes针对视频推理任务对nvidia/Cosmos-Reason2-8B进行监督微调(SFT)。预训练权重来自HuggingFace,而非NGC。这是一个gated model——需要权限。
HF_TOKEN采用基于FSDP的并行机制,使用对应GPU数量,对应节点数量(而非标准的/)。
dp_shard_sizedp_replicate_sizenum_gpusnum_nodesWhen to Use
使用场景
Use this skill to train, evaluate, quantize, or run inference on Cosmos-Reason2-8B for video question-answering and video reasoning. The core workflow is: confirm gating, sample annotations for , load the spec template, apply the critical train overrides below, then launch through the platform skill (or AutoML when enabled).
HF_TOKENvideo_fps当你需要针对视频问答和视频推理任务对Cosmos-Reason2-8B进行训练、评估、量化或推理时,可使用该技能。核心工作流为:确认权限、为采样标注、加载规格模板、应用以下关键训练覆盖配置,然后通过平台技能(或启用AutoML时通过AutoML)启动任务。
HF_TOKENvideo_fpsDataclass Schemas
数据类模式
Generated TAO Core schemas are packaged in , with listing available actions. Each generated schema also emits from the schema top-level field. AutoML enablement is declared at the model layer in via . Runnable AutoML still requires and to exist and parse. Use the packaged train schema for , , defaults, min/max bounds, enums, option weights, math conditions, dependencies, and popular parameters. Do not expect at runtime; maintainers regenerate schemas/templates before packaging the skill bank.
schemas/<action>.schema.jsonschemas/manifest.jsonreferences/spec_template_<action>.yamldefaultreferences/skill_info.yamlautoml_enabledschemas/train.schema.jsonreferences/spec_template_train.yamlautoml_default_parametersautoml_disabled_parameters~/tao-core生成的TAO Core模式打包在中,列出了可用动作。每个生成的模式还会从模式顶层字段生成。AutoML启用状态在的模型层通过声明。可运行的AutoML仍要求和存在且可解析。使用打包的训练模式配置、、默认值、最小/最大边界、枚举值、选项权重、数学条件、依赖关系及常用参数。运行时不要依赖;维护人员在打包技能库前会重新生成模式/模板。
schemas/<action>.schema.jsonschemas/manifest.jsondefaultreferences/spec_template_<action>.yamlreferences/skill_info.yamlautoml_enabledschemas/train.schema.jsonreferences/spec_template_train.yamlautoml_default_parametersautoml_disabled_parameters~/tao-coreTrain Action Policy
训练动作策略
This model is AutoML-enabled at the model layer. Before handling any train-stage request, read and resolve the run override from either an explicit value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as for this run only; otherwise default to . When , , and both and are packaged, route the train action through by default with this model's . Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and . Use direct model training only when or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.
references/skill_info.yamlautoml_policyautoml_policy: offautoautoml_policy: autoautoml_enabled: trueschemas/train.schema.jsonreferences/spec_template_train.yamltao-skill-bank:tao-run-automlskill_dirautoml_policyautoml_policy: offNon-train actions such as , , , and deploy flows stay in this model skill. The per-run override does not change model metadata.
evaluateinferenceexportautoml_policy该模型在模型层已启用AutoML。处理任何训练阶段请求前,需读取,并通过显式值或用户工作流请求确定运行覆盖配置。将“turn off AutoML”、“disable AutoML”、“no HPO”或“plain training”这类短语视为本次运行的;否则默认设为。当、且和已打包时,默认将训练动作路由至,并传入该模型的。保留数据集、规格、输出目录、GPU/平台设置、父检查点及的工作流/应用覆盖配置。仅当或打包的训练模式/模板缺失时,才使用直接模型训练;若模式缺失,需告知用户该模型已启用AutoML但无法运行,直至生成模式。
references/skill_info.yamlautoml_policyautoml_policy: offautoautoml_policy: autoautoml_enabled: trueschemas/train.schema.jsonreferences/spec_template_train.yamltao-skill-bank:tao-run-automlskill_dirautoml_policyautoml_policy: off非训练动作(如、、及部署流程)仍在该模型技能中处理。每次运行的覆盖配置不会更改模型元数据。
evaluateinferenceexportautoml_policyCredentials
凭证
- HF_TOKEN (required): HuggingFace access token. The user must accept the model agreement at https://huggingface.co/nvidia/Cosmos-Reason2-8B and provide a token with read access. Passed to the container as a .
docker_env_var
- HF_TOKEN(必填):HuggingFace访问令牌。用户必须在https://huggingface.co/nvidia/Cosmos-Reason2-8B接受模型协议,并提供具有读取权限的令牌。作为传入容器。
docker_env_var
Datasets
数据集
Dataset type is vlm in llava format; accepted intents are training, evaluation, and testing. Inputs may be dataset roots (root mode maps plus as the media path) or direct spec-key paths (when annotations and media live in different locations). Before launching train/AutoML/evaluate, sample the annotation JSON and require in each record — missing makes the Cosmos-RL SFT loader fail with after the job starts. Stop before runner generation if it is absent and ask the user to fix the annotation files; do not start AutoML to discover this inside torchrun.
<root>/annotations.json<root>video_fpsvideo_fpsError processing sample: 'video_fps'See for the full training requirements, the launch intake reminder (spec-key options, root-mode mapping, container-image confirmation, and the invocation), the Per-Action Dataset Requirements table, the mapping with direct-override examples, and the eval-dataset / auto-split policy.
references/datasets.mdcheck_tao_launch_preflight.pydata_sources数据集类型为vlm,格式为llava;支持的用途包括训练、评估和测试。输入可以是数据集根目录(根目录模式映射,并将作为媒体路径)或直接规格键路径(当标注和媒体位于不同位置时)。启动训练/AutoML/评估前,需采样标注JSON并要求每条记录包含——若缺失,Cosmos-RL SFT加载器会在任务启动后抛出错误。若缺失该字段,需在生成运行器前停止任务,并要求用户修复标注文件;不要启动AutoML后在torchrun中才发现该问题。
<root>/annotations.json<root>video_fpsvideo_fpsError processing sample: 'video_fps'完整训练要求、启动提醒(规格键选项、根目录模式映射、容器镜像确认及调用)、各动作数据集要求表、含直接覆盖示例的映射,以及评估数据集/自动拆分策略,请参阅。
check_tao_launch_preflight.pydata_sourcesreferences/datasets.mdSpec Construction
规格构建
cosmos-rl is . Always start from (or for evaluate) — load it via and apply user overrides on top. The spec the model consumes is nested dicts, not flat dotted keys; the dotted override notation denotes paths into the nested spec, so walk the path and assign at the leaf. Data source overrides are mandatory for every action and must be built from the Per-Action Dataset Requirements table in .
mode: configreferences/spec_template_train.yamlspec_template_evaluate.yamlyaml.safe_load(...)references/datasets.mdSee for the load-template-then-override pattern and the full typical override blocks for train (including , /, and LoRA //), evaluate, quantize, and inference, plus the note that leaf keys are valid even when absent from the default spec object.
references/spec-construction.mdpolicy.model_max_length=81920dp_shard_sizedp_replicate_sizelora_alpharlora_dropoutcustom.val_datasetcosmos-rl采用模式。务必从(评估时从)开始——通过加载模板,然后应用用户覆盖配置。模型使用的规格为嵌套字典,而非扁平点分隔键;点分隔覆盖符号表示嵌套规格中的路径,需遍历路径并在叶子节点赋值。数据源覆盖配置对每个动作都是必填项,必须根据中的各动作数据集要求表构建。
mode: configreferences/spec_template_train.yamlspec_template_evaluate.yamlyaml.safe_load(...)references/datasets.md加载模板再覆盖的模式、训练(包括、/及LoRA的//)、评估、量化和推理的典型完整覆盖块,以及叶子键即使在默认规格对象中缺失也有效的说明,请参阅。
policy.model_max_length=81920dp_shard_sizedp_replicate_sizelora_alpharlora_dropoutcustom.val_datasetreferences/spec-construction.mdCritical Overrides (Train)
关键训练覆盖配置
These are the keys whose template defaults are wrong or where omission flips the run into a different mode:
| Parameter | Template Default | Required Value | Why |
|---|---|---|---|
| | | The bare HF id makes cosmos-rl fetch from HF Hub at runtime; the |
| 40960 | Keep at 40960 or higher | Smaller than ~40k causes |
| 32 | Any multiple of | Mismatch raises an immediate AssertionError |
| | Keep as | If dropped during agent regeneration, cosmos-rl flips to RL mode → rollout replica allocated → multi-node attempted → hostname errors when |
以下参数的模板默认值不正确,或遗漏会导致运行模式改变:
| 参数 | 模板默认值 | 必填值 | 原因 |
|---|---|---|---|
| | | 仅使用HF ID会导致cosmos-rl在运行时从HF Hub拉取权重; |
| 40960 | 保持40960或更高 | 小于约40k会导致视频输入的 |
| 32 | | 不匹配会立即触发AssertionError |
| | SFT工作流中保持为 | 若在代理再生时丢失该参数,cosmos-rl会切换至RL模式→分配rollout副本→尝试多节点→当 |
Parameters
参数说明
train.train_batch_per_replicatrain.train_policy.mini_batchpolicy.model_max_lengthpolicy.parallelism.dp_shard_sizedp_replicate_sizecustom.vision.fpscustom.vision.nframesSee for the complete parameter reference: training loop, model & policy, parallelism (including multi-node guidance and platform-skill pointers), optimization & data loading, vision encoders (fps vs nframes details and the decord/torchvision failure mode), checkpointing, validation, logging, and hardware.
references/parameters.mdtrain.train_batch_per_replicatrain.train_policy.mini_batchpolicy.model_max_lengthpolicy.parallelism.dp_shard_sizedp_replicate_sizecustom.vision.fpscustom.vision.nframes完整参数参考(训练循环、模型与策略、并行机制(包括多节点指南和平台技能指向)、优化与数据加载、视觉编码器(fps与nframes细节及decord/torchvision故障模式)、检查点、验证、日志记录及硬件)请参阅。
references/parameters.mdEvaluate
评估
The evaluator reads a flat TOML config with top-level keys , , , , , , , , , . Task type is (General Evaluator, auto-detects binary yes/no classification and computes TP/FP/TN/FN/accuracy/precision/recall/F1) or (left/right/straight; do NOT use for collision detection). The block in declares inputs and outputs; for SDK invocation see .
datasetmodeltaskevaluationvisiongenerationmetricsresultsnum_gpusresults_dir"""its_directionality"actions.evaluatereferences/skill_info.yamlskills/platform/tao-run-platform/SKILL.mdSee for the config-format detail, task-type notes, LoRA evaluation (checkpoint path via with / and adapter merge behavior), selective download ( partial media pull), and the results format and metrics.
references/evaluate.mdspec_overridesmodel.enable_loramodel.base_model_path{annotation, format, keys}评估器读取扁平TOML配置,顶级键包括、、、、、、、、、。任务类型为(通用评估器,自动检测二元是/否分类并计算TP/FP/TN/FN/准确率/精确率/召回率/F1)或(左/右/直行;请勿用于碰撞检测)。中的块声明了输入和输出;SDK调用说明请参阅。
datasetmodeltaskevaluationvisiongenerationmetricsresultsnum_gpusresults_dir"""its_directionality"references/skill_info.yamlactions.evaluateskills/platform/tao-run-platform/SKILL.md配置格式细节、任务类型说明、LoRA评估(通过设置检查点路径,含/及适配器合并行为)、选择性下载(部分媒体拉取),以及结果格式和指标,请参阅。
spec_overridesmodel.enable_loramodel.base_model_path{annotation, format, keys}references/evaluate.mdError Patterns
错误模式
Common failures include CUDA OOM in train (reduce or raise ), OOM during LoRA evaluation, NaN loss, the shape mismatch (raise to 40960), not divisible by , larger than samples per rank (the 0-step crash), stale dataset cache after changing fps/total_pixels, and the gated-repo authentication loop.
mini_batchdp_shard_sizevision_embedsmodel_max_lengthtrain_batch_per_replicamini_batchtrain_batch_per_replica'NoneType' object has no attribute 'state_dict'See for the full diagnosis and fix for each error pattern.
references/troubleshooting.md常见故障包括训练时CUDA OOM(减少或增大)、LoRA评估时OOM、NaN损失、形状不匹配(将增大至40960)、不能被整除、大于每rank样本数(触发的0步崩溃)、修改fps/total_pixels后数据集缓存过期,以及 gated repo认证循环。
mini_batchdp_shard_sizevision_embedsmodel_max_lengthtrain_batch_per_replicamini_batchtrain_batch_per_replica'NoneType' object has no attribute 'state_dict'各错误模式的完整诊断和修复方法,请参阅。
references/troubleshooting.mdDEFT Support and Parent-Model Inference
DEFT支持与父模型推理
Cosmos-RL implements the DEFT workflow contract for video QA tasks (see and ). Gap analysis via reads cosmos-rl , compares predictions by exact string match after , and emits a parquet of failure cases — so eval prompts must force short constrained answers. Model-specific parent-model inference mappings (evaluate/inference/quantize/train spec fields → inference functions, checkpoint metadata, and handling) live in the reference, not in .
config.jsonworkflow/deft/deft.mdscripts/analyze_gaps.pyresults.json.lower().strip()parent_job_idconfig.jsonSee for the gap-analysis detail and limitation, and the full parent-model inference mapping table.
references/deft-and-inference-mappings.mdCosmos-RL针对视频QA任务实现了DEFT工作流协议(请参阅和)。通过进行差距分析,读取cosmos-rl的,对预测结果进行后精确字符串匹配对比,并输出失败案例的parquet文件——因此评估提示必须强制生成简短的受限答案。模型特定的父模型推理映射(评估/推理/量化/训练规格字段→推理函数、检查点元数据及处理)存放在参考文档中,而非。
config.jsonworkflow/deft/deft.mdscripts/analyze_gaps.pyresults.json.lower().strip()parent_job_idconfig.json差距分析的细节和限制,以及完整的父模型推理映射表,请参阅。
references/deft-and-inference-mappings.md