nemotron-customize
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesenemotron-customize
Nemotron定制流程
Purpose
用途
Use this skill to turn a model-customization request into a repo-native Nemotron step pipeline. It plans the step DAG, validates artifact wiring, and creates only the YAML configs needed to run existing steps.
Use it only for inspecting, configuring, validating, running, or submitting
existing Nemotron steps or multi-step training/customization pipelines. If the
request is a frontend, dashboard, visualization, generic ML-advice,
billing/access, or unrelated coding task, stop with a short scope note and do
not inspect the step catalog or edit files in that turn.
使用此技能将模型定制请求转换为仓库原生的Nemotron步骤流水线。它会规划步骤DAG(有向无环图),验证工件关联,并仅创建运行现有步骤所需的YAML配置文件。
仅将其用于检查、配置、验证、运行或提交现有Nemotron步骤或多阶段训练/定制流水线。如果请求涉及前端、仪表盘、可视化、通用机器学习建议、计费/权限或无关编码任务,请停止操作并给出简短的范围说明,且在该轮次中不要检查步骤目录或编辑文件。
Security Notes
安全说明
This skill may use to create or modify YAML/README files and to
run repository commands. Confirm with the user before file writes or shell
execution. Keep Bash usage scoped to repo-safe commands such as , , , and targeted
validation commands. Never run environment dumps (, , broad
) or commands that expose secret values.
WriteBashuv run nemotron steps ...python -m pytest ...git status/diffenvprintenvexport此技能可能会使用操作创建或修改YAML/README文件,并使用执行仓库命令。在执行文件写入或shell命令前需与用户确认。Bash命令仅限仓库安全操作,例如、、以及针对性验证命令。切勿运行环境转储命令(、、宽泛的)或会暴露机密值的命令。
WriteBashuv run nemotron steps ...python -m pytest ...git status/diffenvprintenvexportRequirements
要求
- Checkout of this Nemotron repo with present.
src/nemotron/steps/ - Invoke from the repo root. All paths in this document are repo-root-relative.
- User-provided model, data, hardware, backend, and output constraints before writing configs.
- Backend credentials only when the selected step needs them (translation, W&B, hosted endpoints).
- 已检出包含目录的Nemotron仓库。
src/nemotron/steps/ - 从仓库根目录调用。本文档中所有路径均为相对仓库根目录的路径。
- 在编写配置文件前,用户需提供模型、数据、硬件、后端和输出约束条件。
- 仅当所选步骤需要时才获取后端凭据(翻译、W&B、托管端点相关)。
Limitations
限制
- Does not invent new catalog steps when an existing one fits.
- New Python/shell code only in Explorer mode after the gap is explicit.
- Post-training deployment-only requests are out of scope.
Invocation: . The repo under is the
source of truth; this skill orchestrates and does not duplicate per-step
knowledge.
/nemotron-customizesrc/nemotron/steps/Priority order: (1) reuse existing repo code, CLIs, recipes, steps, runners,
and configs; (2) add YAML configs for the user's request; (3) generate new
Python/shell only when the repo cannot satisfy the request, and name the gap
first.
For a command request: verify repo root, read the step catalog, read the
selected , verify the requested config exists, read the active env
TOML for any remote profile, then emit the complete command. Do not guess
profiles from examples or naming conventions.
step.toml--batch- 当已有合适的目录步骤时,不会创建新的步骤。
- 仅在明确存在功能缺口后,才会在探索模式下生成新的Python/shell代码。
- 仅涉及训练后部署的请求不在范围内。
调用方式:。目录下的内容为事实来源;此技能仅负责编排,不会重复每个步骤的知识。
/nemotron-customizesrc/nemotron/steps/优先级顺序:(1) 重用现有仓库代码、CLI、方案、步骤、运行器和配置文件;(2) 为用户请求添加YAML配置文件;(3) 仅当仓库无法满足请求时才生成新的Python/shell代码,且需先明确说明缺口。
对于命令请求:验证仓库根目录、读取步骤目录、读取所选的、验证请求的配置是否存在、读取活动环境TOML文件中的远程配置文件,然后输出完整命令。不要根据示例或命名约定猜测配置文件。
step.toml--batchQuick Decision Tree
快速决策树
- AutoModel vs Megatron-Bridge: small GPU count, Hugging Face model,
LoRA/PEFT, or OpenAI-style chat JSONL → AutoModel path (or the matching PEFT AutoModel step). Large distributed training, packed Parquet/binidx data, or full fine-tuning → Megatron-Bridge, but verify against
sft/automodeland the category README first.hardware.md - BYOB / MCQ benchmark inputs route to , NOT
byob/mcq. BYOB preserves the multiple-choice schema (question, choices, answer); the translate path would flatten or strip those fields. Trigger on phrases like "BYOB benchmark", "MCQ", "evaluation benchmark Parquet", "multiple-choice prep".translate/nemo_curator - Curate then translate: when the user says "curate and translate",
"filter then translate", or "prep data before translating", chain
(filter raw JSONL) →
curate/nemo_curator(translate curated JSONL). Do not skip the curate stage.translate/nemo_curator - Checkpoint conversion: route "Megatron to HF", "HF export", "convert
checkpoint", or "iter_* to safetensors" to ; route "HF to Megatron" imports to
convert/megatron_to_hf. Use a concreteconvert/hf_to_megatronsource for Megatron exports.iter_* - Existing endpoint or checkpoint eval: route hosted endpoint smoke tests
and benchmark requests to ; use
eval/model_evalfor hosted chat smoke andtiny_chatfor Megatron checkpoint evaluation.default - No env TOML profile present: do not invent Lepton or profiles; ask the user or fall back to local execution.
--batch
Required inputs before finalizing configs or commands:
- ,
model,input_path, hardware/GPU count, backend/env profile, and any needed API key environment variable name such asoutput_diror an evaluator key.HF_TOKEN - For translation commands, also collect , target/source languages, and the runtime-visible input/output paths.
server.url - For BYOB, collect benchmark/source document path, stage (,
prepare,generate, ortranslate), target/source languages when translating, and output directory.all - For conversion, collect source checkpoint path, output path, model/config
source, and whether the source is HF, Megatron , or LoRA adapter.
iter_* - For eval, collect endpoint URL/model ID or checkpoint path, task IDs, endpoint type, API-key environment variable name, and sample limit.
Response shape for recommendations: , , ,
, , and . Always call out the stack to avoid
when the user's constraints make it a poor fit.
DecisionWhyRequired inputsConfig/commandAvoidNext step- AutoModel vs Megatron-Bridge:GPU数量少、使用Hugging Face模型、LoRA/PEFT或OpenAI风格聊天JSONL数据 → 选择AutoModel路径(或匹配的PEFT AutoModel步骤)。大规模分布式训练、打包的Parquet/binidx数据或全量微调 → 选择Megatron-Bridge,但需先对照
sft/automodel和分类README验证。hardware.md - BYOB / MCQ基准测试输入需路由至,而非
byob/mcq。BYOB会保留选择题结构(问题、选项、答案);翻译路径会扁平化或剥离这些字段。当出现“BYOB benchmark”、“MCQ”、“evaluation benchmark Parquet”、“multiple-choice prep”等表述时触发此规则。translate/nemo_curator - 先整理再翻译:当用户提到“curate and translate”、“filter then translate”或“prep data before translating”时,串联(过滤原始JSONL)→
curate/nemo_curator(翻译整理后的JSONL)。不要跳过整理阶段。translate/nemo_curator - 检查点转换:将“Megatron转HF”、“HF导出”、“convert checkpoint”或“iter_*转safetensors”请求路由至;将“HF转Megatron”导入请求路由至
convert/megatron_to_hf。Megatron导出需使用具体的convert/hf_to_megatron源。iter_* - 现有端点或检查点评估:将托管端点冒烟测试和基准测试请求路由至;托管聊天冒烟测试使用
eval/model_eval,Megatron检查点评估使用tiny_chat。default - 无环境TOML配置文件:不要创建Lepton或配置文件;询问用户或回退到本地执行。
--batch
在最终确定配置文件或命令前需收集的必填输入:
- 、
model、input_path、硬件/GPU数量、后端/环境配置文件,以及所需的API密钥环境变量名称,例如output_dir或评估器密钥。HF_TOKEN - 对于翻译命令,还需收集、目标/源语言,以及运行时可见的输入/输出路径。
server.url - 对于BYOB,收集基准测试/源文档路径、阶段(、
prepare、generate或translate)、翻译时的目标/源语言,以及输出目录。all - 对于转换操作,收集源检查点路径、输出路径、模型/配置源,以及源是否为HF、Megatron 或LoRA适配器。
iter_* - 对于评估操作,收集端点URL/模型ID或检查点路径、任务ID、端点类型、API密钥环境变量名称,以及样本限制。
建议的响应结构:(决策)、(原因)、(必填输入)、(配置/命令)、(避免方案)和(下一步)。当用户的约束条件使某方案不适合时,需明确指出应避免的方案栈。
DecisionWhyRequired inputsConfig/commandAvoidNext stepHow information is split (and where to find it)
信息拆分方式(及查找位置)
| Question | Look here |
|---|---|
| What does step X consume / produce / parameterize? | |
| When/why pick step X over its siblings? | |
| Which step in category C should I pick? | |
| What runner code does step X use? | |
| Cross-step constraint (tokenizer lock, sequence packing, data quality, ...) | |
Artifact compatibility / | |
| GPU memory / parallelism heuristics | |
| Library API extracts for exceptional code generation | |
| Project scaffold rules, only when repo code cannot support the request | |
| Per-stage code rules, only when repo code cannot support the request | |
If two sources say the same thing, the deeper, more specific one wins
( > category > this file).
step.tomlREADME.md| 问题 | 查找位置 |
|---|---|
| 步骤X的输入/输出/参数有哪些? | |
| 何时/为何选择步骤X而非其他同类步骤? | |
| 分类C中应选择哪个步骤? | |
| 步骤X使用哪些运行器代码? | |
| 跨步骤约束(分词器锁定、序列打包、数据质量等) | |
工件兼容性 / | |
| GPU内存 / 并行性启发规则 | |
| 用于特殊代码生成的库API提取 | |
| 项目脚手架规则(仅当仓库代码无法支持请求时使用) | |
| 分阶段代码规则(仅当仓库代码无法支持请求时使用) | |
如果两个来源内容相同,更深入、更具体的来源优先( > 分类 > 本文档)。
step.tomlREADME.mdInstructions
操作说明
Pipeline workflow (≥2 stages): Orient → Plan → Act → Verify. Discover
candidate steps, propose a DAG with validated artifact wiring, wait for
approval, create the minimal YAML configs, and re-check before reporting done.
Not general ML advice — is the source of truth.
src/nemotron/steps/Single-step command flow:
- Confirm the repo root has and
pyproject.toml.src/nemotron/steps/ - Run when available; otherwise read
uv run nemotron steps list --json.src/nemotron/steps/STEPS.md - Read the selected step's and the requested checked-in config.
step.toml - For remote execution, read or a repo-root
NEMOTRON_ENV_FILEand pick an actual section whose profile matches the step.env*.toml - Emit the full command in one reply; then add brief rationale for the
config/profile choices. For translation, also read
and return
src/nemotron/steps/translate/README.md,Decision,Config,Run,Output.Env
Source tiers for command answers — Verified (CLI + manifest + config +
env + dry-run all succeeded), Repo-grounded (manifest/config/env read, no
dry-run), Blocked (a required repo file or env TOML is missing — name it and
stop before guessing).
Canonical commands:
bash
uv run nemotron steps run <step_id> -c <config-or-path> --dry-run
uv run nemotron steps run <step_id> -c <config-or-path> --dry-run --batch <profile>
uv run nemotron steps run <step_id> -c <config-or-path> --batch <profile>流水线工作流(≥2阶段):定位 → 规划 → 执行 → 验证。发现候选步骤,提出经过工件关联验证的DAG,等待批准,创建最小化YAML配置文件,完成前再次检查。不提供通用机器学习建议 — 为事实来源。
src/nemotron/steps/单步骤命令流:
- 确认仓库根目录包含和
pyproject.toml。src/nemotron/steps/ - 若可用,运行;否则读取
uv run nemotron steps list --json。src/nemotron/steps/STEPS.md - 读取所选步骤的和已提交的请求配置。
step.toml - 对于远程执行,读取或仓库根目录下的
NEMOTRON_ENV_FILE,选择与步骤匹配的实际配置段。env*.toml - 在一次回复中输出完整命令;然后简要说明配置/配置文件选择的理由。对于翻译操作,还需读取并返回
src/nemotron/steps/translate/README.md、Decision、Config、Run、Output。Env
命令答案的来源层级:已验证(CLI + 清单 + 配置 + 环境 + 空跑均成功)、基于仓库(已读取清单/配置/环境,未空跑)、受阻(缺少必要的仓库文件或环境TOML — 明确指出并停止,不要猜测)。
标准命令:
bash
uv run nemotron steps run <step_id> -c <config-or-path> --dry-run
uv run nemotron steps run <step_id> -c <config-or-path> --dry-run --batch <profile>
uv run nemotron steps run <step_id> -c <config-or-path> --batch <profile>Workflow
工作流
Four phases, in order: Orient → Plan → Act → Verify. Never skip Verify.
For detailed phase checklists and Explorer-mode implementation rules, read
.
references/WORKFLOW.md四个阶段,顺序为:定位 → 规划 → 执行 → 验证。切勿跳过验证阶段。如需详细的阶段检查清单和探索模式实现规则,请阅读。
references/WORKFLOW.mdOperational Nuances
操作细节
- Smoke configs (,
tiny.yaml) are wiring tests, not quality evidence.tiny_chat.yaml - references belong in recipe-backed configs; standalone YAML uses plain paths.
${art:...} - Keep pretraining data and
bin/idxfrom the same Nemotron release.blend.json
- 冒烟测试配置文件(、
tiny.yaml)仅用于关联测试,不能作为质量依据。tiny_chat.yaml - 引用属于基于方案的配置文件;独立YAML使用纯路径。
${art:...} - 预训练数据和
bin/idx需来自同一Nemotron版本。blend.json
Examples
示例
-
Single step: read manifest + config + env profile, then return a completecommand.
uv run nemotron steps run <step_id> -c <config> --dry-run -
Translate (one-shot command): for "translate EN → <lang>" requests, collect,
server.url, source/target language,model, and runtime-visible input/output paths first, then emit the full command in one reply (do not split across turns):api_key_envbashuv run nemotron steps run translate/nemo_curator \ -c <translate-config.yaml> \ --batch <env-profile-from-env.toml> -
Curate then translate: chain→
curate/nemo_curator. The curate stage produces filtered JSONL that becomes the translate stage input. Both steps need YAML overlays; wire curate'stranslate/nemo_curatorto translate'soutput_dir.input_glob -
BYOB benchmark prep: route MCQ Parquet inputs through, not
byob/mcq, so the multiple-choice schema is preserved.translate/nemo_curator -
SFT pipeline: plan the DAG (→
data_preporsft/megatron_bridge), validate artifact edges viasft/automodel, then create the YAML overlays.types.toml
-
单步骤:读取清单 + 配置 + 环境配置文件,然后返回完整的命令。
uv run nemotron steps run <step_id> -c <config> --dry-run -
翻译(一次性命令):对于“translate EN → <lang>”请求,先收集、
server.url、源/目标语言、model以及运行时可见的输入/输出路径,然后在一次回复中输出完整命令(不要拆分到多轮):api_key_envbashuv run nemotron steps run translate/nemo_curator \ -c <translate-config.yaml> \ --batch <env-profile-from-env.toml> -
先整理再翻译:串联→
curate/nemo_curator。整理阶段生成过滤后的JSONL作为翻译阶段的输入。两个步骤都需要YAML覆盖配置;将整理阶段的translate/nemo_curator关联到翻译阶段的output_dir。input_glob -
BYOB基准测试准备:将MCQ Parquet输入路由至,而非
byob/mcq,以保留选择题结构。translate/nemo_curator -
SFT流水线:规划DAG(→
data_prep或sft/megatron_bridge),通过sft/automodel验证工件关联,然后创建YAML覆盖配置。types.toml
Two modes
两种模式
Catalog mode — a step exists
目录模式 — 已有对应步骤
Fast path: . Use whenever the user's request maps to a step in the catalog.
STEPS.md → category/README.md → step.toml → step.py → adapt YAML config快速路径:。当用户请求可映射到目录中的步骤时使用此模式。
STEPS.md → category/README.md → step.toml → step.py → 适配YAML配置Explorer mode — no repo path supports it
探索模式 — 无仓库路径支持
Use only after confirming no existing step, runner, recipe, CLI, or YAML config
surface can satisfy the request. Follow
.
references/WORKFLOW.md仅在确认没有现有步骤、运行器、方案、CLI或YAML配置可满足请求时使用。遵循中的规则。
references/WORKFLOW.mdChoosing a mode
模式选择
| User says | Mode |
|---|---|
| "SFT with Megatron-Bridge / AutoModel" | Catalog |
| "DPO / RLVR / GRPO / RLHF" | Catalog: |
| "Synthesize preference / SFT data" | Catalog: |
| "Translate EN → <lang> for training data" | Catalog: |
| "Curate and translate" / "filter then translate" | Catalog chain: |
| "Curate web text" | Catalog: |
| "BYOB benchmark" / "MCQ benchmark prep" | Catalog: |
| "Train with X exotic backend" | Explorer or ask |
| Post-training-only request | Out of scope; redirect to a more appropriate workflow. |
| Ambiguous | Ask |
| 用户表述 | 模式 |
|---|---|
| "使用Megatron-Bridge / AutoModel进行SFT" | 目录模式 |
| "DPO / RLVR / GRPO / RLHF" | 目录模式: |
| "生成偏好/SFT数据" | 目录模式: |
| "将训练数据从EN翻译为<lang>" | 目录模式: |
| "先整理再翻译" / "过滤后翻译" | 目录模式串联: |
| "整理网络文本" | 目录模式: |
| "BYOB基准测试" / "MCQ基准测试准备" | 目录模式: |
| "使用X小众后端进行训练" | 探索模式或询问用户 |
| 仅涉及训练后的请求 | 超出范围;重定向至更合适的工作流。 |
| 表述模糊 | 询问用户 |
Boundaries
边界
Do: build pipelines from existing steps and cite directly;
reuse repo CLIs/runners/recipes first; adapt configs (don't copy
blindly); ask about hardware/data/backend/output path; surface
tradeoffs (Megatron-Bridge vs AutoModel, full FT vs LoRA); present the plan
and wait for approval.
step.tomldefault.yamlDon't: invent steps; skip Plan for pipelines ≥2 stages; generate Python or
shell when YAML suffices; import modules outside the step's reference code;
add monitoring/W&B unless asked; tune parallelism beyond and
; assume GPU count; generate Slurm/Airflow/Kubeflow wrappers;
handle non-training requests in this skill; modify ;
restate per-step rules here — link the step's .
hardware.md[[strategies]]src/nemotron/steps/README.md允许操作:基于现有步骤构建流水线并直接引用;优先重用仓库CLI/运行器/方案;适配配置文件(不要盲目复制);询问硬件/数据/后端/输出路径相关信息;说明权衡方案(Megatron-Bridge vs AutoModel、全量微调 vs LoRA);提交计划并等待批准。
step.tomldefault.yaml禁止操作:创建新步骤;对于≥2阶段的流水线跳过规划阶段;当YAML可满足需求时生成Python或shell代码;导入步骤参考代码之外的模块;未被请求时添加监控/W&B;超出和的范围调整并行性;假设GPU数量;生成Slurm/Airflow/Kubeflow包装器;在此技能中处理非训练请求;修改;在此处重复每个步骤的规则 — 链接步骤的。
hardware.md[[strategies]]src/nemotron/steps/README.mdTroubleshooting
故障排除
| Situation | Action |
|---|---|
| Artifact types do not chain | Recheck |
Remote profile unclear / | Read the active env TOML; do not guess. |
| Config key unclear | Read the step config, |
| Strategy points to a missing skill file | Skip the load; use the |
| Hardware too small | Show |
| Two failed Act attempts | Stop, explain what was tried and what failed, ask the user how to proceed. |
| No existing repo path matches | Check libraries cited in |
| 场景 | 操作 |
|---|---|
| 工件类型无法串联 | 重新检查 |
远程配置文件不明确 / | 读取活动环境TOML文件;不要猜测。 |
| 配置键不明确 | 在编辑前读取步骤配置、 |
| 策略指向缺失的技能文件 | 跳过加载;使用 |
| 硬件规格过小 | 展示 |
| 执行尝试失败两次 | 停止操作,说明已尝试的内容和失败原因,询问用户如何继续。 |
| 无匹配的现有仓库路径 | 检查 |