nv-generate-mr-brain-finetune
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNV-Generate-MR-Brain-Finetune
NV-Generate-MR-Brain-Finetune
Purpose
用途
- Used for finetuning the NV-Generate-CTMR diffusion UNet from user-supplied NIfTI training volumes.
rflow-mr-brain - Not for clinical interpretation, regulatory use, or approving synthetic data for production training.
- The wrapper stages the config glue locally and delegates execution to existing upstream scripts: ,
scripts.diff_model_create_training_data, and optionallyscripts.diff_model_train. It does not execute the notebook.scripts.diff_model_infer - Manifest I/O: inputs are and
datalist; outputs aredata_base_dir, optionalfinetuned_checkpoint, andinference_outputs.result_json - The underlying training contract is the upstream config/env JSON (the same one driven from cell of
[10]). The wrapper stages those JSON files for you and exposes the most-tuned fields as CLI flags; the sections below document the fields, their defaults, and how to monitor/tune a run.train_diff_unet_tutorial.ipynb
- 用于基于用户提供的NIfTI训练卷微调NV-Generate-CTMR的扩散UNet模型。
rflow-mr-brain - 不可用于临床解读、监管用途或批准合成数据用于生产训练。
- 该包装器会在本地准备配置文件,并将执行任务委托给现有的上游脚本:、
scripts.diff_model_create_training_data,以及可选的scripts.diff_model_train。它不会运行笔记本文件。scripts.diff_model_infer - 清单输入输出:输入为和
datalist;输出为data_base_dir、可选的finetuned_checkpoint和inference_outputs。result_json - 底层训练协议为上游配置/环境JSON(与的单元格
train_diff_unet_tutorial.ipynb所使用的相同)。包装器会为您准备这些JSON文件,并将最常用的配置字段作为CLI标志暴露出来;以下部分将记录这些字段、默认值以及如何监控和调优运行过程。[10]
Instructions
操作说明
- Read before changing arguments, side effects, or validation gates.
skill_manifest.yaml - Run from the Medical AI Skills repo root.
scripts/run_mr_brain_finetune.py - If a host agent exposes , use
run_script; otherwise run the Bash/Python command below.run_script("scripts/run_mr_brain_finetune.py", args=[...]) - Use first when checking a new datalist; remove
--preflightonly when the user explicitly wants to launch GPU finetuning.--preflight - For a staged preflight input bundle directory, use as the datalist and
BUNDLE/preflight_datalist.jsonasBUNDLE/preflight_datasetwhen those files are present.--data-base-dir
- 在修改参数、副作用或验证规则前,请先阅读。
skill_manifest.yaml - 从Medical AI Skills仓库根目录运行。
scripts/run_mr_brain_finetune.py - 如果主机Agent暴露了接口,请使用
run_script;否则运行下方的Bash/Python命令。run_script("scripts/run_mr_brain_finetune.py", args=[...]) - 检查新数据列表时,请先使用参数;仅当用户明确要启动GPU微调时,再移除该参数。
--preflight - 对于已准备好的预校验输入包目录,当和
BUNDLE/preflight_datalist.json文件存在时,使用前者作为数据列表,后者作为BUNDLE/preflight_dataset。--data-base-dir
Examples
示例
Validate and stage a preflight finetune check from an input bundle (the recommended first step — no GPU, no training). This is the single canonical command; replace and with your paths:
INPUT_BUNDLEOUT_DIRbash
export NV_GENERATE_ROOT="${NV_GENERATE_ROOT:-.workbench_data/upstreams/NV-Generate-CTMR}" && \
python skills/nv-generate-mr-brain-finetune/scripts/run_mr_brain_finetune.py \
INPUT_BUNDLE/preflight_datalist.json \
--data-base-dir INPUT_BUNDLE/preflight_dataset \
--output-dir OUT_DIR \
--modality mri_t1 \
--preflightFor real GPU finetuning and other variations, see Usage below.
从输入包验证并准备预校验微调检查(推荐的第一步——无需GPU,不进行训练)。这是标准命令;请将和替换为您的路径:
INPUT_BUNDLEOUT_DIRbash
export NV_GENERATE_ROOT="${NV_GENERATE_ROOT:-.workbench_data/upstreams/NV-Generate-CTMR}" && \
python skills/nv-generate-mr-brain-finetune/scripts/run_mr_brain_finetune.py \
INPUT_BUNDLE/preflight_datalist.json \
--data-base-dir INPUT_BUNDLE/preflight_dataset \
--output-dir OUT_DIR \
--modality mri_t1 \
--preflight如需进行实际GPU微调及其他变体操作,请查看下方的【2. 使用方式(一键训练)】部分。
Available Scripts
可用脚本
| Script | Purpose | Arguments |
|---|---|---|
| Primary entrypoint declared by | |
| 脚本 | 用途 | 参数 |
|---|---|---|
| | |
Prerequisites
前置条件
- may point to a current checkout of
NV_GENERATE_ROOTcontaininghttps://github.com/NVIDIA-Medtech/NV-Generate-CTMR,scripts/diff_model_create_training_data.py, andscripts/diff_model_train.py.scripts/diff_model_infer.py - If is unset, the wrapper searches
NV_GENERATE_ROOT..workbench_data/upstreams/NV-Generate-CTMR - is optional and can be used to select the GPU for real training.
CUDA_VISIBLE_DEVICES - Runtime requirements: NVIDIA CUDA GPU for real training, Python packages from the upstream , and downloaded MR-brain weights.
requirements.txt - Side effects: writes staged configs, embeddings, checkpoints, optional inference images, and logs under the caller-provided ; may write model caches under the upstream checkout and
--output-dir; may contact~/.cache/huggingface/for model assets andhttps://huggingface.cofor the upstream checkout.https://github.com - The datalist is a MONAI-style JSON object with paths relative to
training[].image.--data-base-diris optional and defaults totraining[].modality.mri_t1
- 需指向
NV_GENERATE_ROOT的当前检出版本,其中需包含https://github.com/NVIDIA-Medtech/NV-Generate-CTMR、scripts/diff_model_create_training_data.py和scripts/diff_model_train.py。scripts/diff_model_infer.py - 如果未设置,包装器会搜索
NV_GENERATE_ROOT。.workbench_data/upstreams/NV-Generate-CTMR - 为可选参数,可用于选择实际训练时使用的GPU。
CUDA_VISIBLE_DEVICES - 运行环境要求:实际训练需NVIDIA CUDA GPU,需安装上游中的Python包,且需下载脑部MR模型权重。
requirements.txt - 副作用:会在用户指定的下写入准备好的配置文件、嵌入向量、检查点、可选的推理图像和日志;可能会在上游检出目录和
--output-dir下写入模型缓存;可能会连接~/.cache/huggingface/获取模型资源,连接https://huggingface.co获取上游检出版本。https://github.com - 数据列表为MONAI风格的JSON对象,其中路径为相对于
training[].image的路径。--data-base-dir为可选参数,默认值为training[].modality。mri_t1
1. Config and environment JSON (adapt to your data)
1. 配置与环境JSON(适配您的数据)
This is a thin wrapper around the upstream flow. Each run performs four steps, delegating the heavy lifting to the model author's scripts:
train_diff_unet_tutorial.ipynb- Stage configs — copy the three config JSONs and rewrite only the run-specific paths and (notebook cell 15).
n_epochs - → latent
python -m scripts.diff_model_create_training_dataembeddings (cell 17).*_emb.nii.gz - Write embedding sidecars — a per embedding with
<emb>.nii.gz.json/spacing(and body-region indices when the model uses them). This is the one piece of glue that lives in the notebook (cell 19), not in upstreammodality, andscripts/requires it; the skill owns it.diff_model_train - (cell 21), optionally
python -m scripts.diff_model_train.python -m scripts.diff_model_infer
Tune by editing the config JSON, not by adding flags. All training/inference hyperparameters (, , , inference ///, …) live in . Edit the upstream copy, or pass your own with (and / for the other two). The wrapper only ever rewrites the fields below.
lrbatch_sizecache_ratedimspacingnum_inference_stepscfg_guidance_scaleconfig_maisi_diff_model_rflow-mr-brain.json--model-config FILE--env-config--model-defEnvironment JSON () — fields the wrapper rewrites per run:
environment_maisi_diff_model_rflow-mr-brain.json| Field | Set from | Notes |
|---|---|---|
| | Root for relative |
| your datalist | Staged copy with per-entry |
| | Latent embeddings, checkpoints, inference images. |
| upstream | Maps modality name → integer code. |
| | Output checkpoint name (default |
| upstream weights / | Starting checkpoint; cleared by |
| upstream weights / | VAE used to encode/decode latents. |
Model config () — the only fields the wrapper touches:
config_maisi_diff_model_rflow-mr-brain.json| Field | Set from | Default | Notes |
|---|---|---|---|
| | | Convenience override (cell 15 does the same); wrapper default is small for verification. |
| | from | Kept consistent with the training modality for optional |
Everything else in that file (, , , the rest of ) is left exactly as written — edit the JSON to change it.
lrbatch_sizecache_ratediffusion_unet_inferenceRuntime flags (not config fields): ( launches ), (disable mixed precision, passed through to ).
--num-gpus N>1torch.distributed.run--no-ampdiff_model_train--modalityconfigs/modality_mapping.jsonmrimri_t1mri_t2mri_flairmri_swi*_skull_strippedtraining[].modality--modalityFor an end-to-end reference including example data download and checkpoint loading, see the upstream tutorial .
train_diff_unet_tutorial.ipynb这是上游流程的轻量包装器。每次运行会执行四个步骤,将核心任务委托给模型作者的脚本:
train_diff_unet_tutorial.ipynb- 准备配置文件 —— 复制三个配置JSON文件,仅重写与运行相关的路径和(对应笔记本单元格15)。
n_epochs - → 生成潜在
python -m scripts.diff_model_create_training_data嵌入向量(对应单元格17)。*_emb.nii.gz - 写入嵌入向量附属文件 —— 每个嵌入向量对应一个文件,包含
<emb>.nii.gz.json/spacing(当模型使用体区域索引时还包含该索引)。这是笔记本中(单元格19)而非上游modality中的唯一衔接逻辑,且scripts/需要该文件;本Skill负责处理此步骤。diff_model_train - (对应单元格21),可选执行
python -m scripts.diff_model_train。python -m scripts.diff_model_infer
通过编辑配置JSON进行调优,而非添加标志。 所有训练/推理超参数(、、、推理///等)均存储在中。编辑上游副本,或通过(以及/用于另外两个配置文件)传入您自己的配置。包装器仅会重写以下字段。
lrbatch_sizecache_ratedimspacingnum_inference_stepscfg_guidance_scaleconfig_maisi_diff_model_rflow-mr-brain.json--model-config FILE--env-config--model-def环境JSON()——包装器会针对每次运行重写以下字段:
environment_maisi_diff_model_rflow-mr-brain.json| 字段 | 来源 | 说明 |
|---|---|---|
| | |
| 您的数据列表 | 已准备好的副本,其中填充了每个条目的 |
| | 潜在嵌入向量、检查点、推理图像的存储目录。 |
| 上游文件 | 映射模态名称→整数编码。 |
| | 输出检查点名称(默认值为 |
| 上游权重 / | 初始检查点;使用 |
| 上游权重 / | 用于编码/解码潜在向量的VAE模型。 |
模型配置()——包装器仅会修改以下字段:
config_maisi_diff_model_rflow-mr-brain.json| 字段 | 来源 | 默认值 | 说明 |
|---|---|---|---|
| | | 便捷覆盖参数(单元格15也会执行相同操作);包装器默认值较小,用于验证。 |
| | 来自 | 与训练模态保持一致,用于可选的 |
该文件中的其他所有内容(、、、的其余参数)均保持原样——如需修改请编辑JSON文件。
lrbatch_sizecache_ratediffusion_unet_inference运行时标志(非配置字段):(时会启动)、(禁用混合精度,会传递给)。
--num-gpus N>1torch.distributed.run--no-ampdiff_model_train--modalityconfigs/modality_mapping.jsonmrimri_t1mri_t2mri_flairmri_swi*_skull_strippedtraining[].modality--modality如需包含示例数据下载和检查点加载的端到端参考,请查看上游教程。
train_diff_unet_tutorial.ipynb2. Usage (one-line training)
2. 使用方式(一键训练)
Preflight only:
bash
export NV_GENERATE_ROOT="${NV_GENERATE_ROOT:-.workbench_data/upstreams/NV-Generate-CTMR}" && \
python skills/nv-generate-mr-brain-finetune/scripts/run_mr_brain_finetune.py \
PATH_TO_DATALIST.json \
--data-base-dir PATH_TO_DATA_ROOT \
--output-dir runs/nv_generate_mr_brain_finetune_preflight \
--preflightPreflight bundle input:
bash
export NV_GENERATE_ROOT="${NV_GENERATE_ROOT:-.workbench_data/upstreams/NV-Generate-CTMR}" && \
python skills/nv-generate-mr-brain-finetune/scripts/run_mr_brain_finetune.py \
PATH_TO_INPUT_BUNDLE/preflight_datalist.json \
--data-base-dir PATH_TO_INPUT_BUNDLE/preflight_dataset \
--output-dir runs/nv_generate_mr_brain_finetune_preflight \
--preflightGPU finetuning:
bash
export NV_GENERATE_ROOT="${NV_GENERATE_ROOT:-.workbench_data/upstreams/NV-Generate-CTMR}" && \
python -m pip install -r "$NV_GENERATE_ROOT/requirements.txt" && \
python skills/nv-generate-mr-brain-finetune/scripts/run_mr_brain_finetune.py \
PATH_TO_DATALIST.json \
--data-base-dir PATH_TO_DATA_ROOT \
--output-dir runs/nv_generate_mr_brain_finetune \
--epochs 2 \
--modality mri_t1 \
--run-inferenceReplace and with the user's actual paths. Do not use the fixture datalist for real training; it is a preflight-only placeholder.
PATH_TO_DATALIST.jsonPATH_TO_DATA_ROOT仅预校验:
bash
export NV_GENERATE_ROOT="${NV_GENERATE_ROOT:-.workbench_data/upstreams/NV-Generate-CTMR}" && \
python skills/nv-generate-mr-brain-finetune/scripts/run_mr_brain_finetune.py \
PATH_TO_DATALIST.json \
--data-base-dir PATH_TO_DATA_ROOT \
--output-dir runs/nv_generate_mr_brain_finetune_preflight \
--preflight预校验包输入:
bash
export NV_GENERATE_ROOT="${NV_GENERATE_ROOT:-.workbench_data/upstreams/NV-Generate-CTMR}" && \
python skills/nv-generate-mr-brain-finetune/scripts/run_mr_brain_finetune.py \
PATH_TO_INPUT_BUNDLE/preflight_datalist.json \
--data-base-dir PATH_TO_INPUT_BUNDLE/preflight_dataset \
--output-dir runs/nv_generate_mr_brain_finetune_preflight \
--preflightGPU微调:
bash
export NV_GENERATE_ROOT="${NV_GENERATE_ROOT:-.workbench_data/upstreams/NV-Generate-CTMR}" && \
python -m pip install -r "$NV_GENERATE_ROOT/requirements.txt" && \
python skills/nv-generate-mr-brain-finetune/scripts/run_mr_brain_finetune.py \
PATH_TO_DATALIST.json \
--data-base-dir PATH_TO_DATA_ROOT \
--output-dir runs/nv_generate_mr_brain_finetune \
--epochs 2 \
--modality mri_t1 \
--run-inference请将和替换为用户的实际路径。请勿使用测试数据列表进行实际训练;它仅作为预校验占位符。
PATH_TO_DATALIST.jsonPATH_TO_DATA_ROOT3. Monitor training (TensorBoard)
3. 监控训练(TensorBoard)
scripts.diff_model_trainmodel_dirOUT_DIR/artifacts/modelsbash
python -m pip install tensorboard && \
tensorboard --logdir runs/nv_generate_mr_brain_finetune/artifactsThe run summary is written to (checkpoint path, embedding sidecars, inference outputs); the JSON the wrapper prints to stdout mirrors the same paths plus and a for quick triage.
OUT_DIR/artifacts/workflow_summary.jsonexit_codestderr_tailscripts.diff_model_trainmodel_dirOUT_DIR/artifacts/modelsbash
python -m pip install tensorboard && \
tensorboard --logdir runs/nv_generate_mr_brain_finetune/artifacts运行摘要会写入(包含检查点路径、嵌入向量附属文件、推理输出);包装器打印到标准输出的JSON会镜像这些路径,同时包含和用于快速排查问题。
OUT_DIR/artifacts/workflow_summary.jsonexit_codestderr_tail4. Hyperparameter tuning and common pitfalls
4. 超参数调优与常见问题
- Loss not decreasing / unstable — lower (default
diffusion_unet_train.lr) in the model-config JSON, or keep AMP on (default);1e-5is slower but more numerically stable on older GPUs.--no-amp - Out-of-memory — keep at
diffusion_unet_train.batch_sizeand1atcache_ratein the config JSON, and confirm the autoencoder/UNet fit your GPU before scaling. Multi-GPU (0) shards the batch via--num-gpus N.torch.distributed.run - Few cases / quick check — keep small (the wrapper default
--epochsis for verification, not convergence; the upstream config ships2).1000 - Wrong modality conditioning — set or per-case
--modalityto a value present intraining[].modality; a mismatch produces a clear error rather than silently mislabeling latents.configs/modality_mapping.json - Slow startup on first run — precomputes latent embeddings once; reuse the same
diff_model_create_training_datato avoid recomputing them.--output-dir
- 损失未下降/不稳定 —— 在模型配置JSON中降低(默认值
diffusion_unet_train.lr),或保持AMP启用(默认设置);1e-5速度较慢,但在旧GPU上数值稳定性更高。--no-amp - 内存不足 —— 在配置JSON中保持为
diffusion_unet_train.batch_size、1为cache_rate,并在扩展前确认自动编码器/UNet适配您的GPU。多GPU(0)会通过--num-gpus N拆分批次。torch.distributed.run - 案例数量少/快速检查 —— 保持较小(包装器默认值
--epochs用于验证,而非收敛;上游配置默认值为2)。1000 - 模态条件错误 —— 将或每个案例的
--modality设置为training[].modality中存在的值;不匹配会产生明确错误,而非静默标记错误的潜在向量。configs/modality_mapping.json - 首次启动速度慢 —— 会预先计算一次潜在嵌入向量;重复使用相同的
diff_model_create_training_data可避免重新计算。--output-dir
5. Evaluate the finetuned model
5. 评估微调后的模型
Use the staged checkpoint () as the diffusion UNet for generation, then inspect the synthesized volumes:
OUT_DIR/artifacts/models/<model_filename>- Pass here for a quick built-in sanity render, or
--run-inference - Point the inference skill at the finetuned checkpoint to generate fresh brain MRI volumes for qualitative review.
nv-generate-mr-brain
This skill gates file accounting and command provenance only — anatomical realism and downstream utility must be judged by a domain expert on the generated images.
使用准备好的检查点()作为扩散UNet进行生成,然后检查合成的卷数据:
OUT_DIR/artifacts/models/<model_filename>- 在此处使用进行快速内置的合理性渲染,或
--run-inference - 将推理Skill指向微调后的检查点,生成新的脑部MRI卷用于定性评估。
nv-generate-mr-brain
本Skill仅负责文件管理和命令溯源——生成图像的解剖学真实性和下游效用必须由领域专家判断。
Limitations
局限性
- Requires a current upstream checkout with the existing diffusion training scripts. The skill itself stages the required config and datalist glue locally and does not depend on the notebook or PR #33.
NV-Generate-CTMR - Full training can be expensive and is not deterministic across hardware, CUDA, and package versions.
- The wrapper gates file accounting and command provenance, not anatomical realism or downstream model utility.
- Not for clinical deployment, clinical interpretation, autonomous diagnosis, regulatory submission, or production training-data approval.
- 需要当前上游检出版本及现有的扩散训练脚本。本Skill本身会在本地准备所需的配置和数据列表衔接逻辑,不依赖笔记本或PR #33。
NV-Generate-CTMR - 完整训练成本较高,且在不同硬件、CUDA和包版本下结果不具有确定性。
- 包装器仅负责文件管理和命令溯源,不保证解剖学真实性或下游模型效用。
- 不可用于临床部署、临床解读、自主诊断、监管提交或生产训练数据审批。
Troubleshooting
故障排除
| Error | Cause | Fix |
|---|---|---|
| | Clone or update |
| | Fix the datalist or pass the correct data root. |
| CUDA or MONAI import failure | Runtime environment lacks upstream dependencies. | Install |
| 错误 | 原因 | 解决方法 |
|---|---|---|
| | 克隆或更新 |
| | 修正数据列表或传入正确的数据根目录。 |
| CUDA或MONAI导入失败 | 运行环境缺少上游依赖。 | 在所选环境中安装 |