Loading...
Loading...
Onboard 1-node GitHub MR functional tests for GB200 from existing mr-scoped 2-node tests.
npx skill4agent add nvidia/skills onboard-gb200-1node-testsmr-githubmrtests/test_utils/recipes/gb200/| Recipe file | Notes |
|---|---|
| GPT dense tests, |
| MoE tests, |
| Existing 1-node MoE tests, |
| 1-node GPT tests (create if not present) |
tests/functional_tests/test_cases/{model}/{test_case}/model_config.yaml_1nodetests/functional_tests/test_cases/{model}/{test_case}_1node/model_config.yamlproducts:gpt.yamlmoe.yamlscope: [mr, ...]scope: [mr-slim, ...]mr-github*-1node.yamlnightlyweeklymr-brokenmodel_config.yaml--tensor-model-parallel-size (TP)
--pipeline-model-parallel-size (PP)
--expert-model-parallel-size (EP)
--expert-tensor-parallel-size (ETP)
--context-parallel-size (CP)
--global-batch-size
--micro-batch-sizeworld_size = TP × PP × DPDP ≥ EP| Condition | Action |
|---|---|
| Trivial copy. Config unchanged; DP is halved automatically. |
| Reduce PP. Set |
| Reduce EP. Set |
| Reduce both PP and EP as above. |
| ETP test (ep × etp ≤ TP × DP) | Check |
_1node# Trivial copy
mkdir -p tests/functional_tests/test_cases/{model}/{test_case}_1node
cp tests/functional_tests/test_cases/{model}/{test_case}/model_config.yaml \
tests/functional_tests/test_cases/{model}/{test_case}_1node/model_config.yaml
# Then apply any parallelism changes (EP or PP) with Edit tooltests/test_utils/recipes/gb200/gpt-1node.yamlgpt.yamlnodes: 1type: basic
format_version: 1
maintainers: [mcore]
loggers: [stdout]
spec:
name: "{test_case}_{environment}_{platforms}"
model: gpt # or moe
build: mcore-pyt-{environment}
nodes: 1
gpus: 4
n_repeat: 5
platforms: dgx_gb200
script_setup: | # copy verbatim from gpt.yaml / moe.yaml
...
script: |- # copy verbatim from gpt.yaml / moe.yaml
...moe-1node.yamlscope: [mr-github, mr-github-slim]scope: [mr-github]products:
- test_case: [<test_case>_1node]
products:
- environment: [dev]
scope: [mr-github, mr-github-slim] # or [mr-github]
platforms: [dgx_gb200]| Original (8 GPUs) | 1-node config (4 GPUs) | Notes |
|---|---|---|
| tp1 pp1 ep1 → dp8 | tp1 pp1 ep1 → dp4 | trivial |
| tp2 pp1 ep1 → dp4 | tp2 pp1 ep1 → dp2 | trivial |
| tp1 pp2 ep1 → dp4 | tp1 pp2 ep1 → dp2 | trivial |
| tp4 pp1 ep1 → dp2 | tp4 pp1 ep1 → dp1 | trivial |
| tp1 pp4 ep1 → dp2 | tp1 pp4 ep1 → dp1 | trivial |
| tp1 pp1 ep8 → dp8 | tp1 pp1 ep4 → dp4 | ep 8→4 |
| tp4 pp2 ep2 etp2 → dp1 | tp4 pp1 ep2 etp2 → dp1 | pp 2→1 |
mrgpt.yamlmoe.yaml*-1node.yaml_1node/model_config.yamlnodes: 1, gpus: 4mr-githubmr-github-slimmr-github-slim