Loading...
Loading...
Testing reference for Megatron Bridge — unit and functional test layout, tier semantics (L0/L1/L2/flaky), script conventions, running tests locally, adding/moving/disabling tests, and pytest conventions.
npx skill4agent add nvidia/skills testingtests/
unit_tests/ # fast, isolated, no GPU required
functional_tests/
launch_scripts/
h100/
active/ # H100 tests that run in CI automatically
flaky/ # H100 tests quarantined from blocking CI
gb200/
active/ # GB200 tests that run in CI automatically
flaky/ # GB200 tests quarantined from blocking CI{Tier}_{Description}.shL0_Launch_training.sh| Tier | Trigger | Blocking |
|---|---|---|
| L0 | Every PR, every push to | Yes — PR cannot merge if L0 fails |
| L1 | Push to | Yes |
| L2 | Schedule, | Yes (when triggered) |
| flaky | | No — failures are informational |
flaky/uv run pytest tests/unit_tests/ -x -vdocker run --rm --gpus all -v $(pwd):/workdir/ -w /workdir/ megatron-bridge \
uv run pytest tests/unit_tests/bash tests/functional_tests/launch_scripts/h100/active/L0_Launch_training.shtests/unit_tests/<domain>/test_<name>.py@pytest.mark.unituv run python -m pytest tests/unit_tests/<your_test>.pysetattrsetattr(config_obj, key, value)if not hasattr(config_obj, key):
raise ValueError(f"Config has no field '{key}'")
setattr(config_obj, key, value)tests/functional_tests/launch_scripts/{h100,gb200}/active/# CI_TIMEOUT=<minutes>{Tier}_{CamelDescription}.shchmod +x <file># H100
git mv tests/functional_tests/launch_scripts/h100/active/L0_Foo.sh \
tests/functional_tests/launch_scripts/h100/flaky/L0_Foo.sh
# GB200 (if the test also exists there)
git mv tests/functional_tests/launch_scripts/gb200/active/L0_Foo.sh \
tests/functional_tests/launch_scripts/gb200/flaky/L0_Foo.shtest_suite=allactive/unitintegrationsystemacceptancedocsskipduringcipleasefixmeCUDA_VISIBLE_DEVICESuv run python -m pytestpytest| GitHub Actions job | Hardware | Directory scanned |
|---|---|---|
| H100 | |
| H100 | |
| H100 | |
| H100 | |
| GB200 | |
| GB200 | |
| GB200 | |
| GB200 | |
nemo-ci-{azure,aws}-gpu-x2nemo-ci-gcp-gpu-x2| Component | Path |
|---|---|
| Matrix generation (H100) | @.github/workflows/cicd-main.yml job |
| Matrix generation (GB200) | @.github/workflows/cicd-main.yml job |
| Test runner action | @.github/actions/test-template/action.yml |
| Launch scripts root | |