earth2studio-deterministic-forecast

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Earth2Studio Deterministic Forecast Skill

Earth2Studio确定性预报技能指南

Purpose

目的

Guide users through building deterministic (single-member) weather forecast inference scripts with Earth2Studio. Covers model selection, data source compatibility, IO backend choice, nsteps calculation, and generating a complete script following

earth2studio.run.deterministic

指导用户使用Earth2Studio构建确定性（单成员）天气预报推理脚本。内容涵盖模型选择、数据源兼容性、IO后端选择、nsteps计算，以及遵循

earth2studio.run.deterministic

规范生成完整脚本。

Prerequisites

前提条件

Earth2Studio installed (

pip install earth2studio

uv add earth2studio

)

CUDA-capable GPU with sufficient VRAM for the chosen model
Network access for model weight download and data fetching
Python 3.10+

已安装Earth2Studio（

pip install earth2studio

或

uv add earth2studio

）

具备CUDA能力的GPU，且拥有足够显存以运行所选模型
具备网络访问权限，用于下载模型权重和获取数据
Python 3.10+

Instructions

操作说明

You are helping a user build a deterministic forecast inference script using Earth2Studio. The script follows the structure of

earth2studio.run.deterministic

— a pipeline that takes a prognostic model, fetches initial conditions from a data source, steps the model forward, and writes output to an IO backend.

你将协助用户使用Earth2Studio构建确定性预报推理脚本。脚本遵循

earth2studio.run.deterministic

的结构——这是一个流水线，接收一个预报模型，从数据源获取初始条件，驱动模型向前推演，并将输出写入IO后端。

Core principle: live docs drive every recommendation

核心原则：实时文档驱动所有建议

Model availability, data source APIs, and IO backends change between releases. Before recommending any component, fetch the relevant live doc page to confirm it exists and check its current interface.

Live doc references (fetch only what the current step requires):

Component	URL
Prognostic models	https://nvidia.github.io/earth2studio/modules/models_px.html
Data sources (analysis)	https://nvidia.github.io/earth2studio/modules/datasources_analysis.html
Data sources (forecast)	https://nvidia.github.io/earth2studio/modules/datasources_forecast.html
IO backends	https://nvidia.github.io/earth2studio/modules/io.html
`run.deterministic` source	https://github.com/NVIDIA/earth2studio/blob/main/earth2studio/run.py
Lexicon (variable compat)	https://github.com/NVIDIA/earth2studio/tree/main/earth2studio/lexicon

模型可用性、数据源API和IO后端会随版本更新而变化。在推荐任何组件之前，请获取相关实时文档页面，确认其存在并检查当前接口。

实时文档参考（仅获取当前步骤所需内容）：

组件	链接
预报模型	https://nvidia.github.io/earth2studio/modules/models_px.html
数据源（分析场）	https://nvidia.github.io/earth2studio/modules/datasources_analysis.html
数据源（预报场）	https://nvidia.github.io/earth2studio/modules/datasources_forecast.html
IO后端	https://nvidia.github.io/earth2studio/modules/io.html
`run.deterministic` 源码	https://github.com/NVIDIA/earth2studio/blob/main/earth2studio/run.py
词汇表（变量兼容性）	https://github.com/NVIDIA/earth2studio/tree/main/earth2studio/lexicon

Interaction protocol

交互协议

Step 1. Understand forecast requirements

步骤1. 明确预报需求

Ask the user (cap at 3 questions, skip what's already answered):

Time horizon — how far ahead? Hours (nowcast), days (medium-range), weeks/months (seasonal)?
Variables of interest — what do they want to predict? (temperature, wind, geopotential, precipitation, etc.)
Region — global or regional (e.g. CONUS for HRRR-based models)?
Hardware — what GPU / VRAM do they have? (filters model choices)

询问用户（最多3个问题，跳过已回答的内容）：

时间范围 —— 预报时长？小时级（临近预报）、天级（中期预报）、周/月级（季节预报）？
关注变量 —— 需要预测哪些要素？（气温、风、位势、降水等）
区域 —— 全球还是区域（例如基于HRRR模型的CONUS区域）？
硬件 —— 使用的GPU/显存规格？（用于筛选模型选项）

Step 2. Select prognostic model

步骤2. 选择预报模型

Fetch the prognostic models page. Filter candidates by:

Time horizon → model class badge (NWC, MR, S2S, CM)
Region → region badge (Global, NA, etc.)
VRAM → rec VRAM badge
Variables → check model's
```
input_coords
```
/
```
output_coords
```
against what the user needs

Present 2–4 candidate models with tradeoffs (resolution, speed, accuracy, VRAM). Let the user choose.

Once selected, note the model's:

Required input variables (from
```
input_coords["variable"]
```
)
Time step size (from
```
output_coords["lead_time"]
```
)
These determine
```
nsteps
```
and constrain which data sources work

获取预报模型页面。按以下条件筛选候选模型：

时间范围 → 模型类别标识（NWC、MR、S2S、CM）
区域 → 区域标识（Global、NA等）
显存 → 推荐显存标识
变量 → 检查模型的
```
input_coords
```
/
```
output_coords
```
是否匹配用户需求

提供2-4个候选模型并说明权衡点（分辨率、速度、精度、显存占用），让用户选择。

选定模型后，记录以下信息：

所需输入变量（来自
```
input_coords["variable"]
```
）
时间步长（来自
```
output_coords["lead_time"]
```
）
这些信息将决定
```
nsteps
```
并限制可用的数据源

Step 3. Select data source

步骤3. 选择数据源

The data source must provide the model's required input variables. Fetch the analysis data source page (or forecast source page if comparing against operational forecasts).

Verify compatibility:

Fetch the candidate source's lexicon from
```
earth2studio/lexicon/<source>.py
```
Confirm all variables in the model's
```
input_coords["variable"]
```
exist as keys in the source's VOCAB

Present viable options. Common pairings:

Global models (AIFS, Pangu, GraphCast, SFNO, etc.) → GFS, ARCO, CDS, WB2ERA5, IFS
Regional models (StormCast, HRRR-based) → HRRR
Historical/research runs → ARCO, CDS, WB2ERA5, NCAR_ERA5

Let the user choose. Confirm the initialization time(s) they want to forecast from.

数据源必须提供模型所需的输入变量。获取分析场数据源页面（若需与业务预报对比则获取预报场数据源页面）。

验证兼容性：

从
```
earth2studio/lexicon/<source>.py
```
获取候选数据源的词汇表
确认模型
```
input_coords["variable"]
```
中的所有变量均存在于数据源的VOCAB键中

提供可行选项。常见搭配：

全球模型（AIFS、Pangu、GraphCast、SFNO等）→ GFS、ARCO、CDS、WB2ERA5、IFS
区域模型（StormCast、基于HRRR的模型）→ HRRR
历史/研究运行 → ARCO、CDS、WB2ERA5、NCAR_ERA5

让用户选择，并确认他们想要用于预报的初始化时间。

Step 4. Select IO backend

步骤4. 选择IO后端

Present the available IO backends (fetch the IO page to confirm current list):

Backend	Best for
ZarrBackend	Large outputs, chunked storage, recommended default
AsyncZarrBackend	Same as Zarr but async writes for performance
NetCDF4Backend	Compatibility with legacy tools
XarrayBackend	In-memory, small runs, interactive exploration
KVBackend	Key-value dict, debugging

Recommend ZarrBackend unless the user has a specific reason for another. Ask where they want output saved.

提供可用的IO后端（获取IO页面确认当前列表）：

后端	适用场景
ZarrBackend	大输出量、分块存储，推荐默认选项
AsyncZarrBackend	与Zarr功能相同，但采用异步写入提升性能
NetCDF4Backend	兼容传统工具
XarrayBackend	内存内存储、小型运行、交互式探索
KVBackend	键值字典、调试场景

除非用户有特殊需求，否则推荐ZarrBackend。询问用户输出保存路径。

Step 5. Determine nsteps

步骤5. 计算nsteps

Calculate

nsteps

from:

User's desired forecast horizon (e.g. 5 days)
Model's time step (e.g. 6 hours for most global models)

nsteps = forecast_hours / model_step_hours

Confirm with the user: "For a 5-day forecast with a 6-hour time step, that's 20 steps. Correct?"

通过以下公式计算

nsteps

：

用户期望的预报时长（例如5天）
模型的时间步长（例如大多数全球模型为6小时）

nsteps = 预报小时数 / 模型步长小时数

与用户确认：“对于6小时步长的5天预报，nsteps为20，是否正确？”

Step 6. Generate the inference script

步骤6. 生成推理脚本

Write a complete Python script following the

earth2studio.run.deterministic

pattern. The script structure:

python

import datetime
from collections import OrderedDict

import numpy as np
import torch

from earth2studio.models.px import <ModelClass>
from earth2studio.data import <DataSourceClass>
from earth2studio.io import <IOBackendClass>
from earth2studio.run import deterministic

编写符合

earth2studio.run.deterministic

模式的完整Python脚本。脚本结构如下：

python

import datetime
from collections import OrderedDict

import numpy as np
import torch

from earth2studio.models.px import <ModelClass>
from earth2studio.data import <DataSourceClass>
from earth2studio.io import <IOBackendClass>
from earth2studio.run import deterministic

1. Initialize model

1. 初始化模型

model = <ModelClass>.load_model(<ModelClass>.load_default_package())

2. Initialize data source

2. 初始化数据源

data = <DataSourceClass>()

3. Initialize IO backend

3. 初始化IO后端

io = <IOBackendClass>("<output_path>")

4. (Optional) Subselect output variables/coords

4. （可选）筛选输出变量/坐标

output_coords = OrderedDict({ "variable": np.array(["t2m", "u10m", ...]), # only save these })

output_coords = OrderedDict({ "variable": np.array(["t2m", "u10m", ...]), # 仅保存这些变量 })

5. Run deterministic forecast

5. 运行确定性预报

io = deterministic( time=["YYYY-MM-DDTHH:MM:SS"], nsteps=<N>, prognostic=model, data=data, io=io, output_coords=output_coords, # optional device=torch.device("cuda"), )

io = deterministic( time=["YYYY-MM-DDTHH:MM:SS"], nsteps=<N>, prognostic=model, data=data, io=io, output_coords=output_coords, # 可选 device=torch.device("cuda"), )

6. Post-run: inspect results

6. 运行后：查看结果

print("Forecast complete. Output at: <output_path>")


**Before writing the script**, fetch the specific model's doc page
to confirm:

- The correct class import path
- How to load the model (`load_model` + `load_default_package()`
  is the standard pattern but verify)
- Any model-specific constructor arguments

Also fetch the data source's doc page to confirm constructor arguments
(some need cache paths, tokens, etc.).

print("Forecast complete. Output at: <output_path>")


**编写脚本前**，获取特定模型的文档页面以确认：

- 正确的类导入路径
- 模型加载方式（`load_model` + `load_default_package()`是标准模式，但需验证）
- 任何模型特定的构造函数参数

同时获取数据源的文档页面以确认构造函数参数（部分需要缓存路径、令牌等）。

Step 7. Explain the script and next steps

步骤7. 解释脚本与后续步骤

After delivering the script, explain:

How to change the forecast time (just edit the
```
time
```
list)
How to run multiple initializations (add more entries to
```
time
```
)
How to subset output variables via
```
output_coords
```
Where the output is saved and how to read it back (e.g.
```
xr.open_zarr(...)
```
)
If they want to add diagnostics on top, point them to the
```
diagnostic
```
workflow pattern

交付脚本后，解释：

如何修改预报时间（只需编辑
```
time
```
列表）
如何运行多个初始化（在
```
time
```
列表中添加更多条目）
如何通过
```
output_coords
```
筛选输出变量
输出保存位置及读取方式（例如
```
xr.open_zarr(...)
```
）
如果需要添加诊断分析，引导用户查看
```
diagnostic
```
工作流模式

Ownership and out-of-scope

职责范围与超出范围内容

Owns: prognostic model selection for deterministic forecasts, data source compatibility verification, IO backend selection, nsteps calculation, generating the complete inference script following

earth2studio.run.deterministic

structure.

Does not own: ensemble workflows, diagnostic model chaining, data-only fetch (earth2studio-data-fetch), installation (earth2studio-install), model training or fine-tuning, custom model development.

负责： 确定性预报的预报模型选择、数据源兼容性验证、IO后端选择、nsteps计算、生成符合

earth2studio.run.deterministic

结构的完整推理脚本。

不负责： 集合预报工作流、诊断模型链、仅数据获取（earth2studio-data-fetch）、安装操作（earth2studio-install）、模型训练或微调、自定义模型开发。

Examples

示例

Typical invocation:

"Run a 5-day global forecast with Pangu-Weather starting from today's GFS analysis, saving output to Zarr."

The skill walks through Steps 1-7: confirms requirements, selects Pangu24, pairs with GFS data source, picks ZarrBackend, calculates nsteps=5 (24h steps), generates the script, and explains how to inspect results.

典型调用场景：

“从今日GFS分析场出发，运行Pangu-Weather的5天全球预报，将输出保存为Zarr格式。”

该技能会引导完成步骤1-7：确认需求、选择Pangu24模型、搭配GFS数据源、选择ZarrBackend、计算nsteps=5（24小时步长）、生成脚本，并解释如何查看结果。

Limitations

局限性

Only deterministic (single-member) forecasts; use ensemble workflow for probabilistic runs
Cannot train or fine-tune models — inference only
Model weights require first-time download (several GB depending on model)
Regional models (e.g. StormCast) require matching regional data sources
GPU required; CPU-only inference is not supported for most models

仅支持确定性（单成员）预报；概率预报请使用集合预报工作流
无法训练或微调模型——仅支持推理
模型权重首次使用时需要下载（根据模型不同，大小可达数GB）
区域模型（如StormCast）需要匹配对应的区域数据源
需要GPU支持；大多数模型不支持仅CPU推理

Troubleshooting

故障排查

Error	Cause	Solution
`KeyError` on variable	Lexicon missing variable	Check compat; pick different source
`OutOfMemoryError`	VRAM exceeded	Use smaller model or free cache
`FileNotFoundError` package	Weights not cached	Call `load_default_package()` first
`TimeoutError` data fetch	API slow/unreachable	Retry or use cached source
`ValueError: nsteps`	Horizon < model step	Increase horizon or finer model

错误	原因	解决方案
`KeyError` on variable	词汇表缺少对应变量	检查兼容性；更换其他数据源
`OutOfMemoryError`	显存不足	使用更小的模型或释放缓存
`FileNotFoundError` package	权重未缓存	先调用 `load_default_package()`
`TimeoutError` data fetch	API访问缓慢/不可达	重试或使用缓存数据源
`ValueError: nsteps`	预报时长小于模型步长	增加预报时长或选择步长更精细的模型