earth2studio-deterministic-forecast

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Earth2Studio Deterministic Forecast Skill

Earth2Studio确定性预报技能指南

Purpose

目的

Guide users through building deterministic (single-member) weather forecast inference scripts with Earth2Studio. Covers model selection, data source compatibility, IO backend choice, nsteps calculation, and generating a complete script following
earth2studio.run.deterministic
.
指导用户使用Earth2Studio构建确定性(单成员)天气预报推理脚本。内容涵盖模型选择、数据源兼容性、IO后端选择、nsteps计算,以及遵循
earth2studio.run.deterministic
规范生成完整脚本。

Prerequisites

前提条件

  • Earth2Studio installed (
    pip install earth2studio
    or
    uv add earth2studio
    )
  • CUDA-capable GPU with sufficient VRAM for the chosen model
  • Network access for model weight download and data fetching
  • Python 3.10+
  • 已安装Earth2Studio(
    pip install earth2studio
    uv add earth2studio
  • 具备CUDA能力的GPU,且拥有足够显存以运行所选模型
  • 具备网络访问权限,用于下载模型权重和获取数据
  • Python 3.10+

Instructions

操作说明

You are helping a user build a deterministic forecast inference script using Earth2Studio. The script follows the structure of
earth2studio.run.deterministic
— a pipeline that takes a prognostic model, fetches initial conditions from a data source, steps the model forward, and writes output to an IO backend.
你将协助用户使用Earth2Studio构建确定性预报推理脚本。脚本遵循
earth2studio.run.deterministic
的结构——这是一个流水线,接收一个预报模型,从数据源获取初始条件,驱动模型向前推演,并将输出写入IO后端。

Core principle: live docs drive every recommendation

核心原则:实时文档驱动所有建议

Model availability, data source APIs, and IO backends change between releases. Before recommending any component, fetch the relevant live doc page to confirm it exists and check its current interface.
Live doc references (fetch only what the current step requires):
模型可用性、数据源API和IO后端会随版本更新而变化。在推荐任何组件之前,请获取相关实时文档页面,确认其存在并检查当前接口。
实时文档参考(仅获取当前步骤所需内容):

Interaction protocol

交互协议

Step 1. Understand forecast requirements

步骤1. 明确预报需求

Ask the user (cap at 3 questions, skip what's already answered):
  1. Time horizon — how far ahead? Hours (nowcast), days (medium-range), weeks/months (seasonal)?
  2. Variables of interest — what do they want to predict? (temperature, wind, geopotential, precipitation, etc.)
  3. Region — global or regional (e.g. CONUS for HRRR-based models)?
  4. Hardware — what GPU / VRAM do they have? (filters model choices)
询问用户(最多3个问题,跳过已回答的内容):
  1. 时间范围 —— 预报时长?小时级(临近预报)、天级(中期预报)、周/月级(季节预报)?
  2. 关注变量 —— 需要预测哪些要素?(气温、风、位势、降水等)
  3. 区域 —— 全球还是区域(例如基于HRRR模型的CONUS区域)?
  4. 硬件 —— 使用的GPU/显存规格?(用于筛选模型选项)

Step 2. Select prognostic model

步骤2. 选择预报模型

Fetch the prognostic models page. Filter candidates by:
  • Time horizon → model class badge (NWC, MR, S2S, CM)
  • Region → region badge (Global, NA, etc.)
  • VRAM → rec VRAM badge
  • Variables → check model's
    input_coords
    /
    output_coords
    against what the user needs
Present 2–4 candidate models with tradeoffs (resolution, speed, accuracy, VRAM). Let the user choose.
Once selected, note the model's:
  • Required input variables (from
    input_coords["variable"]
    )
  • Time step size (from
    output_coords["lead_time"]
    )
  • These determine
    nsteps
    and constrain which data sources work
获取预报模型页面。按以下条件筛选候选模型:
  • 时间范围 → 模型类别标识(NWC、MR、S2S、CM)
  • 区域 → 区域标识(Global、NA等)
  • 显存 → 推荐显存标识
  • 变量 → 检查模型的
    input_coords
    /
    output_coords
    是否匹配用户需求
提供2-4个候选模型并说明权衡点(分辨率、速度、精度、显存占用),让用户选择。
选定模型后,记录以下信息:
  • 所需输入变量(来自
    input_coords["variable"]
  • 时间步长(来自
    output_coords["lead_time"]
  • 这些信息将决定
    nsteps
    并限制可用的数据源

Step 3. Select data source

步骤3. 选择数据源

The data source must provide the model's required input variables. Fetch the analysis data source page (or forecast source page if comparing against operational forecasts).
Verify compatibility:
  1. Fetch the candidate source's lexicon from
    earth2studio/lexicon/<source>.py
  2. Confirm all variables in the model's
    input_coords["variable"]
    exist as keys in the source's VOCAB
Present viable options. Common pairings:
  • Global models (AIFS, Pangu, GraphCast, SFNO, etc.) → GFS, ARCO, CDS, WB2ERA5, IFS
  • Regional models (StormCast, HRRR-based) → HRRR
  • Historical/research runs → ARCO, CDS, WB2ERA5, NCAR_ERA5
Let the user choose. Confirm the initialization time(s) they want to forecast from.
数据源必须提供模型所需的输入变量。获取分析场数据源页面(若需与业务预报对比则获取预报场数据源页面)。
验证兼容性:
  1. earth2studio/lexicon/<source>.py
    获取候选数据源的词汇表
  2. 确认模型
    input_coords["variable"]
    中的所有变量均存在于数据源的VOCAB键中
提供可行选项。常见搭配:
  • 全球模型(AIFS、Pangu、GraphCast、SFNO等)→ GFS、ARCO、CDS、WB2ERA5、IFS
  • 区域模型(StormCast、基于HRRR的模型)→ HRRR
  • 历史/研究运行 → ARCO、CDS、WB2ERA5、NCAR_ERA5
让用户选择,并确认他们想要用于预报的初始化时间。

Step 4. Select IO backend

步骤4. 选择IO后端

Present the available IO backends (fetch the IO page to confirm current list):
BackendBest for
ZarrBackendLarge outputs, chunked storage, recommended default
AsyncZarrBackendSame as Zarr but async writes for performance
NetCDF4BackendCompatibility with legacy tools
XarrayBackendIn-memory, small runs, interactive exploration
KVBackendKey-value dict, debugging
Recommend ZarrBackend unless the user has a specific reason for another. Ask where they want output saved.
提供可用的IO后端(获取IO页面确认当前列表):
后端适用场景
ZarrBackend大输出量、分块存储,推荐默认选项
AsyncZarrBackend与Zarr功能相同,但采用异步写入提升性能
NetCDF4Backend兼容传统工具
XarrayBackend内存内存储、小型运行、交互式探索
KVBackend键值字典、调试场景
除非用户有特殊需求,否则推荐ZarrBackend。询问用户输出保存路径。

Step 5. Determine nsteps

步骤5. 计算nsteps

Calculate
nsteps
from:
  • User's desired forecast horizon (e.g. 5 days)
  • Model's time step (e.g. 6 hours for most global models)
  • nsteps = forecast_hours / model_step_hours
Confirm with the user: "For a 5-day forecast with a 6-hour time step, that's 20 steps. Correct?"
通过以下公式计算
nsteps
  • 用户期望的预报时长(例如5天)
  • 模型的时间步长(例如大多数全球模型为6小时)
  • nsteps = 预报小时数 / 模型步长小时数
与用户确认:“对于6小时步长的5天预报,nsteps为20,是否正确?”

Step 6. Generate the inference script

步骤6. 生成推理脚本

Write a complete Python script following the
earth2studio.run.deterministic
pattern. The script structure:
python
import datetime
from collections import OrderedDict

import numpy as np
import torch

from earth2studio.models.px import <ModelClass>
from earth2studio.data import <DataSourceClass>
from earth2studio.io import <IOBackendClass>
from earth2studio.run import deterministic
编写符合
earth2studio.run.deterministic
模式的完整Python脚本。脚本结构如下:
python
import datetime
from collections import OrderedDict

import numpy as np
import torch

from earth2studio.models.px import <ModelClass>
from earth2studio.data import <DataSourceClass>
from earth2studio.io import <IOBackendClass>
from earth2studio.run import deterministic

1. Initialize model

1. 初始化模型

model = <ModelClass>.load_model(<ModelClass>.load_default_package())
model = <ModelClass>.load_model(<ModelClass>.load_default_package())

2. Initialize data source

2. 初始化数据源

data = <DataSourceClass>()
data = <DataSourceClass>()

3. Initialize IO backend

3. 初始化IO后端

io = <IOBackendClass>("<output_path>")
io = <IOBackendClass>("<output_path>")

4. (Optional) Subselect output variables/coords

4. (可选)筛选输出变量/坐标

output_coords = OrderedDict({ "variable": np.array(["t2m", "u10m", ...]), # only save these })
output_coords = OrderedDict({ "variable": np.array(["t2m", "u10m", ...]), # 仅保存这些变量 })

5. Run deterministic forecast

5. 运行确定性预报

io = deterministic( time=["YYYY-MM-DDTHH:MM:SS"], nsteps=<N>, prognostic=model, data=data, io=io, output_coords=output_coords, # optional device=torch.device("cuda"), )
io = deterministic( time=["YYYY-MM-DDTHH:MM:SS"], nsteps=<N>, prognostic=model, data=data, io=io, output_coords=output_coords, # 可选 device=torch.device("cuda"), )

6. Post-run: inspect results

6. 运行后:查看结果

print("Forecast complete. Output at: <output_path>")

**Before writing the script**, fetch the specific model's doc page
to confirm:

- The correct class import path
- How to load the model (`load_model` + `load_default_package()`
  is the standard pattern but verify)
- Any model-specific constructor arguments

Also fetch the data source's doc page to confirm constructor arguments
(some need cache paths, tokens, etc.).
print("Forecast complete. Output at: <output_path>")

**编写脚本前**,获取特定模型的文档页面以确认:

- 正确的类导入路径
- 模型加载方式(`load_model` + `load_default_package()`是标准模式,但需验证)
- 任何模型特定的构造函数参数

同时获取数据源的文档页面以确认构造函数参数(部分需要缓存路径、令牌等)。

Step 7. Explain the script and next steps

步骤7. 解释脚本与后续步骤

After delivering the script, explain:
  • How to change the forecast time (just edit the
    time
    list)
  • How to run multiple initializations (add more entries to
    time
    )
  • How to subset output variables via
    output_coords
  • Where the output is saved and how to read it back (e.g.
    xr.open_zarr(...)
    )
  • If they want to add diagnostics on top, point them to the
    diagnostic
    workflow pattern
交付脚本后,解释:
  • 如何修改预报时间(只需编辑
    time
    列表)
  • 如何运行多个初始化(在
    time
    列表中添加更多条目)
  • 如何通过
    output_coords
    筛选输出变量
  • 输出保存位置及读取方式(例如
    xr.open_zarr(...)
  • 如果需要添加诊断分析,引导用户查看
    diagnostic
    工作流模式

Ownership and out-of-scope

职责范围与超出范围内容

Owns: prognostic model selection for deterministic forecasts, data source compatibility verification, IO backend selection, nsteps calculation, generating the complete inference script following
earth2studio.run.deterministic
structure.
Does not own: ensemble workflows, diagnostic model chaining, data-only fetch (earth2studio-data-fetch), installation (earth2studio-install), model training or fine-tuning, custom model development.
负责: 确定性预报的预报模型选择、数据源兼容性验证、IO后端选择、nsteps计算、生成符合
earth2studio.run.deterministic
结构的完整推理脚本。
不负责: 集合预报工作流、诊断模型链、仅数据获取(earth2studio-data-fetch)、安装操作(earth2studio-install)、模型训练或微调、自定义模型开发。

Examples

示例

Typical invocation:
"Run a 5-day global forecast with Pangu-Weather starting from today's GFS analysis, saving output to Zarr."
The skill walks through Steps 1-7: confirms requirements, selects Pangu24, pairs with GFS data source, picks ZarrBackend, calculates nsteps=5 (24h steps), generates the script, and explains how to inspect results.
典型调用场景:
“从今日GFS分析场出发,运行Pangu-Weather的5天全球预报,将输出保存为Zarr格式。”
该技能会引导完成步骤1-7:确认需求、选择Pangu24模型、搭配GFS数据源、选择ZarrBackend、计算nsteps=5(24小时步长)、生成脚本,并解释如何查看结果。

Limitations

局限性

  • Only deterministic (single-member) forecasts; use ensemble workflow for probabilistic runs
  • Cannot train or fine-tune models — inference only
  • Model weights require first-time download (several GB depending on model)
  • Regional models (e.g. StormCast) require matching regional data sources
  • GPU required; CPU-only inference is not supported for most models
  • 仅支持确定性(单成员)预报;概率预报请使用集合预报工作流
  • 无法训练或微调模型——仅支持推理
  • 模型权重首次使用时需要下载(根据模型不同,大小可达数GB)
  • 区域模型(如StormCast)需要匹配对应的区域数据源
  • 需要GPU支持;大多数模型不支持仅CPU推理

Troubleshooting

故障排查

ErrorCauseSolution
KeyError
on variable
Lexicon missing variableCheck compat; pick different source
OutOfMemoryError
VRAM exceededUse smaller model or free cache
FileNotFoundError
package
Weights not cachedCall
load_default_package()
first
TimeoutError
data fetch
API slow/unreachableRetry or use cached source
ValueError: nsteps
Horizon < model stepIncrease horizon or finer model
错误原因解决方案
KeyError
on variable
词汇表缺少对应变量检查兼容性;更换其他数据源
OutOfMemoryError
显存不足使用更小的模型或释放缓存
FileNotFoundError
package
权重未缓存先调用
load_default_package()
TimeoutError
data fetch
API访问缓慢/不可达重试或使用缓存数据源
ValueError: nsteps
预报时长小于模型步长增加预报时长或选择步长更精细的模型