earth2studio-discover

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Earth2Studio Discoverability Skill

Earth2Studio 组件检索技能

Purpose

用途

Help users identify the right Earth2Studio models, data sources, and examples for their weather/climate task. Use when: comparing models by GPU/VRAM requirements, choosing forecast class (nowcast, medium-range, seasonal), finding compatible data sources via lexicons, or locating gallery examples for downscaling, ensemble generation, or data assimilation.

帮助用户为其天气/气候任务挑选合适的Earth2Studio模型、数据源及示例。适用于以下场景：按GPU/VRAM需求对比模型、选择预报类型（nowcast、中期预报、季节预报）、通过词汇表查找兼容数据源，或定位用于降尺度、集合生成、数据同化的示例。

Prerequisites

前置条件

Internet access to fetch live documentation pages from nvidia.github.io
Familiarity with Earth2Studio badge system (Class, Region, VRAM, Release)

You are helping a user find the right Earth2Studio components for their use case. Your job is to understand what they want to do, then point them at the models, data sources, and examples that fit — verified against live documentation.

可访问互联网以从nvidia.github.io获取实时文档页面
熟悉Earth2Studio标识系统（Class、Region、VRAM、Release）

您的任务是帮助用户为其用例找到合适的Earth2Studio组件。您需要先理解用户的需求，然后为其推荐匹配的模型、数据源及示例，并通过实时文档进行验证。

Core principle: discover from live docs, don't memorize

核心原则：从实时文档检索，而非记忆

Earth2Studio adds models, data sources, and examples every release. Model classes get new badges, new data sources appear, examples get reorganized. Any static list in this skill will rot.

Rules:

Always fetch the relevant live doc pages before recommending components.
Use badge metadata (Region, Class, VRAM, Release) from the docs to filter candidates.
Verify data-source ↔ model compatibility using the lexicon system (see Step 4).
Cite doc URLs so the user can explore further.

Earth2Studio在每个版本都会新增模型、数据源及示例。模型类别会添加新标识，新数据源会上线，示例也会重新整理。本技能中的任何静态列表都会失效。

规则：

在推荐组件前，务必先获取相关的实时文档页面。
使用文档中的标识元数据（Region、Class、VRAM、Release）筛选候选组件。
通过词汇表系统验证数据源与模型的兼容性（参见步骤4）。
提供文档URL，方便用户进一步探索。

Live doc references

实时文档参考

Fetch these pages as needed (not all at once — only what the user's question requires):

Category	URL
Prognostic models	https://nvidia.github.io/earth2studio/modules/models_px.html
Diagnostic models	https://nvidia.github.io/earth2studio/modules/models_dx.html
Data assimilation	https://nvidia.github.io/earth2studio/modules/models_da.html
Data sources (analysis)	https://nvidia.github.io/earth2studio/modules/datasources_analysis.html
Data sources (forecast)	https://nvidia.github.io/earth2studio/modules/datasources_forecast.html
Data sources (dataframe)	https://nvidia.github.io/earth2studio/modules/datasources_dataframe.html
Examples gallery	https://nvidia.github.io/earth2studio/examples/index.html
Lexicon source	https://github.com/NVIDIA/earth2studio/tree/main/earth2studio/lexicon

根据需求获取以下页面（无需一次性获取全部，仅获取用户问题所需的页面）：

类别	URL
预报模型	https://nvidia.github.io/earth2studio/modules/models_px.html
诊断模型	https://nvidia.github.io/earth2studio/modules/models_dx.html
数据同化	https://nvidia.github.io/earth2studio/modules/models_da.html
数据源（分析类）	https://nvidia.github.io/earth2studio/modules/datasources_analysis.html
数据源（预报类）	https://nvidia.github.io/earth2studio/modules/datasources_forecast.html
数据源（数据框类）	https://nvidia.github.io/earth2studio/modules/datasources_dataframe.html
示例库	https://nvidia.github.io/earth2studio/examples/index.html
词汇表源	https://github.com/NVIDIA/earth2studio/tree/main/earth2studio/lexicon

Interaction protocol

交互流程

Step 1. Understand the user's problem

步骤1. 理解用户需求

Extract from what the user has said (ask follow-ups if needed, cap at 3 questions):

Task type — medium-range forecasting, nowcasting, downscaling/super-resolution, seasonal/subseasonal, data assimilation, climate projection, ensemble generation, derived diagnostics
Region — global, North America, Europe, Asia, specific country/area
Temporal scale — hours ahead (nowcast), days ahead (medium-range), weeks/months (seasonal), climate
Variables of interest — temperature, precipitation, wind, pressure, radiation, specific levels, etc.
Hardware constraints — GPU type, available VRAM (40GB, 48GB, 80GB, 96GB)
Deterministic vs. ensemble — single forecast or probabilistic

Good follow-up phrasing: "Are you looking for a single best-estimate forecast or an ensemble with uncertainty?" — not "what's your use case?"

从用户的描述中提取以下信息（如有需要可跟进提问，最多3个问题）：

任务类型 — 中期预报、临近预报（nowcast）、降尺度/超分辨率、季节/次季节预报、数据同化、气候预估、集合生成、衍生诊断
区域 — 全球、北美、欧洲、亚洲、特定国家/地区
时间尺度 — 数小时后（临近预报）、数天后（中期预报）、数周/数月（季节预报）、气候尺度
关注变量 — 气温、降水、风、气压、辐射、特定高度层等
硬件限制 — GPU类型、可用显存（40GB、48GB、80GB、96GB）
确定性 vs 集合 — 单一预报或概率预报

合适的跟进提问表述：“您需要的是单一最优估计预报还是带不确定性的集合预报？” — 而非 “您的用例是什么？”

Step 2. Fetch relevant model docs

步骤2. 获取相关模型文档

Based on the user's task type, fetch the appropriate model page(s):

Forecasting → prognostic models (px)
Post-processing / downscaling / derived variables → diagnostic models (dx)
Observation integration → data assimilation (da)
Often a workflow chains px → dx, so check both

From the doc pages, extract for each candidate model:

Class badge — NWC, DS, MR, S2S, DA, CM
Region badge — Global, NA, EU, AS, etc.
Rec VRAM badge — minimum GPU memory
Release year — newer models generally supersede older ones in the same class

Filter to models matching the user's task type, region, and hardware. Present a short-list (not the full catalog) with badge metadata.

根据用户的任务类型，获取对应的模型页面：

预报任务 → 预报模型（px）
后处理/降尺度/衍生变量 → 诊断模型（dx）
观测数据融合 → 数据同化（da）
通常工作流会串联px→dx，因此需同时检查两类模型

从文档页面中提取每个候选模型的以下信息：

类别标识 — NWC、DS、MR、S2S、DA、CM
区域标识 — Global、NA、EU、AS等
推荐显存标识 — 最低GPU内存
发布年份 — 同类别中较新的模型通常会替代旧模型

筛选出符合用户任务类型、区域及硬件条件的模型，提供简短列表（而非完整目录）并附带标识元数据。

Step 3. Fetch relevant data source docs

步骤3. 获取相关数据源文档

Based on the user's data needs, fetch the appropriate data source page:

Historical reanalysis → analysis data sources
Real-time or operational → forecast data sources
Observations / station data → dataframe data sources

Note which data sources cover the user's region and variables.

根据用户的数据需求，获取对应的数据源页面：

历史再分析数据 → 分析类数据源
实时或业务化数据 → 预报类数据源
观测/站点数据 → 数据框类数据源

记录哪些数据源覆盖了用户所需的区域和变量。

Step 4. Verify compatibility via lexicon

步骤4. 通过词汇表验证兼容性

This is the key technical step. Earth2Studio models declare their required input variables via

input_coords()

. Data sources expose available variables through their lexicon VOCAB. If a data source's lexicon VOCAB keys contain all variables in a model's

input_coords

(the "variable" dimension), they are compatible.

To verify:

Check the model's doc page or source for its
```
input_coords
```
— specifically the variable list
Check the data source's lexicon file at
```
earth2studio/lexicon/<source>.py
```
for its VOCAB keys
Confirm the data source VOCAB covers all variables the model needs

If checking source code directly (e.g. user has a local clone), the lexicon files are at:

earth2studio/lexicon/gfs.py
earth2studio/lexicon/hrrr.py
earth2studio/lexicon/cds.py
earth2studio/lexicon/arco.py
earth2studio/lexicon/wb2.py
... (one per data source)

Each defines a

VOCAB: dict[str, str | tuple]

mapping Earth2Studio variable names to source-specific identifiers.

Surface compatibility results clearly: "GraphCastOperational needs [list of variables] — GFS and ERA5 (via ARCO/CDS) both provide these, but HRRR does not cover pressure levels above X."

这是关键的技术步骤。Earth2Studio模型通过

input_coords()

声明其所需的输入变量。数据源通过其词汇表VOCAB暴露可用变量。若数据源的词汇表VOCAB键包含模型

input_coords

中的所有变量（即“variable”维度），则二者兼容。

验证方法：

查看模型的文档页面或源代码，获取其
```
input_coords
```
— 特别是变量列表
查看数据源的词汇表文件
```
earth2studio/lexicon/<source>.py
```
中的VOCAB键
确认数据源VOCAB覆盖了模型所需的所有变量

若直接查看源代码（例如用户有本地克隆版本），词汇表文件位于：

earth2studio/lexicon/gfs.py
earth2studio/lexicon/hrrr.py
earth2studio/lexicon/cds.py
earth2studio/lexicon/arco.py
earth2studio/lexicon/wb2.py
... (每个数据源对应一个文件)

每个文件定义了

VOCAB: dict[str, str | tuple]

，将Earth2Studio变量名映射为数据源特定的标识符。

清晰表述兼容性结果：“GraphCastOperational需要[变量列表] — GFS和ERA5（通过ARCO/CDS）均提供这些变量，但HRRR不覆盖X以上的气压层。”

Step 5. Suggest examples

步骤5. 推荐示例

Fetch the examples gallery and identify examples that demonstrate the user's workflow pattern. Examples are organized by category:

```
01_getting_started
```
— basic deterministic, diagnostic, ensemble pipelines
```
02_medium_range
```
— ensemble extension, perturbation, cyclone tracking
```
03_downscaling
```
— CorrDiff, CBottle, ensemble downscaling
```
04_nowcasting
```
— StormCast, StormScope
```
05_data_assimilation
```
— StormCast SDA, HealDA
```
06_seasonal
```
— DLESyM, statistical methods
```
07_misc
```
— distributed inference, IO, custom data, generation
```
08_extend
```
— building custom models, diagnostics, data sources

Point the user at the most relevant 1–3 examples as starting points. Explain what each demonstrates and how it relates to their problem.

获取示例库并识别符合用户工作流模式的示例。示例按类别组织：

```
01_getting_started
```
— 基础确定性、诊断、集合流水线
```
02_medium_range
```
— 集合扩展、扰动、气旋追踪
```
03_downscaling
```
— CorrDiff、CBottle、集合降尺度
```
04_nowcasting
```
— StormCast、StormScope
```
05_data_assimilation
```
— StormCast SDA、HealDA
```
06_seasonal
```
— DLESyM、统计方法
```
07_misc
```
— 分布式推理、IO、自定义数据、生成
```
08_extend
```
— 构建自定义模型、诊断、数据源

为用户推荐最相关的1-3个示例作为起点，说明每个示例的演示内容及其与用户问题的关联。

Step 6. Return recommendations

步骤6. 返回推荐结果

Output structure (omit empty sections):

undefined

输出结构（省略空章节）：

undefined

Your use case

您的用例

[1-2 sentence restatement of what the user wants to do]

[1-2句话重述用户需求]

Recommended models

Model	Class	Region	VRAM	Why
[Short-list with rationale per row]

模型	类别	区域	显存	推荐理由
[简短列表，每行附带理由]

Compatible data sources

兼容数据源

Data Source	Coverage	Compatible with
[Verified via lexicon]

数据源	覆盖范围	兼容模型
[通过词汇表验证的结果]

Relevant examples

Next steps

下一步操作

[What to install, what to read next]


Keep recommendations to 2–4 models maximum. If multiple options exist, explain the tradeoff (accuracy vs. speed, deterministic vs. ensemble, VRAM, etc.) rather than listing everything.

[需安装的内容、后续阅读文档]


推荐模型最多保留2-4个。若存在多个选项，需说明权衡点（精度vs速度、确定性vs集合、显存等），而非列出所有选项。

Limitations

局限性

Recommendations are only as current as the live docs; unreleased models are not discoverable.
Badge metadata may be incomplete for newly added models.
Lexicon compatibility checks require source code access for full accuracy; doc-only checks are approximate.

推荐结果的时效性取决于实时文档；未发布的模型无法检索到。
新增模型的标识元数据可能不完整。
词汇表兼容性检查若仅通过文档进行则为近似结果，需访问源代码才能获得完全准确的结果。

Troubleshooting

故障排查

Error	Cause	Solution
Model page returns 404	URL changed after a release	Check https://nvidia.github.io/earth2studio/ for updated navigation
Lexicon file not found	Data source is new or renamed	Search `earth2studio/lexicon/` directory for current filenames
Badge missing from model	Model docs not yet updated	Fall back to the model's source code `__init__` or README for specs

错误	原因	解决方案
模型页面返回404	版本更新后URL变更	访问https://nvidia.github.io/earth2studio/查看更新后的导航
未找到词汇表文件	数据源为新增或已重命名	在 `earth2studio/lexicon/` 目录中搜索当前文件名
模型缺少标识	模型文档尚未更新	查看模型源代码的 `__init__` 或README获取规格信息

Ownership and out-of-scope

职责范围与超出范围内容

Owns: component discovery, model/data-source compatibility checking, badge-based filtering, example recommendation, hardware-fit assessment.

Does not own: installation (use earth2studio-install skill), writing inference code, model training, custom model development, runtime debugging, PhysicsNeMo model discovery.

负责： 组件检索、模型与数据源兼容性检查、基于标识的筛选、示例推荐、硬件适配评估。

不负责： 安装操作（使用earth2studio-install技能）、编写推理代码、模型训练、自定义模型开发、运行时调试、PhysicsNeMo模型检索。