earth2studio-discover

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Earth2Studio Discoverability Skill

Earth2Studio 组件检索技能

Purpose

用途

Help users identify the right Earth2Studio models, data sources, and examples for their weather/climate task. Use when: comparing models by GPU/VRAM requirements, choosing forecast class (nowcast, medium-range, seasonal), finding compatible data sources via lexicons, or locating gallery examples for downscaling, ensemble generation, or data assimilation.
帮助用户为其天气/气候任务挑选合适的Earth2Studio模型、数据源及示例。适用于以下场景:按GPU/VRAM需求对比模型、选择预报类型(nowcast、中期预报、季节预报)、通过词汇表查找兼容数据源,或定位用于降尺度、集合生成、数据同化的示例。

Prerequisites

前置条件

  • Internet access to fetch live documentation pages from nvidia.github.io
  • Familiarity with Earth2Studio badge system (Class, Region, VRAM, Release)
You are helping a user find the right Earth2Studio components for their use case. Your job is to understand what they want to do, then point them at the models, data sources, and examples that fit — verified against live documentation.
  • 可访问互联网以从nvidia.github.io获取实时文档页面
  • 熟悉Earth2Studio标识系统(Class、Region、VRAM、Release)
您的任务是帮助用户为其用例找到合适的Earth2Studio组件。您需要先理解用户的需求,然后为其推荐匹配的模型、数据源及示例,并通过实时文档进行验证。

Core principle: discover from live docs, don't memorize

核心原则:从实时文档检索,而非记忆

Earth2Studio adds models, data sources, and examples every release. Model classes get new badges, new data sources appear, examples get reorganized. Any static list in this skill will rot.
Rules:
  1. Always fetch the relevant live doc pages before recommending components.
  2. Use badge metadata (Region, Class, VRAM, Release) from the docs to filter candidates.
  3. Verify data-source ↔ model compatibility using the lexicon system (see Step 4).
  4. Cite doc URLs so the user can explore further.
Earth2Studio在每个版本都会新增模型、数据源及示例。模型类别会添加新标识,新数据源会上线,示例也会重新整理。本技能中的任何静态列表都会失效。
规则:
  1. 在推荐组件前,务必先获取相关的实时文档页面。
  2. 使用文档中的标识元数据(Region、Class、VRAM、Release)筛选候选组件。
  3. 通过词汇表系统验证数据源与模型的兼容性(参见步骤4)。
  4. 提供文档URL,方便用户进一步探索。

Live doc references

实时文档参考

Interaction protocol

交互流程

Step 1. Understand the user's problem

步骤1. 理解用户需求

Extract from what the user has said (ask follow-ups if needed, cap at 3 questions):
  • Task type — medium-range forecasting, nowcasting, downscaling/super-resolution, seasonal/subseasonal, data assimilation, climate projection, ensemble generation, derived diagnostics
  • Region — global, North America, Europe, Asia, specific country/area
  • Temporal scale — hours ahead (nowcast), days ahead (medium-range), weeks/months (seasonal), climate
  • Variables of interest — temperature, precipitation, wind, pressure, radiation, specific levels, etc.
  • Hardware constraints — GPU type, available VRAM (40GB, 48GB, 80GB, 96GB)
  • Deterministic vs. ensemble — single forecast or probabilistic
Good follow-up phrasing: "Are you looking for a single best-estimate forecast or an ensemble with uncertainty?" — not "what's your use case?"
从用户的描述中提取以下信息(如有需要可跟进提问,最多3个问题):
  • 任务类型 — 中期预报、临近预报(nowcast)、降尺度/超分辨率、季节/次季节预报、数据同化、气候预估、集合生成、衍生诊断
  • 区域 — 全球、北美、欧洲、亚洲、特定国家/地区
  • 时间尺度 — 数小时后(临近预报)、数天后(中期预报)、数周/数月(季节预报)、气候尺度
  • 关注变量 — 气温、降水、风、气压、辐射、特定高度层等
  • 硬件限制 — GPU类型、可用显存(40GB、48GB、80GB、96GB)
  • 确定性 vs 集合 — 单一预报或概率预报
合适的跟进提问表述:“您需要的是单一最优估计预报还是带不确定性的集合预报?” — 而非 “您的用例是什么?”

Step 2. Fetch relevant model docs

步骤2. 获取相关模型文档

Based on the user's task type, fetch the appropriate model page(s):
  • Forecasting → prognostic models (px)
  • Post-processing / downscaling / derived variables → diagnostic models (dx)
  • Observation integration → data assimilation (da)
  • Often a workflow chains px → dx, so check both
From the doc pages, extract for each candidate model:
  • Class badge — NWC, DS, MR, S2S, DA, CM
  • Region badge — Global, NA, EU, AS, etc.
  • Rec VRAM badge — minimum GPU memory
  • Release year — newer models generally supersede older ones in the same class
Filter to models matching the user's task type, region, and hardware. Present a short-list (not the full catalog) with badge metadata.
根据用户的任务类型,获取对应的模型页面:
  • 预报任务 → 预报模型(px)
  • 后处理/降尺度/衍生变量 → 诊断模型(dx)
  • 观测数据融合 → 数据同化(da)
  • 通常工作流会串联px→dx,因此需同时检查两类模型
从文档页面中提取每个候选模型的以下信息:
  • 类别标识 — NWC、DS、MR、S2S、DA、CM
  • 区域标识 — Global、NA、EU、AS等
  • 推荐显存标识 — 最低GPU内存
  • 发布年份 — 同类别中较新的模型通常会替代旧模型
筛选出符合用户任务类型、区域及硬件条件的模型,提供简短列表(而非完整目录)并附带标识元数据。

Step 3. Fetch relevant data source docs

步骤3. 获取相关数据源文档

Based on the user's data needs, fetch the appropriate data source page:
  • Historical reanalysis → analysis data sources
  • Real-time or operational → forecast data sources
  • Observations / station data → dataframe data sources
Note which data sources cover the user's region and variables.
根据用户的数据需求,获取对应的数据源页面:
  • 历史再分析数据 → 分析类数据源
  • 实时或业务化数据 → 预报类数据源
  • 观测/站点数据 → 数据框类数据源
记录哪些数据源覆盖了用户所需的区域和变量。

Step 4. Verify compatibility via lexicon

步骤4. 通过词汇表验证兼容性

This is the key technical step. Earth2Studio models declare their required input variables via
input_coords()
. Data sources expose available variables through their lexicon VOCAB. If a data source's lexicon VOCAB keys contain all variables in a model's
input_coords
(the "variable" dimension), they are compatible.
To verify:
  1. Check the model's doc page or source for its
    input_coords
    — specifically the variable list
  2. Check the data source's lexicon file at
    earth2studio/lexicon/<source>.py
    for its VOCAB keys
  3. Confirm the data source VOCAB covers all variables the model needs
If checking source code directly (e.g. user has a local clone), the lexicon files are at:
earth2studio/lexicon/gfs.py
earth2studio/lexicon/hrrr.py
earth2studio/lexicon/cds.py
earth2studio/lexicon/arco.py
earth2studio/lexicon/wb2.py
... (one per data source)
Each defines a
VOCAB: dict[str, str | tuple]
mapping Earth2Studio variable names to source-specific identifiers.
Surface compatibility results clearly: "GraphCastOperational needs [list of variables] — GFS and ERA5 (via ARCO/CDS) both provide these, but HRRR does not cover pressure levels above X."
这是关键的技术步骤。Earth2Studio模型通过
input_coords()
声明其所需的输入变量。数据源通过其词汇表VOCAB暴露可用变量。若数据源的词汇表VOCAB键包含模型
input_coords
中的所有变量(即“variable”维度),则二者兼容。
验证方法:
  1. 查看模型的文档页面或源代码,获取其
    input_coords
    — 特别是变量列表
  2. 查看数据源的词汇表文件
    earth2studio/lexicon/<source>.py
    中的VOCAB键
  3. 确认数据源VOCAB覆盖了模型所需的所有变量
若直接查看源代码(例如用户有本地克隆版本),词汇表文件位于:
earth2studio/lexicon/gfs.py
earth2studio/lexicon/hrrr.py
earth2studio/lexicon/cds.py
earth2studio/lexicon/arco.py
earth2studio/lexicon/wb2.py
... (每个数据源对应一个文件)
每个文件定义了
VOCAB: dict[str, str | tuple]
,将Earth2Studio变量名映射为数据源特定的标识符。
清晰表述兼容性结果:“GraphCastOperational需要[变量列表] — GFS和ERA5(通过ARCO/CDS)均提供这些变量,但HRRR不覆盖X以上的气压层。”

Step 5. Suggest examples

步骤5. 推荐示例

Fetch the examples gallery and identify examples that demonstrate the user's workflow pattern. Examples are organized by category:
  • 01_getting_started
    — basic deterministic, diagnostic, ensemble pipelines
  • 02_medium_range
    — ensemble extension, perturbation, cyclone tracking
  • 03_downscaling
    — CorrDiff, CBottle, ensemble downscaling
  • 04_nowcasting
    — StormCast, StormScope
  • 05_data_assimilation
    — StormCast SDA, HealDA
  • 06_seasonal
    — DLESyM, statistical methods
  • 07_misc
    — distributed inference, IO, custom data, generation
  • 08_extend
    — building custom models, diagnostics, data sources
Point the user at the most relevant 1–3 examples as starting points. Explain what each demonstrates and how it relates to their problem.
获取示例库并识别符合用户工作流模式的示例。示例按类别组织:
  • 01_getting_started
    — 基础确定性、诊断、集合流水线
  • 02_medium_range
    — 集合扩展、扰动、气旋追踪
  • 03_downscaling
    — CorrDiff、CBottle、集合降尺度
  • 04_nowcasting
    — StormCast、StormScope
  • 05_data_assimilation
    — StormCast SDA、HealDA
  • 06_seasonal
    — DLESyM、统计方法
  • 07_misc
    — 分布式推理、IO、自定义数据、生成
  • 08_extend
    — 构建自定义模型、诊断、数据源
为用户推荐最相关的1-3个示例作为起点,说明每个示例的演示内容及其与用户问题的关联。

Step 6. Return recommendations

步骤6. 返回推荐结果

Output structure (omit empty sections):
undefined
输出结构(省略空章节):
undefined

Your use case

您的用例

[1-2 sentence restatement of what the user wants to do]
[1-2句话重述用户需求]

Recommended models

推荐模型

ModelClassRegionVRAMWhy
[Short-list with rationale per row]
模型类别区域显存推荐理由
[简短列表,每行附带理由]

Compatible data sources

兼容数据源

Data SourceCoverageCompatible with
[Verified via lexicon]
数据源覆盖范围兼容模型
[通过词汇表验证的结果]

Relevant examples

相关示例

  • Example name — what it demonstrates
  • 示例名称 — 演示内容

Next steps

下一步操作

[What to install, what to read next]

Keep recommendations to 2–4 models maximum. If multiple options exist, explain the tradeoff (accuracy vs. speed, deterministic vs. ensemble, VRAM, etc.) rather than listing everything.
[需安装的内容、后续阅读文档]

推荐模型最多保留2-4个。若存在多个选项,需说明权衡点(精度vs速度、确定性vs集合、显存等),而非列出所有选项。

Limitations

局限性

  • Recommendations are only as current as the live docs; unreleased models are not discoverable.
  • Badge metadata may be incomplete for newly added models.
  • Lexicon compatibility checks require source code access for full accuracy; doc-only checks are approximate.
  • 推荐结果的时效性取决于实时文档;未发布的模型无法检索到。
  • 新增模型的标识元数据可能不完整。
  • 词汇表兼容性检查若仅通过文档进行则为近似结果,需访问源代码才能获得完全准确的结果。

Troubleshooting

故障排查

ErrorCauseSolution
Model page returns 404URL changed after a releaseCheck https://nvidia.github.io/earth2studio/ for updated navigation
Lexicon file not foundData source is new or renamedSearch
earth2studio/lexicon/
directory for current filenames
Badge missing from modelModel docs not yet updatedFall back to the model's source code
__init__
or README for specs
错误原因解决方案
模型页面返回404版本更新后URL变更访问https://nvidia.github.io/earth2studio/查看更新后的导航
未找到词汇表文件数据源为新增或已重命名
earth2studio/lexicon/
目录中搜索当前文件名
模型缺少标识模型文档尚未更新查看模型源代码的
__init__
或README获取规格信息

Ownership and out-of-scope

职责范围与超出范围内容

Owns: component discovery, model/data-source compatibility checking, badge-based filtering, example recommendation, hardware-fit assessment.
Does not own: installation (use earth2studio-install skill), writing inference code, model training, custom model development, runtime debugging, PhysicsNeMo model discovery.
负责: 组件检索、模型与数据源兼容性检查、基于标识的筛选、示例推荐、硬件适配评估。
不负责: 安装操作(使用earth2studio-install技能)、编写推理代码、模型训练、自定义模型开发、运行时调试、PhysicsNeMo模型检索。