earth2studio-discover
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEarth2Studio Discoverability Skill
Earth2Studio 组件检索技能
Purpose
用途
Help users identify the right Earth2Studio models, data sources, and examples for
their weather/climate task. Use when: comparing models by GPU/VRAM requirements,
choosing forecast class (nowcast, medium-range, seasonal), finding compatible
data sources via lexicons, or locating gallery examples for downscaling,
ensemble generation, or data assimilation.
帮助用户为其天气/气候任务挑选合适的Earth2Studio模型、数据源及示例。适用于以下场景:按GPU/VRAM需求对比模型、选择预报类型(nowcast、中期预报、季节预报)、通过词汇表查找兼容数据源,或定位用于降尺度、集合生成、数据同化的示例。
Prerequisites
前置条件
- Internet access to fetch live documentation pages from nvidia.github.io
- Familiarity with Earth2Studio badge system (Class, Region, VRAM, Release)
You are helping a user find the right Earth2Studio components for their use case. Your job is to understand what they want to do, then point them at the models, data sources, and examples that fit — verified against live documentation.
- 可访问互联网以从nvidia.github.io获取实时文档页面
- 熟悉Earth2Studio标识系统(Class、Region、VRAM、Release)
您的任务是帮助用户为其用例找到合适的Earth2Studio组件。您需要先理解用户的需求,然后为其推荐匹配的模型、数据源及示例,并通过实时文档进行验证。
Core principle: discover from live docs, don't memorize
核心原则:从实时文档检索,而非记忆
Earth2Studio adds models, data sources, and examples every release. Model classes get new badges, new data sources appear, examples get reorganized. Any static list in this skill will rot.
Rules:
- Always fetch the relevant live doc pages before recommending components.
- Use badge metadata (Region, Class, VRAM, Release) from the docs to filter candidates.
- Verify data-source ↔ model compatibility using the lexicon system (see Step 4).
- Cite doc URLs so the user can explore further.
Earth2Studio在每个版本都会新增模型、数据源及示例。模型类别会添加新标识,新数据源会上线,示例也会重新整理。本技能中的任何静态列表都会失效。
规则:
- 在推荐组件前,务必先获取相关的实时文档页面。
- 使用文档中的标识元数据(Region、Class、VRAM、Release)筛选候选组件。
- 通过词汇表系统验证数据源与模型的兼容性(参见步骤4)。
- 提供文档URL,方便用户进一步探索。
Live doc references
实时文档参考
Fetch these pages as needed (not all at once — only what the user's question requires):
| Category | URL |
|---|---|
| Prognostic models | https://nvidia.github.io/earth2studio/modules/models_px.html |
| Diagnostic models | https://nvidia.github.io/earth2studio/modules/models_dx.html |
| Data assimilation | https://nvidia.github.io/earth2studio/modules/models_da.html |
| Data sources (analysis) | https://nvidia.github.io/earth2studio/modules/datasources_analysis.html |
| Data sources (forecast) | https://nvidia.github.io/earth2studio/modules/datasources_forecast.html |
| Data sources (dataframe) | https://nvidia.github.io/earth2studio/modules/datasources_dataframe.html |
| Examples gallery | https://nvidia.github.io/earth2studio/examples/index.html |
| Lexicon source | https://github.com/NVIDIA/earth2studio/tree/main/earth2studio/lexicon |
根据需求获取以下页面(无需一次性获取全部,仅获取用户问题所需的页面):
Interaction protocol
交互流程
Step 1. Understand the user's problem
步骤1. 理解用户需求
Extract from what the user has said (ask follow-ups if needed, cap at 3 questions):
- Task type — medium-range forecasting, nowcasting, downscaling/super-resolution, seasonal/subseasonal, data assimilation, climate projection, ensemble generation, derived diagnostics
- Region — global, North America, Europe, Asia, specific country/area
- Temporal scale — hours ahead (nowcast), days ahead (medium-range), weeks/months (seasonal), climate
- Variables of interest — temperature, precipitation, wind, pressure, radiation, specific levels, etc.
- Hardware constraints — GPU type, available VRAM (40GB, 48GB, 80GB, 96GB)
- Deterministic vs. ensemble — single forecast or probabilistic
Good follow-up phrasing: "Are you looking for a single best-estimate forecast or an ensemble with uncertainty?" — not "what's your use case?"
从用户的描述中提取以下信息(如有需要可跟进提问,最多3个问题):
- 任务类型 — 中期预报、临近预报(nowcast)、降尺度/超分辨率、季节/次季节预报、数据同化、气候预估、集合生成、衍生诊断
- 区域 — 全球、北美、欧洲、亚洲、特定国家/地区
- 时间尺度 — 数小时后(临近预报)、数天后(中期预报)、数周/数月(季节预报)、气候尺度
- 关注变量 — 气温、降水、风、气压、辐射、特定高度层等
- 硬件限制 — GPU类型、可用显存(40GB、48GB、80GB、96GB)
- 确定性 vs 集合 — 单一预报或概率预报
合适的跟进提问表述:“您需要的是单一最优估计预报还是带不确定性的集合预报?” — 而非 “您的用例是什么?”
Step 2. Fetch relevant model docs
步骤2. 获取相关模型文档
Based on the user's task type, fetch the appropriate model page(s):
- Forecasting → prognostic models (px)
- Post-processing / downscaling / derived variables → diagnostic models (dx)
- Observation integration → data assimilation (da)
- Often a workflow chains px → dx, so check both
From the doc pages, extract for each candidate model:
- Class badge — NWC, DS, MR, S2S, DA, CM
- Region badge — Global, NA, EU, AS, etc.
- Rec VRAM badge — minimum GPU memory
- Release year — newer models generally supersede older ones in the same class
Filter to models matching the user's task type, region, and hardware. Present a short-list (not the full catalog) with badge metadata.
根据用户的任务类型,获取对应的模型页面:
- 预报任务 → 预报模型(px)
- 后处理/降尺度/衍生变量 → 诊断模型(dx)
- 观测数据融合 → 数据同化(da)
- 通常工作流会串联px→dx,因此需同时检查两类模型
从文档页面中提取每个候选模型的以下信息:
- 类别标识 — NWC、DS、MR、S2S、DA、CM
- 区域标识 — Global、NA、EU、AS等
- 推荐显存标识 — 最低GPU内存
- 发布年份 — 同类别中较新的模型通常会替代旧模型
筛选出符合用户任务类型、区域及硬件条件的模型,提供简短列表(而非完整目录)并附带标识元数据。
Step 3. Fetch relevant data source docs
步骤3. 获取相关数据源文档
Based on the user's data needs, fetch the appropriate data source page:
- Historical reanalysis → analysis data sources
- Real-time or operational → forecast data sources
- Observations / station data → dataframe data sources
Note which data sources cover the user's region and variables.
根据用户的数据需求,获取对应的数据源页面:
- 历史再分析数据 → 分析类数据源
- 实时或业务化数据 → 预报类数据源
- 观测/站点数据 → 数据框类数据源
记录哪些数据源覆盖了用户所需的区域和变量。
Step 4. Verify compatibility via lexicon
步骤4. 通过词汇表验证兼容性
This is the key technical step. Earth2Studio models declare their required input variables via . Data sources expose available variables through their lexicon VOCAB. If a data source's lexicon VOCAB keys contain all variables in a model's (the "variable" dimension), they are compatible.
input_coords()input_coordsTo verify:
- Check the model's doc page or source for its — specifically the variable list
input_coords - Check the data source's lexicon file at for its VOCAB keys
earth2studio/lexicon/<source>.py - Confirm the data source VOCAB covers all variables the model needs
If checking source code directly (e.g. user has a local clone), the lexicon files are at:
earth2studio/lexicon/gfs.py
earth2studio/lexicon/hrrr.py
earth2studio/lexicon/cds.py
earth2studio/lexicon/arco.py
earth2studio/lexicon/wb2.py
... (one per data source)Each defines a mapping Earth2Studio variable names to source-specific identifiers.
VOCAB: dict[str, str | tuple]Surface compatibility results clearly: "GraphCastOperational needs [list of variables] — GFS and ERA5 (via ARCO/CDS) both provide these, but HRRR does not cover pressure levels above X."
这是关键的技术步骤。Earth2Studio模型通过声明其所需的输入变量。数据源通过其词汇表VOCAB暴露可用变量。若数据源的词汇表VOCAB键包含模型中的所有变量(即“variable”维度),则二者兼容。
input_coords()input_coords验证方法:
- 查看模型的文档页面或源代码,获取其— 特别是变量列表
input_coords - 查看数据源的词汇表文件中的VOCAB键
earth2studio/lexicon/<source>.py - 确认数据源VOCAB覆盖了模型所需的所有变量
若直接查看源代码(例如用户有本地克隆版本),词汇表文件位于:
earth2studio/lexicon/gfs.py
earth2studio/lexicon/hrrr.py
earth2studio/lexicon/cds.py
earth2studio/lexicon/arco.py
earth2studio/lexicon/wb2.py
... (每个数据源对应一个文件)每个文件定义了,将Earth2Studio变量名映射为数据源特定的标识符。
VOCAB: dict[str, str | tuple]清晰表述兼容性结果:“GraphCastOperational需要[变量列表] — GFS和ERA5(通过ARCO/CDS)均提供这些变量,但HRRR不覆盖X以上的气压层。”
Step 5. Suggest examples
步骤5. 推荐示例
Fetch the examples gallery and identify examples that demonstrate the user's workflow pattern. Examples are organized by category:
- — basic deterministic, diagnostic, ensemble pipelines
01_getting_started - — ensemble extension, perturbation, cyclone tracking
02_medium_range - — CorrDiff, CBottle, ensemble downscaling
03_downscaling - — StormCast, StormScope
04_nowcasting - — StormCast SDA, HealDA
05_data_assimilation - — DLESyM, statistical methods
06_seasonal - — distributed inference, IO, custom data, generation
07_misc - — building custom models, diagnostics, data sources
08_extend
Point the user at the most relevant 1–3 examples as starting points. Explain what each demonstrates and how it relates to their problem.
获取示例库并识别符合用户工作流模式的示例。示例按类别组织:
- — 基础确定性、诊断、集合流水线
01_getting_started - — 集合扩展、扰动、气旋追踪
02_medium_range - — CorrDiff、CBottle、集合降尺度
03_downscaling - — StormCast、StormScope
04_nowcasting - — StormCast SDA、HealDA
05_data_assimilation - — DLESyM、统计方法
06_seasonal - — 分布式推理、IO、自定义数据、生成
07_misc - — 构建自定义模型、诊断、数据源
08_extend
为用户推荐最相关的1-3个示例作为起点,说明每个示例的演示内容及其与用户问题的关联。
Step 6. Return recommendations
步骤6. 返回推荐结果
Output structure (omit empty sections):
undefined输出结构(省略空章节):
undefinedYour use case
您的用例
[1-2 sentence restatement of what the user wants to do]
[1-2句话重述用户需求]
Recommended models
推荐模型
| Model | Class | Region | VRAM | Why |
|---|---|---|---|---|
| [Short-list with rationale per row] |
| 模型 | 类别 | 区域 | 显存 | 推荐理由 |
|---|---|---|---|---|
| [简短列表,每行附带理由] |
Compatible data sources
兼容数据源
| Data Source | Coverage | Compatible with |
|---|---|---|
| [Verified via lexicon] |
| 数据源 | 覆盖范围 | 兼容模型 |
|---|---|---|
| [通过词汇表验证的结果] |
Relevant examples
相关示例
- Example name — what it demonstrates
- 示例名称 — 演示内容
Next steps
下一步操作
[What to install, what to read next]
Keep recommendations to 2–4 models maximum. If multiple options exist, explain the tradeoff (accuracy vs. speed, deterministic vs. ensemble, VRAM, etc.) rather than listing everything.[需安装的内容、后续阅读文档]
推荐模型最多保留2-4个。若存在多个选项,需说明权衡点(精度vs速度、确定性vs集合、显存等),而非列出所有选项。Limitations
局限性
- Recommendations are only as current as the live docs; unreleased models are not discoverable.
- Badge metadata may be incomplete for newly added models.
- Lexicon compatibility checks require source code access for full accuracy; doc-only checks are approximate.
- 推荐结果的时效性取决于实时文档;未发布的模型无法检索到。
- 新增模型的标识元数据可能不完整。
- 词汇表兼容性检查若仅通过文档进行则为近似结果,需访问源代码才能获得完全准确的结果。
Troubleshooting
故障排查
| Error | Cause | Solution |
|---|---|---|
| Model page returns 404 | URL changed after a release | Check https://nvidia.github.io/earth2studio/ for updated navigation |
| Lexicon file not found | Data source is new or renamed | Search |
| Badge missing from model | Model docs not yet updated | Fall back to the model's source code |
| 错误 | 原因 | 解决方案 |
|---|---|---|
| 模型页面返回404 | 版本更新后URL变更 | 访问https://nvidia.github.io/earth2studio/查看更新后的导航 |
| 未找到词汇表文件 | 数据源为新增或已重命名 | 在 |
| 模型缺少标识 | 模型文档尚未更新 | 查看模型源代码的 |
Ownership and out-of-scope
职责范围与超出范围内容
Owns: component discovery, model/data-source compatibility checking, badge-based filtering, example recommendation, hardware-fit assessment.
Does not own: installation (use earth2studio-install skill), writing inference code, model training, custom model development, runtime debugging, PhysicsNeMo model discovery.
负责: 组件检索、模型与数据源兼容性检查、基于标识的筛选、示例推荐、硬件适配评估。
不负责: 安装操作(使用earth2studio-install技能)、编写推理代码、模型训练、自定义模型开发、运行时调试、PhysicsNeMo模型检索。