rhino-sdk

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Rhino Health SDK — Workflow Planner & Code Expert

Rhino Health SDK — 工作流规划师与代码专家

Plan-first skill for the

rhino-health

Python SDK (v2.1.x). Takes high-level research and analytics goals, decomposes them into phased execution plans, and generates complete runnable Python code.

面向

rhino-health

Python SDK（v2.1.x）的先规划再执行工具。它能接收高层级的研究与分析目标，将其分解为分阶段执行计划，并生成完整的可运行Python代码。

Context Loading

上下文加载

Before responding, read ALL reference files — planning requires the full SDK picture:

API Reference —
```
references/sdk_reference.md
```
Endpoint classes, methods, enums, CreateInput summaries, dataclass fields, import paths.
Patterns & Gotchas —
```
references/patterns_and_gotchas.md
```
Auth patterns, resource lookup, metrics execution, filtering, code objects, async, and pitfalls.
Metrics Reference —
```
references/metrics_reference.md
```
All 40+ federated metric classes with parameters, import paths, and decision guide.
Example Index —
```
references/examples/INDEX.md
```
Mapping of use cases to working example files with key methods and difficulty levels.

For SDK questions that don't require planning, you may selectively load only the relevant files.

在回复之前，请阅读所有参考文件——规划需要全面了解SDK的相关信息：

API参考 —
```
references/sdk_reference.md
```
端点类、方法、枚举、CreateInput摘要、数据类字段、导入路径。
模式与注意事项 —
```
references/patterns_and_gotchas.md
```
认证模式、资源查找、指标执行、过滤、代码对象、异步操作以及常见陷阱。
指标参考 —
```
references/metrics_reference.md
```
所有40余种联邦指标类，包含参数、导入路径和决策指南。
示例索引 —
```
references/examples/INDEX.md
```
用例与可用示例文件的映射关系，包含关键方法和难度等级。

对于无需规划的SDK问题，你可以选择性地仅加载相关文件。

Request Routing

请求路由

Determine what the user needs and follow the appropriate workflow:

User intent	Action
High-level goal, multi-step workflow, "plan", "design", "how should I approach"	Full planning workflow (Sections 3-6)
"Write code", "generate a script", single-task code generation	Code generation with validation (Section 6)
"How do I...", SDK concept question	Answer from reference files (Section 9)
Error, traceback, "why is this failing"	Error diagnosis (Section 8)
"Which metric for...", metric configuration	Metric selection (Section 7)
"Show me an example", "sample code"	Example matching from `references/examples/INDEX.md`

判断用户需求并遵循相应的工作流：

用户意图	操作
高层级目标、多步骤工作流、“规划”、“设计”、“我该如何着手”	完整规划工作流（第3-6节）
“编写代码”、“生成脚本”、单任务代码生成	带验证的代码生成（第6节）
“我该如何...”, SDK概念问题	从参考文件中查找答案（第9节）
错误、回溯信息、“为什么这个会失败”	错误诊断（第8节）
“哪种指标适用于...”, 指标配置	指标选择（第7节）
“给我看示例”、“示例代码”	从 `references/examples/INDEX.md` 中匹配示例

Planning Process

规划流程

Follow these four steps for any multi-step goal:

针对任何多步骤目标，请遵循以下四个步骤：

Step 1: Analyze the Goal

步骤1：分析目标

Extract from the user's request:

Data: What data sources? Do datasets already exist, or need ingestion/creation?
Analysis: What computation? Metrics, custom code, harmonization, or a combination?
Output: What does the user want? Numbers, transformed datasets, trained models, exported files?
Constraints: Filters (age > 50, gender = F), specific sites, time ranges, target data models (OMOP/FHIR)?

If any of these are unclear, ask the user before producing the plan.

从用户请求中提取以下信息：

数据：数据源是什么？数据集是否已存在，还是需要导入/创建？
分析：需要进行什么计算？指标计算、自定义代码、数据协调，还是组合操作？
输出：用户想要什么结果？数值、转换后的数据集、训练好的模型、导出文件？
约束条件：过滤条件（如年龄>50、性别=F）、特定站点、时间范围、目标数据模型（OMOP/FHIR）？

如果以上任何信息不明确，请在生成计划前询问用户。

Step 2: Select Workflow Templates

步骤2：选择工作流模板

Match the goal to one or more composable SDK pipeline templates:

将目标与一个或多个可组合的SDK管道模板匹配：

Template A: Federated Analytics

模板A：联邦分析

Run statistical metrics across one or more sites without moving data.

Auth → Project → Datasets → Metric Config → Execute → Results

Step	SDK Method	Notes
Authenticate	`rh.login()`	Always first
Get project	`session.project.get_project_by_name()`	Check for None
Get datasets	`project.get_dataset_by_name()` or list all	One per site
Configure metric	`Mean(variable=...)` , `Cox(...)` , etc.	Add filters/group_by as needed
Execute per-site	`session.dataset.get_dataset_metric(uid, config)`	Single site
Execute aggregated	`session.project.aggregate_dataset_metric(uids, config)`	Cross-site, `List[str]` of UIDs
Execute joined	`session.project.joined_dataset_metric(config, query, filter)`	Federated join with shared identifiers

Use when: descriptive stats, survival analysis, hypothesis tests, or any metric-based analysis.

在不移动数据的情况下，跨一个或多个站点运行统计指标。

认证 → 项目 → 数据集 → 指标配置 → 执行 → 结果

步骤	SDK方法	说明
认证	`rh.login()`	始终是第一步
获取项目	`session.project.get_project_by_name()`	检查是否为None
获取数据集	`project.get_dataset_by_name()` 或列出所有数据集	每个站点一个数据集
配置指标	`Mean(variable=...)` , `Cox(...)` 等	根据需要添加过滤/分组条件
单站点执行	`session.dataset.get_dataset_metric(uid, config)`	单个站点
聚合执行	`session.project.aggregate_dataset_metric(uids, config)`	跨站点，接收UID的 `List[str]`
联合执行	`session.project.joined_dataset_metric(config, query, filter)`	带共享标识符的联邦连接

适用场景： 描述性统计、生存分析、假设检验或任何基于指标的分析。

Template B: Code Object Execution

模板B：代码对象执行

Run custom containerized or Python code across federated sites.

Auth → Project → Data Schema → Code Object Create → Build → Run → Wait → Output Datasets

Step	SDK Method	Notes
Authenticate	`rh.login()`
Get/create project	`session.project.get_project_by_name()`
Get/create schema	`session.data_schema.create_data_schema()`	Only if new data format
Create code object	`session.code_object.create_code_object()`	`GENERALIZED_COMPUTE` or `PYTHON_CODE`
Wait for build	`code_object.wait_for_build()`	Only for `GENERALIZED_COMPUTE`
Run	`session.code_object.run_code_object()`	`input_dataset_uids=[[uid]]` double-nested
Wait for completion	`code_run.wait_for_completion()`
Access outputs	`result.output_dataset_uids.root[0].root[0].root[0]`	Triply nested

Use when: custom computation — train/test splits, feature engineering, model training, any logic that metrics alone cannot express.

跨联邦站点运行自定义容器化或Python代码。

认证 → 项目 → 数据架构 → 代码对象创建 → 构建 → 运行 → 等待 → 输出数据集

步骤	SDK方法	说明
认证	`rh.login()`
获取/创建项目	`session.project.get_project_by_name()`
获取/创建架构	`session.data_schema.create_data_schema()`	仅当需要新数据格式时使用
创建代码对象	`session.code_object.create_code_object()`	`GENERALIZED_COMPUTE` 或 `PYTHON_CODE` 类型
等待构建完成	`code_object.wait_for_build()`	仅适用于 `GENERALIZED_COMPUTE` 类型
运行	`session.code_object.run_code_object()`	`input_dataset_uids=[[uid]]` 为双层嵌套
等待执行完成	`code_run.wait_for_completion()`
访问输出	`result.output_dataset_uids.root[0].root[0].root[0]`	三层嵌套

适用场景： 自定义计算——训练/测试集划分、特征工程、模型训练，或任何内置指标无法实现的逻辑。

Template C: Data Harmonization

模板C：数据协调

Transform source data into a target data model (OMOP, FHIR, custom).

Auth → Project → Vocabulary → Semantic Mapping → Syntactic Mapping → Config → Run → Output

Step	SDK Method	Notes
Authenticate	`rh.login()`
Get project	`session.project.get_project_by_name()`
Create semantic mapping	`session.semantic_mapping.create_semantic_mapping()`	Optional; for vocabulary lookups
Wait for indexing	`semantic_mapping.wait_for_completion()`	Can be slow (minutes)
Create syntactic mapping	`session.syntactic_mapping.create_syntactic_mapping()`	Defines column transformations
Generate/set config	`session.syntactic_mapping.generate_config()`	LLM-based auto-generation or manual
Run harmonization	`session.syntactic_mapping.run_data_harmonization()`	Preferred path
Wait for completion	`code_run.wait_for_completion()`
Access outputs	`result.output_dataset_uids.root[0].root[0].root[0]`	Triply nested

Key harmonization types:

TransformationType.SPECIFIC_VALUE

SOURCE_DATA_VALUE

ROW_PYTHON

TABLE_PYTHON

SEMANTIC_MAPPING

VLOOKUP

DATE

SECURE_UUID

Target models:

SyntacticMappingDataModel.OMOP

.FHIR

.CUSTOM

Use when: source data needs transformation before analysis — different column names, value encodings, or target standards like OMOP/FHIR.

将源数据转换为目标数据模型（OMOP、FHIR、自定义模型）。

认证 → 项目 → 词汇表 → 语义映射 → 语法映射 → 配置 → 运行 → 输出

步骤	SDK方法	说明
认证	`rh.login()`
获取项目	`session.project.get_project_by_name()`
创建语义映射	`session.semantic_mapping.create_semantic_mapping()`	可选；用于词汇表查找
等待索引完成	`semantic_mapping.wait_for_completion()`	可能较慢（数分钟）
创建语法映射	`session.syntactic_mapping.create_syntactic_mapping()`	定义列转换规则
生成/设置配置	`session.syntactic_mapping.generate_config()`	基于LLM自动生成或手动设置
运行协调任务	`session.syntactic_mapping.run_data_harmonization()`	推荐路径
等待执行完成	`code_run.wait_for_completion()`
访问输出	`result.output_dataset_uids.root[0].root[0].root[0]`	三层嵌套

主要协调类型：

TransformationType.SPECIFIC_VALUE

、

SOURCE_DATA_VALUE

、

ROW_PYTHON

、

TABLE_PYTHON

、

SEMANTIC_MAPPING

、

VLOOKUP

、

DATE

、

SECURE_UUID

。

目标模型：

SyntacticMappingDataModel.OMOP

、

.FHIR

、

.CUSTOM

。

适用场景： 源数据需要在分析前进行转换——例如列名不同、值编码不同，或需要符合OMOP/FHIR等目标标准。

Template D: SQL Data Ingestion

模板D：SQL数据导入

Pull data from an on-prem database into the Rhino platform.

Auth → Project → Connection Details → SQL Query → Import as Dataset → Verify

Step	SDK Method	Notes
Authenticate	`rh.login()`
Get project	`session.project.get_project_by_name()`
Define connection	`ConnectionDetails(server_type=..., server_url=..., ...)`	PostgreSQL, MySQL, etc.
Run metrics on query	`session.sql_query.run_sql_query(SQLQueryInput(...))`	Does NOT return raw data
Import as dataset	`session.sql_query.import_dataset_from_sql_query(SQLQueryImportInput(...))`	Creates a Dataset from query results
Wait	`sql_query.wait_for_completion()`

Use when: data lives in a relational database and needs to be brought into the platform as a Dataset.

将本地数据库中的数据导入Rhino平台。

认证 → 项目 → 连接详情 → SQL查询 → 导入为数据集 → 验证

步骤	SDK方法	说明
认证	`rh.login()`
获取项目	`session.project.get_project_by_name()`
定义连接	`ConnectionDetails(server_type=..., server_url=..., ...)`	支持PostgreSQL、MySQL等
对查询结果运行指标	`session.sql_query.run_sql_query(SQLQueryInput(...))`	不返回原始数据
导入为数据集	`session.sql_query.import_dataset_from_sql_query(SQLQueryImportInput(...))`	从查询结果创建数据集
等待完成	`sql_query.wait_for_completion()`

适用场景： 数据存储在关系型数据库中，需要将其导入平台作为数据集。

Template E: Model Training + Inference

模板E：模型训练+推理

Train a federated model, then run inference on new data. This is Template B applied twice:

Train phase: Code Object with training logic → produces model artifacts
Inference phase:
```
session.code_run.run_inference()
```
using the trained model

Step	SDK Method	Notes
Train (Template B)	`create_code_object` → `run_code_object` → `wait_for_completion`	Full code object lifecycle
Run inference	`session.code_run.run_inference(code_run_uid, validation_dataset_uids, ...)`	Uses trained model
Get model params	`session.code_run.get_model_params(code_run_uid)`	Download model weights

Use when: federated ML model training and validation.

训练联邦模型，然后在新数据上运行推理。这是模板B的两次应用：

训练阶段：包含训练逻辑的代码对象 → 生成模型工件
推理阶段：使用训练好的模型调用
```
session.code_run.run_inference()
```

步骤	SDK方法	说明
训练（模板B）	`create_code_object` → `run_code_object` → `wait_for_completion`	完整的代码对象生命周期
运行推理	`session.code_run.run_inference(code_run_uid, validation_dataset_uids, ...)`	使用训练好的模型
获取模型参数	`session.code_run.get_model_params(code_run_uid)`	下载模型权重

适用场景： 联邦机器学习模型的训练与验证。

Template F: Multi-Pipeline Composition

模板F：多管道组合

Chain 2+ templates when a single template cannot satisfy the goal:

Goal pattern	Composition
Harmonize then analyze	Template C → Template A
Ingest from SQL then analyze	Template D → Template A
Harmonize then train model	Template C → Template E
Ingest, harmonize, analyze, train	Template D → Template C → Template A → Template E
Custom preprocessing then analytics	Template B → Template A

Chaining rule: the output datasets of one phase become the input datasets of the next. Use

result.output_dataset_uids.root[0].root[0].root[0]

to extract UIDs and pass them forward.

当单个模板无法满足目标时，将2个或多个模板链接起来：

目标模式	组合方式
先协调再分析	模板C → 模板A
从SQL导入后再分析	模板D → 模板A
先协调再训练模型	模板C → 模板E
导入、协调、分析、训练	模板D → 模板C → 模板A → 模板E
自定义预处理后再分析	模板B → 模板A

链接规则： 前一阶段的输出数据集作为下一阶段的输入数据集。使用

result.output_dataset_uids.root[0].root[0].root[0]

提取UID并传递给下一阶段。

Step 3: Compose the Plan

步骤3：组合计划

Authentication is always Phase 0 — shared across all phases. Include project and workgroup discovery.
One template per phase — if the goal requires Templates C → A → B, that is three phases plus Phase 0.
Chain outputs to inputs — explicitly state which output from Phase N feeds into Phase N+1.
Add checkpoints — after each phase, include a verification step (print status, check dataset count, verify output exists).
Surface prerequisites — list what must already exist vs. what will be created.
Note alternatives — if there are multiple valid approaches, briefly state why you chose one.

认证始终是第0阶段——所有阶段共享此步骤。包括项目和工作组的发现。
每个阶段对应一个模板——如果目标需要模板C→A→B，则包含第0阶段在内共四个阶段。
将输出链接到输入——明确说明第N阶段的哪个输出将作为第N+1阶段的输入。
添加检查点——每个阶段结束后，添加验证步骤（如打印状态、检查数据集数量、验证输出是否存在）。
列出前置条件——列出哪些资源必须已存在，哪些将由本计划创建。
说明替代方案——如果有多种有效方法，简要说明选择当前方案的原因。

Step 4: Generate Implementation

步骤4：生成实现代码

After presenting the plan, generate the complete runnable code following ALL validation rules in Section 6.

在展示计划后，按照第6节中的所有验证规则生成完整的可运行代码。

Plan Output Format

计划输出格式

Structure every planning response as:

undefined

所有规划回复均需按照以下结构：

undefined

Goal

目标

[1-2 sentence restatement]

[1-2句话重述目标]

Prerequisites

前置条件

Must exist: [project, datasets, schemas, workgroup access]
Created by this plan: [new code objects, schemas, harmonized datasets]

必须已存在： [项目、数据集、架构、工作组访问权限]
将由本计划创建： [新代码对象、架构、协调后的数据集]

Plan

计划

Phase 0: Setup

第0阶段：设置

Authenticate and discover project/workgroup/datasets
Checkpoint: print project name and dataset count

认证并发现项目/工作组/数据集
检查点：打印项目名称和数据集数量

Phase 1: [Name] — Template [X]

第1阶段：[名称] — 模板[X]

Step 1.1: [description] —
```
session.X.method()
```
Step 1.2: [description] —
```
session.Y.method()
```
Checkpoint: [how to verify]

步骤1.1：[描述] —
```
session.X.method()
```
步骤1.2：[描述] —
```
session.Y.method()
```
检查点：[验证方式]

Phase 2: [Name] — Template [Y]

第2阶段：[名称] — 模板[Y]

Depends on: Phase 1 output datasets
Step 2.1: ...
Checkpoint: [how to verify]

依赖：第1阶段的输出数据集
步骤2.1：...
检查点：[验证方式]

Alternatives Considered

考虑过的替代方案

[Other approaches and why this plan is preferred]

[其他方法以及选择当前计划的原因]

Implementation

实现代码

[Complete, runnable Python script]

undefined

[完整的可运行Python脚本]

undefined

Decision Guidance

决策指南

When the goal is ambiguous, use this table:

User signal	Template	Reasoning
"analyze", "measure", "statistics", "compare"	A (Analytics)	Metric-based, no custom code needed
"run code", "custom analysis", "process data", "split", "transform"	B (Code Object)	Needs logic beyond built-in metrics
"harmonize", "OMOP", "FHIR", "map columns", "standardize"	C (Harmonization)	Data transformation to target model
"SQL", "database", "import from DB", "ingest"	D (SQL Ingestion)	Data lives in a relational database
"train model", "predict", "inference", "ML"	E (Model Train)	Federated model training + validation
Multiple of the above	F (Composition)	Chain templates in dependency order

当目标不明确时，请使用以下表格：

用户信号	模板	理由
“分析”、“测量”、“统计”、“比较”	A（分析）	基于指标，无需自定义代码
“运行代码”、“自定义分析”、“处理数据”、“划分”、“转换”	B（代码对象）	需要内置指标之外的逻辑
“协调”、“OMOP”、“FHIR”、“映射列”、“标准化”	C（数据协调）	需要将数据转换为目标模型
“SQL”、“数据库”、“从DB导入”、“导入”	D（SQL导入）	数据存储在关系型数据库中
“训练模型”、“预测”、“推理”、“ML”	E（模型训练）	联邦模型训练与验证
以上多种信号	F（组合）	按依赖顺序链接模板

Validation Checklist

验证清单

Apply every item to ALL generated code — plans and standalone scripts alike.

所有生成的代码（包括计划和独立脚本）均需满足以下所有条件：

Endpoint Accessors

端点访问器

Operation	Correct accessor
Project-level operations, aggregate/joined metrics	`session.project`
Dataset-level operations, per-site metrics	`session.dataset`
Code objects, builds, runs, harmonization	`session.code_object`
Run status, inference results	`session.code_run`
SQL queries	`session.sql_query`
Semantic mappings, vocabularies	`session.semantic_mapping`
Syntactic mappings, harmonization config	`session.syntactic_mapping`
Data schemas	`session.data_schema`

操作	正确的访问器
项目级操作、聚合/联合指标	`session.project`
数据集级操作、单站点指标	`session.dataset`
代码对象、构建、运行、协调任务	`session.code_object`
运行状态、推理结果	`session.code_run`
SQL查询	`session.sql_query`
语义映射、词汇表	`session.semantic_mapping`
语法映射、协调配置	`session.syntactic_mapping`
数据架构	`session.data_schema`

Environment

环境

Default

rh.login()

connects to production. For dev/QA/staging, pass

rhino_api_url

rh.login(..., rhino_api_url=ApiEnvironment.DEV1_AWS_URL)

Import:

from rhino_health.lib.constants import ApiEnvironment

If user mentions dev1/dev2/QA/staging environment, ALWAYS add
```
rhino_api_url
```
parameter

默认
```
rh.login()
```
连接到生产环境。对于开发/QA/预发布环境，请传递
```
rhino_api_url
```
参数：
```
rh.login(..., rhino_api_url=ApiEnvironment.DEV1_AWS_URL)
```

导入语句：

from rhino_health.lib.constants import ApiEnvironment

如果用户提到dev1/dev2/QA/预发布环境，必须添加
```
rhino_api_url
```
参数

Import Paths

导入路径

Wrong	Correct
`from rhino_health.metrics import X`	`from rhino_health.lib.metrics import X`
`from rhino_health.endpoints.X import Y`	`from rhino_health.lib.endpoints.X.X_dataclass import Y`

错误写法	正确写法
`from rhino_health.metrics import X`	`from rhino_health.lib.metrics import X`
`from rhino_health.endpoints.X import Y`	`from rhino_health.lib.endpoints.X.X_dataclass import Y`

Metric Calls

指标调用

aggregate_dataset_metric

takes

List[str]

of UIDs:

[str(d.uid) for d in datasets]

```
get_dataset_metric
```
takes a single
```
dataset_uid: str
```

joined_dataset_metric

takes

query_datasets

and optional

filter_datasets

List[str]

Metric config objects require
```
data_column
```
(not
```
column
```
or
```
field
```
)

FilterVariable

uses keys:

data_column

filter_column

filter_value

filter_type

aggregate_dataset_metric

接收UID的

List[str]

：

[str(d.uid) for d in datasets]

```
get_dataset_metric
```
接收单个
```
dataset_uid: str
```

joined_dataset_metric

接收

query_datasets

和可选的

filter_datasets

，均为

List[str]

指标配置对象需要
```
data_column
```
参数（而非
```
column
```
或
```
field
```
）

FilterVariable

使用以下键：

data_column

、

filter_column

、

filter_value

、

filter_type

CreateInput Alias Fields

CreateInput别名字段

Field name	Alias (use this)
`project_uid`	`project`
`workgroup_uid`	`workgroup`

字段名	别名（请使用此别名）
`project_uid`	`project`
`workgroup_uid`	`workgroup`

Nested Structures & RootModels

嵌套结构与RootModels

CodeObjectRunInput.input_dataset_uids

List[List[str]]

[[uid1, uid2]]

output_dataset_uids

is triply nested RootModel: access via

.root[0].root[0].root[0]

DataSchema.schema_fields

is a

SchemaFields

RootModel: access list via

.root

, names via

.field_names

group_by

format:

{"groupings": [{"data_column": "col"}]}

data_filters

list:

[FilterVariable(data_column="col", filter_column="col", filter_value="val", filter_type=FilterType.EQUALS)]

Enum display: use
```
.value
```
for clean strings (e.g.
```
status.value
```
→
```
'Approved'
```
)

CodeObjectRunInput.input_dataset_uids

的类型为

List[List[str]]

：

[[uid1, uid2]]

output_dataset_uids

是三层嵌套的RootModel：通过

.root[0].root[0].root[0]

访问

```
DataSchema.schema_fields
```
是
```
SchemaFields
```
类型的RootModel：通过
```
.root
```
访问列表，通过
```
.field_names
```
访问字段名

group_by

格式：

{"groupings": [{"data_column": "col"}]}

data_filters

列表：

[FilterVariable(data_column="col", filter_column="col", filter_value="val", filter_type=FilterType.EQUALS)]

枚举显示：使用
```
.value
```
获取清晰的字符串（例如
```
status.value
```
→
```
'Approved'
```
）

Async Operations

异步操作

Call
```
wait_for_build()
```
after creating Generalized Compute code objects

Call

wait_for_completion()

after

run_code_object()

run_data_harmonization()

run_sql_query()

创建Generalized Compute类型的代码对象后，调用
```
wait_for_build()
```

调用

run_code_object()

、

run_data_harmonization()

、

run_sql_query()

后，调用

wait_for_completion()

None Checks

None检查

Every

get_*_by_name()

call must be followed by a None check:

python

dataset = project.get_dataset_by_name("Name")
if dataset is None:
    raise ValueError("Dataset not found")

每个

get_*_by_name()

调用后必须添加None检查：

python

dataset = project.get_dataset_by_name("Name")
if dataset is None:
    raise ValueError("Dataset not found")

Code Template

代码模板

Every generated script must follow this structure:

python

import rhino_health as rh
from getpass import getpass

所有生成的脚本必须遵循以下结构：

python

import rhino_health as rh
from getpass import getpass

... additional imports ...

... 其他导入语句 ...

For non-production environments, add rhino_api_url:

对于非生产环境，请添加rhino_api_url参数：

from rhino_health.lib.constants import ApiEnvironment

session = rh.login(username="my_email@example.com", password=getpass(),

rhino_api_url=ApiEnvironment.DEV1_AWS_URL)

session = rh.login(username="my_email@example.com", password=getpass())

PROJECT_NAME = "My Project"

session = rh.login(username="my_email@example.com", password=getpass())

PROJECT_NAME = "My Project"

... constants ...

... 常量定义 ...

project = session.project.get_project_by_name(PROJECT_NAME) if project is None: raise ValueError(f"Project '{PROJECT_NAME}' not found")

... core logic ...

... 核心逻辑 ...

print(result)

undefined

print(result)

undefined

Metric Selection Tree

指标选择树

Map natural language to the right metric class:

User asks about...	Metric class	Category
Counts, frequencies	`Count`	Basic
Averages, means	`Mean`	Basic
Spread, variability	`StandardDeviation` , `Variance`	Basic
Totals, sums	`Sum`	Basic
Percentiles, medians, quartiles	`Percentile` , `NPercentile`	Quantile
Survival time, time-to-event	`KaplanMeier`	Survival
Hazard ratios, covariates + survival	`Cox`	Survival
ROC curves, AUC	`RocAuc`	ROC/AUC
ROC with confidence intervals	`RocAucWithCI`	ROC/AUC
Correlation between variables	`Pearson` , `Spearman`	Statistics
Inter-rater reliability	`ICC`	Statistics
Compare two group means	`TTest`	Statistics
Compare 3+ group means	`OneWayANOVA`	Statistics
Categorical association	`ChiSquare`	Statistics
2x2 contingency table	`TwoByTwoTable`	Epidemiology
Odds ratio	`OddsRatio`	Epidemiology
Risk ratio / relative risk	`RiskRatio`	Epidemiology
Risk difference	`RiskDifference`	Epidemiology
Incidence rates	`Incidence`	Epidemiology

All metrics:

from rhino_health.lib.metrics import ClassName

将自然语言查询映射到正确的指标类：

用户询问的内容	指标类	分类
计数、频率	`Count`	基础指标
平均值、均值	`Mean`	基础指标
离散度、变异性	`StandardDeviation` , `Variance`	基础指标
总和、总计	`Sum`	基础指标
百分位数、中位数、四分位数	`Percentile` , `NPercentile`	分位数指标
生存时间、事件发生时间	`KaplanMeier`	生存分析
风险比、协变量+生存分析	`Cox`	生存分析
ROC曲线、AUC	`RocAuc`	ROC/AUC
带置信区间的ROC	`RocAucWithCI`	ROC/AUC
变量间相关性	`Pearson` , `Spearman`	统计分析
评分者信度	`ICC`	统计分析
两组均值比较	`TTest`	统计分析
三组及以上均值比较	`OneWayANOVA`	统计分析
分类变量关联性	`ChiSquare`	统计分析
2x2列联表	`TwoByTwoTable`	流行病学
比值比	`OddsRatio`	流行病学
风险比/相对风险	`RiskRatio`	流行病学
风险差	`RiskDifference`	流行病学
发病率	`Incidence`	流行病学

所有指标的导入方式：

from rhino_health.lib.metrics import ClassName

Execution modes

执行模式

Scope Method

Single site

Scope	Method
Single site	`session.dataset.get_dataset_metric(dataset_uid, config)`
Aggregated across sites	`session.project.aggregate_dataset_metric(dataset_uids, config)` — `List[str]` UIDs
Federated join	`session.project.joined_dataset_metric(config, query_datasets, filter_datasets)`

session.dataset.get_dataset_metric(dataset_uid, config)

Aggregated across sites

session.project.aggregate_dataset_metric(dataset_uids, config)

—

List[str]

UIDs

Federated join

session.project.joined_dataset_metric(config, query_datasets, filter_datasets)

范围方法

单站点

范围	方法
单站点	`session.dataset.get_dataset_metric(dataset_uid, config)`
跨站点聚合	`session.project.aggregate_dataset_metric(dataset_uids, config)` — 接收UID的 `List[str]`
联邦连接	`session.project.joined_dataset_metric(config, query_datasets, filter_datasets)`

session.dataset.get_dataset_metric(dataset_uid, config)

跨站点聚合

session.project.aggregate_dataset_metric(dataset_uids, config)

— 接收UID的

List[str]

联邦连接

session.project.joined_dataset_metric(config, query_datasets, filter_datasets)

Filtering example

过滤示例

python

from rhino_health.lib.metrics import Mean, FilterType, FilterVariable

config = Mean(
    variable="Height",
    data_filters=[
        FilterVariable(
            data_column="Gender",
            filter_column="Gender",
            filter_value="Female",
            filter_type=FilterType.EQUALS,
        )
    ],
    group_by={"groupings": ["Gender"]},
)

python

from rhino_health.lib.metrics import Mean, FilterType, FilterVariable

config = Mean(
    variable="Height",
    data_filters=[
        FilterVariable(
            data_column="Gender",
            filter_column="Gender",
            filter_value="Female",
            filter_type=FilterType.EQUALS,
        )
    ],
    group_by={"groupings": ["Gender"]},
)

Error-to-Fix Reference

错误与修复参考

When the user encounters an error, diagnose using this table:

Error pattern	Root cause	Fix
`NotAuthenticatedError` / HTTP 401	Token expired, wrong creds, or MFA	Re-login; pass `otp_code` if MFA enabled
HTTP 401 with correct credentials	Wrong environment URL	Add `rhino_api_url=ApiEnvironment.DEV1_AWS_URL` (or QA/staging). Default is production
`AttributeError: 'NoneType'`	`get_*_by_name()` returned None	Add None check after every `get_*_by_name()`
`ValidationError` (pydantic)	Wrong field names — alias confusion	Use aliases: `project` not `project_uid` , `workgroup` not `workgroup_uid`
`TypeError` in metric config	String where FilterVariable expected	Use `FilterVariable(data_column=..., filter_column=..., filter_value=..., filter_type=...)`
`ImportError` / `ModuleNotFoundError`	Wrong import path	`from rhino_health.lib.metrics import X` (NOT `rhino_health.metrics` )
`TypeError: aggregate_dataset_metric()`	`List[Dataset]` instead of `List[str]`	Convert: `[str(d.uid) for d in datasets]`
`IndexError` on `output_dataset_uids`	Accessing as flat list	Use `.root[0].root[0].root[0]` (triply nested RootModel)
`TypeError` / `AttributeError` on `schema_fields`	`SchemaFields` is a RootModel, not a list	Use `schema.schema_fields.root` for the list, `.field_names` for names
`TimeoutError` / operation hangs	Default timeout too low	Increase `timeout_seconds` in `wait_for_completion()`
`TypeError: input_dataset_uids`	`List[str]` instead of `List[List[str]]`	Must be double-nested: `[[uid1, uid2]]`
`KeyError` / None in metric results	Wrong `data_column` name	Verify column name matches dataset schema (case-sensitive)
Enum shows full path (e.g. `Status.APPROVED` )	Printing enum object directly	Use `.value` for clean string: `status.value` → `'Approved'`
`ValidationError` on enum field (e.g. `indexing_status` )	SDK/API version mismatch — backend added new value	Use `session.get()` raw API escape hatch (§17 in patterns_and_gotchas.md), or `pip install --upgrade rhino-health`

Diagnostic process: identify exception class → locate failing SDK call → cross-reference correct signature in

references/sdk_reference.md

→ check for compound errors.

当用户遇到错误时，请使用以下表格进行诊断：

错误模式	根本原因	修复方法
`NotAuthenticatedError` / HTTP 401	令牌过期、凭据错误或MFA未验证	重新登录；如果启用了MFA，请传递 `otp_code` 参数
凭据正确但仍返回HTTP 401	环境URL错误	添加 `rhino_api_url=ApiEnvironment.DEV1_AWS_URL` （或QA/预发布环境的URL）。默认连接的是生产环境
`AttributeError: 'NoneType'`	`get_*_by_name()` 返回了None	在每个 `get_*_by_name()` 调用后添加None检查
`ValidationError` （pydantic）	字段名错误——混淆了别名	使用别名： `project` 而非 `project_uid` ， `workgroup` 而非 `workgroup_uid`
指标配置中的 `TypeError`	应该传入FilterVariable的地方传入了字符串	使用 `FilterVariable(data_column=..., filter_column=..., filter_value=..., filter_type=...)`
`ImportError` / `ModuleNotFoundError`	导入路径错误	使用 `from rhino_health.lib.metrics import X` （而非 `rhino_health.metrics` ）
`aggregate_dataset_metric()` 的 `TypeError`	传入了 `List[Dataset]` 而非 `List[str]`	转换为： `[str(d.uid) for d in datasets]`
`output_dataset_uids` 的 `IndexError`	当作扁平列表访问	使用 `.root[0].root[0].root[0]` （三层嵌套的RootModel）
`schema_fields` 的 `TypeError` / `AttributeError`	`SchemaFields` 是RootModel，而非列表	使用 `schema.schema_fields.root` 获取列表，使用 `.field_names` 获取字段名
`TimeoutError` / 操作挂起	默认超时时间过短	在 `wait_for_completion()` 中增加 `timeout_seconds` 参数
`input_dataset_uids` 的 `TypeError`	传入了 `List[str]` 而非 `List[List[str]]`	必须是双层嵌套： `[[uid1, uid2]]`
指标结果中的 `KeyError` / None	`data_column` 名称错误	验证列名是否与数据集架构匹配（区分大小写）
枚举显示完整路径（如 `Status.APPROVED` ）	直接打印枚举对象	使用 `.value` 获取清晰的字符串： `status.value` → `'Approved'`
枚举字段的 `ValidationError` （如 `indexing_status` ）	SDK/API版本不匹配——后端添加了新值	使用 `session.get()` 原始API绕过方法（见patterns_and_gotchas.md第17节），或执行 `pip install --upgrade rhino-health` 升级SDK

诊断流程：识别异常类 → 定位失败的SDK调用 → 在

references/sdk_reference.md

中交叉引用正确的方法签名 → 检查是否存在复合错误。

Question Routing

问题路由

For non-planning SDK questions, locate the right context section:

Question type	Source file	Section
Authentication, login, MFA	patterns_and_gotchas.md	§1
Finding projects/datasets by name	patterns_and_gotchas.md	§2
Creating/updating resources (upsert)	patterns_and_gotchas.md	§3
Running per-site or aggregated metrics	patterns_and_gotchas.md	§4
Filtering data	patterns_and_gotchas.md	§5
Group-by analysis	patterns_and_gotchas.md	§6
Federated joins	patterns_and_gotchas.md	§7
Code objects (create, build, run)	patterns_and_gotchas.md	§8
Async operations / waiting	patterns_and_gotchas.md	§9
Correct import paths	patterns_and_gotchas.md	§11
Environment URL (dev1, QA, staging)	patterns_and_gotchas.md	§13
RootModel access (SchemaFields, output UIDs)	patterns_and_gotchas.md	§14
Semantic mapping entries / data	patterns_and_gotchas.md	§15
Session persistence / SSO	patterns_and_gotchas.md	§16
SDK crash on valid API data, ValidationError on enum	patterns_and_gotchas.md	§17
Raw API calls, session.get(), bypassing Pydantic	patterns_and_gotchas.md	§17
Vocabularies, vocabulary types	sdk_reference.md	§SemanticMappingEndpoints, §Key Enums
Data schema fields, column info	sdk_reference.md	§DataSchema, §SchemaFields
Specific endpoint methods	sdk_reference.md	§[EndpointName]Endpoints
Enums and constants	sdk_reference.md	§Key Enums
API environment URLs	sdk_reference.md	§ApiEnvironment
Metric configuration	metrics_reference.md	§[Category]
"Which metric for...?"	metrics_reference.md	§Quick Decision Guide

对于非规划类的SDK问题，请定位到正确的上下文部分：

问题类型	源文件	章节
认证、登录、MFA	patterns_and_gotchas.md	第1节
通过名称查找项目/数据集	patterns_and_gotchas.md	第2节
创建/更新资源（upsert）	patterns_and_gotchas.md	第3节
运行单站点或聚合指标	patterns_and_gotchas.md	第4节
数据过滤	patterns_and_gotchas.md	第5节
分组分析	patterns_and_gotchas.md	第6节
联邦连接	patterns_and_gotchas.md	第7节
代码对象（创建、构建、运行）	patterns_and_gotchas.md	第8节
异步操作 / 等待	patterns_and_gotchas.md	第9节
正确的导入路径	patterns_and_gotchas.md	第11节
环境URL（dev1、QA、预发布）	patterns_and_gotchas.md	第13节
RootModel访问（SchemaFields、输出UID）	patterns_and_gotchas.md	第14节
语义映射条目 / 数据	patterns_and_gotchas.md	第15节
会话持久化 / SSO	patterns_and_gotchas.md	第16节
SDK在有效API数据上崩溃、枚举的ValidationError	patterns_and_gotchas.md	第17节
原始API调用、session.get()、绕过Pydantic	patterns_and_gotchas.md	第17节
词汇表、词汇表类型	sdk_reference.md	§SemanticMappingEndpoints、§Key Enums
数据架构字段、列信息	sdk_reference.md	§DataSchema、§SchemaFields
特定端点方法	sdk_reference.md	§[EndpointName]Endpoints
枚举与常量	sdk_reference.md	§Key Enums
API环境URL	sdk_reference.md	§ApiEnvironment
指标配置	metrics_reference.md	§[分类]
“哪种指标适用于...?”	metrics_reference.md	§快速决策指南

Working Examples

可用示例

Match the user's goal to verified working examples from

references/examples/INDEX.md

Template	Example files
A (Analytics)	`eda.py` , `cox.py` , `metrics_examples.py` , `roc_analysis.py` , `aggregate_quantile.py` , `federated_join.py`
B (Code Object)	`train_test_split.py` , `runtime_external_files.py`
C (Harmonization)	`fhir_pipeline.py`
D (SQL Ingestion)	`sql_data_ingestion.py`
E (Model Training)	`train_test_split.py` (training portion)
F (Composition)	`fhir_pipeline.py` (harmonization + code object + export)

Read the relevant example file before generating code to follow its proven patterns.

将用户的目标与

references/examples/INDEX.md

中经过验证的可用示例匹配：

模板	示例文件
A（分析）	`eda.py` , `cox.py` , `metrics_examples.py` , `roc_analysis.py` , `aggregate_quantile.py` , `federated_join.py`
B（代码对象）	`train_test_split.py` , `runtime_external_files.py`
C（数据协调）	`fhir_pipeline.py`
D（SQL导入）	`sql_data_ingestion.py`
E（模型训练）	`train_test_split.py` （训练部分）
F（组合）	`fhir_pipeline.py` （协调 + 代码对象 + 导出）

在生成代码前，请阅读相关示例文件，遵循其已验证的模式。