rhino-sdk

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Rhino Health SDK — Workflow Planner & Code Expert

Rhino Health SDK — 工作流规划师与代码专家

Plan-first skill for the
rhino-health
Python SDK (v2.1.x). Takes high-level research and analytics goals, decomposes them into phased execution plans, and generates complete runnable Python code.
面向
rhino-health
Python SDK(v2.1.x)的先规划再执行工具。它能接收高层级的研究与分析目标,将其分解为分阶段执行计划,并生成完整的可运行Python代码。

Context Loading

上下文加载

Before responding, read ALL reference files — planning requires the full SDK picture:
  1. API Reference
    references/sdk_reference.md
    Endpoint classes, methods, enums, CreateInput summaries, dataclass fields, import paths.
  2. Patterns & Gotchas
    references/patterns_and_gotchas.md
    Auth patterns, resource lookup, metrics execution, filtering, code objects, async, and pitfalls.
  3. Metrics Reference
    references/metrics_reference.md
    All 40+ federated metric classes with parameters, import paths, and decision guide.
  4. Example Index
    references/examples/INDEX.md
    Mapping of use cases to working example files with key methods and difficulty levels.
For SDK questions that don't require planning, you may selectively load only the relevant files.
在回复之前,请阅读所有参考文件——规划需要全面了解SDK的相关信息:
  1. API参考
    references/sdk_reference.md
    端点类、方法、枚举、CreateInput摘要、数据类字段、导入路径。
  2. 模式与注意事项
    references/patterns_and_gotchas.md
    认证模式、资源查找、指标执行、过滤、代码对象、异步操作以及常见陷阱。
  3. 指标参考
    references/metrics_reference.md
    所有40余种联邦指标类,包含参数、导入路径和决策指南。
  4. 示例索引
    references/examples/INDEX.md
    用例与可用示例文件的映射关系,包含关键方法和难度等级。
对于无需规划的SDK问题,你可以选择性地仅加载相关文件。

Request Routing

请求路由

Determine what the user needs and follow the appropriate workflow:
User intentAction
High-level goal, multi-step workflow, "plan", "design", "how should I approach"Full planning workflow (Sections 3-6)
"Write code", "generate a script", single-task code generationCode generation with validation (Section 6)
"How do I...", SDK concept questionAnswer from reference files (Section 9)
Error, traceback, "why is this failing"Error diagnosis (Section 8)
"Which metric for...", metric configurationMetric selection (Section 7)
"Show me an example", "sample code"Example matching from
references/examples/INDEX.md
判断用户需求并遵循相应的工作流:
用户意图操作
高层级目标、多步骤工作流、“规划”、“设计”、“我该如何着手”完整规划工作流(第3-6节)
“编写代码”、“生成脚本”、单任务代码生成带验证的代码生成(第6节)
“我该如何...”, SDK概念问题从参考文件中查找答案(第9节)
错误、回溯信息、“为什么这个会失败”错误诊断(第8节)
“哪种指标适用于...”, 指标配置指标选择(第7节)
“给我看示例”、“示例代码”
references/examples/INDEX.md
中匹配示例

Planning Process

规划流程

Follow these four steps for any multi-step goal:
针对任何多步骤目标,请遵循以下四个步骤:

Step 1: Analyze the Goal

步骤1:分析目标

Extract from the user's request:
  • Data: What data sources? Do datasets already exist, or need ingestion/creation?
  • Analysis: What computation? Metrics, custom code, harmonization, or a combination?
  • Output: What does the user want? Numbers, transformed datasets, trained models, exported files?
  • Constraints: Filters (age > 50, gender = F), specific sites, time ranges, target data models (OMOP/FHIR)?
If any of these are unclear, ask the user before producing the plan.
从用户请求中提取以下信息:
  • 数据:数据源是什么?数据集是否已存在,还是需要导入/创建?
  • 分析:需要进行什么计算?指标计算、自定义代码、数据协调,还是组合操作?
  • 输出:用户想要什么结果?数值、转换后的数据集、训练好的模型、导出文件?
  • 约束条件:过滤条件(如年龄>50、性别=F)、特定站点、时间范围、目标数据模型(OMOP/FHIR)?
如果以上任何信息不明确,请在生成计划前询问用户。

Step 2: Select Workflow Templates

步骤2:选择工作流模板

Match the goal to one or more composable SDK pipeline templates:
将目标与一个或多个可组合的SDK管道模板匹配:

Template A: Federated Analytics

模板A:联邦分析

Run statistical metrics across one or more sites without moving data.
Auth → Project → Datasets → Metric Config → Execute → Results
StepSDK MethodNotes
Authenticate
rh.login()
Always first
Get project
session.project.get_project_by_name()
Check for None
Get datasets
project.get_dataset_by_name()
or list all
One per site
Configure metric
Mean(variable=...)
,
Cox(...)
, etc.
Add filters/group_by as needed
Execute per-site
session.dataset.get_dataset_metric(uid, config)
Single site
Execute aggregated
session.project.aggregate_dataset_metric(uids, config)
Cross-site,
List[str]
of UIDs
Execute joined
session.project.joined_dataset_metric(config, query, filter)
Federated join with shared identifiers
Use when: descriptive stats, survival analysis, hypothesis tests, or any metric-based analysis.
在不移动数据的情况下,跨一个或多个站点运行统计指标。
认证 → 项目 → 数据集 → 指标配置 → 执行 → 结果
步骤SDK方法说明
认证
rh.login()
始终是第一步
获取项目
session.project.get_project_by_name()
检查是否为None
获取数据集
project.get_dataset_by_name()
或列出所有数据集
每个站点一个数据集
配置指标
Mean(variable=...)
,
Cox(...)
根据需要添加过滤/分组条件
单站点执行
session.dataset.get_dataset_metric(uid, config)
单个站点
聚合执行
session.project.aggregate_dataset_metric(uids, config)
跨站点,接收UID的
List[str]
联合执行
session.project.joined_dataset_metric(config, query, filter)
带共享标识符的联邦连接
适用场景: 描述性统计、生存分析、假设检验或任何基于指标的分析。

Template B: Code Object Execution

模板B:代码对象执行

Run custom containerized or Python code across federated sites.
Auth → Project → Data Schema → Code Object Create → Build → Run → Wait → Output Datasets
StepSDK MethodNotes
Authenticate
rh.login()
Get/create project
session.project.get_project_by_name()
Get/create schema
session.data_schema.create_data_schema()
Only if new data format
Create code object
session.code_object.create_code_object()
GENERALIZED_COMPUTE
or
PYTHON_CODE
Wait for build
code_object.wait_for_build()
Only for
GENERALIZED_COMPUTE
Run
session.code_object.run_code_object()
input_dataset_uids=[[uid]]
double-nested
Wait for completion
code_run.wait_for_completion()
Access outputs
result.output_dataset_uids.root[0].root[0].root[0]
Triply nested
Use when: custom computation — train/test splits, feature engineering, model training, any logic that metrics alone cannot express.
跨联邦站点运行自定义容器化或Python代码。
认证 → 项目 → 数据架构 → 代码对象创建 → 构建 → 运行 → 等待 → 输出数据集
步骤SDK方法说明
认证
rh.login()
获取/创建项目
session.project.get_project_by_name()
获取/创建架构
session.data_schema.create_data_schema()
仅当需要新数据格式时使用
创建代码对象
session.code_object.create_code_object()
GENERALIZED_COMPUTE
PYTHON_CODE
类型
等待构建完成
code_object.wait_for_build()
仅适用于
GENERALIZED_COMPUTE
类型
运行
session.code_object.run_code_object()
input_dataset_uids=[[uid]]
为双层嵌套
等待执行完成
code_run.wait_for_completion()
访问输出
result.output_dataset_uids.root[0].root[0].root[0]
三层嵌套
适用场景: 自定义计算——训练/测试集划分、特征工程、模型训练,或任何内置指标无法实现的逻辑。

Template C: Data Harmonization

模板C:数据协调

Transform source data into a target data model (OMOP, FHIR, custom).
Auth → Project → Vocabulary → Semantic Mapping → Syntactic Mapping → Config → Run → Output
StepSDK MethodNotes
Authenticate
rh.login()
Get project
session.project.get_project_by_name()
Create semantic mapping
session.semantic_mapping.create_semantic_mapping()
Optional; for vocabulary lookups
Wait for indexing
semantic_mapping.wait_for_completion()
Can be slow (minutes)
Create syntactic mapping
session.syntactic_mapping.create_syntactic_mapping()
Defines column transformations
Generate/set config
session.syntactic_mapping.generate_config()
LLM-based auto-generation or manual
Run harmonization
session.syntactic_mapping.run_data_harmonization()
Preferred path
Wait for completion
code_run.wait_for_completion()
Access outputs
result.output_dataset_uids.root[0].root[0].root[0]
Triply nested
Key harmonization types:
TransformationType.SPECIFIC_VALUE
,
SOURCE_DATA_VALUE
,
ROW_PYTHON
,
TABLE_PYTHON
,
SEMANTIC_MAPPING
,
VLOOKUP
,
DATE
,
SECURE_UUID
.
Target models:
SyntacticMappingDataModel.OMOP
,
.FHIR
,
.CUSTOM
.
Use when: source data needs transformation before analysis — different column names, value encodings, or target standards like OMOP/FHIR.
将源数据转换为目标数据模型(OMOP、FHIR、自定义模型)。
认证 → 项目 → 词汇表 → 语义映射 → 语法映射 → 配置 → 运行 → 输出
步骤SDK方法说明
认证
rh.login()
获取项目
session.project.get_project_by_name()
创建语义映射
session.semantic_mapping.create_semantic_mapping()
可选;用于词汇表查找
等待索引完成
semantic_mapping.wait_for_completion()
可能较慢(数分钟)
创建语法映射
session.syntactic_mapping.create_syntactic_mapping()
定义列转换规则
生成/设置配置
session.syntactic_mapping.generate_config()
基于LLM自动生成或手动设置
运行协调任务
session.syntactic_mapping.run_data_harmonization()
推荐路径
等待执行完成
code_run.wait_for_completion()
访问输出
result.output_dataset_uids.root[0].root[0].root[0]
三层嵌套
主要协调类型:
TransformationType.SPECIFIC_VALUE
SOURCE_DATA_VALUE
ROW_PYTHON
TABLE_PYTHON
SEMANTIC_MAPPING
VLOOKUP
DATE
SECURE_UUID
目标模型:
SyntacticMappingDataModel.OMOP
.FHIR
.CUSTOM
适用场景: 源数据需要在分析前进行转换——例如列名不同、值编码不同,或需要符合OMOP/FHIR等目标标准。

Template D: SQL Data Ingestion

模板D:SQL数据导入

Pull data from an on-prem database into the Rhino platform.
Auth → Project → Connection Details → SQL Query → Import as Dataset → Verify
StepSDK MethodNotes
Authenticate
rh.login()
Get project
session.project.get_project_by_name()
Define connection
ConnectionDetails(server_type=..., server_url=..., ...)
PostgreSQL, MySQL, etc.
Run metrics on query
session.sql_query.run_sql_query(SQLQueryInput(...))
Does NOT return raw data
Import as dataset
session.sql_query.import_dataset_from_sql_query(SQLQueryImportInput(...))
Creates a Dataset from query results
Wait
sql_query.wait_for_completion()
Use when: data lives in a relational database and needs to be brought into the platform as a Dataset.
将本地数据库中的数据导入Rhino平台。
认证 → 项目 → 连接详情 → SQL查询 → 导入为数据集 → 验证
步骤SDK方法说明
认证
rh.login()
获取项目
session.project.get_project_by_name()
定义连接
ConnectionDetails(server_type=..., server_url=..., ...)
支持PostgreSQL、MySQL等
对查询结果运行指标
session.sql_query.run_sql_query(SQLQueryInput(...))
不返回原始数据
导入为数据集
session.sql_query.import_dataset_from_sql_query(SQLQueryImportInput(...))
从查询结果创建数据集
等待完成
sql_query.wait_for_completion()
适用场景: 数据存储在关系型数据库中,需要将其导入平台作为数据集。

Template E: Model Training + Inference

模板E:模型训练+推理

Train a federated model, then run inference on new data. This is Template B applied twice:
  1. Train phase: Code Object with training logic → produces model artifacts
  2. Inference phase:
    session.code_run.run_inference()
    using the trained model
StepSDK MethodNotes
Train (Template B)
create_code_object
run_code_object
wait_for_completion
Full code object lifecycle
Run inference
session.code_run.run_inference(code_run_uid, validation_dataset_uids, ...)
Uses trained model
Get model params
session.code_run.get_model_params(code_run_uid)
Download model weights
Use when: federated ML model training and validation.
训练联邦模型,然后在新数据上运行推理。这是模板B的两次应用:
  1. 训练阶段:包含训练逻辑的代码对象 → 生成模型工件
  2. 推理阶段:使用训练好的模型调用
    session.code_run.run_inference()
步骤SDK方法说明
训练(模板B)
create_code_object
run_code_object
wait_for_completion
完整的代码对象生命周期
运行推理
session.code_run.run_inference(code_run_uid, validation_dataset_uids, ...)
使用训练好的模型
获取模型参数
session.code_run.get_model_params(code_run_uid)
下载模型权重
适用场景: 联邦机器学习模型的训练与验证。

Template F: Multi-Pipeline Composition

模板F:多管道组合

Chain 2+ templates when a single template cannot satisfy the goal:
Goal patternComposition
Harmonize then analyzeTemplate C → Template A
Ingest from SQL then analyzeTemplate D → Template A
Harmonize then train modelTemplate C → Template E
Ingest, harmonize, analyze, trainTemplate D → Template C → Template A → Template E
Custom preprocessing then analyticsTemplate B → Template A
Chaining rule: the output datasets of one phase become the input datasets of the next. Use
result.output_dataset_uids.root[0].root[0].root[0]
to extract UIDs and pass them forward.
当单个模板无法满足目标时,将2个或多个模板链接起来:
目标模式组合方式
先协调再分析模板C → 模板A
从SQL导入后再分析模板D → 模板A
先协调再训练模型模板C → 模板E
导入、协调、分析、训练模板D → 模板C → 模板A → 模板E
自定义预处理后再分析模板B → 模板A
链接规则: 前一阶段的输出数据集作为下一阶段的输入数据集。使用
result.output_dataset_uids.root[0].root[0].root[0]
提取UID并传递给下一阶段。

Step 3: Compose the Plan

步骤3:组合计划

  1. Authentication is always Phase 0 — shared across all phases. Include project and workgroup discovery.
  2. One template per phase — if the goal requires Templates C → A → B, that is three phases plus Phase 0.
  3. Chain outputs to inputs — explicitly state which output from Phase N feeds into Phase N+1.
  4. Add checkpoints — after each phase, include a verification step (print status, check dataset count, verify output exists).
  5. Surface prerequisites — list what must already exist vs. what will be created.
  6. Note alternatives — if there are multiple valid approaches, briefly state why you chose one.
  1. 认证始终是第0阶段——所有阶段共享此步骤。包括项目和工作组的发现。
  2. 每个阶段对应一个模板——如果目标需要模板C→A→B,则包含第0阶段在内共四个阶段。
  3. 将输出链接到输入——明确说明第N阶段的哪个输出将作为第N+1阶段的输入。
  4. 添加检查点——每个阶段结束后,添加验证步骤(如打印状态、检查数据集数量、验证输出是否存在)。
  5. 列出前置条件——列出哪些资源必须已存在,哪些将由本计划创建。
  6. 说明替代方案——如果有多种有效方法,简要说明选择当前方案的原因。

Step 4: Generate Implementation

步骤4:生成实现代码

After presenting the plan, generate the complete runnable code following ALL validation rules in Section 6.
在展示计划后,按照第6节中的所有验证规则生成完整的可运行代码。

Plan Output Format

计划输出格式

Structure every planning response as:
undefined
所有规划回复均需按照以下结构:
undefined

Goal

目标

[1-2 sentence restatement]
[1-2句话重述目标]

Prerequisites

前置条件

  • Must exist: [project, datasets, schemas, workgroup access]
  • Created by this plan: [new code objects, schemas, harmonized datasets]
  • 必须已存在: [项目、数据集、架构、工作组访问权限]
  • 将由本计划创建: [新代码对象、架构、协调后的数据集]

Plan

计划

Phase 0: Setup

第0阶段:设置

  • Authenticate and discover project/workgroup/datasets
  • Checkpoint: print project name and dataset count
  • 认证并发现项目/工作组/数据集
  • 检查点:打印项目名称和数据集数量

Phase 1: [Name] — Template [X]

第1阶段:[名称] — 模板[X]

  • Step 1.1: [description] —
    session.X.method()
  • Step 1.2: [description] —
    session.Y.method()
  • Checkpoint: [how to verify]
  • 步骤1.1:[描述] —
    session.X.method()
  • 步骤1.2:[描述] —
    session.Y.method()
  • 检查点:[验证方式]

Phase 2: [Name] — Template [Y]

第2阶段:[名称] — 模板[Y]

  • Depends on: Phase 1 output datasets
  • Step 2.1: ...
  • Checkpoint: [how to verify]
  • 依赖:第1阶段的输出数据集
  • 步骤2.1:...
  • 检查点:[验证方式]

Alternatives Considered

考虑过的替代方案

[Other approaches and why this plan is preferred]
[其他方法以及选择当前计划的原因]

Implementation

实现代码

[Complete, runnable Python script]
undefined
[完整的可运行Python脚本]
undefined

Decision Guidance

决策指南

When the goal is ambiguous, use this table:
User signalTemplateReasoning
"analyze", "measure", "statistics", "compare"A (Analytics)Metric-based, no custom code needed
"run code", "custom analysis", "process data", "split", "transform"B (Code Object)Needs logic beyond built-in metrics
"harmonize", "OMOP", "FHIR", "map columns", "standardize"C (Harmonization)Data transformation to target model
"SQL", "database", "import from DB", "ingest"D (SQL Ingestion)Data lives in a relational database
"train model", "predict", "inference", "ML"E (Model Train)Federated model training + validation
Multiple of the aboveF (Composition)Chain templates in dependency order
当目标不明确时,请使用以下表格:
用户信号模板理由
“分析”、“测量”、“统计”、“比较”A(分析)基于指标,无需自定义代码
“运行代码”、“自定义分析”、“处理数据”、“划分”、“转换”B(代码对象)需要内置指标之外的逻辑
“协调”、“OMOP”、“FHIR”、“映射列”、“标准化”C(数据协调)需要将数据转换为目标模型
“SQL”、“数据库”、“从DB导入”、“导入”D(SQL导入)数据存储在关系型数据库中
“训练模型”、“预测”、“推理”、“ML”E(模型训练)联邦模型训练与验证
以上多种信号F(组合)按依赖顺序链接模板

Validation Checklist

验证清单

Apply every item to ALL generated code — plans and standalone scripts alike.
所有生成的代码(包括计划和独立脚本)均需满足以下所有条件:

Endpoint Accessors

端点访问器

OperationCorrect accessor
Project-level operations, aggregate/joined metrics
session.project
Dataset-level operations, per-site metrics
session.dataset
Code objects, builds, runs, harmonization
session.code_object
Run status, inference results
session.code_run
SQL queries
session.sql_query
Semantic mappings, vocabularies
session.semantic_mapping
Syntactic mappings, harmonization config
session.syntactic_mapping
Data schemas
session.data_schema
操作正确的访问器
项目级操作、聚合/联合指标
session.project
数据集级操作、单站点指标
session.dataset
代码对象、构建、运行、协调任务
session.code_object
运行状态、推理结果
session.code_run
SQL查询
session.sql_query
语义映射、词汇表
session.semantic_mapping
语法映射、协调配置
session.syntactic_mapping
数据架构
session.data_schema

Environment

环境

  • Default
    rh.login()
    connects to production. For dev/QA/staging, pass
    rhino_api_url
    :
    rh.login(..., rhino_api_url=ApiEnvironment.DEV1_AWS_URL)
  • Import:
    from rhino_health.lib.constants import ApiEnvironment
  • If user mentions dev1/dev2/QA/staging environment, ALWAYS add
    rhino_api_url
    parameter
  • 默认
    rh.login()
    连接到生产环境。对于开发/QA/预发布环境,请传递
    rhino_api_url
    参数:
    rh.login(..., rhino_api_url=ApiEnvironment.DEV1_AWS_URL)
  • 导入语句:
    from rhino_health.lib.constants import ApiEnvironment
  • 如果用户提到dev1/dev2/QA/预发布环境,必须添加
    rhino_api_url
    参数

Import Paths

导入路径

WrongCorrect
from rhino_health.metrics import X
from rhino_health.lib.metrics import X
from rhino_health.endpoints.X import Y
from rhino_health.lib.endpoints.X.X_dataclass import Y
错误写法正确写法
from rhino_health.metrics import X
from rhino_health.lib.metrics import X
from rhino_health.endpoints.X import Y
from rhino_health.lib.endpoints.X.X_dataclass import Y

Metric Calls

指标调用

  • aggregate_dataset_metric
    takes
    List[str]
    of UIDs:
    [str(d.uid) for d in datasets]
  • get_dataset_metric
    takes a single
    dataset_uid: str
  • joined_dataset_metric
    takes
    query_datasets
    and optional
    filter_datasets
    as
    List[str]
  • Metric config objects require
    data_column
    (not
    column
    or
    field
    )
  • FilterVariable
    uses keys:
    data_column
    ,
    filter_column
    ,
    filter_value
    ,
    filter_type
  • aggregate_dataset_metric
    接收UID的
    List[str]
    [str(d.uid) for d in datasets]
  • get_dataset_metric
    接收单个
    dataset_uid: str
  • joined_dataset_metric
    接收
    query_datasets
    和可选的
    filter_datasets
    ,均为
    List[str]
  • 指标配置对象需要
    data_column
    参数(而非
    column
    field
  • FilterVariable
    使用以下键:
    data_column
    filter_column
    filter_value
    filter_type

CreateInput Alias Fields

CreateInput别名字段

Field nameAlias (use this)
project_uid
project
workgroup_uid
workgroup
字段名别名(请使用此别名)
project_uid
project
workgroup_uid
workgroup

Nested Structures & RootModels

嵌套结构与RootModels

  • CodeObjectRunInput.input_dataset_uids
    is
    List[List[str]]
    :
    [[uid1, uid2]]
  • output_dataset_uids
    is triply nested RootModel: access via
    .root[0].root[0].root[0]
  • DataSchema.schema_fields
    is a
    SchemaFields
    RootModel: access list via
    .root
    , names via
    .field_names
  • group_by
    format:
    {"groupings": [{"data_column": "col"}]}
  • data_filters
    list:
    [FilterVariable(data_column="col", filter_column="col", filter_value="val", filter_type=FilterType.EQUALS)]
  • Enum display: use
    .value
    for clean strings (e.g.
    status.value
    'Approved'
    )
  • CodeObjectRunInput.input_dataset_uids
    的类型为
    List[List[str]]
    [[uid1, uid2]]
  • output_dataset_uids
    是三层嵌套的RootModel:通过
    .root[0].root[0].root[0]
    访问
  • DataSchema.schema_fields
    SchemaFields
    类型的RootModel:通过
    .root
    访问列表,通过
    .field_names
    访问字段名
  • group_by
    格式:
    {"groupings": [{"data_column": "col"}]}
  • data_filters
    列表:
    [FilterVariable(data_column="col", filter_column="col", filter_value="val", filter_type=FilterType.EQUALS)]
  • 枚举显示:使用
    .value
    获取清晰的字符串(例如
    status.value
    'Approved'

Async Operations

异步操作

  • Call
    wait_for_build()
    after creating Generalized Compute code objects
  • Call
    wait_for_completion()
    after
    run_code_object()
    ,
    run_data_harmonization()
    ,
    run_sql_query()
  • 创建Generalized Compute类型的代码对象后,调用
    wait_for_build()
  • 调用
    run_code_object()
    run_data_harmonization()
    run_sql_query()
    后,调用
    wait_for_completion()

None Checks

None检查

Every
get_*_by_name()
call must be followed by a None check:
python
dataset = project.get_dataset_by_name("Name")
if dataset is None:
    raise ValueError("Dataset not found")
每个
get_*_by_name()
调用后必须添加None检查:
python
dataset = project.get_dataset_by_name("Name")
if dataset is None:
    raise ValueError("Dataset not found")

Code Template

代码模板

Every generated script must follow this structure:
python
import rhino_health as rh
from getpass import getpass
所有生成的脚本必须遵循以下结构:
python
import rhino_health as rh
from getpass import getpass

... additional imports ...

... 其他导入语句 ...

For non-production environments, add rhino_api_url:

对于非生产环境,请添加rhino_api_url参数:

from rhino_health.lib.constants import ApiEnvironment

from rhino_health.lib.constants import ApiEnvironment

session = rh.login(username="my_email@example.com", password=getpass(),

session = rh.login(username="my_email@example.com", password=getpass(),

rhino_api_url=ApiEnvironment.DEV1_AWS_URL)

rhino_api_url=ApiEnvironment.DEV1_AWS_URL)

session = rh.login(username="my_email@example.com", password=getpass())
PROJECT_NAME = "My Project"
session = rh.login(username="my_email@example.com", password=getpass())
PROJECT_NAME = "My Project"

... constants ...

... 常量定义 ...

project = session.project.get_project_by_name(PROJECT_NAME) if project is None: raise ValueError(f"Project '{PROJECT_NAME}' not found")
project = session.project.get_project_by_name(PROJECT_NAME) if project is None: raise ValueError(f"Project '{PROJECT_NAME}' not found")

... core logic ...

... 核心逻辑 ...

print(result)
undefined
print(result)
undefined

Metric Selection Tree

指标选择树

Map natural language to the right metric class:
User asks about...Metric classCategory
Counts, frequencies
Count
Basic
Averages, means
Mean
Basic
Spread, variability
StandardDeviation
,
Variance
Basic
Totals, sums
Sum
Basic
Percentiles, medians, quartiles
Percentile
,
NPercentile
Quantile
Survival time, time-to-event
KaplanMeier
Survival
Hazard ratios, covariates + survival
Cox
Survival
ROC curves, AUC
RocAuc
ROC/AUC
ROC with confidence intervals
RocAucWithCI
ROC/AUC
Correlation between variables
Pearson
,
Spearman
Statistics
Inter-rater reliability
ICC
Statistics
Compare two group means
TTest
Statistics
Compare 3+ group means
OneWayANOVA
Statistics
Categorical association
ChiSquare
Statistics
2x2 contingency table
TwoByTwoTable
Epidemiology
Odds ratio
OddsRatio
Epidemiology
Risk ratio / relative risk
RiskRatio
Epidemiology
Risk difference
RiskDifference
Epidemiology
Incidence rates
Incidence
Epidemiology
All metrics:
from rhino_health.lib.metrics import ClassName
将自然语言查询映射到正确的指标类:
用户询问的内容指标类分类
计数、频率
Count
基础指标
平均值、均值
Mean
基础指标
离散度、变异性
StandardDeviation
,
Variance
基础指标
总和、总计
Sum
基础指标
百分位数、中位数、四分位数
Percentile
,
NPercentile
分位数指标
生存时间、事件发生时间
KaplanMeier
生存分析
风险比、协变量+生存分析
Cox
生存分析
ROC曲线、AUC
RocAuc
ROC/AUC
带置信区间的ROC
RocAucWithCI
ROC/AUC
变量间相关性
Pearson
,
Spearman
统计分析
评分者信度
ICC
统计分析
两组均值比较
TTest
统计分析
三组及以上均值比较
OneWayANOVA
统计分析
分类变量关联性
ChiSquare
统计分析
2x2列联表
TwoByTwoTable
流行病学
比值比
OddsRatio
流行病学
风险比/相对风险
RiskRatio
流行病学
风险差
RiskDifference
流行病学
发病率
Incidence
流行病学
所有指标的导入方式:
from rhino_health.lib.metrics import ClassName

Execution modes

执行模式

ScopeMethod
Single site
session.dataset.get_dataset_metric(dataset_uid, config)
Aggregated across sites
session.project.aggregate_dataset_metric(dataset_uids, config)
List[str]
UIDs
Federated join
session.project.joined_dataset_metric(config, query_datasets, filter_datasets)
范围方法
单站点
session.dataset.get_dataset_metric(dataset_uid, config)
跨站点聚合
session.project.aggregate_dataset_metric(dataset_uids, config)
— 接收UID的
List[str]
联邦连接
session.project.joined_dataset_metric(config, query_datasets, filter_datasets)

Filtering example

过滤示例

python
from rhino_health.lib.metrics import Mean, FilterType, FilterVariable

config = Mean(
    variable="Height",
    data_filters=[
        FilterVariable(
            data_column="Gender",
            filter_column="Gender",
            filter_value="Female",
            filter_type=FilterType.EQUALS,
        )
    ],
    group_by={"groupings": ["Gender"]},
)
python
from rhino_health.lib.metrics import Mean, FilterType, FilterVariable

config = Mean(
    variable="Height",
    data_filters=[
        FilterVariable(
            data_column="Gender",
            filter_column="Gender",
            filter_value="Female",
            filter_type=FilterType.EQUALS,
        )
    ],
    group_by={"groupings": ["Gender"]},
)

Error-to-Fix Reference

错误与修复参考

When the user encounters an error, diagnose using this table:
Error patternRoot causeFix
NotAuthenticatedError
/ HTTP 401
Token expired, wrong creds, or MFARe-login; pass
otp_code
if MFA enabled
HTTP 401 with correct credentialsWrong environment URLAdd
rhino_api_url=ApiEnvironment.DEV1_AWS_URL
(or QA/staging). Default is production
AttributeError: 'NoneType'
get_*_by_name()
returned None
Add None check after every
get_*_by_name()
ValidationError
(pydantic)
Wrong field names — alias confusionUse aliases:
project
not
project_uid
,
workgroup
not
workgroup_uid
TypeError
in metric config
String where FilterVariable expectedUse
FilterVariable(data_column=..., filter_column=..., filter_value=..., filter_type=...)
ImportError
/
ModuleNotFoundError
Wrong import path
from rhino_health.lib.metrics import X
(NOT
rhino_health.metrics
)
TypeError: aggregate_dataset_metric()
List[Dataset]
instead of
List[str]
Convert:
[str(d.uid) for d in datasets]
IndexError
on
output_dataset_uids
Accessing as flat listUse
.root[0].root[0].root[0]
(triply nested RootModel)
TypeError
/
AttributeError
on
schema_fields
SchemaFields
is a RootModel, not a list
Use
schema.schema_fields.root
for the list,
.field_names
for names
TimeoutError
/ operation hangs
Default timeout too lowIncrease
timeout_seconds
in
wait_for_completion()
TypeError: input_dataset_uids
List[str]
instead of
List[List[str]]
Must be double-nested:
[[uid1, uid2]]
KeyError
/ None in metric results
Wrong
data_column
name
Verify column name matches dataset schema (case-sensitive)
Enum shows full path (e.g.
Status.APPROVED
)
Printing enum object directlyUse
.value
for clean string:
status.value
'Approved'
ValidationError
on enum field (e.g.
indexing_status
)
SDK/API version mismatch — backend added new valueUse
session.get()
raw API escape hatch (§17 in patterns_and_gotchas.md), or
pip install --upgrade rhino-health
Diagnostic process: identify exception class → locate failing SDK call → cross-reference correct signature in
references/sdk_reference.md
→ check for compound errors.
当用户遇到错误时,请使用以下表格进行诊断:
错误模式根本原因修复方法
NotAuthenticatedError
/ HTTP 401
令牌过期、凭据错误或MFA未验证重新登录;如果启用了MFA,请传递
otp_code
参数
凭据正确但仍返回HTTP 401环境URL错误添加
rhino_api_url=ApiEnvironment.DEV1_AWS_URL
(或QA/预发布环境的URL)。默认连接的是生产环境
AttributeError: 'NoneType'
get_*_by_name()
返回了None
在每个
get_*_by_name()
调用后添加None检查
ValidationError
(pydantic)
字段名错误——混淆了别名使用别名:
project
而非
project_uid
workgroup
而非
workgroup_uid
指标配置中的
TypeError
应该传入FilterVariable的地方传入了字符串使用
FilterVariable(data_column=..., filter_column=..., filter_value=..., filter_type=...)
ImportError
/
ModuleNotFoundError
导入路径错误使用
from rhino_health.lib.metrics import X
(而非
rhino_health.metrics
aggregate_dataset_metric()
TypeError
传入了
List[Dataset]
而非
List[str]
转换为:
[str(d.uid) for d in datasets]
output_dataset_uids
IndexError
当作扁平列表访问使用
.root[0].root[0].root[0]
(三层嵌套的RootModel)
schema_fields
TypeError
/
AttributeError
SchemaFields
是RootModel,而非列表
使用
schema.schema_fields.root
获取列表,使用
.field_names
获取字段名
TimeoutError
/ 操作挂起
默认超时时间过短
wait_for_completion()
中增加
timeout_seconds
参数
input_dataset_uids
TypeError
传入了
List[str]
而非
List[List[str]]
必须是双层嵌套:
[[uid1, uid2]]
指标结果中的
KeyError
/ None
data_column
名称错误
验证列名是否与数据集架构匹配(区分大小写)
枚举显示完整路径(如
Status.APPROVED
直接打印枚举对象使用
.value
获取清晰的字符串:
status.value
'Approved'
枚举字段的
ValidationError
(如
indexing_status
SDK/API版本不匹配——后端添加了新值使用
session.get()
原始API绕过方法(见patterns_and_gotchas.md第17节),或执行
pip install --upgrade rhino-health
升级SDK
诊断流程:识别异常类 → 定位失败的SDK调用 → 在
references/sdk_reference.md
中交叉引用正确的方法签名 → 检查是否存在复合错误。

Question Routing

问题路由

For non-planning SDK questions, locate the right context section:
Question typeSource fileSection
Authentication, login, MFApatterns_and_gotchas.md§1
Finding projects/datasets by namepatterns_and_gotchas.md§2
Creating/updating resources (upsert)patterns_and_gotchas.md§3
Running per-site or aggregated metricspatterns_and_gotchas.md§4
Filtering datapatterns_and_gotchas.md§5
Group-by analysispatterns_and_gotchas.md§6
Federated joinspatterns_and_gotchas.md§7
Code objects (create, build, run)patterns_and_gotchas.md§8
Async operations / waitingpatterns_and_gotchas.md§9
Correct import pathspatterns_and_gotchas.md§11
Environment URL (dev1, QA, staging)patterns_and_gotchas.md§13
RootModel access (SchemaFields, output UIDs)patterns_and_gotchas.md§14
Semantic mapping entries / datapatterns_and_gotchas.md§15
Session persistence / SSOpatterns_and_gotchas.md§16
SDK crash on valid API data, ValidationError on enumpatterns_and_gotchas.md§17
Raw API calls, session.get(), bypassing Pydanticpatterns_and_gotchas.md§17
Vocabularies, vocabulary typessdk_reference.md§SemanticMappingEndpoints, §Key Enums
Data schema fields, column infosdk_reference.md§DataSchema, §SchemaFields
Specific endpoint methodssdk_reference.md§[EndpointName]Endpoints
Enums and constantssdk_reference.md§Key Enums
API environment URLssdk_reference.md§ApiEnvironment
Metric configurationmetrics_reference.md§[Category]
"Which metric for...?"metrics_reference.md§Quick Decision Guide
对于非规划类的SDK问题,请定位到正确的上下文部分:
问题类型源文件章节
认证、登录、MFApatterns_and_gotchas.md第1节
通过名称查找项目/数据集patterns_and_gotchas.md第2节
创建/更新资源(upsert)patterns_and_gotchas.md第3节
运行单站点或聚合指标patterns_and_gotchas.md第4节
数据过滤patterns_and_gotchas.md第5节
分组分析patterns_and_gotchas.md第6节
联邦连接patterns_and_gotchas.md第7节
代码对象(创建、构建、运行)patterns_and_gotchas.md第8节
异步操作 / 等待patterns_and_gotchas.md第9节
正确的导入路径patterns_and_gotchas.md第11节
环境URL(dev1、QA、预发布)patterns_and_gotchas.md第13节
RootModel访问(SchemaFields、输出UID)patterns_and_gotchas.md第14节
语义映射条目 / 数据patterns_and_gotchas.md第15节
会话持久化 / SSOpatterns_and_gotchas.md第16节
SDK在有效API数据上崩溃、枚举的ValidationErrorpatterns_and_gotchas.md第17节
原始API调用、session.get()、绕过Pydanticpatterns_and_gotchas.md第17节
词汇表、词汇表类型sdk_reference.md§SemanticMappingEndpoints、§Key Enums
数据架构字段、列信息sdk_reference.md§DataSchema、§SchemaFields
特定端点方法sdk_reference.md§[EndpointName]Endpoints
枚举与常量sdk_reference.md§Key Enums
API环境URLsdk_reference.md§ApiEnvironment
指标配置metrics_reference.md§[分类]
“哪种指标适用于...?”metrics_reference.md§快速决策指南

Working Examples

可用示例

Match the user's goal to verified working examples from
references/examples/INDEX.md
:
TemplateExample files
A (Analytics)
eda.py
,
cox.py
,
metrics_examples.py
,
roc_analysis.py
,
aggregate_quantile.py
,
federated_join.py
B (Code Object)
train_test_split.py
,
runtime_external_files.py
C (Harmonization)
fhir_pipeline.py
D (SQL Ingestion)
sql_data_ingestion.py
E (Model Training)
train_test_split.py
(training portion)
F (Composition)
fhir_pipeline.py
(harmonization + code object + export)
Read the relevant example file before generating code to follow its proven patterns.
将用户的目标与
references/examples/INDEX.md
中经过验证的可用示例匹配:
模板示例文件
A(分析)
eda.py
,
cox.py
,
metrics_examples.py
,
roc_analysis.py
,
aggregate_quantile.py
,
federated_join.py
B(代码对象)
train_test_split.py
,
runtime_external_files.py
C(数据协调)
fhir_pipeline.py
D(SQL导入)
sql_data_ingestion.py
E(模型训练)
train_test_split.py
(训练部分)
F(组合)
fhir_pipeline.py
(协调 + 代码对象 + 导出)
在生成代码前,请阅读相关示例文件,遵循其已验证的模式。