tooluniverse-multiomic-disease-characterization

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Multi-Omics Disease Characterization Pipeline

多组学疾病特征分析流程

Characterize diseases across multiple molecular layers (genomics, transcriptomics, proteomics, pathways) to provide systems-level understanding of disease mechanisms, identify therapeutic opportunities, and discover biomarker candidates.

KEY PRINCIPLES:

Report-first approach - Create report file FIRST, then populate progressively
Disease disambiguation FIRST - Resolve all identifiers before omics analysis
Layer-by-layer analysis - Systematically cover all omics layers
Cross-layer integration - Identify genes/targets appearing in multiple layers
Evidence grading - Grade all evidence as T1 (human/clinical) to T4 (computational)
Tissue context - Emphasize disease-relevant tissues/organs
Quantitative scoring - Multi-Omics Confidence Score (0-100)
Druggable focus - Prioritize targets with therapeutic potential
Biomarker identification - Highlight diagnostic/prognostic markers
Mechanistic synthesis - Generate testable hypotheses
Source references - Every statement must cite tool/database
Completeness checklist - Mandatory section showing analysis coverage
English-first queries - Always use English terms in tool calls. Respond in user's language

从基因组学、转录组学、蛋白质组学、通路等多个分子层面对疾病进行特征分析，助力从系统层面理解疾病机制、识别治疗机会并发现候选生物标志物。

核心原则:

报告优先原则 - 先创建报告文件，再逐步填充内容
疾病消歧优先 - 在组学分析前解析所有标识符
逐层分析 - 系统覆盖所有组学层面
跨层整合 - 识别出现在多个层面的基因/靶点
证据分级 - 将所有证据分为T1（人类/临床）至T4（计算）等级
组织背景 - 强调疾病相关组织/器官
定量评分 - 多组学置信度评分（0-100）
可成药性聚焦 - 优先关注具有治疗潜力的靶点
生物标志物识别 - 突出诊断/预后标志物
机制合成 - 生成可验证的假说
来源引用 - 所有陈述必须标注工具/数据库来源
完整性检查清单 - 强制包含分析覆盖情况的章节
英文优先查询 - 工具调用中始终使用英文术语，以用户语言回复

When to Use This Skill

何时使用本技能

Apply when users:

Ask about disease mechanisms across omics layers
Need multi-omics characterization of a disease
Want to understand disease at the systems biology level
Ask "What pathways/genes/proteins are involved in [disease]?"
Need biomarker discovery for a disease
Want to identify druggable targets from disease profiling
Ask for integrated genomics + transcriptomics + proteomics analysis
Need cross-layer concordance analysis
Ask about disease network biology / hub genes

NOT for (use other skills instead):

Single gene/target validation -> Use
```
tooluniverse-drug-target-validation
```
Drug safety profiling -> Use
```
tooluniverse-adverse-event-detection
```
General disease overview -> Use
```
tooluniverse-disease-research
```
Variant interpretation -> Use
```
tooluniverse-variant-interpretation
```
GWAS-specific analysis -> Use
```
tooluniverse-gwas-*
```
skills
Pathway-only analysis -> Use
```
tooluniverse-systems-biology
```

适用于用户以下场景:

询问跨组学层面的疾病机制
需要对疾病进行多组学特征分析
希望从系统生物学层面理解疾病
询问“[疾病]涉及哪些通路/基因/蛋白质？”
需要为疾病发现生物标志物
希望从疾病分析中识别可成药靶点
请求整合基因组学+转录组学+蛋白质组学分析
需要跨层一致性分析
询问疾病网络生物学/枢纽基因

不适用于（请使用其他技能）:

单基因/靶点验证 -> 使用
```
tooluniverse-drug-target-validation
```
药物安全性分析 -> 使用
```
tooluniverse-adverse-event-detection
```
疾病概述 -> 使用
```
tooluniverse-disease-research
```
变异解读 -> 使用
```
tooluniverse-variant-interpretation
```
特定GWAS分析 -> 使用
```
tooluniverse-gwas-*
```
系列技能
仅通路分析 -> 使用
```
tooluniverse-systems-biology
```

Input Parameters

输入参数

Parameter	Required	Description	Example
disease	Yes	Disease name, OMIM ID, EFO ID, or MONDO ID	`Alzheimer disease` , `MONDO_0004975`
tissue	No	Tissue/organ of interest	`brain` , `liver` , `blood`
focus_layers	No	Specific omics layers to emphasize	`genomics` , `transcriptomics` , `pathways`

参数	是否必填	描述	示例
disease	是	疾病名称、OMIM ID、EFO ID或MONDO ID	`阿尔茨海默病` , `MONDO_0004975`
tissue	否	目标组织/器官	`大脑` , `肝脏` , `血液`
focus_layers	否	需要重点分析的特定组学层面	`genomics` , `transcriptomics` , `pathways`

Multi-Omics Confidence Score (0-100)

多组学置信度评分（0-100）

Score Components

评分组成

Data Availability (0-40 points):

Genomics data available (GWAS or rare variants): 10 points
Transcriptomics data available (DEGs or expression): 10 points
Protein data available (PPI or expression): 5 points
Pathway data available (enriched pathways): 10 points
Clinical/drug data available (approved drugs or trials): 5 points

Evidence Concordance (0-40 points):

Multi-layer genes (appear in 3+ layers): up to 20 points (2 per gene, max 10 genes)
Consistent direction (genetics + expression concordant): 10 points
Pathway-gene concordance (genes found in enriched pathways): 10 points

Evidence Quality (0-20 points):

Strong genetic evidence (GWAS p < 5e-8): 10 points
Clinical validation (approved drugs): 10 points

数据可用性（0-40分）:

有基因组数据（GWAS或罕见变异）：10分
有转录组数据（差异表达基因或表达量）：10分
有蛋白质数据（蛋白质相互作用或表达量）：5分
有通路数据（富集通路）：10分
有临床/药物数据（已获批药物或临床试验）：5分

证据一致性（0-40分）:

跨多层面基因（出现在3个及以上层面）：最高20分（每个基因2分，最多10个基因）
方向一致（遗传学与表达量结果一致）：10分
通路-基因一致性（基因存在于富集通路中）：10分

证据质量（0-20分）:

强遗传学证据（GWAS p < 5e-8）：10分
临床验证（已获批药物）：10分

Score Interpretation

评分解读

Score	Tier	Interpretation
80-100	Excellent	Comprehensive multi-omics coverage, high confidence, strong cross-layer concordance
60-79	Good	Good coverage across most layers, some gaps
40-59	Moderate	Moderate coverage, limited cross-layer integration
0-39	Limited	Limited data, single-layer analysis dominates

分数	等级	解读
80-100	优秀	全面的多组学覆盖，高置信度，强跨层一致性
60-79	良好	多数层面覆盖良好，存在部分缺口
40-59	中等	中等覆盖度，跨层整合有限
0-39	有限	数据有限，以单层面分析为主

Evidence Grading System

证据分级系统

Tier	Symbol	Criteria	Examples
T1	[T1]	Direct human evidence, clinical proof	FDA-approved drug, GWAS hit (p<5e-8), clinical trial result
T2	[T2]	Experimental evidence	Differential expression (validated), functional screen, mouse KO
T3	[T3]	Computational/database evidence	PPI network, pathway mapping, expression correlation
T4	[T4]	Annotation/prediction only	GO annotation, text-mined association, predicted interaction

等级	符号	标准	示例
T1	[T1]	直接人类证据、临床验证	FDA获批药物、GWAS显著关联（p<5e-8）、临床试验结果
T2	[T2]	实验证据	差异表达（已验证）、功能筛选、基因敲除小鼠
T3	[T3]	计算/数据库证据	蛋白质相互作用网络、通路映射、表达量相关性
T4	[T4]	仅注释/预测	GO注释、文本挖掘关联、预测相互作用

Report Template

报告模板

Create this file structure at the start:

{disease_name}_multiomic_report.md

markdown

undefined

开始时创建以下文件结构:

{disease_name}_multiomic_report.md

markdown

undefined

Multi-Omics Disease Characterization: {Disease Name}

多组学疾病特征分析: {疾病名称}

Report Generated: {date} Disease Identifiers: (to be filled) Multi-Omics Confidence Score: (to be calculated)

报告生成时间: {日期} 疾病标识符: （待填充） 多组学置信度评分: （待计算）

Executive Summary

执行摘要

(2-3 sentence disease mechanism synthesis - fill after all layers complete)

（2-3句话总结疾病机制 - 完成所有层面分析后填充）

1. Disease Definition & Context

1. 疾病定义与背景

Disease Identifiers

疾病标识符

System	ID	Source

系统	ID	来源

Description

疾病描述

Synonyms

同义词

Disease Hierarchy (parents/children)

疾病层级（父类/子类）

Affected Tissues/Organs

受影响组织/器官

Therapeutic Areas

治疗领域

Sources: (tools used)

来源: （使用的工具）

2. Genomics Layer

2. 基因组学层面

2.1 GWAS Associations

2.1 GWAS关联

SNP	P-value	Effect	Gene	Study	Source

SNP	P值	效应	基因	研究	来源

2.2 GWAS Studies Summary

2.2 GWAS研究汇总

Study ID	Trait	Sample Size	Year	Source

研究ID	性状	样本量	年份	来源

2.3 Associated Genes (Genetic Evidence)

2.3 关联基因（遗传学证据）

Gene	Ensembl ID	Association Score	Evidence Type	Source

基因	Ensembl ID	关联评分	证据类型	来源

2.4 Rare Variants (ClinVar)

2.4 罕见变异（ClinVar）

Variant	Gene	Clinical Significance	Source

变异	基因	临床意义	来源

Genomics Layer Summary

基因组学层面总结

Total GWAS hits:
Top genes by genetic evidence:
Genetic architecture:

Sources: (tools used)

总GWAS关联数:
遗传学证据排名靠前的基因:
遗传结构:

来源: （使用的工具）

3. Transcriptomics Layer

3. 转录组学层面

3.1 Differential Expression Studies

3.1 差异表达研究

Experiment	Condition	Up-regulated	Down-regulated	Source

实验	条件	上调基因	下调基因	来源

3.2 Expression Atlas Disease Evidence

3.2 Expression Atlas疾病证据

Gene	Score	Source

基因	评分	来源

3.3 Tissue Expression Patterns (GTEx/HPA)

3.3 组织表达模式（GTEx/HPA）

Gene	Tissue	Expression Level	Source

基因	组织	表达水平	来源

3.4 Biomarker Candidates (Expression-Based)

3.4 候选生物标志物（基于表达量）

Gene	Tissue Specificity	Fold Change	Evidence	Source

基因	组织特异性	倍数变化	证据	来源

Transcriptomics Layer Summary

转录组学层面总结

Differential expression datasets:
Top DEGs:
Tissue-specific patterns:

Sources: (tools used)

差异表达数据集:
排名靠前的差异表达基因:
组织特异性模式:

来源: （使用的工具）

4. Proteomics & Interaction Layer

4. 蛋白质组学与相互作用层面

4.1 Protein-Protein Interactions (STRING)

4.1 蛋白质-蛋白质相互作用（STRING）

Protein A	Protein B	Score	Source

蛋白质A	蛋白质B	评分	来源

4.2 Hub Genes (Network Centrality)

4.2 枢纽基因（网络中心性）

Gene	Degree	Betweenness	Role	Source

基因	度	介数	作用	来源

4.3 Protein Complexes (IntAct)

4.3 蛋白质复合物（IntAct）

Complex	Members	Function	Source

复合物	成员	功能	来源

4.4 Tissue-Specific PPI Network

4.4 组织特异性蛋白质相互作用网络

Gene	Interaction Score	Tissue	Source

基因	相互作用评分	组织	来源

Proteomics Layer Summary

蛋白质组学层面总结

Total PPIs:
Hub genes:
Network modules:

Sources: (tools used)

总蛋白质相互作用数:
枢纽基因:
网络模块:

来源: （使用的工具）

5. Pathway & Network Layer

5. 通路与网络层面

5.1 Enriched Pathways (Enrichr/Reactome)

5.1 富集通路（Enrichr/Reactome）

Pathway	Database	P-value	Genes	Source

通路	数据库	P值	基因	来源

5.2 Reactome Pathway Details

5.2 Reactome通路详情

Pathway ID	Name	Genes Involved	Source

通路ID	名称	涉及基因	来源

5.3 KEGG Pathways

5.3 KEGG通路

Pathway ID	Name	Description	Source

通路ID	名称	描述	来源

5.4 WikiPathways

Pathway ID	Name	Organism	Source

通路ID	名称	物种	来源

Pathway Layer Summary

通路层面总结

Top enriched pathways:
Key pathway nodes:
Cross-pathway connections:

Sources: (tools used)

排名靠前的富集通路:
关键通路节点:
通路间关联:

来源: （使用的工具）

6. Gene Ontology & Functional Annotation

6. 基因本体与功能注释

6.1 Biological Processes

6.1 生物过程

GO Term	Name	P-value	Genes	Source

GO术语	名称	P值	基因	来源

6.2 Molecular Functions

6.2 分子功能

GO Term	Name	P-value	Genes	Source

GO术语	名称	P值	基因	来源

6.3 Cellular Components

6.3 细胞组分

GO Term	Name	P-value	Genes	Source

Sources: (tools used)

GO术语	名称	P值	基因	来源

来源: （使用的工具）

7. Therapeutic Landscape

7. 治疗全景

7.1 Approved Drugs

7.1 已获批药物

Drug	ChEMBL ID	Mechanism	Target	Phase	Source

药物	ChEMBL ID	作用机制	靶点	研发阶段	来源

7.2 Druggable Targets

7.2 可成药靶点

Gene	Tractability	Modality	Clinical Precedent	Source

基因	可成药性	作用方式	临床先例	来源

7.3 Drug Repurposing Candidates

7.3 药物重定位候选

Drug	Original Indication	Mechanism	Target	Source

药物	原适应症	作用机制	靶点	来源

7.4 Clinical Trials

7.4 临床试验

NCT ID	Title	Phase	Status	Intervention	Source

NCT ID	标题	阶段	状态	干预措施	来源

Therapeutic Summary

治疗全景总结

Approved drugs:
Clinical pipeline:
Novel targets:

Sources: (tools used)

已获批药物:
临床管线:
新型靶点:

来源: （使用的工具）

8. Multi-Omics Integration

8. 多组学整合

8.1 Cross-Layer Gene Concordance

8.1 跨层基因一致性

Gene	Genomics	Transcriptomics	Proteomics	Pathways	Layers	Evidence Tier

基因	基因组学	转录组学	蛋白质组学	通路	涉及层面数	证据等级

8.2 Multi-Omics Hub Genes (Top 20)

8.2 多组学枢纽基因（前20位）

Rank	Gene	Layers Found	Key Evidence	Druggable	Source

排名	基因	涉及层面数	关键证据	可成药性	来源

8.3 Biomarker Candidates

8.3 候选生物标志物

Biomarker	Type	Evidence Layers	Confidence	Source

生物标志物	类型	支持证据层面	置信度	来源

8.4 Mechanistic Hypotheses

8.4 机制假说

(Hypothesis with supporting evidence from multiple layers)
...

（基于多层面证据支持的假说）
...

8.5 Systems-Level Insights

8.5 系统层面洞察

Key disrupted processes:
Critical pathway nodes:
Therapeutic intervention points:
Testable hypotheses:

关键失调过程:
关键通路节点:
治疗干预点:
可验证假说:

Multi-Omics Confidence Score

多组学置信度评分

Component	Points	Max	Details
Genomics data		10
Transcriptomics data		10
Protein data		5
Pathway data		10
Clinical data		5
Multi-layer genes		20
Direction concordance		10
Pathway-gene concordance		10
Genetic evidence quality		10
Clinical validation		10
TOTAL		100

Score: XX/100 - [Tier]

组成部分	得分	满分	详情
基因组数据		10
转录组数据		10
蛋白质数据		5
通路数据		10
临床数据		5
跨多层面基因		20
方向一致性		10
通路-基因一致性		10
遗传学证据质量		10
临床验证		10
总分		100

评分: XX/100 - [等级]

Data Availability Checklist

数据可用性检查清单

Omics Layer	Data Available	Tools Used	Findings
Genomics (GWAS)	Yes/No
Genomics (Rare Variants)	Yes/No
Transcriptomics (DEGs)	Yes/No
Transcriptomics (Expression)	Yes/No
Proteomics (PPI)	Yes/No
Proteomics (Expression)	Yes/No
Pathways (Enrichment)	Yes/No
Pathways (KEGG/Reactome)	Yes/No
Gene Ontology	Yes/No
Drugs/Therapeutics	Yes/No
Clinical Trials	Yes/No
Literature	Yes/No

组学层面	数据是否可用	使用工具	发现
基因组学（GWAS）	是/否
基因组学（罕见变异）	是/否
转录组学（差异表达基因）	是/否
转录组学（表达量）	是/否
蛋白质组学（蛋白质相互作用）	是/否
蛋白质组学（表达量）	是/否
通路（富集分析）	是/否
通路（KEGG/Reactome）	是/否
基因本体	是/否
药物/治疗	是/否
临床试验	是/否
文献	是/否

Completeness Checklist

完整性检查清单

References

参考文献

Data Sources Used

使用的数据源

#	Tool	Parameters	Section	Items Retrieved

#	工具	参数	章节	检索条目数

Database Versions

数据库版本

OpenTargets: (current)
GWAS Catalog: (current)
STRING: (current)
Reactome: (current)

---

OpenTargets: （当前版本）
GWAS Catalog: （当前版本）
STRING: （当前版本）
Reactome: （当前版本）

---

Phase 0: Disease Disambiguation (ALWAYS FIRST)

阶段0：疾病消歧（始终优先执行）

Objective: Resolve disease to standard identifiers for all downstream queries.

目标: 解析疾病对应的标准标识符，用于所有下游查询。

Tools Used

使用工具

OpenTargets_get_disease_id_description_by_name (primary):

Input:
```
diseaseName
```
(string) - Disease name

Output:

{data: {search: {hits: [{id, name, description}]}}}

Use: Get MONDO/EFO IDs and description
CRITICAL: Disease IDs from OpenTargets use underscore format (e.g.,
```
MONDO_0004975
```
), NOT colon format

OSL_get_efo_id_by_disease_name (secondary):

Input:
```
disease
```
(string) - Disease name
Output:
```
{efo_id, name}
```
Use: Get EFO/MONDO ID

OpenTargets_get_disease_description_by_efoId:

Input:
```
efoId
```
(string) - Disease ID (e.g.,
```
MONDO_0004975
```
)

Output:

{data: {disease: {id, name, description, dbXRefs}}}

Use: Get full description, cross-references (OMIM, UMLS, DOID, etc.)

OpenTargets_get_disease_synonyms_by_efoId:

Input:
```
efoId
```
(string)

Output:

{data: {disease: {id, name, synonyms: [{relation, terms}]}}}

OpenTargets_get_disease_therapeutic_areas_by_efoId:

Input:
```
efoId
```
(string)

Output:

{data: {disease: {id, name, therapeuticAreas: [{id, name}]}}}

OpenTargets_get_disease_ancestors_parents_by_efoId:

Input:
```
efoId
```
(string)

Output:

{data: {disease: {id, name, ancestors: [{id, name}]}}}

OpenTargets_get_disease_descendants_children_by_efoId:

Input:
```
efoId
```
(string)

Output:

{data: {disease: {id, name, descendants: [{id, name}]}}}

OpenTargets_map_any_disease_id_to_all_other_ids:

Input:
```
inputId
```
(string) - Any known disease ID (e.g.,
```
OMIM:104300
```
,
```
UMLS:C0002395
```
)

Output:

{data: {disease: {id, name, dbXRefs: [str], ...}}}

Use: Cross-map between OMIM, UMLS, ICD10, DOID, etc.

OpenTargets_get_disease_id_description_by_name（主要工具）:

输入:
```
diseaseName
```
（字符串）- 疾病名称

输出:

{data: {search: {hits: [{id, name, description}]}}}

用途: 获取MONDO/EFO ID及疾病描述
关键提示: OpenTargets返回的疾病ID使用下划线格式（如
```
MONDO_0004975
```
），而非冒号格式

OSL_get_efo_id_by_disease_name（次要工具）:

输入:
```
disease
```
（字符串）- 疾病名称
输出:
```
{efo_id, name}
```
用途: 获取EFO/MONDO ID

OpenTargets_get_disease_description_by_efoId:

输入:
```
efoId
```
（字符串）- 疾病ID（如
```
MONDO_0004975
```
）

输出:

{data: {disease: {id, name, description, dbXRefs}}}

用途: 获取完整疾病描述及交叉引用（OMIM、UMLS、DOID等）

OpenTargets_get_disease_synonyms_by_efoId:

输入:
```
efoId
```
（字符串）

输出:

{data: {disease: {id, name, synonyms: [{relation, terms}]}}}

OpenTargets_get_disease_therapeutic_areas_by_efoId:

输入:
```
efoId
```
（字符串）

输出:

{data: {disease: {id, name, therapeuticAreas: [{id, name}]}}}

OpenTargets_get_disease_ancestors_parents_by_efoId:

输入:
```
efoId
```
（字符串）

输出:

{data: {disease: {id, name, ancestors: [{id, name}]}}}

OpenTargets_get_disease_descendants_children_by_efoId:

输入:
```
efoId
```
（字符串）

输出:

{data: {disease: {id, name, descendants: [{id, name}]}}}

OpenTargets_map_any_disease_id_to_all_other_ids:

输入:
```
inputId
```
（字符串）- 已知的任意疾病ID（如
```
OMIM:104300
```
,
```
UMLS:C0002395
```
）

输出:

{data: {disease: {id, name, dbXRefs: [str], ...}}}

用途: 在OMIM、UMLS、ICD10、DOID等标识符间进行交叉映射

Workflow

工作流程

Search by disease name to get primary ID (OpenTargets)
Get full description and cross-references
Get synonyms for search term expansion
Get therapeutic areas for context
Get disease hierarchy (parents/children)
If user provided OMIM/other ID, map to MONDO/EFO first

通过疾病名称搜索获取主ID（OpenTargets）
获取完整疾病描述及交叉引用
获取同义词以扩展搜索词
获取治疗领域背景信息
获取疾病层级（父类/子类）
如果用户提供了OMIM或其他ID，先映射为MONDO/EFO ID

Collision-Aware Search

冲突感知搜索

When disease name returns multiple hits:

Check if user's input matches any hit exactly
If ambiguous, present top 3-5 options and ask user to select
Always prefer the most specific disease (not parent categories)
For cancer, prefer the specific tumor type over generic "cancer"

当疾病名称返回多个结果时:

检查用户输入是否与任一结果完全匹配
若存在歧义，展示前3-5个选项并请用户选择
始终优先选择最具体的疾病（而非父类范畴）
对于癌症，优先选择特定肿瘤类型而非通用的“癌症”

Key Disease IDs to Track

需要跟踪的关键疾病ID

After disambiguation, store these for all downstream queries:

```
efo_id
```
- Primary ID for OpenTargets queries (e.g.,
```
MONDO_0004975
```
)
```
disease_name
```
- Canonical name (e.g.,
```
Alzheimer disease
```
)
```
synonyms
```
- For literature search expansion
```
therapeutic_areas
```
- For context
```
dbXRefs
```
- Cross-references (OMIM, UMLS, DOID, etc.)

消歧完成后，存储以下信息用于所有下游查询:

```
efo_id
```
- OpenTargets查询的主ID（如
```
MONDO_0004975
```
）
```
disease_name
```
- 标准疾病名称（如
```
Alzheimer disease
```
）
```
synonyms
```
- 用于文献搜索扩展
```
therapeutic_areas
```
- 背景信息
```
dbXRefs
```
- 交叉引用（OMIM、UMLS、DOID等）

Phase 1: Genomics Layer

阶段1：基因组学层面

Objective: Identify genetic variants, GWAS associations, and genetically implicated genes.

目标: 识别遗传变异、GWAS关联及遗传学相关基因。

Tools Used

使用工具

OpenTargets_get_associated_targets_by_disease_efoId (primary):

Input:
```
efoId
```
(string) - Disease EFO/MONDO ID

Output:

{data: {disease: {id, name, associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}}

Use: Get ALL disease-associated genes ranked by overall evidence score
NOTE: Returns top 25 by default. For comprehensive analysis, note the total
```
count
```

OpenTargets_get_evidence_by_datasource:

Input:
```
efoId
```
(string),
```
ensemblId
```
(string), optional
```
datasourceIds
```
(array),
```
size
```
(int, default 50)

Output:

{data: {disease: {evidences: {count, rows: [{...evidence details}]}}}}

Use: Get specific evidence types. Key datasourceIds for genomics:
- ```
['ot_genetics_portal']
```
  - GWAS/genetics
- ```
['gene2phenotype', 'genomics_england', 'orphanet']
```
  - Rare variants
- ```
['eva']
```
  - ClinVar variants

gwas_search_associations (GWAS Catalog):

Input:
```
disease_trait
```
(string),
```
size
```
(int, default 20)

Output:

{data: [{association_id, p_value, or_per_copy_num, or_value, beta, risk_frequency, efo_traits: [{...}], ...}], metadata: {pagination: {totalElements}}}

Use: Get genome-wide significant associations
NOTE: Use disease name (e.g., "Alzheimer"), not ID. Returns paginated results

gwas_get_studies_for_trait:

Input:
```
disease_trait
```
(string),
```
size
```
(int)

Output:

{data: [...studies], metadata: {pagination}}

NOTE: May return empty if trait name does not match exactly. Try synonyms

gwas_get_variants_for_trait:

Input:
```
disease_trait
```
(string),
```
size
```
(int)

Output:

{data: [...variants], metadata: {pagination}}

GWAS_search_associations_by_gene:

Input:
```
gene_name
```
(string)
Output: Associations for a specific gene

OpenTargets_search_gwas_studies_by_disease:

Input:
```
diseaseIds
```
(array of strings),
```
enableIndirect
```
(bool, default true),
```
size
```
(int, default 10)

Output:

{data: {studies: {count, rows: [{id, studyType, traitFromSource, publicationFirstAuthor, publicationDate, pubmedId, nSamples, nCases, nControls, ...}]}}}

Use: Get GWAS studies from OpenTargets genetics portal

clinvar_search_variants:

Input:
```
condition
```
(string) or
```
gene
```
(string), optional
```
max_results
```
(int)
Output: List of ClinVar variants with clinical significance
Use: Rare variant / monogenic disease evidence

OpenTargets_get_associated_targets_by_disease_efoId（主要工具）:

输入:
```
efoId
```
（字符串）- 疾病EFO/MONDO ID

输出:

{data: {disease: {id, name, associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}}

用途: 获取所有与疾病关联的基因，按整体证据评分排序
注意: 默认返回前25个基因。如需全面分析，请记录总
```
count
```
数

OpenTargets_get_evidence_by_datasource:

输入:
```
efoId
```
（字符串）,
```
ensemblId
```
（字符串）, 可选
```
datasourceIds
```
（数组）,
```
size
```
（整数，默认50）

输出:

{data: {disease: {evidences: {count, rows: [{...evidence details}]}}}}

用途: 获取特定类型的证据。基因组学相关的关键
```
datasourceIds
```
:
- ```
['ot_genetics_portal']
```
  - GWAS/遗传学
- ```
['gene2phenotype', 'genomics_england', 'orphanet']
```
  - 罕见变异
- ```
['eva']
```
  - ClinVar变异

gwas_search_associations（GWAS Catalog）:

输入:
```
disease_trait
```
（字符串）,
```
size
```
（整数，默认20）

输出:

{data: [{association_id, p_value, or_per_copy_num, or_value, beta, risk_frequency, efo_traits: [{...}], ...}], metadata: {pagination: {totalElements}}}

用途: 获取全基因组显著关联结果
注意: 使用疾病名称（如"Alzheimer"）而非ID。结果为分页返回

gwas_get_studies_for_trait:

输入:
```
disease_trait
```
（字符串）,
```
size
```
（整数）

输出:

{data: [...studies], metadata: {pagination}}

注意: 如果性状名称不完全匹配，可能返回空结果。请尝试使用同义词

gwas_get_variants_for_trait:

输入:
```
disease_trait
```
（字符串）,
```
size
```
（整数）

输出:

{data: [...variants], metadata: {pagination}}

GWAS_search_associations_by_gene:

输入:
```
gene_name
```
（字符串）
输出: 特定基因的关联结果

OpenTargets_search_gwas_studies_by_disease:

输入:
```
diseaseIds
```
（字符串数组）,
```
enableIndirect
```
（布尔值，默认true）,
```
size
```
（整数，默认10）

输出:

{data: {studies: {count, rows: [{id, studyType, traitFromSource, publicationFirstAuthor, publicationDate, pubmedId, nSamples, nCases, nControls, ...}]}}}

用途: 从OpenTargets遗传学门户获取GWAS研究

clinvar_search_variants:

输入:
```
condition
```
（字符串）或
```
gene
```
（字符串）, 可选
```
max_results
```
（整数）
输出: 包含临床意义的ClinVar变异列表
用途: 罕见变异/单基因病证据

Workflow

工作流程

Get associated genes from OpenTargets (overall scores)
For top 10-15 genes, get genetic evidence specifically via
```
OpenTargets_get_evidence_by_datasource
```
Search GWAS Catalog for associations
Search OpenTargets GWAS studies
Search ClinVar for rare variants
For top GWAS genes, check
```
GWAS_search_associations_by_gene
```

从OpenTargets获取关联基因（整体评分）
对排名前10-15的基因，通过
```
OpenTargets_get_evidence_by_datasource
```
获取特定遗传学证据
在GWAS Catalog中搜索关联结果
在OpenTargets中搜索GWAS研究
在ClinVar中搜索罕见变异
对排名靠前的GWAS基因，使用
```
GWAS_search_associations_by_gene
```
进行验证

Gene Tracking

基因跟踪

Maintain a dictionary of genes found in genomics layer:

python

genomics_genes = {
    'PSEN1': {'score': 0.87, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000080815', 'layer': 'genomics'},
    'APP': {'score': 0.82, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000142192', 'layer': 'genomics'},
    # ...
}

维护基因组学层面发现的基因字典:

python

genomics_genes = {
    'PSEN1': {'score': 0.87, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000080815', 'layer': 'genomics'},
    'APP': {'score': 0.82, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000142192', 'layer': 'genomics'},
    # ...
}

Phase 2: Transcriptomics Layer

阶段2：转录组学层面

Objective: Identify differentially expressed genes, tissue-specific expression, and expression-based biomarkers.

目标: 识别差异表达基因、组织特异性表达及基于表达量的生物标志物。

Tools Used

使用工具

ExpressionAtlas_search_differential:

Input: optional
```
gene
```
(string),
```
condition
```
(string),
```
species
```
(string, default 'homo sapiens')
Output: Differential expression studies and results
Use: Find studies where genes are differentially expressed in disease

ExpressionAtlas_search_experiments:

Input: optional
```
gene
```
(string),
```
condition
```
(string),
```
species
```
(string)
Output: Expression experiments relevant to condition
Use: Find all Expression Atlas experiments for the disease

expression_atlas_disease_target_score:

Input:
```
efoId
```
(string),
```
pageSize
```
(int, required)
Output: Genes scored by expression evidence for the disease
Use: Get expression-based disease-gene association scores

europepmc_disease_target_score:

Input:
```
efoId
```
(string),
```
pageSize
```
(int, required)
Output: Genes scored by literature evidence for the disease
Use: Complement expression evidence with literature-mined associations

HPA_get_rna_expression_by_source (Human Protein Atlas):

Input:
```
gene_name
```
(string),
```
source_type
```
(string: 'tissue', 'blood', 'brain'),
```
source_name
```
(string: e.g., 'brain', 'liver')

Output:

{status, data: {gene_name, source_type, source_name, expression_value, expression_level, expression_unit}}

NOTE: ALL 3 params required.
```
source_type
```
options: 'tissue', 'blood', 'brain', 'cell_line', 'single_cell'

HPA_get_rna_expression_in_specific_tissues:

Input:
```
gene_name
```
(string),
```
tissues
```
(array of strings)
Output: Expression across specified tissues

HPA_get_cancer_prognostics_by_gene:

Input:
```
gene_name
```
(string)
Output: Cancer prognostic data (if cancer context)

HPA_get_subcellular_location:

Input:
```
gene_name
```
(string)
Output: Subcellular localization data

HPA_search_genes_by_query:

Input:
```
query
```
(string)
Output: Matching genes in HPA

ExpressionAtlas_search_differential:

输入: 可选
```
gene
```
（字符串）,
```
condition
```
（字符串）,
```
species
```
（字符串，默认'homo sapiens'）
输出: 差异表达研究及结果
用途: 查找基因在疾病中差异表达的研究

ExpressionAtlas_search_experiments:

输入: 可选
```
gene
```
（字符串）,
```
condition
```
（字符串）,
```
species
```
（字符串）
输出: 与疾病相关的表达实验
用途: 查找所有与疾病相关的Expression Atlas实验

expression_atlas_disease_target_score:

输入:
```
efoId
```
（字符串）,
```
pageSize
```
（整数，必填）
输出: 基于表达量的疾病-基因关联评分
用途: 获取基于表达量的疾病关联评分

europepmc_disease_target_score:

输入:
```
efoId
```
（字符串）,
```
pageSize
```
（整数，必填）
输出: 基于文献挖掘的疾病-基因关联评分
用途: 补充表达量证据，提供文献层面的关联评分

HPA_get_rna_expression_by_source（人类蛋白质图谱）:

输入:
```
gene_name
```
（字符串）,
```
source_type
```
（字符串: 'tissue', 'blood', 'brain'）,
```
source_name
```
（字符串: 如'brain', 'liver'）

输出:

{status, data: {gene_name, source_type, source_name, expression_value, expression_level, expression_unit}}

注意: 三个参数均为必填项。
```
source_type
```
选项: 'tissue', 'blood', 'brain', 'cell_line', 'single_cell'

HPA_get_rna_expression_in_specific_tissues:

输入:
```
gene_name
```
（字符串）,
```
tissues
```
（字符串数组）
输出: 基因在指定组织中的表达情况

HPA_get_cancer_prognostics_by_gene:

输入:
```
gene_name
```
（字符串）
输出: 癌症预后数据（仅适用于癌症场景）

HPA_get_subcellular_location:

输入:
```
gene_name
```
（字符串）
输出: 亚细胞定位数据

HPA_search_genes_by_query:

输入:
```
query
```
（字符串）
输出: HPA中匹配的基因

Workflow

工作流程

Search Expression Atlas for differential expression studies
Get expression-based disease scores
Get literature-based disease scores (EuropePMC)
For top 10-15 genes from genomics layer, check tissue expression via HPA
Check disease-relevant tissue expression patterns
For cancer: check prognostic biomarkers

在Expression Atlas中搜索差异表达研究
获取基于表达量的疾病评分
获取基于文献的疾病评分（EuropePMC）
对基因组学层面排名前10-15的基因，通过HPA检查组织表达情况
检查疾病相关组织的表达模式
对于癌症场景：检查预后生物标志物

Gene Tracking

基因跟踪

Add transcriptomics genes to tracking:

python

transcriptomics_genes = {
    'APOE': {'expression_score': 0.75, 'tissues': ['brain'], 'evidence': 'differential_expression', 'layer': 'transcriptomics'},
    # ...
}

将转录组学层面的基因添加到跟踪字典:

python

transcriptomics_genes = {
    'APOE': {'expression_score': 0.75, 'tissues': ['brain'], 'evidence': 'differential_expression', 'layer': 'transcriptomics'},
    # ...
}

Phase 3: Proteomics & Interaction Layer

阶段3：蛋白质组学与相互作用层面

Objective: Map protein-protein interactions, identify hub genes, and characterize interaction networks.

目标: 绘制蛋白质-蛋白质相互作用图谱，识别枢纽基因并表征相互作用网络。

Tools Used

使用工具

STRING_get_interaction_partners (primary PPI):

Input:
```
protein_ids
```
(array of strings - gene names work),
```
species
```
(int, default 9606),
```
confidence_score
```
(float, default 0.4),
```
limit
```
(int, default 20)

Output:

{status: 'success', data: [{stringId_A, stringId_B, preferredName_A, preferredName_B, ncbiTaxonId, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]}

Use: Get interaction partners for disease genes
NOTE:
```
protein_ids
```
is an array, NOT string. Gene symbols like
```
['APOE']
```
work

STRING_get_network:

Input:
```
protein_ids
```
(array),
```
species
```
(int),
```
confidence_score
```
(float)
Output: Network of interactions between input proteins
Use: Build disease-specific PPI network

STRING_functional_enrichment:

Input:
```
protein_ids
```
(array),
```
species
```
(int)
Output: Functional enrichment results (GO, KEGG, etc.)
Use: Functional characterization of disease gene set

STRING_ppi_enrichment:

Input:
```
protein_ids
```
(array),
```
species
```
(int)
Output: Statistical test for PPI enrichment (more interactions than expected)
Use: Test if disease genes form a connected module

intact_get_interactions:

Input:
```
identifier
```
(string - UniProt ID or gene name)
Output: Molecular interaction data from IntAct

intact_search_interactions:

Input:
```
query
```
(string),
```
first
```
(int, default 0),
```
max
```
(int, default 25)
Output: Search results for interactions

HPA_get_protein_interactions_by_gene:

Input:
```
gene_name
```
(string)

Output:

{gene, interactions, interactor_count, interactors: [...]}

humanbase_ppi_analysis:

Input:

gene_list

(array),

tissue

(string),

max_node

(int),

interaction

(string),

string_mode

(bool)

Output: Tissue-specific PPI network
NOTE: ALL params required.
```
interaction
```
options: 'coexpression', 'interaction', 'coexpression_and_interaction'.
```
string_mode
```
: true/false

STRING_get_interaction_partners（主要蛋白质相互作用工具）:

输入:
```
protein_ids
```
（字符串数组 - 基因名称可直接使用）,
```
species
```
（整数，默认9606）,
```
confidence_score
```
（浮点数，默认0.4）,
```
limit
```
（整数，默认20）

输出:

{status: 'success', data: [{stringId_A, stringId_B, preferredName_A, preferredName_B, ncbiTaxonId, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]}

用途: 获取疾病相关基因的相互作用伙伴
注意:
```
protein_ids
```
是数组类型，而非字符串。基因符号如
```
['APOE']
```
可直接使用

STRING_get_network:

输入:
```
protein_ids
```
（数组）,
```
species
```
（整数）,
```
confidence_score
```
（浮点数）
输出: 输入蛋白质间的相互作用网络
用途: 构建疾病特异性蛋白质相互作用网络

STRING_functional_enrichment:

输入:
```
protein_ids
```
（数组）,
```
species
```
（整数）
输出: 功能富集结果（GO、KEGG等）
用途: 对疾病相关基因集进行功能表征

STRING_ppi_enrichment:

输入:
```
protein_ids
```
（数组）,
```
species
```
（整数）
输出: 蛋白质相互作用富集的统计检验（是否比随机情况有更多相互作用）
用途: 检验疾病相关基因是否形成连接模块

intact_get_interactions:

输入:
```
identifier
```
（字符串 - UniProt ID或基因名称）
输出: IntAct数据库中的分子相互作用数据

intact_search_interactions:

输入:
```
query
```
（字符串）,
```
first
```
（整数，默认0）,
```
max
```
（整数，默认25）
输出: 相互作用的搜索结果

HPA_get_protein_interactions_by_gene:

输入:
```
gene_name
```
（字符串）

输出:

{gene, interactions, interactor_count, interactors: [...]}

humanbase_ppi_analysis:

输入:
```
gene_list
```
（数组）,
```
tissue
```
（字符串）,
```
max_node
```
（整数）,
```
interaction
```
（字符串）,
```
string_mode
```
（布尔值）
输出: 组织特异性蛋白质相互作用网络
注意: 所有参数均为必填项。
```
interaction
```
选项: 'coexpression', 'interaction', 'coexpression_and_interaction'。
```
string_mode
```
: true/false

Workflow

工作流程

Take top 15-20 genes from genomics + transcriptomics layers
Query STRING for interaction partners of each gene
Build composite PPI network using STRING_get_network
Test PPI enrichment (are genes more connected than random?)
Get functional enrichment from STRING
For disease-relevant tissue, get tissue-specific network (HumanBase)
Identify hub genes (highest degree centrality)
Check IntAct for experimentally validated interactions

选取基因组学+转录组学层面排名前15-20的基因
为每个基因查询STRING获取相互作用伙伴
使用STRING_get_network构建复合蛋白质相互作用网络
进行蛋白质相互作用富集检验（基因间的连接是否比随机情况更紧密？）
从STRING获取功能富集结果
针对疾病相关组织，通过HumanBase获取组织特异性网络
识别枢纽基因（度中心性最高的基因）
在IntAct中查找实验验证的相互作用

Hub Gene Analysis

枢纽基因分析

Calculate network centrality metrics:

Degree: Number of interaction partners
Betweenness: Number of shortest paths through node
Hub score: Genes with degree > mean + 1 SD are hubs

计算网络中心性指标:

度: 相互作用伙伴的数量
介数: 经过该节点的最短路径数量
枢纽评分: 度大于均值+1个标准差的基因即为枢纽基因

Phase 4: Pathway & Network Layer

阶段4：通路与网络层面

Objective: Identify enriched biological pathways and cross-pathway connections.

目标: 识别富集的生物通路及通路间的关联。

Tools Used

使用工具

enrichr_gene_enrichment_analysis (primary enrichment):

Input:
```
gene_list
```
(array of gene symbols, min 2),
```
libs
```
(array of library names)

Output:

{status: 'success', data: '{...JSON string with enrichment results...}'}

Key libraries:

['KEGG_2021_Human']

['Reactome_2022']

['WikiPathway_2023_Human']

['GO_Biological_Process_2023']

['GO_Molecular_Function_2023']

['GO_Cellular_Component_2023']

NOTE:
```
data
```
field is a JSON string, needs parsing. Contains
```
connected_paths
```
and per-library results
NOTE:
```
libs
```
is REQUIRED as array

ReactomeAnalysis_pathway_enrichment:

Input:
```
identifiers
```
(string - space-separated gene list), optional
```
page_size
```
(int, default 20),
```
include_disease
```
(bool),
```
projection
```
(bool)

Output:

{data: {token, analysis_type, pathways_found, pathways: [{pathway_id, name, species, is_disease, is_lowest_level, entities_found, entities_total, entities_ratio, p_value, fdr, reactions_found, reactions_total}]}}

Use: Reactome-specific pathway enrichment with statistical testing

Reactome_map_uniprot_to_pathways:

Input:
```
id
```
(string - UniProt accession)
Output: List of Reactome pathways containing this protein
Use: Map individual proteins to pathways

Reactome_get_pathway:

Input:
```
stId
```
(string - Reactome stable ID, e.g., 'R-HSA-73817')
Output: Pathway details

Reactome_get_pathway_reactions:

Input:
```
stId
```
(string)
Output: Reactions within pathway

kegg_search_pathway:

Input:
```
keyword
```
(string)
Output: Array of KEGG pathway matches

kegg_get_pathway_info:

Input:
```
pathway_id
```
(string, e.g., 'hsa04930')
Output: Detailed pathway information

WikiPathways_search:

Input:
```
query
```
(string), optional
```
organism
```
(string, e.g., 'Homo sapiens')
Output: Matching community-curated pathways

enrichr_gene_enrichment_analysis（主要富集分析工具）:

输入:
```
gene_list
```
（基因符号数组，最少2个）,
```
libs
```
（数据库名称数组）

输出:

{status: 'success', data: '{...JSON string with enrichment results...}'}

关键数据库:

['KEGG_2021_Human']

['Reactome_2022']

['WikiPathway_2023_Human']

['GO_Biological_Process_2023']

['GO_Molecular_Function_2023']

['GO_Cellular_Component_2023']

注意:
```
data
```
字段是JSON字符串，需要解析。包含
```
connected_paths
```
及各数据库的富集结果
注意:
```
libs
```
为必填数组参数

ReactomeAnalysis_pathway_enrichment:

输入:
```
identifiers
```
（字符串 - 空格分隔的基因列表）, 可选
```
page_size
```
（整数，默认20）,
```
include_disease
```
（布尔值）,
```
projection
```
（布尔值）

输出:

{data: {token, analysis_type, pathways_found, pathways: [{pathway_id, name, species, is_disease, is_lowest_level, entities_found, entities_total, entities_ratio, p_value, fdr, reactions_found, reactions_total}]}}

用途: 针对Reactome数据库进行通路富集分析及统计检验

Reactome_map_uniprot_to_pathways:

输入:
```
id
```
（字符串 - UniProt登录号）
输出: 包含该蛋白质的Reactome通路列表
用途: 将单个蛋白质映射到通路

Reactome_get_pathway:

输入:
```
stId
```
（字符串 - Reactome稳定ID，如'R-HSA-73817'）
输出: 通路详情

Reactome_get_pathway_reactions:

输入:
```
stId
```
（字符串）
输出: 通路内的反应

kegg_search_pathway:

输入:
```
keyword
```
（字符串）
输出: 匹配的KEGG通路数组

kegg_get_pathway_info:

输入:
```
pathway_id
```
（字符串，如'hsa04930'）
输出: 详细的通路信息

WikiPathways_search:

输入:
```
query
```
（字符串）, 可选
```
organism
```
（字符串，如'Homo sapiens'）
输出: 匹配的社区注释通路

Workflow

工作流程

Collect all genes from genomics + transcriptomics layers (top 20-30)
Run Enrichr enrichment for KEGG, Reactome, WikiPathways
Run ReactomeAnalysis for more detailed Reactome enrichment with p-values
Search KEGG for disease-specific pathways
Search WikiPathways for disease pathways
For top Reactome pathways, get detailed reactions
Identify cross-pathway connections (genes in multiple pathways)

收集基因组学+转录组学层面的所有基因（前20-30个）
使用Enrichr针对KEGG、Reactome、WikiPathways进行富集分析
使用ReactomeAnalysis进行更详细的Reactome富集分析，获取P值
在KEGG中搜索疾病特异性通路
在WikiPathways中搜索疾病相关通路
对排名靠前的Reactome通路，获取详细的反应信息
识别通路间的关联（出现在多个通路中的基因）

Phase 5: Gene Ontology & Functional Annotation

阶段5：基因本体与功能注释

Objective: Characterize biological processes, molecular functions, and cellular components.

目标: 表征生物过程、分子功能及细胞组分。

Tools Used

使用工具

enrichr_gene_enrichment_analysis (GO enrichment):

Use with
```
libs=['GO_Biological_Process_2023']
```
for BP
Use with
```
libs=['GO_Molecular_Function_2023']
```
for MF
Use with
```
libs=['GO_Cellular_Component_2023']
```
for CC

GO_get_annotations_for_gene:

Input:
```
gene_id
```
(string - gene symbol or UniProt ID)
Output: List of GO annotations with terms, aspects, evidence codes

GO_search_terms:

Input:
```
query
```
(string)
Output: Matching GO terms

QuickGO_annotations_by_gene:

Input:
```
gene_product_id
```
(string - UniProt accession, e.g., 'UniProtKB:P02649'), optional
```
aspect
```
(string: 'biological_process', 'molecular_function', 'cellular_component'),
```
taxon_id
```
(int: 9606),
```
limit
```
(int: 25)
Output: GO annotations with evidence codes

OpenTargets_get_target_gene_ontology_by_ensemblID:

Input:
```
ensemblId
```
(string)
Output: GO terms associated with target

enrichr_gene_enrichment_analysis（GO富集分析）:

使用
```
libs=['GO_Biological_Process_2023']
```
获取生物过程（BP）
使用
```
libs=['GO_Molecular_Function_2023']
```
获取分子功能（MF）
使用
```
libs=['GO_Cellular_Component_2023']
```
获取细胞组分（CC）

GO_get_annotations_for_gene:

输入:
```
gene_id
```
（字符串 - 基因符号或UniProt ID）
输出: 包含术语、方向、证据编码的GO注释列表

GO_search_terms:

输入:
```
query
```
（字符串）
输出: 匹配的GO术语

QuickGO_annotations_by_gene:

输入:
```
gene_product_id
```
（字符串 - UniProt登录号，如'UniProtKB:P02649'）, 可选
```
aspect
```
（字符串: 'biological_process', 'molecular_function', 'cellular_component'）,
```
taxon_id
```
（整数: 9606）,
```
limit
```
（整数: 25）
输出: 包含证据编码的GO注释

OpenTargets_get_target_gene_ontology_by_ensemblID:

输入:
```
ensemblId
```
（字符串）
输出: 与靶点关联的GO术语

Workflow

工作流程

Run Enrichr GO enrichment for all 3 aspects using combined gene list
For top 5 genes, get detailed GO annotations from QuickGO
For top genes, get OpenTargets GO terms
Summarize key biological processes, molecular functions, cellular components

使用合并后的基因列表，通过Enrichr对GO的3个层面进行富集分析
对排名前5的基因，从QuickGO获取详细的GO注释
对排名靠前的基因，获取OpenTargets中的GO术语
总结关键生物过程、分子功能及细胞组分

Phase 6: Therapeutic Landscape

阶段6：治疗全景

Objective: Map approved drugs, druggable targets, repurposing opportunities, and clinical trials.

目标: 绘制已获批药物、可成药靶点、药物重定位机会及临床试验的图谱。

Tools Used

使用工具

OpenTargets_get_associated_drugs_by_disease_efoId (primary):

Input:
```
efoId
```
(string),
```
size
```
(int, REQUIRED - use 100)

Output:

{data: {disease: {knownDrugs: {count, rows: [{drug: {id, name, tradeNames, maximumClinicalTrialPhase, isApproved, hasBeenWithdrawn}, phase, mechanismOfAction, target: {id, approvedSymbol}, disease: {id, name}, urls: [{url, name}]}]}}}}

Use: All drugs associated with disease (approved + investigational)

OpenTargets_get_target_tractability_by_ensemblID:

Input:
```
ensemblId
```
(string)
Output: Tractability assessment (small molecule, antibody, PROTAC, etc.)

OpenTargets_get_associated_drugs_by_target_ensemblID:

Input:
```
ensemblId
```
(string),
```
size
```
(int, REQUIRED)
Output: Drugs targeting this gene/protein

search_clinical_trials:

Input:
```
query_term
```
(string, REQUIRED), optional
```
condition
```
(string),
```
intervention
```
(string),
```
pageSize
```
(int, default 10)
Output: Clinical trial results
NOTE:
```
query_term
```
is REQUIRED even if
```
condition
```
is provided

OpenTargets_get_drug_mechanisms_of_action_by_chemblId:

Input:
```
chemblId
```
(string)
Output: Mechanism of action details

OpenTargets_get_associated_drugs_by_disease_efoId（主要工具）:

输入:
```
efoId
```
（字符串）,
```
size
```
（整数，必填 - 建议使用100）

输出:

{data: {disease: {knownDrugs: {count, rows: [{drug: {id, name, tradeNames, maximumClinicalTrialPhase, isApproved, hasBeenWithdrawn}, phase, mechanismOfAction, target: {id, approvedSymbol}, disease: {id, name}, urls: [{url, name}]}]}}}}

用途: 获取所有与疾病关联的药物（已获批+研究中）

OpenTargets_get_target_tractability_by_ensemblID:

输入:
```
ensemblId
```
（字符串）
输出: 可成药性评估（小分子、抗体、PROTAC等）

OpenTargets_get_associated_drugs_by_target_ensemblID:

输入:
```
ensemblId
```
（字符串）,
```
size
```
（整数，必填）
输出: 靶向该基因/蛋白质的药物

search_clinical_trials:

输入:
```
query_term
```
（字符串，必填）, 可选
```
condition
```
（字符串）,
```
intervention
```
（字符串）,
```
pageSize
```
（整数，默认10）
输出: 临床试验结果
注意: 即使提供了
```
condition
```
，
```
query_term
```
仍为必填项

OpenTargets_get_drug_mechanisms_of_action_by_chemblId:

输入:
```
chemblId
```
（字符串）
输出: 药物作用机制详情

Workflow

工作流程

Get all drugs for disease from OpenTargets
For top disease-associated genes, check tractability
For top genes with no approved drugs, identify repurposing candidates
Search clinical trials for disease
For top approved drugs, get mechanism of action

从OpenTargets获取所有与疾病关联的药物
对排名靠前的疾病关联基因，检查其可成药性
对尚无获批药物的排名靠前基因，识别药物重定位候选
搜索疾病相关的临床试验
对排名靠前的已获批药物，获取其作用机制

Drug Tracking

药物跟踪

python

drug_targets = {
    'PSEN1': {'drugs': ['Semagacestat'], 'tractability': 'small_molecule', 'clinical_phase': 3},
    'ACHE': {'drugs': ['Donepezil', 'Galantamine'], 'tractability': 'small_molecule', 'clinical_phase': 4},
    # ...
}

python

drug_targets = {
    'PSEN1': {'drugs': ['Semagacestat'], 'tractability': 'small_molecule', 'clinical_phase': 3},
    'ACHE': {'drugs': ['Donepezil', 'Galantamine'], 'tractability': 'small_molecule', 'clinical_phase': 4},
    # ...
}

Phase 7: Multi-Omics Integration

阶段7：多组学整合

Objective: Integrate findings across all layers to identify cross-layer genes, calculate concordance, and generate mechanistic hypotheses.

目标: 整合所有层面的发现，识别跨层基因、计算一致性并生成机制假说。

Cross-Layer Gene Concordance Analysis

跨层基因一致性分析

This is the core integrative step. For each gene found in the analysis:

Count layers: In how many omics layers does this gene appear?
- Genomics (GWAS, rare variants, genetic association)
- Transcriptomics (DEGs, expression score)
- Proteomics (PPI hub, protein expression)
- Pathways (enriched pathway member)
- Therapeutics (drug target)
Score genes: Genes appearing in 3+ layers are "multi-omics hub genes"
Direction concordance: Do genetics and expression agree?
- Risk allele + upregulated = concordant gain-of-function
- Risk allele + downregulated = concordant loss-of-function
- Discordant = needs investigation

这是核心的整合步骤。对于分析中发现的每个基因:

统计涉及层面数: 该基因出现在多少个组学层面中？
- 基因组学（GWAS、罕见变异、遗传关联）
- 转录组学（差异表达基因、表达评分）
- 蛋白质组学（蛋白质相互作用枢纽、蛋白质表达）
- 通路（富集通路成员）
- 治疗学（药物靶点）
基因评分: 出现在3个及以上层面的基因即为“多组学枢纽基因”
方向一致性: 遗传学与表达量结果是否一致？
- 风险等位基因+上调表达 = 一致的功能获得
- 风险等位基因+下调表达 = 一致的功能丧失
- 不一致 = 需要进一步研究

Biomarker Identification

生物标志物识别

For each multi-omics hub gene, assess biomarker potential:

Diagnostic: Gene expression distinguishes disease vs healthy
Prognostic: Expression/variant predicts outcome (cancer prognostics from HPA)
Predictive: Variant/expression predicts treatment response (pharmacogenomics)
Evidence level: Number of supporting omics layers

对于每个多组学枢纽基因，评估其生物标志物潜力:

诊断型: 基因表达可区分疾病与健康状态
预后型: 表达量/变异可预测疾病结局（来自HPA的癌症预后数据）
预测型: 变异/表达量可预测治疗响应（药物基因组学）
证据等级: 支持的组学层面数量

Mechanistic Hypothesis Generation

机制假说生成

From the integrated data:

Identify the most supported biological processes (GO + pathways)
Map causal chain: genetic variant -> gene expression -> protein function -> pathway disruption -> disease
Identify intervention points (druggable nodes in the causal chain)
Generate testable hypotheses

从整合数据中:

识别支持证据最多的生物过程（GO+通路）
绘制因果链: 遗传变异 -> 基因表达 -> 蛋白质功能 -> 通路失调 -> 疾病
识别干预点（因果链中的可成药节点）
生成可验证的假说

Confidence Score Calculation

置信度评分计算

Calculate the Multi-Omics Confidence Score (0-100) based on:

Data availability across layers
Cross-layer concordance
Evidence quality
Clinical validation

基于以下指标计算多组学置信度评分（0-100）:

各层面的数据可用性
跨层一致性
证据质量
临床验证

Phase 8: Report Finalization

阶段8：报告定稿

Executive Summary

执行摘要

Write a 2-3 sentence synthesis covering:

Disease mechanism in systems terms
Key genes/pathways identified
Therapeutic opportunities

撰写2-3句话的总结，涵盖:

系统层面的疾病机制
识别出的关键基因/通路
治疗机会

Final Report Quality Checklist

最终报告质量检查清单

Tool Parameter Quick Reference

工具参数快速参考

Tool	Key Parameters	Notes
`OpenTargets_get_disease_id_description_by_name`	`diseaseName`	Primary disambiguation
`OSL_get_efo_id_by_disease_name`	`disease`	Secondary disambiguation
`OpenTargets_get_associated_targets_by_disease_efoId`	`efoId`	Returns top 25 genes
`OpenTargets_get_evidence_by_datasource`	`efoId` , `ensemblId` , `datasourceIds[]` , `size`	Per-gene evidence
`OpenTargets_search_gwas_studies_by_disease`	`diseaseIds[]` , `size`	GWAS studies
`gwas_search_associations`	`disease_trait` , `size`	GWAS Catalog
`clinvar_search_variants`	`condition` or `gene` , `max_results`	Rare variants
`ExpressionAtlas_search_differential`	`condition` , `species`	DEGs
`expression_atlas_disease_target_score`	`efoId` , `pageSize` (REQUIRED)	Expression scores
`europepmc_disease_target_score`	`efoId` , `pageSize` (REQUIRED)	Literature scores
`HPA_get_rna_expression_by_source`	`gene_name` , `source_type` , `source_name` (ALL REQUIRED)	Tissue expression
`STRING_get_interaction_partners`	`protein_ids[]` , `species` (9606), `limit`	PPI partners
`STRING_get_network`	`protein_ids[]` , `species`	PPI network
`STRING_functional_enrichment`	`protein_ids[]` , `species`	Functional enrichment
`STRING_ppi_enrichment`	`protein_ids[]` , `species`	Network significance
`intact_search_interactions`	`query` , `max`	Experimental PPIs
`humanbase_ppi_analysis`	`gene_list[]` , `tissue` , `max_node` , `interaction` , `string_mode` (ALL REQ)	Tissue PPI
`enrichr_gene_enrichment_analysis`	`gene_list[]` , `libs[]` (BOTH REQUIRED)	Pathway/GO enrichment
`ReactomeAnalysis_pathway_enrichment`	`identifiers` (space-sep string)	Reactome enrichment
`Reactome_map_uniprot_to_pathways`	`id` (UniProt accession)	Protein-pathway mapping
`kegg_search_pathway`	`keyword`	KEGG pathway search
`WikiPathways_search`	`query` , `organism`	WikiPathways search
`GO_get_annotations_for_gene`	`gene_id`	GO annotations
`QuickGO_annotations_by_gene`	`gene_product_id` (e.g., 'UniProtKB:P02649')	Detailed GO
`OpenTargets_get_associated_drugs_by_disease_efoId`	`efoId` , `size` (REQUIRED)	Disease drugs
`OpenTargets_get_target_tractability_by_ensemblID`	`ensemblId`	Druggability
`search_clinical_trials`	`query_term` (REQUIRED), `condition` , `pageSize`	Clinical trials
`PubMed_search_articles`	`query` , `limit`	Literature
`ensembl_lookup_gene`	`gene_id` , `species` ('homo_sapiens' REQUIRED)	Gene lookup
`MyGene_query_genes`	`query` , `species` , `fields` , `size`	Gene info
`OpenTargets_get_similar_entities_by_disease_efoId`	`efoId` , `threshold` , `size` (ALL REQUIRED)	Similar diseases

工具	关键参数	注意事项
`OpenTargets_get_disease_id_description_by_name`	`diseaseName`	主要消歧工具
`OSL_get_efo_id_by_disease_name`	`disease`	次要消歧工具
`OpenTargets_get_associated_targets_by_disease_efoId`	`efoId`	返回前25个基因
`OpenTargets_get_evidence_by_datasource`	`efoId` , `ensemblId` , `datasourceIds[]` , `size`	单基因证据获取
`OpenTargets_search_gwas_studies_by_disease`	`diseaseIds[]` , `size`	GWAS研究获取
`gwas_search_associations`	`disease_trait` , `size`	GWAS Catalog关联搜索
`clinvar_search_variants`	`condition` 或 `gene` , `max_results`	罕见变异搜索
`ExpressionAtlas_search_differential`	`condition` , `species`	差异表达基因搜索
`expression_atlas_disease_target_score`	`efoId` , `pageSize` （必填）	表达评分获取
`europepmc_disease_target_score`	`efoId` , `pageSize` （必填）	文献评分获取
`HPA_get_rna_expression_by_source`	`gene_name` , `source_type` , `source_name` （均为必填）	组织表达获取
`STRING_get_interaction_partners`	`protein_ids[]` , `species` （9606）, `limit`	蛋白质相互作用伙伴获取
`STRING_get_network`	`protein_ids[]` , `species`	蛋白质相互作用网络构建
`STRING_functional_enrichment`	`protein_ids[]` , `species`	功能富集分析
`STRING_ppi_enrichment`	`protein_ids[]` , `species`	网络显著性检验
`intact_search_interactions`	`query` , `max`	实验验证的蛋白质相互作用搜索
`humanbase_ppi_analysis`	`gene_list[]` , `tissue` , `max_node` , `interaction` , `string_mode` （均为必填）	组织特异性蛋白质相互作用网络
`enrichr_gene_enrichment_analysis`	`gene_list[]` , `libs[]` （均为必填）	通路/GO富集分析
`ReactomeAnalysis_pathway_enrichment`	`identifiers` （空格分隔字符串）	Reactome富集分析
`Reactome_map_uniprot_to_pathways`	`id` （UniProt登录号）	蛋白质-通路映射
`kegg_search_pathway`	`keyword`	KEGG通路搜索
`WikiPathways_search`	`query` , `organism`	WikiPathways搜索
`GO_get_annotations_for_gene`	`gene_id`	GO注释获取
`QuickGO_annotations_by_gene`	`gene_product_id` （如'UniProtKB:P02649'）	详细GO注释获取
`OpenTargets_get_associated_drugs_by_disease_efoId`	`efoId` , `size` （必填）	疾病关联药物获取
`OpenTargets_get_target_tractability_by_ensemblID`	`ensemblId`	可成药性评估
`search_clinical_trials`	`query_term` （必填）, `condition` , `pageSize`	临床试验搜索
`PubMed_search_articles`	`query` , `limit`	文献搜索
`ensembl_lookup_gene`	`gene_id` , `species` （'homo_sapiens'必填）	基因查找
`MyGene_query_genes`	`query` , `species` , `fields` , `size`	基因信息获取
`OpenTargets_get_similar_entities_by_disease_efoId`	`efoId` , `threshold` , `size` （均为必填）	相似疾病获取

Response Format Notes (Verified)

响应格式说明（已验证）

OpenTargets Associated Targets

OpenTargets关联靶点

json

{
  "data": {
    "disease": {
      "id": "MONDO_0004975",
      "name": "Alzheimer disease",
      "associatedTargets": {
        "count": 2456,
        "rows": [
          {
            "target": {"id": "ENSG00000080815", "approvedSymbol": "PSEN1"},
            "score": 0.87
          }
        ]
      }
    }
  }
}

json

{
  "data": {
    "disease": {
      "id": "MONDO_0004975",
      "name": "Alzheimer disease",
      "associatedTargets": {
        "count": 2456,
        "rows": [
          {
            "target": {"id": "ENSG00000080815", "approvedSymbol": "PSEN1"},
            "score": 0.87
          }
        ]
      }
    }
  }
}

GWAS Catalog Associations

GWAS Catalog关联

json

{
  "data": [
    {
      "association_id": 216440893,
      "p_value": 2e-09,
      "or_per_copy_num": 0.94,
      "or_value": "0.94",
      "efo_traits": [{"..."}],
      "risk_frequency": "NR"
    }
  ],
  "metadata": {"pagination": {"totalElements": 1061816}}
}

json

{
  "data": [
    {
      "association_id": 216440893,
      "p_value": 2e-09,
      "or_per_copy_num": 0.94,
      "or_value": "0.94",
      "efo_traits": [{"..."}],
      "risk_frequency": "NR"
    }
  ],
  "metadata": {"pagination": {"totalElements": 1061816}}
}

STRING Interactions

STRING相互作用

json

{
  "status": "success",
  "data": [
    {
      "stringId_A": "9606.ENSP00000252486",
      "stringId_B": "9606.ENSP00000466775",
      "preferredName_A": "APOE",
      "preferredName_B": "APOC2",
      "score": 0.999
    }
  ]
}

json

{
  "status": "success",
  "data": [
    {
      "stringId_A": "9606.ENSP00000252486",
      "stringId_B": "9606.ENSP00000466775",
      "preferredName_A": "APOE",
      "preferredName_B": "APOC2",
      "score": 0.999
    }
  ]
}

Reactome Enrichment

Reactome富集

json

{
  "data": {
    "token": "...",
    "pathways_found": 154,
    "pathways": [
      {
        "pathway_id": "R-HSA-1251985",
        "name": "Nuclear signaling by ERBB4",
        "species": "Homo sapiens",
        "is_disease": false,
        "is_lowest_level": true,
        "entities_found": 3,
        "entities_total": 47,
        "entities_ratio": 0.00291,
        "p_value": 4.0e-06,
        "fdr": 0.00068,
        "reactions_found": 3,
        "reactions_total": 34
      }
    ]
  }
}

json

{
  "data": {
    "token": "...",
    "pathways_found": 154,
    "pathways": [
      {
        "pathway_id": "R-HSA-1251985",
        "name": "Nuclear signaling by ERBB4",
        "species": "Homo sapiens",
        "is_disease": false,
        "is_lowest_level": true,
        "entities_found": 3,
        "entities_total": 47,
        "entities_ratio": 0.00291,
        "p_value": 4.0e-06,
        "fdr": 0.00068,
        "reactions_found": 3,
        "reactions_total": 34
      }
    ]
  }
}

HPA RNA Expression

HPA RNA表达

json

{
  "status": "success",
  "data": {
    "gene_name": "APOE",
    "source_type": "tissue",
    "source_name": "brain",
    "expression_value": "2714.9",
    "expression_level": "very high",
    "expression_unit": "nTPM"
  }
}

json

{
  "status": "success",
  "data": {
    "gene_name": "APOE",
    "source_type": "tissue",
    "source_name": "brain",
    "expression_value": "2714.9",
    "expression_level": "very high",
    "expression_unit": "nTPM"
  }
}

Enrichr Results

Enrichr结果

json

{
  "status": "success",
  "data": "{\"connected_paths\": {\"Path: ...\": \"Total Weight: ...\"}}"
}

NOTE: The

data

field is a JSON string that needs parsing.

json

{
  "status": "success",
  "data": "{\"connected_paths\": {\"Path: ...\": \"Total Weight: ...\"}}"
}

注意:

data

字段是JSON字符串，需要进行解析。

Common Use Patterns

常见使用模式

1. Comprehensive Disease Profiling

1. 全面疾病分析

User: "Characterize Alzheimer's disease across omics layers"
-> Run all 8 phases
-> Produce full multi-omics report

用户: "对阿尔茨海默病进行跨组学层面的特征分析"
-> 执行所有8个阶段
-> 生成完整多组学报告

2. Therapeutic Target Discovery

2. 治疗靶点发现

User: "What are druggable targets for rheumatoid arthritis?"
-> Emphasize Phase 1 (genomics), Phase 6 (therapeutics), Phase 7 (integration)
-> Focus on tractability and clinical precedent

用户: "类风湿关节炎的可成药靶点有哪些？"
-> 重点执行阶段1（基因组学）、阶段6（治疗学）、阶段7（整合）
-> 聚焦可成药性及临床先例

3. Biomarker Identification

3. 生物标志物识别

User: "Find diagnostic biomarkers for pancreatic cancer"
-> Emphasize Phase 2 (transcriptomics), Phase 3 (proteomics), Phase 7 (biomarkers)
-> Focus on tissue-specific expression and diagnostic potential

用户: "寻找胰腺癌的诊断生物标志物"
-> 重点执行阶段2（转录组学）、阶段3（蛋白质组学）、阶段7（生物标志物）
-> 聚焦组织特异性表达及诊断潜力

4. Mechanism Elucidation

4. 机制解析

User: "What pathways are dysregulated in Crohn's disease?"
-> Emphasize Phase 4 (pathways), Phase 5 (GO), Phase 7 (mechanistic hypotheses)
-> Focus on pathway enrichment and cross-pathway connections

用户: "克罗恩病中哪些通路失调？"
-> 重点执行阶段4（通路）、阶段5（GO）、阶段7（机制假说）
-> 聚焦通路富集及通路间关联

5. Drug Repurposing

5. 药物重定位

User: "What existing drugs could be repurposed for ALS?"
-> Emphasize Phase 1 (genetics), Phase 6 (therapeutic landscape), Phase 7 (repurposing)
-> Focus on drugs targeting disease-associated genes

用户: "哪些现有药物可重定位用于ALS？"
-> 重点执行阶段1（遗传学）、阶段6（治疗全景）、阶段7（重定位）
-> 聚焦靶向疾病关联基因的药物

6. Systems Biology

6. 系统生物学分析

User: "What are the hub genes and key pathways in type 2 diabetes?"
-> Emphasize Phase 3 (PPI network), Phase 4 (pathways), Phase 7 (network analysis)
-> Focus on hub genes and network modules

用户: "2型糖尿病中的枢纽基因和关键通路有哪些？"
-> 重点执行阶段3（蛋白质相互作用网络）、阶段4（通路）、阶段7（网络分析）
-> 聚焦枢纽基因及网络模块

Edge Case Handling

边缘场景处理

Rare Diseases (limited data)

罕见病（数据有限）

Genomics layer may dominate (single gene)
Limited GWAS data (monogenic)
Focus on ClinVar variants, pathway consequences
Confidence score will be lower (less cross-layer data)

基因组学层面可能占主导（单基因）
GWAS数据有限（单基因病）
聚焦ClinVar变异、通路影响
置信度评分会较低（跨层数据较少）

Common Diseases (overwhelming data)

常见病（数据过多）

Thousands of GWAS associations
Prioritize by effect size and significance
Focus on top 20-30 genes for downstream analysis
Use strict significance thresholds (p < 5e-8)

数千个GWAS关联结果
按效应量和显著性优先排序
下游分析聚焦前20-30个基因
使用严格的显著性阈值（p < 5e-8）

Cancer

癌症

Include somatic mutations (if CIViC/cBioPortal available)
Check cancer prognostics via HPA
Include tumor-specific expression patterns
Clinical trial landscape may be extensive

包含体细胞突变（若CIViC/cBioPortal可用）
通过HPA检查癌症预后
包含肿瘤特异性表达模式
临床试验全景可能非常广泛

Monogenic Diseases

单基因病

Single gene dominates
ClinVar/OMIM evidence is primary
Pathway analysis reveals downstream effects
Therapeutic landscape may be limited (gene therapy, enzyme replacement)

单个基因占主导
ClinVar/OMIM证据为主要依据
通路分析揭示下游效应
治疗全景可能有限（基因治疗、酶替代疗法）

Polygenic Diseases

多基因病

Many weak genetic signals
GWAS provides the gene list
Pathway enrichment reveals convergent biology
Network analysis identifies hub genes

许多弱遗传信号
GWAS提供基因列表
通路富集揭示趋同生物学特征
网络分析识别枢纽基因

Tissue Ambiguity

组织歧义

Diseases affecting multiple tissues
Query HPA for all relevant tissues
Compare tissue-specific expression patterns
Use tissue context from disease ontology

影响多个组织的疾病
查询HPA获取所有相关组织的信息
比较组织特异性表达模式
使用疾病本体中的组织背景信息

Fallback Strategies

fallback策略

If disease name not found

若未找到疾病名称

Try synonyms
Try broader disease category
Try OMIM/UMLS ID mapping
Report disambiguation failure and ask user

尝试使用同义词
尝试更宽泛的疾病类别
尝试OMIM/UMLS ID映射
报告消歧失败并询问用户

If no GWAS data

若无GWAS数据

Check ClinVar for rare variants
Use OpenTargets genetic evidence
Note in report as "Limited genetic data"
Adjust confidence score accordingly

检查ClinVar中的罕见变异
使用OpenTargets中的遗传学证据
在报告中注明“遗传学数据有限”
相应调整置信度评分

If no expression data

若无表达数据

Try different disease name/synonym
Check HPA for individual gene expression
Use OpenTargets expression evidence
Note as "Limited transcriptomics data"

尝试不同的疾病名称/同义词
检查HPA中单个基因的表达情况
使用OpenTargets中的表达证据
注明“转录组学数据有限”

If no pathway enrichment

若无通路富集结果

Reduce gene list stringency
Try different pathway databases
Map individual genes to pathways via Reactome
Note as "No significant pathway enrichment"

降低基因列表的筛选严格度
尝试不同的通路数据库
通过Reactome将单个基因映射到通路
注明“无显著通路富集”

If no drugs found

若无药物数据

Check if disease is rare/orphan
Look for drugs targeting individual genes
Check clinical trials for investigational therapies
Note as "No approved drugs - novel therapeutic opportunity"

检查疾病是否为罕见/孤儿病
查找靶向单个基因的药物
检查研究中的临床试验疗法
注明“无获批药物 - 存在新型治疗机会”