tooluniverse-multiomic-disease-characterization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMulti-Omics Disease Characterization Pipeline
多组学疾病特征分析流程
Characterize diseases across multiple molecular layers (genomics, transcriptomics, proteomics, pathways) to provide systems-level understanding of disease mechanisms, identify therapeutic opportunities, and discover biomarker candidates.
KEY PRINCIPLES:
- Report-first approach - Create report file FIRST, then populate progressively
- Disease disambiguation FIRST - Resolve all identifiers before omics analysis
- Layer-by-layer analysis - Systematically cover all omics layers
- Cross-layer integration - Identify genes/targets appearing in multiple layers
- Evidence grading - Grade all evidence as T1 (human/clinical) to T4 (computational)
- Tissue context - Emphasize disease-relevant tissues/organs
- Quantitative scoring - Multi-Omics Confidence Score (0-100)
- Druggable focus - Prioritize targets with therapeutic potential
- Biomarker identification - Highlight diagnostic/prognostic markers
- Mechanistic synthesis - Generate testable hypotheses
- Source references - Every statement must cite tool/database
- Completeness checklist - Mandatory section showing analysis coverage
- English-first queries - Always use English terms in tool calls. Respond in user's language
从基因组学、转录组学、蛋白质组学、通路等多个分子层面对疾病进行特征分析,助力从系统层面理解疾病机制、识别治疗机会并发现候选生物标志物。
核心原则:
- 报告优先原则 - 先创建报告文件,再逐步填充内容
- 疾病消歧优先 - 在组学分析前解析所有标识符
- 逐层分析 - 系统覆盖所有组学层面
- 跨层整合 - 识别出现在多个层面的基因/靶点
- 证据分级 - 将所有证据分为T1(人类/临床)至T4(计算)等级
- 组织背景 - 强调疾病相关组织/器官
- 定量评分 - 多组学置信度评分(0-100)
- 可成药性聚焦 - 优先关注具有治疗潜力的靶点
- 生物标志物识别 - 突出诊断/预后标志物
- 机制合成 - 生成可验证的假说
- 来源引用 - 所有陈述必须标注工具/数据库来源
- 完整性检查清单 - 强制包含分析覆盖情况的章节
- 英文优先查询 - 工具调用中始终使用英文术语,以用户语言回复
When to Use This Skill
何时使用本技能
Apply when users:
- Ask about disease mechanisms across omics layers
- Need multi-omics characterization of a disease
- Want to understand disease at the systems biology level
- Ask "What pathways/genes/proteins are involved in [disease]?"
- Need biomarker discovery for a disease
- Want to identify druggable targets from disease profiling
- Ask for integrated genomics + transcriptomics + proteomics analysis
- Need cross-layer concordance analysis
- Ask about disease network biology / hub genes
NOT for (use other skills instead):
- Single gene/target validation -> Use
tooluniverse-drug-target-validation - Drug safety profiling -> Use
tooluniverse-adverse-event-detection - General disease overview -> Use
tooluniverse-disease-research - Variant interpretation -> Use
tooluniverse-variant-interpretation - GWAS-specific analysis -> Use skills
tooluniverse-gwas-* - Pathway-only analysis -> Use
tooluniverse-systems-biology
适用于用户以下场景:
- 询问跨组学层面的疾病机制
- 需要对疾病进行多组学特征分析
- 希望从系统生物学层面理解疾病
- 询问“[疾病]涉及哪些通路/基因/蛋白质?”
- 需要为疾病发现生物标志物
- 希望从疾病分析中识别可成药靶点
- 请求整合基因组学+转录组学+蛋白质组学分析
- 需要跨层一致性分析
- 询问疾病网络生物学/枢纽基因
不适用于(请使用其他技能):
- 单基因/靶点验证 -> 使用
tooluniverse-drug-target-validation - 药物安全性分析 -> 使用
tooluniverse-adverse-event-detection - 疾病概述 -> 使用
tooluniverse-disease-research - 变异解读 -> 使用
tooluniverse-variant-interpretation - 特定GWAS分析 -> 使用系列技能
tooluniverse-gwas-* - 仅通路分析 -> 使用
tooluniverse-systems-biology
Input Parameters
输入参数
| Parameter | Required | Description | Example |
|---|---|---|---|
| disease | Yes | Disease name, OMIM ID, EFO ID, or MONDO ID | |
| tissue | No | Tissue/organ of interest | |
| focus_layers | No | Specific omics layers to emphasize | |
| 参数 | 是否必填 | 描述 | 示例 |
|---|---|---|---|
| disease | 是 | 疾病名称、OMIM ID、EFO ID或MONDO ID | |
| tissue | 否 | 目标组织/器官 | |
| focus_layers | 否 | 需要重点分析的特定组学层面 | |
Multi-Omics Confidence Score (0-100)
多组学置信度评分(0-100)
Score Components
评分组成
Data Availability (0-40 points):
- Genomics data available (GWAS or rare variants): 10 points
- Transcriptomics data available (DEGs or expression): 10 points
- Protein data available (PPI or expression): 5 points
- Pathway data available (enriched pathways): 10 points
- Clinical/drug data available (approved drugs or trials): 5 points
Evidence Concordance (0-40 points):
- Multi-layer genes (appear in 3+ layers): up to 20 points (2 per gene, max 10 genes)
- Consistent direction (genetics + expression concordant): 10 points
- Pathway-gene concordance (genes found in enriched pathways): 10 points
Evidence Quality (0-20 points):
- Strong genetic evidence (GWAS p < 5e-8): 10 points
- Clinical validation (approved drugs): 10 points
数据可用性(0-40分):
- 有基因组数据(GWAS或罕见变异):10分
- 有转录组数据(差异表达基因或表达量):10分
- 有蛋白质数据(蛋白质相互作用或表达量):5分
- 有通路数据(富集通路):10分
- 有临床/药物数据(已获批药物或临床试验):5分
证据一致性(0-40分):
- 跨多层面基因(出现在3个及以上层面):最高20分(每个基因2分,最多10个基因)
- 方向一致(遗传学与表达量结果一致):10分
- 通路-基因一致性(基因存在于富集通路中):10分
证据质量(0-20分):
- 强遗传学证据(GWAS p < 5e-8):10分
- 临床验证(已获批药物):10分
Score Interpretation
评分解读
| Score | Tier | Interpretation |
|---|---|---|
| 80-100 | Excellent | Comprehensive multi-omics coverage, high confidence, strong cross-layer concordance |
| 60-79 | Good | Good coverage across most layers, some gaps |
| 40-59 | Moderate | Moderate coverage, limited cross-layer integration |
| 0-39 | Limited | Limited data, single-layer analysis dominates |
| 分数 | 等级 | 解读 |
|---|---|---|
| 80-100 | 优秀 | 全面的多组学覆盖,高置信度,强跨层一致性 |
| 60-79 | 良好 | 多数层面覆盖良好,存在部分缺口 |
| 40-59 | 中等 | 中等覆盖度,跨层整合有限 |
| 0-39 | 有限 | 数据有限,以单层面分析为主 |
Evidence Grading System
证据分级系统
| Tier | Symbol | Criteria | Examples |
|---|---|---|---|
| T1 | [T1] | Direct human evidence, clinical proof | FDA-approved drug, GWAS hit (p<5e-8), clinical trial result |
| T2 | [T2] | Experimental evidence | Differential expression (validated), functional screen, mouse KO |
| T3 | [T3] | Computational/database evidence | PPI network, pathway mapping, expression correlation |
| T4 | [T4] | Annotation/prediction only | GO annotation, text-mined association, predicted interaction |
| 等级 | 符号 | 标准 | 示例 |
|---|---|---|---|
| T1 | [T1] | 直接人类证据、临床验证 | FDA获批药物、GWAS显著关联(p<5e-8)、临床试验结果 |
| T2 | [T2] | 实验证据 | 差异表达(已验证)、功能筛选、基因敲除小鼠 |
| T3 | [T3] | 计算/数据库证据 | 蛋白质相互作用网络、通路映射、表达量相关性 |
| T4 | [T4] | 仅注释/预测 | GO注释、文本挖掘关联、预测相互作用 |
Report Template
报告模板
Create this file structure at the start:
{disease_name}_multiomic_report.mdmarkdown
undefined开始时创建以下文件结构:
{disease_name}_multiomic_report.mdmarkdown
undefinedMulti-Omics Disease Characterization: {Disease Name}
多组学疾病特征分析: {疾病名称}
Report Generated: {date}
Disease Identifiers: (to be filled)
Multi-Omics Confidence Score: (to be calculated)
报告生成时间: {日期}
疾病标识符: (待填充)
多组学置信度评分: (待计算)
Executive Summary
执行摘要
(2-3 sentence disease mechanism synthesis - fill after all layers complete)
(2-3句话总结疾病机制 - 完成所有层面分析后填充)
1. Disease Definition & Context
1. 疾病定义与背景
Disease Identifiers
疾病标识符
| System | ID | Source |
|---|
| 系统 | ID | 来源 |
|---|
Description
疾病描述
Synonyms
同义词
Disease Hierarchy (parents/children)
疾病层级(父类/子类)
Affected Tissues/Organs
受影响组织/器官
Therapeutic Areas
治疗领域
Sources: (tools used)
来源: (使用的工具)
2. Genomics Layer
2. 基因组学层面
2.1 GWAS Associations
2.1 GWAS关联
| SNP | P-value | Effect | Gene | Study | Source |
|---|
| SNP | P值 | 效应 | 基因 | 研究 | 来源 |
|---|
2.2 GWAS Studies Summary
2.2 GWAS研究汇总
| Study ID | Trait | Sample Size | Year | Source |
|---|
| 研究ID | 性状 | 样本量 | 年份 | 来源 |
|---|
2.3 Associated Genes (Genetic Evidence)
2.3 关联基因(遗传学证据)
| Gene | Ensembl ID | Association Score | Evidence Type | Source |
|---|
| 基因 | Ensembl ID | 关联评分 | 证据类型 | 来源 |
|---|
2.4 Rare Variants (ClinVar)
2.4 罕见变异(ClinVar)
| Variant | Gene | Clinical Significance | Source |
|---|
| 变异 | 基因 | 临床意义 | 来源 |
|---|
Genomics Layer Summary
基因组学层面总结
- Total GWAS hits:
- Top genes by genetic evidence:
- Genetic architecture:
Sources: (tools used)
- 总GWAS关联数:
- 遗传学证据排名靠前的基因:
- 遗传结构:
来源: (使用的工具)
3. Transcriptomics Layer
3. 转录组学层面
3.1 Differential Expression Studies
3.1 差异表达研究
| Experiment | Condition | Up-regulated | Down-regulated | Source |
|---|
| 实验 | 条件 | 上调基因 | 下调基因 | 来源 |
|---|
3.2 Expression Atlas Disease Evidence
3.2 Expression Atlas疾病证据
| Gene | Score | Source |
|---|
| 基因 | 评分 | 来源 |
|---|
3.3 Tissue Expression Patterns (GTEx/HPA)
3.3 组织表达模式(GTEx/HPA)
| Gene | Tissue | Expression Level | Source |
|---|
| 基因 | 组织 | 表达水平 | 来源 |
|---|
3.4 Biomarker Candidates (Expression-Based)
3.4 候选生物标志物(基于表达量)
| Gene | Tissue Specificity | Fold Change | Evidence | Source |
|---|
| 基因 | 组织特异性 | 倍数变化 | 证据 | 来源 |
|---|
Transcriptomics Layer Summary
转录组学层面总结
- Differential expression datasets:
- Top DEGs:
- Tissue-specific patterns:
Sources: (tools used)
- 差异表达数据集:
- 排名靠前的差异表达基因:
- 组织特异性模式:
来源: (使用的工具)
4. Proteomics & Interaction Layer
4. 蛋白质组学与相互作用层面
4.1 Protein-Protein Interactions (STRING)
4.1 蛋白质-蛋白质相互作用(STRING)
| Protein A | Protein B | Score | Source |
|---|
| 蛋白质A | 蛋白质B | 评分 | 来源 |
|---|
4.2 Hub Genes (Network Centrality)
4.2 枢纽基因(网络中心性)
| Gene | Degree | Betweenness | Role | Source |
|---|
| 基因 | 度 | 介数 | 作用 | 来源 |
|---|
4.3 Protein Complexes (IntAct)
4.3 蛋白质复合物(IntAct)
| Complex | Members | Function | Source |
|---|
| 复合物 | 成员 | 功能 | 来源 |
|---|
4.4 Tissue-Specific PPI Network
4.4 组织特异性蛋白质相互作用网络
| Gene | Interaction Score | Tissue | Source |
|---|
| 基因 | 相互作用评分 | 组织 | 来源 |
|---|
Proteomics Layer Summary
蛋白质组学层面总结
- Total PPIs:
- Hub genes:
- Network modules:
Sources: (tools used)
- 总蛋白质相互作用数:
- 枢纽基因:
- 网络模块:
来源: (使用的工具)
5. Pathway & Network Layer
5. 通路与网络层面
5.1 Enriched Pathways (Enrichr/Reactome)
5.1 富集通路(Enrichr/Reactome)
| Pathway | Database | P-value | Genes | Source |
|---|
| 通路 | 数据库 | P值 | 基因 | 来源 |
|---|
5.2 Reactome Pathway Details
5.2 Reactome通路详情
| Pathway ID | Name | Genes Involved | Source |
|---|
| 通路ID | 名称 | 涉及基因 | 来源 |
|---|
5.3 KEGG Pathways
5.3 KEGG通路
| Pathway ID | Name | Description | Source |
|---|
| 通路ID | 名称 | 描述 | 来源 |
|---|
5.4 WikiPathways
5.4 WikiPathways
| Pathway ID | Name | Organism | Source |
|---|
| 通路ID | 名称 | 物种 | 来源 |
|---|
Pathway Layer Summary
通路层面总结
- Top enriched pathways:
- Key pathway nodes:
- Cross-pathway connections:
Sources: (tools used)
- 排名靠前的富集通路:
- 关键通路节点:
- 通路间关联:
来源: (使用的工具)
6. Gene Ontology & Functional Annotation
6. 基因本体与功能注释
6.1 Biological Processes
6.1 生物过程
| GO Term | Name | P-value | Genes | Source |
|---|
| GO术语 | 名称 | P值 | 基因 | 来源 |
|---|
6.2 Molecular Functions
6.2 分子功能
| GO Term | Name | P-value | Genes | Source |
|---|
| GO术语 | 名称 | P值 | 基因 | 来源 |
|---|
6.3 Cellular Components
6.3 细胞组分
| GO Term | Name | P-value | Genes | Source |
|---|
Sources: (tools used)
| GO术语 | 名称 | P值 | 基因 | 来源 |
|---|
来源: (使用的工具)
7. Therapeutic Landscape
7. 治疗全景
7.1 Approved Drugs
7.1 已获批药物
| Drug | ChEMBL ID | Mechanism | Target | Phase | Source |
|---|
| 药物 | ChEMBL ID | 作用机制 | 靶点 | 研发阶段 | 来源 |
|---|
7.2 Druggable Targets
7.2 可成药靶点
| Gene | Tractability | Modality | Clinical Precedent | Source |
|---|
| 基因 | 可成药性 | 作用方式 | 临床先例 | 来源 |
|---|
7.3 Drug Repurposing Candidates
7.3 药物重定位候选
| Drug | Original Indication | Mechanism | Target | Source |
|---|
| 药物 | 原适应症 | 作用机制 | 靶点 | 来源 |
|---|
7.4 Clinical Trials
7.4 临床试验
| NCT ID | Title | Phase | Status | Intervention | Source |
|---|
| NCT ID | 标题 | 阶段 | 状态 | 干预措施 | 来源 |
|---|
Therapeutic Summary
治疗全景总结
- Approved drugs:
- Clinical pipeline:
- Novel targets:
Sources: (tools used)
- 已获批药物:
- 临床管线:
- 新型靶点:
来源: (使用的工具)
8. Multi-Omics Integration
8. 多组学整合
8.1 Cross-Layer Gene Concordance
8.1 跨层基因一致性
| Gene | Genomics | Transcriptomics | Proteomics | Pathways | Layers | Evidence Tier |
|---|
| 基因 | 基因组学 | 转录组学 | 蛋白质组学 | 通路 | 涉及层面数 | 证据等级 |
|---|
8.2 Multi-Omics Hub Genes (Top 20)
8.2 多组学枢纽基因(前20位)
| Rank | Gene | Layers Found | Key Evidence | Druggable | Source |
|---|
| 排名 | 基因 | 涉及层面数 | 关键证据 | 可成药性 | 来源 |
|---|
8.3 Biomarker Candidates
8.3 候选生物标志物
| Biomarker | Type | Evidence Layers | Confidence | Source |
|---|
| 生物标志物 | 类型 | 支持证据层面 | 置信度 | 来源 |
|---|
8.4 Mechanistic Hypotheses
8.4 机制假说
- (Hypothesis with supporting evidence from multiple layers)
- ...
- (基于多层面证据支持的假说)
- ...
8.5 Systems-Level Insights
8.5 系统层面洞察
- Key disrupted processes:
- Critical pathway nodes:
- Therapeutic intervention points:
- Testable hypotheses:
- 关键失调过程:
- 关键通路节点:
- 治疗干预点:
- 可验证假说:
Multi-Omics Confidence Score
多组学置信度评分
| Component | Points | Max | Details |
|---|---|---|---|
| Genomics data | 10 | ||
| Transcriptomics data | 10 | ||
| Protein data | 5 | ||
| Pathway data | 10 | ||
| Clinical data | 5 | ||
| Multi-layer genes | 20 | ||
| Direction concordance | 10 | ||
| Pathway-gene concordance | 10 | ||
| Genetic evidence quality | 10 | ||
| Clinical validation | 10 | ||
| TOTAL | 100 |
Score: XX/100 - [Tier]
| 组成部分 | 得分 | 满分 | 详情 |
|---|---|---|---|
| 基因组数据 | 10 | ||
| 转录组数据 | 10 | ||
| 蛋白质数据 | 5 | ||
| 通路数据 | 10 | ||
| 临床数据 | 5 | ||
| 跨多层面基因 | 20 | ||
| 方向一致性 | 10 | ||
| 通路-基因一致性 | 10 | ||
| 遗传学证据质量 | 10 | ||
| 临床验证 | 10 | ||
| 总分 | 100 |
评分: XX/100 - [等级]
Data Availability Checklist
数据可用性检查清单
| Omics Layer | Data Available | Tools Used | Findings |
|---|---|---|---|
| Genomics (GWAS) | Yes/No | ||
| Genomics (Rare Variants) | Yes/No | ||
| Transcriptomics (DEGs) | Yes/No | ||
| Transcriptomics (Expression) | Yes/No | ||
| Proteomics (PPI) | Yes/No | ||
| Proteomics (Expression) | Yes/No | ||
| Pathways (Enrichment) | Yes/No | ||
| Pathways (KEGG/Reactome) | Yes/No | ||
| Gene Ontology | Yes/No | ||
| Drugs/Therapeutics | Yes/No | ||
| Clinical Trials | Yes/No | ||
| Literature | Yes/No |
| 组学层面 | 数据是否可用 | 使用工具 | 发现 |
|---|---|---|---|
| 基因组学(GWAS) | 是/否 | ||
| 基因组学(罕见变异) | 是/否 | ||
| 转录组学(差异表达基因) | 是/否 | ||
| 转录组学(表达量) | 是/否 | ||
| 蛋白质组学(蛋白质相互作用) | 是/否 | ||
| 蛋白质组学(表达量) | 是/否 | ||
| 通路(富集分析) | 是/否 | ||
| 通路(KEGG/Reactome) | 是/否 | ||
| 基因本体 | 是/否 | ||
| 药物/治疗 | 是/否 | ||
| 临床试验 | 是/否 | ||
| 文献 | 是/否 |
Completeness Checklist
完整性检查清单
- Disease disambiguation complete (IDs resolved)
- Genomics layer analyzed (GWAS + variants)
- Transcriptomics layer analyzed (DEGs + expression)
- Proteomics layer analyzed (PPI + interactions)
- Pathway layer analyzed (enrichment + mapping)
- Gene Ontology analyzed (BP + MF + CC)
- Therapeutic landscape analyzed (drugs + targets + trials)
- Cross-layer integration complete (concordance analysis)
- Multi-Omics Confidence Score calculated
- Biomarker candidates identified
- Hub genes identified
- Mechanistic hypotheses generated
- Executive summary written
- All sections have source citations
- 疾病消歧完成(标识符已解析)
- 基因组学层面分析完成(GWAS + 变异)
- 转录组学层面分析完成(差异表达基因 + 表达量)
- 蛋白质组学层面分析完成(蛋白质相互作用 + 相互作用关系)
- 通路层面分析完成(富集分析 + 映射)
- 基因本体分析完成(生物过程 + 分子功能 + 细胞组分)
- 治疗全景分析完成(药物 + 靶点 + 临床试验)
- 跨层整合完成(一致性分析)
- 多组学置信度评分已计算
- 候选生物标志物已识别
- 枢纽基因已识别
- 机制假说已生成
- 执行摘要已撰写
- 所有章节均有来源引用
References
参考文献
Data Sources Used
使用的数据源
| # | Tool | Parameters | Section | Items Retrieved |
|---|
| # | 工具 | 参数 | 章节 | 检索条目数 |
|---|
Database Versions
数据库版本
- OpenTargets: (current)
- GWAS Catalog: (current)
- STRING: (current)
- Reactome: (current)
---- OpenTargets: (当前版本)
- GWAS Catalog: (当前版本)
- STRING: (当前版本)
- Reactome: (当前版本)
---Phase 0: Disease Disambiguation (ALWAYS FIRST)
阶段0:疾病消歧(始终优先执行)
Objective: Resolve disease to standard identifiers for all downstream queries.
目标: 解析疾病对应的标准标识符,用于所有下游查询。
Tools Used
使用工具
OpenTargets_get_disease_id_description_by_name (primary):
- Input: (string) - Disease name
diseaseName - Output:
{data: {search: {hits: [{id, name, description}]}}} - Use: Get MONDO/EFO IDs and description
- CRITICAL: Disease IDs from OpenTargets use underscore format (e.g., ), NOT colon format
MONDO_0004975
OSL_get_efo_id_by_disease_name (secondary):
- Input: (string) - Disease name
disease - Output:
{efo_id, name} - Use: Get EFO/MONDO ID
OpenTargets_get_disease_description_by_efoId:
- Input: (string) - Disease ID (e.g.,
efoId)MONDO_0004975 - Output:
{data: {disease: {id, name, description, dbXRefs}}} - Use: Get full description, cross-references (OMIM, UMLS, DOID, etc.)
OpenTargets_get_disease_synonyms_by_efoId:
- Input: (string)
efoId - Output:
{data: {disease: {id, name, synonyms: [{relation, terms}]}}}
OpenTargets_get_disease_therapeutic_areas_by_efoId:
- Input: (string)
efoId - Output:
{data: {disease: {id, name, therapeuticAreas: [{id, name}]}}}
OpenTargets_get_disease_ancestors_parents_by_efoId:
- Input: (string)
efoId - Output:
{data: {disease: {id, name, ancestors: [{id, name}]}}}
OpenTargets_get_disease_descendants_children_by_efoId:
- Input: (string)
efoId - Output:
{data: {disease: {id, name, descendants: [{id, name}]}}}
OpenTargets_map_any_disease_id_to_all_other_ids:
- Input: (string) - Any known disease ID (e.g.,
inputId,OMIM:104300)UMLS:C0002395 - Output:
{data: {disease: {id, name, dbXRefs: [str], ...}}} - Use: Cross-map between OMIM, UMLS, ICD10, DOID, etc.
OpenTargets_get_disease_id_description_by_name(主要工具):
- 输入: (字符串)- 疾病名称
diseaseName - 输出:
{data: {search: {hits: [{id, name, description}]}}} - 用途: 获取MONDO/EFO ID及疾病描述
- 关键提示: OpenTargets返回的疾病ID使用下划线格式(如),而非冒号格式
MONDO_0004975
OSL_get_efo_id_by_disease_name(次要工具):
- 输入: (字符串)- 疾病名称
disease - 输出:
{efo_id, name} - 用途: 获取EFO/MONDO ID
OpenTargets_get_disease_description_by_efoId:
- 输入: (字符串)- 疾病ID(如
efoId)MONDO_0004975 - 输出:
{data: {disease: {id, name, description, dbXRefs}}} - 用途: 获取完整疾病描述及交叉引用(OMIM、UMLS、DOID等)
OpenTargets_get_disease_synonyms_by_efoId:
- 输入: (字符串)
efoId - 输出:
{data: {disease: {id, name, synonyms: [{relation, terms}]}}}
OpenTargets_get_disease_therapeutic_areas_by_efoId:
- 输入: (字符串)
efoId - 输出:
{data: {disease: {id, name, therapeuticAreas: [{id, name}]}}}
OpenTargets_get_disease_ancestors_parents_by_efoId:
- 输入: (字符串)
efoId - 输出:
{data: {disease: {id, name, ancestors: [{id, name}]}}}
OpenTargets_get_disease_descendants_children_by_efoId:
- 输入: (字符串)
efoId - 输出:
{data: {disease: {id, name, descendants: [{id, name}]}}}
OpenTargets_map_any_disease_id_to_all_other_ids:
- 输入: (字符串)- 已知的任意疾病ID(如
inputId,OMIM:104300)UMLS:C0002395 - 输出:
{data: {disease: {id, name, dbXRefs: [str], ...}}} - 用途: 在OMIM、UMLS、ICD10、DOID等标识符间进行交叉映射
Workflow
工作流程
- Search by disease name to get primary ID (OpenTargets)
- Get full description and cross-references
- Get synonyms for search term expansion
- Get therapeutic areas for context
- Get disease hierarchy (parents/children)
- If user provided OMIM/other ID, map to MONDO/EFO first
- 通过疾病名称搜索获取主ID(OpenTargets)
- 获取完整疾病描述及交叉引用
- 获取同义词以扩展搜索词
- 获取治疗领域背景信息
- 获取疾病层级(父类/子类)
- 如果用户提供了OMIM或其他ID,先映射为MONDO/EFO ID
Collision-Aware Search
冲突感知搜索
When disease name returns multiple hits:
- Check if user's input matches any hit exactly
- If ambiguous, present top 3-5 options and ask user to select
- Always prefer the most specific disease (not parent categories)
- For cancer, prefer the specific tumor type over generic "cancer"
当疾病名称返回多个结果时:
- 检查用户输入是否与任一结果完全匹配
- 若存在歧义,展示前3-5个选项并请用户选择
- 始终优先选择最具体的疾病(而非父类范畴)
- 对于癌症,优先选择特定肿瘤类型而非通用的“癌症”
Key Disease IDs to Track
需要跟踪的关键疾病ID
After disambiguation, store these for all downstream queries:
- - Primary ID for OpenTargets queries (e.g.,
efo_id)MONDO_0004975 - - Canonical name (e.g.,
disease_name)Alzheimer disease - - For literature search expansion
synonyms - - For context
therapeutic_areas - - Cross-references (OMIM, UMLS, DOID, etc.)
dbXRefs
消歧完成后,存储以下信息用于所有下游查询:
- - OpenTargets查询的主ID(如
efo_id)MONDO_0004975 - - 标准疾病名称(如
disease_name)Alzheimer disease - - 用于文献搜索扩展
synonyms - - 背景信息
therapeutic_areas - - 交叉引用(OMIM、UMLS、DOID等)
dbXRefs
Phase 1: Genomics Layer
阶段1:基因组学层面
Objective: Identify genetic variants, GWAS associations, and genetically implicated genes.
目标: 识别遗传变异、GWAS关联及遗传学相关基因。
Tools Used
使用工具
OpenTargets_get_associated_targets_by_disease_efoId (primary):
- Input: (string) - Disease EFO/MONDO ID
efoId - Output:
{data: {disease: {id, name, associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}} - Use: Get ALL disease-associated genes ranked by overall evidence score
- NOTE: Returns top 25 by default. For comprehensive analysis, note the total
count
OpenTargets_get_evidence_by_datasource:
- Input: (string),
efoId(string), optionalensemblId(array),datasourceIds(int, default 50)size - Output:
{data: {disease: {evidences: {count, rows: [{...evidence details}]}}}} - Use: Get specific evidence types. Key datasourceIds for genomics:
- - GWAS/genetics
['ot_genetics_portal'] - - Rare variants
['gene2phenotype', 'genomics_england', 'orphanet'] - - ClinVar variants
['eva']
gwas_search_associations (GWAS Catalog):
- Input: (string),
disease_trait(int, default 20)size - Output:
{data: [{association_id, p_value, or_per_copy_num, or_value, beta, risk_frequency, efo_traits: [{...}], ...}], metadata: {pagination: {totalElements}}} - Use: Get genome-wide significant associations
- NOTE: Use disease name (e.g., "Alzheimer"), not ID. Returns paginated results
gwas_get_studies_for_trait:
- Input: (string),
disease_trait(int)size - Output:
{data: [...studies], metadata: {pagination}} - NOTE: May return empty if trait name does not match exactly. Try synonyms
gwas_get_variants_for_trait:
- Input: (string),
disease_trait(int)size - Output:
{data: [...variants], metadata: {pagination}}
GWAS_search_associations_by_gene:
- Input: (string)
gene_name - Output: Associations for a specific gene
OpenTargets_search_gwas_studies_by_disease:
- Input: (array of strings),
diseaseIds(bool, default true),enableIndirect(int, default 10)size - Output:
{data: {studies: {count, rows: [{id, studyType, traitFromSource, publicationFirstAuthor, publicationDate, pubmedId, nSamples, nCases, nControls, ...}]}}} - Use: Get GWAS studies from OpenTargets genetics portal
clinvar_search_variants:
- Input: (string) or
condition(string), optionalgene(int)max_results - Output: List of ClinVar variants with clinical significance
- Use: Rare variant / monogenic disease evidence
OpenTargets_get_associated_targets_by_disease_efoId(主要工具):
- 输入: (字符串)- 疾病EFO/MONDO ID
efoId - 输出:
{data: {disease: {id, name, associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}} - 用途: 获取所有与疾病关联的基因,按整体证据评分排序
- 注意: 默认返回前25个基因。如需全面分析,请记录总数
count
OpenTargets_get_evidence_by_datasource:
- 输入: (字符串),
efoId(字符串), 可选ensemblId(数组),datasourceIds(整数,默认50)size - 输出:
{data: {disease: {evidences: {count, rows: [{...evidence details}]}}}} - 用途: 获取特定类型的证据。基因组学相关的关键:
datasourceIds- - GWAS/遗传学
['ot_genetics_portal'] - - 罕见变异
['gene2phenotype', 'genomics_england', 'orphanet'] - - ClinVar变异
['eva']
gwas_search_associations(GWAS Catalog):
- 输入: (字符串),
disease_trait(整数,默认20)size - 输出:
{data: [{association_id, p_value, or_per_copy_num, or_value, beta, risk_frequency, efo_traits: [{...}], ...}], metadata: {pagination: {totalElements}}} - 用途: 获取全基因组显著关联结果
- 注意: 使用疾病名称(如"Alzheimer")而非ID。结果为分页返回
gwas_get_studies_for_trait:
- 输入: (字符串),
disease_trait(整数)size - 输出:
{data: [...studies], metadata: {pagination}} - 注意: 如果性状名称不完全匹配,可能返回空结果。请尝试使用同义词
gwas_get_variants_for_trait:
- 输入: (字符串),
disease_trait(整数)size - 输出:
{data: [...variants], metadata: {pagination}}
GWAS_search_associations_by_gene:
- 输入: (字符串)
gene_name - 输出: 特定基因的关联结果
OpenTargets_search_gwas_studies_by_disease:
- 输入: (字符串数组),
diseaseIds(布尔值,默认true),enableIndirect(整数,默认10)size - 输出:
{data: {studies: {count, rows: [{id, studyType, traitFromSource, publicationFirstAuthor, publicationDate, pubmedId, nSamples, nCases, nControls, ...}]}}} - 用途: 从OpenTargets遗传学门户获取GWAS研究
clinvar_search_variants:
- 输入: (字符串)或
condition(字符串), 可选gene(整数)max_results - 输出: 包含临床意义的ClinVar变异列表
- 用途: 罕见变异/单基因病证据
Workflow
工作流程
- Get associated genes from OpenTargets (overall scores)
- For top 10-15 genes, get genetic evidence specifically via
OpenTargets_get_evidence_by_datasource - Search GWAS Catalog for associations
- Search OpenTargets GWAS studies
- Search ClinVar for rare variants
- For top GWAS genes, check
GWAS_search_associations_by_gene
- 从OpenTargets获取关联基因(整体评分)
- 对排名前10-15的基因,通过获取特定遗传学证据
OpenTargets_get_evidence_by_datasource - 在GWAS Catalog中搜索关联结果
- 在OpenTargets中搜索GWAS研究
- 在ClinVar中搜索罕见变异
- 对排名靠前的GWAS基因,使用进行验证
GWAS_search_associations_by_gene
Gene Tracking
基因跟踪
Maintain a dictionary of genes found in genomics layer:
python
genomics_genes = {
'PSEN1': {'score': 0.87, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000080815', 'layer': 'genomics'},
'APP': {'score': 0.82, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000142192', 'layer': 'genomics'},
# ...
}维护基因组学层面发现的基因字典:
python
genomics_genes = {
'PSEN1': {'score': 0.87, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000080815', 'layer': 'genomics'},
'APP': {'score': 0.82, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000142192', 'layer': 'genomics'},
# ...
}Phase 2: Transcriptomics Layer
阶段2:转录组学层面
Objective: Identify differentially expressed genes, tissue-specific expression, and expression-based biomarkers.
目标: 识别差异表达基因、组织特异性表达及基于表达量的生物标志物。
Tools Used
使用工具
ExpressionAtlas_search_differential:
- Input: optional (string),
gene(string),condition(string, default 'homo sapiens')species - Output: Differential expression studies and results
- Use: Find studies where genes are differentially expressed in disease
ExpressionAtlas_search_experiments:
- Input: optional (string),
gene(string),condition(string)species - Output: Expression experiments relevant to condition
- Use: Find all Expression Atlas experiments for the disease
expression_atlas_disease_target_score:
- Input: (string),
efoId(int, required)pageSize - Output: Genes scored by expression evidence for the disease
- Use: Get expression-based disease-gene association scores
europepmc_disease_target_score:
- Input: (string),
efoId(int, required)pageSize - Output: Genes scored by literature evidence for the disease
- Use: Complement expression evidence with literature-mined associations
HPA_get_rna_expression_by_source (Human Protein Atlas):
- Input: (string),
gene_name(string: 'tissue', 'blood', 'brain'),source_type(string: e.g., 'brain', 'liver')source_name - Output:
{status, data: {gene_name, source_type, source_name, expression_value, expression_level, expression_unit}} - NOTE: ALL 3 params required. options: 'tissue', 'blood', 'brain', 'cell_line', 'single_cell'
source_type
HPA_get_rna_expression_in_specific_tissues:
- Input: (string),
gene_name(array of strings)tissues - Output: Expression across specified tissues
HPA_get_cancer_prognostics_by_gene:
- Input: (string)
gene_name - Output: Cancer prognostic data (if cancer context)
HPA_get_subcellular_location:
- Input: (string)
gene_name - Output: Subcellular localization data
HPA_search_genes_by_query:
- Input: (string)
query - Output: Matching genes in HPA
ExpressionAtlas_search_differential:
- 输入: 可选(字符串),
gene(字符串),condition(字符串,默认'homo sapiens')species - 输出: 差异表达研究及结果
- 用途: 查找基因在疾病中差异表达的研究
ExpressionAtlas_search_experiments:
- 输入: 可选(字符串),
gene(字符串),condition(字符串)species - 输出: 与疾病相关的表达实验
- 用途: 查找所有与疾病相关的Expression Atlas实验
expression_atlas_disease_target_score:
- 输入: (字符串),
efoId(整数,必填)pageSize - 输出: 基于表达量的疾病-基因关联评分
- 用途: 获取基于表达量的疾病关联评分
europepmc_disease_target_score:
- 输入: (字符串),
efoId(整数,必填)pageSize - 输出: 基于文献挖掘的疾病-基因关联评分
- 用途: 补充表达量证据,提供文献层面的关联评分
HPA_get_rna_expression_by_source(人类蛋白质图谱):
- 输入: (字符串),
gene_name(字符串: 'tissue', 'blood', 'brain'),source_type(字符串: 如'brain', 'liver')source_name - 输出:
{status, data: {gene_name, source_type, source_name, expression_value, expression_level, expression_unit}} - 注意: 三个参数均为必填项。选项: 'tissue', 'blood', 'brain', 'cell_line', 'single_cell'
source_type
HPA_get_rna_expression_in_specific_tissues:
- 输入: (字符串),
gene_name(字符串数组)tissues - 输出: 基因在指定组织中的表达情况
HPA_get_cancer_prognostics_by_gene:
- 输入: (字符串)
gene_name - 输出: 癌症预后数据(仅适用于癌症场景)
HPA_get_subcellular_location:
- 输入: (字符串)
gene_name - 输出: 亚细胞定位数据
HPA_search_genes_by_query:
- 输入: (字符串)
query - 输出: HPA中匹配的基因
Workflow
工作流程
- Search Expression Atlas for differential expression studies
- Get expression-based disease scores
- Get literature-based disease scores (EuropePMC)
- For top 10-15 genes from genomics layer, check tissue expression via HPA
- Check disease-relevant tissue expression patterns
- For cancer: check prognostic biomarkers
- 在Expression Atlas中搜索差异表达研究
- 获取基于表达量的疾病评分
- 获取基于文献的疾病评分(EuropePMC)
- 对基因组学层面排名前10-15的基因,通过HPA检查组织表达情况
- 检查疾病相关组织的表达模式
- 对于癌症场景:检查预后生物标志物
Gene Tracking
基因跟踪
Add transcriptomics genes to tracking:
python
transcriptomics_genes = {
'APOE': {'expression_score': 0.75, 'tissues': ['brain'], 'evidence': 'differential_expression', 'layer': 'transcriptomics'},
# ...
}将转录组学层面的基因添加到跟踪字典:
python
transcriptomics_genes = {
'APOE': {'expression_score': 0.75, 'tissues': ['brain'], 'evidence': 'differential_expression', 'layer': 'transcriptomics'},
# ...
}Phase 3: Proteomics & Interaction Layer
阶段3:蛋白质组学与相互作用层面
Objective: Map protein-protein interactions, identify hub genes, and characterize interaction networks.
目标: 绘制蛋白质-蛋白质相互作用图谱,识别枢纽基因并表征相互作用网络。
Tools Used
使用工具
STRING_get_interaction_partners (primary PPI):
- Input: (array of strings - gene names work),
protein_ids(int, default 9606),species(float, default 0.4),confidence_score(int, default 20)limit - Output:
{status: 'success', data: [{stringId_A, stringId_B, preferredName_A, preferredName_B, ncbiTaxonId, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]} - Use: Get interaction partners for disease genes
- NOTE: is an array, NOT string. Gene symbols like
protein_idswork['APOE']
STRING_get_network:
- Input: (array),
protein_ids(int),species(float)confidence_score - Output: Network of interactions between input proteins
- Use: Build disease-specific PPI network
STRING_functional_enrichment:
- Input: (array),
protein_ids(int)species - Output: Functional enrichment results (GO, KEGG, etc.)
- Use: Functional characterization of disease gene set
STRING_ppi_enrichment:
- Input: (array),
protein_ids(int)species - Output: Statistical test for PPI enrichment (more interactions than expected)
- Use: Test if disease genes form a connected module
intact_get_interactions:
- Input: (string - UniProt ID or gene name)
identifier - Output: Molecular interaction data from IntAct
intact_search_interactions:
- Input: (string),
query(int, default 0),first(int, default 25)max - Output: Search results for interactions
HPA_get_protein_interactions_by_gene:
- Input: (string)
gene_name - Output:
{gene, interactions, interactor_count, interactors: [...]}
humanbase_ppi_analysis:
- Input: (array),
gene_list(string),tissue(int),max_node(string),interaction(bool)string_mode - Output: Tissue-specific PPI network
- NOTE: ALL params required. options: 'coexpression', 'interaction', 'coexpression_and_interaction'.
interaction: true/falsestring_mode
STRING_get_interaction_partners(主要蛋白质相互作用工具):
- 输入: (字符串数组 - 基因名称可直接使用),
protein_ids(整数,默认9606),species(浮点数,默认0.4),confidence_score(整数,默认20)limit - 输出:
{status: 'success', data: [{stringId_A, stringId_B, preferredName_A, preferredName_B, ncbiTaxonId, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]} - 用途: 获取疾病相关基因的相互作用伙伴
- 注意: 是数组类型,而非字符串。基因符号如
protein_ids可直接使用['APOE']
STRING_get_network:
- 输入: (数组),
protein_ids(整数),species(浮点数)confidence_score - 输出: 输入蛋白质间的相互作用网络
- 用途: 构建疾病特异性蛋白质相互作用网络
STRING_functional_enrichment:
- 输入: (数组),
protein_ids(整数)species - 输出: 功能富集结果(GO、KEGG等)
- 用途: 对疾病相关基因集进行功能表征
STRING_ppi_enrichment:
- 输入: (数组),
protein_ids(整数)species - 输出: 蛋白质相互作用富集的统计检验(是否比随机情况有更多相互作用)
- 用途: 检验疾病相关基因是否形成连接模块
intact_get_interactions:
- 输入: (字符串 - UniProt ID或基因名称)
identifier - 输出: IntAct数据库中的分子相互作用数据
intact_search_interactions:
- 输入: (字符串),
query(整数,默认0),first(整数,默认25)max - 输出: 相互作用的搜索结果
HPA_get_protein_interactions_by_gene:
- 输入: (字符串)
gene_name - 输出:
{gene, interactions, interactor_count, interactors: [...]}
humanbase_ppi_analysis:
- 输入: (数组),
gene_list(字符串),tissue(整数),max_node(字符串),interaction(布尔值)string_mode - 输出: 组织特异性蛋白质相互作用网络
- 注意: 所有参数均为必填项。选项: 'coexpression', 'interaction', 'coexpression_and_interaction'。
interaction: true/falsestring_mode
Workflow
工作流程
- Take top 15-20 genes from genomics + transcriptomics layers
- Query STRING for interaction partners of each gene
- Build composite PPI network using STRING_get_network
- Test PPI enrichment (are genes more connected than random?)
- Get functional enrichment from STRING
- For disease-relevant tissue, get tissue-specific network (HumanBase)
- Identify hub genes (highest degree centrality)
- Check IntAct for experimentally validated interactions
- 选取基因组学+转录组学层面排名前15-20的基因
- 为每个基因查询STRING获取相互作用伙伴
- 使用STRING_get_network构建复合蛋白质相互作用网络
- 进行蛋白质相互作用富集检验(基因间的连接是否比随机情况更紧密?)
- 从STRING获取功能富集结果
- 针对疾病相关组织,通过HumanBase获取组织特异性网络
- 识别枢纽基因(度中心性最高的基因)
- 在IntAct中查找实验验证的相互作用
Hub Gene Analysis
枢纽基因分析
Calculate network centrality metrics:
- Degree: Number of interaction partners
- Betweenness: Number of shortest paths through node
- Hub score: Genes with degree > mean + 1 SD are hubs
计算网络中心性指标:
- 度: 相互作用伙伴的数量
- 介数: 经过该节点的最短路径数量
- 枢纽评分: 度大于均值+1个标准差的基因即为枢纽基因
Phase 4: Pathway & Network Layer
阶段4:通路与网络层面
Objective: Identify enriched biological pathways and cross-pathway connections.
目标: 识别富集的生物通路及通路间的关联。
Tools Used
使用工具
enrichr_gene_enrichment_analysis (primary enrichment):
- Input: (array of gene symbols, min 2),
gene_list(array of library names)libs - Output:
{status: 'success', data: '{...JSON string with enrichment results...}'} - Key libraries: ,
['KEGG_2021_Human'],['Reactome_2022'],['WikiPathway_2023_Human'],['GO_Biological_Process_2023'],['GO_Molecular_Function_2023']['GO_Cellular_Component_2023'] - NOTE: field is a JSON string, needs parsing. Contains
dataand per-library resultsconnected_paths - NOTE: is REQUIRED as array
libs
ReactomeAnalysis_pathway_enrichment:
- Input: (string - space-separated gene list), optional
identifiers(int, default 20),page_size(bool),include_disease(bool)projection - Output:
{data: {token, analysis_type, pathways_found, pathways: [{pathway_id, name, species, is_disease, is_lowest_level, entities_found, entities_total, entities_ratio, p_value, fdr, reactions_found, reactions_total}]}} - Use: Reactome-specific pathway enrichment with statistical testing
Reactome_map_uniprot_to_pathways:
- Input: (string - UniProt accession)
id - Output: List of Reactome pathways containing this protein
- Use: Map individual proteins to pathways
Reactome_get_pathway:
- Input: (string - Reactome stable ID, e.g., 'R-HSA-73817')
stId - Output: Pathway details
Reactome_get_pathway_reactions:
- Input: (string)
stId - Output: Reactions within pathway
kegg_search_pathway:
- Input: (string)
keyword - Output: Array of KEGG pathway matches
kegg_get_pathway_info:
- Input: (string, e.g., 'hsa04930')
pathway_id - Output: Detailed pathway information
WikiPathways_search:
- Input: (string), optional
query(string, e.g., 'Homo sapiens')organism - Output: Matching community-curated pathways
enrichr_gene_enrichment_analysis(主要富集分析工具):
- 输入: (基因符号数组,最少2个),
gene_list(数据库名称数组)libs - 输出:
{status: 'success', data: '{...JSON string with enrichment results...}'} - 关键数据库: ,
['KEGG_2021_Human'],['Reactome_2022'],['WikiPathway_2023_Human'],['GO_Biological_Process_2023'],['GO_Molecular_Function_2023']['GO_Cellular_Component_2023'] - 注意: 字段是JSON字符串,需要解析。包含
data及各数据库的富集结果connected_paths - 注意: 为必填数组参数
libs
ReactomeAnalysis_pathway_enrichment:
- 输入: (字符串 - 空格分隔的基因列表), 可选
identifiers(整数,默认20),page_size(布尔值),include_disease(布尔值)projection - 输出:
{data: {token, analysis_type, pathways_found, pathways: [{pathway_id, name, species, is_disease, is_lowest_level, entities_found, entities_total, entities_ratio, p_value, fdr, reactions_found, reactions_total}]}} - 用途: 针对Reactome数据库进行通路富集分析及统计检验
Reactome_map_uniprot_to_pathways:
- 输入: (字符串 - UniProt登录号)
id - 输出: 包含该蛋白质的Reactome通路列表
- 用途: 将单个蛋白质映射到通路
Reactome_get_pathway:
- 输入: (字符串 - Reactome稳定ID,如'R-HSA-73817')
stId - 输出: 通路详情
Reactome_get_pathway_reactions:
- 输入: (字符串)
stId - 输出: 通路内的反应
kegg_search_pathway:
- 输入: (字符串)
keyword - 输出: 匹配的KEGG通路数组
kegg_get_pathway_info:
- 输入: (字符串,如'hsa04930')
pathway_id - 输出: 详细的通路信息
WikiPathways_search:
- 输入: (字符串), 可选
query(字符串,如'Homo sapiens')organism - 输出: 匹配的社区注释通路
Workflow
工作流程
- Collect all genes from genomics + transcriptomics layers (top 20-30)
- Run Enrichr enrichment for KEGG, Reactome, WikiPathways
- Run ReactomeAnalysis for more detailed Reactome enrichment with p-values
- Search KEGG for disease-specific pathways
- Search WikiPathways for disease pathways
- For top Reactome pathways, get detailed reactions
- Identify cross-pathway connections (genes in multiple pathways)
- 收集基因组学+转录组学层面的所有基因(前20-30个)
- 使用Enrichr针对KEGG、Reactome、WikiPathways进行富集分析
- 使用ReactomeAnalysis进行更详细的Reactome富集分析,获取P值
- 在KEGG中搜索疾病特异性通路
- 在WikiPathways中搜索疾病相关通路
- 对排名靠前的Reactome通路,获取详细的反应信息
- 识别通路间的关联(出现在多个通路中的基因)
Phase 5: Gene Ontology & Functional Annotation
阶段5:基因本体与功能注释
Objective: Characterize biological processes, molecular functions, and cellular components.
目标: 表征生物过程、分子功能及细胞组分。
Tools Used
使用工具
enrichr_gene_enrichment_analysis (GO enrichment):
- Use with for BP
libs=['GO_Biological_Process_2023'] - Use with for MF
libs=['GO_Molecular_Function_2023'] - Use with for CC
libs=['GO_Cellular_Component_2023']
GO_get_annotations_for_gene:
- Input: (string - gene symbol or UniProt ID)
gene_id - Output: List of GO annotations with terms, aspects, evidence codes
GO_search_terms:
- Input: (string)
query - Output: Matching GO terms
QuickGO_annotations_by_gene:
- Input: (string - UniProt accession, e.g., 'UniProtKB:P02649'), optional
gene_product_id(string: 'biological_process', 'molecular_function', 'cellular_component'),aspect(int: 9606),taxon_id(int: 25)limit - Output: GO annotations with evidence codes
OpenTargets_get_target_gene_ontology_by_ensemblID:
- Input: (string)
ensemblId - Output: GO terms associated with target
enrichr_gene_enrichment_analysis(GO富集分析):
- 使用获取生物过程(BP)
libs=['GO_Biological_Process_2023'] - 使用获取分子功能(MF)
libs=['GO_Molecular_Function_2023'] - 使用获取细胞组分(CC)
libs=['GO_Cellular_Component_2023']
GO_get_annotations_for_gene:
- 输入: (字符串 - 基因符号或UniProt ID)
gene_id - 输出: 包含术语、方向、证据编码的GO注释列表
GO_search_terms:
- 输入: (字符串)
query - 输出: 匹配的GO术语
QuickGO_annotations_by_gene:
- 输入: (字符串 - UniProt登录号,如'UniProtKB:P02649'), 可选
gene_product_id(字符串: 'biological_process', 'molecular_function', 'cellular_component'),aspect(整数: 9606),taxon_id(整数: 25)limit - 输出: 包含证据编码的GO注释
OpenTargets_get_target_gene_ontology_by_ensemblID:
- 输入: (字符串)
ensemblId - 输出: 与靶点关联的GO术语
Workflow
工作流程
- Run Enrichr GO enrichment for all 3 aspects using combined gene list
- For top 5 genes, get detailed GO annotations from QuickGO
- For top genes, get OpenTargets GO terms
- Summarize key biological processes, molecular functions, cellular components
- 使用合并后的基因列表,通过Enrichr对GO的3个层面进行富集分析
- 对排名前5的基因,从QuickGO获取详细的GO注释
- 对排名靠前的基因,获取OpenTargets中的GO术语
- 总结关键生物过程、分子功能及细胞组分
Phase 6: Therapeutic Landscape
阶段6:治疗全景
Objective: Map approved drugs, druggable targets, repurposing opportunities, and clinical trials.
目标: 绘制已获批药物、可成药靶点、药物重定位机会及临床试验的图谱。
Tools Used
使用工具
OpenTargets_get_associated_drugs_by_disease_efoId (primary):
- Input: (string),
efoId(int, REQUIRED - use 100)size - Output:
{data: {disease: {knownDrugs: {count, rows: [{drug: {id, name, tradeNames, maximumClinicalTrialPhase, isApproved, hasBeenWithdrawn}, phase, mechanismOfAction, target: {id, approvedSymbol}, disease: {id, name}, urls: [{url, name}]}]}}}} - Use: All drugs associated with disease (approved + investigational)
OpenTargets_get_target_tractability_by_ensemblID:
- Input: (string)
ensemblId - Output: Tractability assessment (small molecule, antibody, PROTAC, etc.)
OpenTargets_get_associated_drugs_by_target_ensemblID:
- Input: (string),
ensemblId(int, REQUIRED)size - Output: Drugs targeting this gene/protein
search_clinical_trials:
- Input: (string, REQUIRED), optional
query_term(string),condition(string),intervention(int, default 10)pageSize - Output: Clinical trial results
- NOTE: is REQUIRED even if
query_termis providedcondition
OpenTargets_get_drug_mechanisms_of_action_by_chemblId:
- Input: (string)
chemblId - Output: Mechanism of action details
OpenTargets_get_associated_drugs_by_disease_efoId(主要工具):
- 输入: (字符串),
efoId(整数,必填 - 建议使用100)size - 输出:
{data: {disease: {knownDrugs: {count, rows: [{drug: {id, name, tradeNames, maximumClinicalTrialPhase, isApproved, hasBeenWithdrawn}, phase, mechanismOfAction, target: {id, approvedSymbol}, disease: {id, name}, urls: [{url, name}]}]}}}} - 用途: 获取所有与疾病关联的药物(已获批+研究中)
OpenTargets_get_target_tractability_by_ensemblID:
- 输入: (字符串)
ensemblId - 输出: 可成药性评估(小分子、抗体、PROTAC等)
OpenTargets_get_associated_drugs_by_target_ensemblID:
- 输入: (字符串),
ensemblId(整数,必填)size - 输出: 靶向该基因/蛋白质的药物
search_clinical_trials:
- 输入: (字符串,必填), 可选
query_term(字符串),condition(字符串),intervention(整数,默认10)pageSize - 输出: 临床试验结果
- 注意: 即使提供了,
condition仍为必填项query_term
OpenTargets_get_drug_mechanisms_of_action_by_chemblId:
- 输入: (字符串)
chemblId - 输出: 药物作用机制详情
Workflow
工作流程
- Get all drugs for disease from OpenTargets
- For top disease-associated genes, check tractability
- For top genes with no approved drugs, identify repurposing candidates
- Search clinical trials for disease
- For top approved drugs, get mechanism of action
- 从OpenTargets获取所有与疾病关联的药物
- 对排名靠前的疾病关联基因,检查其可成药性
- 对尚无获批药物的排名靠前基因,识别药物重定位候选
- 搜索疾病相关的临床试验
- 对排名靠前的已获批药物,获取其作用机制
Drug Tracking
药物跟踪
python
drug_targets = {
'PSEN1': {'drugs': ['Semagacestat'], 'tractability': 'small_molecule', 'clinical_phase': 3},
'ACHE': {'drugs': ['Donepezil', 'Galantamine'], 'tractability': 'small_molecule', 'clinical_phase': 4},
# ...
}python
drug_targets = {
'PSEN1': {'drugs': ['Semagacestat'], 'tractability': 'small_molecule', 'clinical_phase': 3},
'ACHE': {'drugs': ['Donepezil', 'Galantamine'], 'tractability': 'small_molecule', 'clinical_phase': 4},
# ...
}Phase 7: Multi-Omics Integration
阶段7:多组学整合
Objective: Integrate findings across all layers to identify cross-layer genes, calculate concordance, and generate mechanistic hypotheses.
目标: 整合所有层面的发现,识别跨层基因、计算一致性并生成机制假说。
Cross-Layer Gene Concordance Analysis
跨层基因一致性分析
This is the core integrative step. For each gene found in the analysis:
-
Count layers: In how many omics layers does this gene appear?
- Genomics (GWAS, rare variants, genetic association)
- Transcriptomics (DEGs, expression score)
- Proteomics (PPI hub, protein expression)
- Pathways (enriched pathway member)
- Therapeutics (drug target)
-
Score genes: Genes appearing in 3+ layers are "multi-omics hub genes"
-
Direction concordance: Do genetics and expression agree?
- Risk allele + upregulated = concordant gain-of-function
- Risk allele + downregulated = concordant loss-of-function
- Discordant = needs investigation
这是核心的整合步骤。对于分析中发现的每个基因:
-
统计涉及层面数: 该基因出现在多少个组学层面中?
- 基因组学(GWAS、罕见变异、遗传关联)
- 转录组学(差异表达基因、表达评分)
- 蛋白质组学(蛋白质相互作用枢纽、蛋白质表达)
- 通路(富集通路成员)
- 治疗学(药物靶点)
-
基因评分: 出现在3个及以上层面的基因即为“多组学枢纽基因”
-
方向一致性: 遗传学与表达量结果是否一致?
- 风险等位基因+上调表达 = 一致的功能获得
- 风险等位基因+下调表达 = 一致的功能丧失
- 不一致 = 需要进一步研究
Biomarker Identification
生物标志物识别
For each multi-omics hub gene, assess biomarker potential:
- Diagnostic: Gene expression distinguishes disease vs healthy
- Prognostic: Expression/variant predicts outcome (cancer prognostics from HPA)
- Predictive: Variant/expression predicts treatment response (pharmacogenomics)
- Evidence level: Number of supporting omics layers
对于每个多组学枢纽基因,评估其生物标志物潜力:
- 诊断型: 基因表达可区分疾病与健康状态
- 预后型: 表达量/变异可预测疾病结局(来自HPA的癌症预后数据)
- 预测型: 变异/表达量可预测治疗响应(药物基因组学)
- 证据等级: 支持的组学层面数量
Mechanistic Hypothesis Generation
机制假说生成
From the integrated data:
- Identify the most supported biological processes (GO + pathways)
- Map causal chain: genetic variant -> gene expression -> protein function -> pathway disruption -> disease
- Identify intervention points (druggable nodes in the causal chain)
- Generate testable hypotheses
从整合数据中:
- 识别支持证据最多的生物过程(GO+通路)
- 绘制因果链: 遗传变异 -> 基因表达 -> 蛋白质功能 -> 通路失调 -> 疾病
- 识别干预点(因果链中的可成药节点)
- 生成可验证的假说
Confidence Score Calculation
置信度评分计算
Calculate the Multi-Omics Confidence Score (0-100) based on:
- Data availability across layers
- Cross-layer concordance
- Evidence quality
- Clinical validation
基于以下指标计算多组学置信度评分(0-100):
- 各层面的数据可用性
- 跨层一致性
- 证据质量
- 临床验证
Phase 8: Report Finalization
阶段8:报告定稿
Executive Summary
执行摘要
Write a 2-3 sentence synthesis covering:
- Disease mechanism in systems terms
- Key genes/pathways identified
- Therapeutic opportunities
撰写2-3句话的总结,涵盖:
- 系统层面的疾病机制
- 识别出的关键基因/通路
- 治疗机会
Final Report Quality Checklist
最终报告质量检查清单
Before presenting to user, verify:
- All 8 sections have content (or marked as "No data available")
- Every data point has a source citation
- Executive summary reflects key findings
- Multi-Omics Confidence Score calculated
- Top 20 genes ranked by multi-omics evidence
- Top 10 enriched pathways listed
- Biomarker candidates identified
- Cross-layer concordance table complete
- Therapeutic opportunities summarized
- Mechanistic hypotheses generated
- Data Availability Checklist complete
- Completeness Checklist complete
- References section lists all tools used
在提交给用户前,请验证:
- 所有8个章节均有内容(或标注为“无可用数据”)
- 每个数据点均有来源引用
- 执行摘要反映关键发现
- 多组学置信度评分已计算
- 按多组学证据排名的前20个基因
- 列出排名前10的富集通路
- 已识别候选生物标志物
- 跨层一致性表格已完成
- 治疗机会已总结
- 已生成机制假说
- 数据可用性检查清单已完成
- 完整性检查清单已完成
- 参考文献章节列出所有使用的工具
Tool Parameter Quick Reference
工具参数快速参考
| Tool | Key Parameters | Notes |
|---|---|---|
| | Primary disambiguation |
| | Secondary disambiguation |
| | Returns top 25 genes |
| | Per-gene evidence |
| | GWAS studies |
| | GWAS Catalog |
| | Rare variants |
| | DEGs |
| | Expression scores |
| | Literature scores |
| | Tissue expression |
| | PPI partners |
| | PPI network |
| | Functional enrichment |
| | Network significance |
| | Experimental PPIs |
| | Tissue PPI |
| | Pathway/GO enrichment |
| | Reactome enrichment |
| | Protein-pathway mapping |
| | KEGG pathway search |
| | WikiPathways search |
| | GO annotations |
| | Detailed GO |
| | Disease drugs |
| | Druggability |
| | Clinical trials |
| | Literature |
| | Gene lookup |
| | Gene info |
| | Similar diseases |
| 工具 | 关键参数 | 注意事项 |
|---|---|---|
| | 主要消歧工具 |
| | 次要消歧工具 |
| | 返回前25个基因 |
| | 单基因证据获取 |
| | GWAS研究获取 |
| | GWAS Catalog关联搜索 |
| | 罕见变异搜索 |
| | 差异表达基因搜索 |
| | 表达评分获取 |
| | 文献评分获取 |
| | 组织表达获取 |
| | 蛋白质相互作用伙伴获取 |
| | 蛋白质相互作用网络构建 |
| | 功能富集分析 |
| | 网络显著性检验 |
| | 实验验证的蛋白质相互作用搜索 |
| | 组织特异性蛋白质相互作用网络 |
| | 通路/GO富集分析 |
| | Reactome富集分析 |
| | 蛋白质-通路映射 |
| | KEGG通路搜索 |
| | WikiPathways搜索 |
| | GO注释获取 |
| | 详细GO注释获取 |
| | 疾病关联药物获取 |
| | 可成药性评估 |
| | 临床试验搜索 |
| | 文献搜索 |
| | 基因查找 |
| | 基因信息获取 |
| | 相似疾病获取 |
Response Format Notes (Verified)
响应格式说明(已验证)
OpenTargets Associated Targets
OpenTargets关联靶点
json
{
"data": {
"disease": {
"id": "MONDO_0004975",
"name": "Alzheimer disease",
"associatedTargets": {
"count": 2456,
"rows": [
{
"target": {"id": "ENSG00000080815", "approvedSymbol": "PSEN1"},
"score": 0.87
}
]
}
}
}
}json
{
"data": {
"disease": {
"id": "MONDO_0004975",
"name": "Alzheimer disease",
"associatedTargets": {
"count": 2456,
"rows": [
{
"target": {"id": "ENSG00000080815", "approvedSymbol": "PSEN1"},
"score": 0.87
}
]
}
}
}
}GWAS Catalog Associations
GWAS Catalog关联
json
{
"data": [
{
"association_id": 216440893,
"p_value": 2e-09,
"or_per_copy_num": 0.94,
"or_value": "0.94",
"efo_traits": [{"..."}],
"risk_frequency": "NR"
}
],
"metadata": {"pagination": {"totalElements": 1061816}}
}json
{
"data": [
{
"association_id": 216440893,
"p_value": 2e-09,
"or_per_copy_num": 0.94,
"or_value": "0.94",
"efo_traits": [{"..."}],
"risk_frequency": "NR"
}
],
"metadata": {"pagination": {"totalElements": 1061816}}
}STRING Interactions
STRING相互作用
json
{
"status": "success",
"data": [
{
"stringId_A": "9606.ENSP00000252486",
"stringId_B": "9606.ENSP00000466775",
"preferredName_A": "APOE",
"preferredName_B": "APOC2",
"score": 0.999
}
]
}json
{
"status": "success",
"data": [
{
"stringId_A": "9606.ENSP00000252486",
"stringId_B": "9606.ENSP00000466775",
"preferredName_A": "APOE",
"preferredName_B": "APOC2",
"score": 0.999
}
]
}Reactome Enrichment
Reactome富集
json
{
"data": {
"token": "...",
"pathways_found": 154,
"pathways": [
{
"pathway_id": "R-HSA-1251985",
"name": "Nuclear signaling by ERBB4",
"species": "Homo sapiens",
"is_disease": false,
"is_lowest_level": true,
"entities_found": 3,
"entities_total": 47,
"entities_ratio": 0.00291,
"p_value": 4.0e-06,
"fdr": 0.00068,
"reactions_found": 3,
"reactions_total": 34
}
]
}
}json
{
"data": {
"token": "...",
"pathways_found": 154,
"pathways": [
{
"pathway_id": "R-HSA-1251985",
"name": "Nuclear signaling by ERBB4",
"species": "Homo sapiens",
"is_disease": false,
"is_lowest_level": true,
"entities_found": 3,
"entities_total": 47,
"entities_ratio": 0.00291,
"p_value": 4.0e-06,
"fdr": 0.00068,
"reactions_found": 3,
"reactions_total": 34
}
]
}
}HPA RNA Expression
HPA RNA表达
json
{
"status": "success",
"data": {
"gene_name": "APOE",
"source_type": "tissue",
"source_name": "brain",
"expression_value": "2714.9",
"expression_level": "very high",
"expression_unit": "nTPM"
}
}json
{
"status": "success",
"data": {
"gene_name": "APOE",
"source_type": "tissue",
"source_name": "brain",
"expression_value": "2714.9",
"expression_level": "very high",
"expression_unit": "nTPM"
}
}Enrichr Results
Enrichr结果
json
{
"status": "success",
"data": "{\"connected_paths\": {\"Path: ...\": \"Total Weight: ...\"}}"
}NOTE: The field is a JSON string that needs parsing.
datajson
{
"status": "success",
"data": "{\"connected_paths\": {\"Path: ...\": \"Total Weight: ...\"}}"
}注意: 字段是JSON字符串,需要进行解析。
dataCommon Use Patterns
常见使用模式
1. Comprehensive Disease Profiling
1. 全面疾病分析
User: "Characterize Alzheimer's disease across omics layers"
-> Run all 8 phases
-> Produce full multi-omics report用户: "对阿尔茨海默病进行跨组学层面的特征分析"
-> 执行所有8个阶段
-> 生成完整多组学报告2. Therapeutic Target Discovery
2. 治疗靶点发现
User: "What are druggable targets for rheumatoid arthritis?"
-> Emphasize Phase 1 (genomics), Phase 6 (therapeutics), Phase 7 (integration)
-> Focus on tractability and clinical precedent用户: "类风湿关节炎的可成药靶点有哪些?"
-> 重点执行阶段1(基因组学)、阶段6(治疗学)、阶段7(整合)
-> 聚焦可成药性及临床先例3. Biomarker Identification
3. 生物标志物识别
User: "Find diagnostic biomarkers for pancreatic cancer"
-> Emphasize Phase 2 (transcriptomics), Phase 3 (proteomics), Phase 7 (biomarkers)
-> Focus on tissue-specific expression and diagnostic potential用户: "寻找胰腺癌的诊断生物标志物"
-> 重点执行阶段2(转录组学)、阶段3(蛋白质组学)、阶段7(生物标志物)
-> 聚焦组织特异性表达及诊断潜力4. Mechanism Elucidation
4. 机制解析
User: "What pathways are dysregulated in Crohn's disease?"
-> Emphasize Phase 4 (pathways), Phase 5 (GO), Phase 7 (mechanistic hypotheses)
-> Focus on pathway enrichment and cross-pathway connections用户: "克罗恩病中哪些通路失调?"
-> 重点执行阶段4(通路)、阶段5(GO)、阶段7(机制假说)
-> 聚焦通路富集及通路间关联5. Drug Repurposing
5. 药物重定位
User: "What existing drugs could be repurposed for ALS?"
-> Emphasize Phase 1 (genetics), Phase 6 (therapeutic landscape), Phase 7 (repurposing)
-> Focus on drugs targeting disease-associated genes用户: "哪些现有药物可重定位用于ALS?"
-> 重点执行阶段1(遗传学)、阶段6(治疗全景)、阶段7(重定位)
-> 聚焦靶向疾病关联基因的药物6. Systems Biology
6. 系统生物学分析
User: "What are the hub genes and key pathways in type 2 diabetes?"
-> Emphasize Phase 3 (PPI network), Phase 4 (pathways), Phase 7 (network analysis)
-> Focus on hub genes and network modules用户: "2型糖尿病中的枢纽基因和关键通路有哪些?"
-> 重点执行阶段3(蛋白质相互作用网络)、阶段4(通路)、阶段7(网络分析)
-> 聚焦枢纽基因及网络模块Edge Case Handling
边缘场景处理
Rare Diseases (limited data)
罕见病(数据有限)
- Genomics layer may dominate (single gene)
- Limited GWAS data (monogenic)
- Focus on ClinVar variants, pathway consequences
- Confidence score will be lower (less cross-layer data)
- 基因组学层面可能占主导(单基因)
- GWAS数据有限(单基因病)
- 聚焦ClinVar变异、通路影响
- 置信度评分会较低(跨层数据较少)
Common Diseases (overwhelming data)
常见病(数据过多)
- Thousands of GWAS associations
- Prioritize by effect size and significance
- Focus on top 20-30 genes for downstream analysis
- Use strict significance thresholds (p < 5e-8)
- 数千个GWAS关联结果
- 按效应量和显著性优先排序
- 下游分析聚焦前20-30个基因
- 使用严格的显著性阈值(p < 5e-8)
Cancer
癌症
- Include somatic mutations (if CIViC/cBioPortal available)
- Check cancer prognostics via HPA
- Include tumor-specific expression patterns
- Clinical trial landscape may be extensive
- 包含体细胞突变(若CIViC/cBioPortal可用)
- 通过HPA检查癌症预后
- 包含肿瘤特异性表达模式
- 临床试验全景可能非常广泛
Monogenic Diseases
单基因病
- Single gene dominates
- ClinVar/OMIM evidence is primary
- Pathway analysis reveals downstream effects
- Therapeutic landscape may be limited (gene therapy, enzyme replacement)
- 单个基因占主导
- ClinVar/OMIM证据为主要依据
- 通路分析揭示下游效应
- 治疗全景可能有限(基因治疗、酶替代疗法)
Polygenic Diseases
多基因病
- Many weak genetic signals
- GWAS provides the gene list
- Pathway enrichment reveals convergent biology
- Network analysis identifies hub genes
- 许多弱遗传信号
- GWAS提供基因列表
- 通路富集揭示趋同生物学特征
- 网络分析识别枢纽基因
Tissue Ambiguity
组织歧义
- Diseases affecting multiple tissues
- Query HPA for all relevant tissues
- Compare tissue-specific expression patterns
- Use tissue context from disease ontology
- 影响多个组织的疾病
- 查询HPA获取所有相关组织的信息
- 比较组织特异性表达模式
- 使用疾病本体中的组织背景信息
Fallback Strategies
fallback策略
If disease name not found
若未找到疾病名称
- Try synonyms
- Try broader disease category
- Try OMIM/UMLS ID mapping
- Report disambiguation failure and ask user
- 尝试使用同义词
- 尝试更宽泛的疾病类别
- 尝试OMIM/UMLS ID映射
- 报告消歧失败并询问用户
If no GWAS data
若无GWAS数据
- Check ClinVar for rare variants
- Use OpenTargets genetic evidence
- Note in report as "Limited genetic data"
- Adjust confidence score accordingly
- 检查ClinVar中的罕见变异
- 使用OpenTargets中的遗传学证据
- 在报告中注明“遗传学数据有限”
- 相应调整置信度评分
If no expression data
若无表达数据
- Try different disease name/synonym
- Check HPA for individual gene expression
- Use OpenTargets expression evidence
- Note as "Limited transcriptomics data"
- 尝试不同的疾病名称/同义词
- 检查HPA中单个基因的表达情况
- 使用OpenTargets中的表达证据
- 注明“转录组学数据有限”
If no pathway enrichment
若无通路富集结果
- Reduce gene list stringency
- Try different pathway databases
- Map individual genes to pathways via Reactome
- Note as "No significant pathway enrichment"
- 降低基因列表的筛选严格度
- 尝试不同的通路数据库
- 通过Reactome将单个基因映射到通路
- 注明“无显著通路富集”
If no drugs found
若无药物数据
- Check if disease is rare/orphan
- Look for drugs targeting individual genes
- Check clinical trials for investigational therapies
- Note as "No approved drugs - novel therapeutic opportunity"
- 检查疾病是否为罕见/孤儿病
- 查找靶向单个基因的药物
- 检查研究中的临床试验疗法
- 注明“无获批药物 - 存在新型治疗机会”