tooluniverse-multiomic-disease-characterization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Multi-Omics Disease Characterization Pipeline

多组学疾病特征分析流程

Characterize diseases across multiple molecular layers (genomics, transcriptomics, proteomics, pathways) to provide systems-level understanding of disease mechanisms, identify therapeutic opportunities, and discover biomarker candidates.
KEY PRINCIPLES:
  1. Report-first approach - Create report file FIRST, then populate progressively
  2. Disease disambiguation FIRST - Resolve all identifiers before omics analysis
  3. Layer-by-layer analysis - Systematically cover all omics layers
  4. Cross-layer integration - Identify genes/targets appearing in multiple layers
  5. Evidence grading - Grade all evidence as T1 (human/clinical) to T4 (computational)
  6. Tissue context - Emphasize disease-relevant tissues/organs
  7. Quantitative scoring - Multi-Omics Confidence Score (0-100)
  8. Druggable focus - Prioritize targets with therapeutic potential
  9. Biomarker identification - Highlight diagnostic/prognostic markers
  10. Mechanistic synthesis - Generate testable hypotheses
  11. Source references - Every statement must cite tool/database
  12. Completeness checklist - Mandatory section showing analysis coverage
  13. English-first queries - Always use English terms in tool calls. Respond in user's language

从基因组学、转录组学、蛋白质组学、通路等多个分子层面对疾病进行特征分析,助力从系统层面理解疾病机制、识别治疗机会并发现候选生物标志物。
核心原则:
  1. 报告优先原则 - 先创建报告文件,再逐步填充内容
  2. 疾病消歧优先 - 在组学分析前解析所有标识符
  3. 逐层分析 - 系统覆盖所有组学层面
  4. 跨层整合 - 识别出现在多个层面的基因/靶点
  5. 证据分级 - 将所有证据分为T1(人类/临床)至T4(计算)等级
  6. 组织背景 - 强调疾病相关组织/器官
  7. 定量评分 - 多组学置信度评分(0-100)
  8. 可成药性聚焦 - 优先关注具有治疗潜力的靶点
  9. 生物标志物识别 - 突出诊断/预后标志物
  10. 机制合成 - 生成可验证的假说
  11. 来源引用 - 所有陈述必须标注工具/数据库来源
  12. 完整性检查清单 - 强制包含分析覆盖情况的章节
  13. 英文优先查询 - 工具调用中始终使用英文术语,以用户语言回复

When to Use This Skill

何时使用本技能

Apply when users:
  • Ask about disease mechanisms across omics layers
  • Need multi-omics characterization of a disease
  • Want to understand disease at the systems biology level
  • Ask "What pathways/genes/proteins are involved in [disease]?"
  • Need biomarker discovery for a disease
  • Want to identify druggable targets from disease profiling
  • Ask for integrated genomics + transcriptomics + proteomics analysis
  • Need cross-layer concordance analysis
  • Ask about disease network biology / hub genes
NOT for (use other skills instead):
  • Single gene/target validation -> Use
    tooluniverse-drug-target-validation
  • Drug safety profiling -> Use
    tooluniverse-adverse-event-detection
  • General disease overview -> Use
    tooluniverse-disease-research
  • Variant interpretation -> Use
    tooluniverse-variant-interpretation
  • GWAS-specific analysis -> Use
    tooluniverse-gwas-*
    skills
  • Pathway-only analysis -> Use
    tooluniverse-systems-biology

适用于用户以下场景:
  • 询问跨组学层面的疾病机制
  • 需要对疾病进行多组学特征分析
  • 希望从系统生物学层面理解疾病
  • 询问“[疾病]涉及哪些通路/基因/蛋白质?”
  • 需要为疾病发现生物标志物
  • 希望从疾病分析中识别可成药靶点
  • 请求整合基因组学+转录组学+蛋白质组学分析
  • 需要跨层一致性分析
  • 询问疾病网络生物学/枢纽基因
不适用于(请使用其他技能):
  • 单基因/靶点验证 -> 使用
    tooluniverse-drug-target-validation
  • 药物安全性分析 -> 使用
    tooluniverse-adverse-event-detection
  • 疾病概述 -> 使用
    tooluniverse-disease-research
  • 变异解读 -> 使用
    tooluniverse-variant-interpretation
  • 特定GWAS分析 -> 使用
    tooluniverse-gwas-*
    系列技能
  • 仅通路分析 -> 使用
    tooluniverse-systems-biology

Input Parameters

输入参数

ParameterRequiredDescriptionExample
diseaseYesDisease name, OMIM ID, EFO ID, or MONDO ID
Alzheimer disease
,
MONDO_0004975
tissueNoTissue/organ of interest
brain
,
liver
,
blood
focus_layersNoSpecific omics layers to emphasize
genomics
,
transcriptomics
,
pathways

参数是否必填描述示例
disease疾病名称、OMIM ID、EFO ID或MONDO ID
阿尔茨海默病
,
MONDO_0004975
tissue目标组织/器官
大脑
,
肝脏
,
血液
focus_layers需要重点分析的特定组学层面
genomics
,
transcriptomics
,
pathways

Multi-Omics Confidence Score (0-100)

多组学置信度评分(0-100)

Score Components

评分组成

Data Availability (0-40 points):
  • Genomics data available (GWAS or rare variants): 10 points
  • Transcriptomics data available (DEGs or expression): 10 points
  • Protein data available (PPI or expression): 5 points
  • Pathway data available (enriched pathways): 10 points
  • Clinical/drug data available (approved drugs or trials): 5 points
Evidence Concordance (0-40 points):
  • Multi-layer genes (appear in 3+ layers): up to 20 points (2 per gene, max 10 genes)
  • Consistent direction (genetics + expression concordant): 10 points
  • Pathway-gene concordance (genes found in enriched pathways): 10 points
Evidence Quality (0-20 points):
  • Strong genetic evidence (GWAS p < 5e-8): 10 points
  • Clinical validation (approved drugs): 10 points
数据可用性(0-40分):
  • 有基因组数据(GWAS或罕见变异):10分
  • 有转录组数据(差异表达基因或表达量):10分
  • 有蛋白质数据(蛋白质相互作用或表达量):5分
  • 有通路数据(富集通路):10分
  • 有临床/药物数据(已获批药物或临床试验):5分
证据一致性(0-40分):
  • 跨多层面基因(出现在3个及以上层面):最高20分(每个基因2分,最多10个基因)
  • 方向一致(遗传学与表达量结果一致):10分
  • 通路-基因一致性(基因存在于富集通路中):10分
证据质量(0-20分):
  • 强遗传学证据(GWAS p < 5e-8):10分
  • 临床验证(已获批药物):10分

Score Interpretation

评分解读

ScoreTierInterpretation
80-100ExcellentComprehensive multi-omics coverage, high confidence, strong cross-layer concordance
60-79GoodGood coverage across most layers, some gaps
40-59ModerateModerate coverage, limited cross-layer integration
0-39LimitedLimited data, single-layer analysis dominates
分数等级解读
80-100优秀全面的多组学覆盖,高置信度,强跨层一致性
60-79良好多数层面覆盖良好,存在部分缺口
40-59中等中等覆盖度,跨层整合有限
0-39有限数据有限,以单层面分析为主

Evidence Grading System

证据分级系统

TierSymbolCriteriaExamples
T1[T1]Direct human evidence, clinical proofFDA-approved drug, GWAS hit (p<5e-8), clinical trial result
T2[T2]Experimental evidenceDifferential expression (validated), functional screen, mouse KO
T3[T3]Computational/database evidencePPI network, pathway mapping, expression correlation
T4[T4]Annotation/prediction onlyGO annotation, text-mined association, predicted interaction

等级符号标准示例
T1[T1]直接人类证据、临床验证FDA获批药物、GWAS显著关联(p<5e-8)、临床试验结果
T2[T2]实验证据差异表达(已验证)、功能筛选、基因敲除小鼠
T3[T3]计算/数据库证据蛋白质相互作用网络、通路映射、表达量相关性
T4[T4]仅注释/预测GO注释、文本挖掘关联、预测相互作用

Report Template

报告模板

Create this file structure at the start:
{disease_name}_multiomic_report.md
markdown
undefined
开始时创建以下文件结构:
{disease_name}_multiomic_report.md
markdown
undefined

Multi-Omics Disease Characterization: {Disease Name}

多组学疾病特征分析: {疾病名称}

Report Generated: {date} Disease Identifiers: (to be filled) Multi-Omics Confidence Score: (to be calculated)

报告生成时间: {日期} 疾病标识符: (待填充) 多组学置信度评分: (待计算)

Executive Summary

执行摘要

(2-3 sentence disease mechanism synthesis - fill after all layers complete)

(2-3句话总结疾病机制 - 完成所有层面分析后填充)

1. Disease Definition & Context

1. 疾病定义与背景

Disease Identifiers

疾病标识符

SystemIDSource
系统ID来源

Description

疾病描述

Synonyms

同义词

Disease Hierarchy (parents/children)

疾病层级(父类/子类)

Affected Tissues/Organs

受影响组织/器官

Therapeutic Areas

治疗领域

Sources: (tools used)

来源: (使用的工具)

2. Genomics Layer

2. 基因组学层面

2.1 GWAS Associations

2.1 GWAS关联

SNPP-valueEffectGeneStudySource
SNPP值效应基因研究来源

2.2 GWAS Studies Summary

2.2 GWAS研究汇总

Study IDTraitSample SizeYearSource
研究ID性状样本量年份来源

2.3 Associated Genes (Genetic Evidence)

2.3 关联基因(遗传学证据)

GeneEnsembl IDAssociation ScoreEvidence TypeSource
基因Ensembl ID关联评分证据类型来源

2.4 Rare Variants (ClinVar)

2.4 罕见变异(ClinVar)

VariantGeneClinical SignificanceSource
变异基因临床意义来源

Genomics Layer Summary

基因组学层面总结

  • Total GWAS hits:
  • Top genes by genetic evidence:
  • Genetic architecture:
Sources: (tools used)

  • 总GWAS关联数:
  • 遗传学证据排名靠前的基因:
  • 遗传结构:
来源: (使用的工具)

3. Transcriptomics Layer

3. 转录组学层面

3.1 Differential Expression Studies

3.1 差异表达研究

ExperimentConditionUp-regulatedDown-regulatedSource
实验条件上调基因下调基因来源

3.2 Expression Atlas Disease Evidence

3.2 Expression Atlas疾病证据

GeneScoreSource
基因评分来源

3.3 Tissue Expression Patterns (GTEx/HPA)

3.3 组织表达模式(GTEx/HPA)

GeneTissueExpression LevelSource
基因组织表达水平来源

3.4 Biomarker Candidates (Expression-Based)

3.4 候选生物标志物(基于表达量)

GeneTissue SpecificityFold ChangeEvidenceSource
基因组织特异性倍数变化证据来源

Transcriptomics Layer Summary

转录组学层面总结

  • Differential expression datasets:
  • Top DEGs:
  • Tissue-specific patterns:
Sources: (tools used)

  • 差异表达数据集:
  • 排名靠前的差异表达基因:
  • 组织特异性模式:
来源: (使用的工具)

4. Proteomics & Interaction Layer

4. 蛋白质组学与相互作用层面

4.1 Protein-Protein Interactions (STRING)

4.1 蛋白质-蛋白质相互作用(STRING)

Protein AProtein BScoreSource
蛋白质A蛋白质B评分来源

4.2 Hub Genes (Network Centrality)

4.2 枢纽基因(网络中心性)

GeneDegreeBetweennessRoleSource
基因介数作用来源

4.3 Protein Complexes (IntAct)

4.3 蛋白质复合物(IntAct)

ComplexMembersFunctionSource
复合物成员功能来源

4.4 Tissue-Specific PPI Network

4.4 组织特异性蛋白质相互作用网络

GeneInteraction ScoreTissueSource
基因相互作用评分组织来源

Proteomics Layer Summary

蛋白质组学层面总结

  • Total PPIs:
  • Hub genes:
  • Network modules:
Sources: (tools used)

  • 总蛋白质相互作用数:
  • 枢纽基因:
  • 网络模块:
来源: (使用的工具)

5. Pathway & Network Layer

5. 通路与网络层面

5.1 Enriched Pathways (Enrichr/Reactome)

5.1 富集通路(Enrichr/Reactome)

PathwayDatabaseP-valueGenesSource
通路数据库P值基因来源

5.2 Reactome Pathway Details

5.2 Reactome通路详情

Pathway IDNameGenes InvolvedSource
通路ID名称涉及基因来源

5.3 KEGG Pathways

5.3 KEGG通路

Pathway IDNameDescriptionSource
通路ID名称描述来源

5.4 WikiPathways

5.4 WikiPathways

Pathway IDNameOrganismSource
通路ID名称物种来源

Pathway Layer Summary

通路层面总结

  • Top enriched pathways:
  • Key pathway nodes:
  • Cross-pathway connections:
Sources: (tools used)

  • 排名靠前的富集通路:
  • 关键通路节点:
  • 通路间关联:
来源: (使用的工具)

6. Gene Ontology & Functional Annotation

6. 基因本体与功能注释

6.1 Biological Processes

6.1 生物过程

GO TermNameP-valueGenesSource
GO术语名称P值基因来源

6.2 Molecular Functions

6.2 分子功能

GO TermNameP-valueGenesSource
GO术语名称P值基因来源

6.3 Cellular Components

6.3 细胞组分

GO TermNameP-valueGenesSource
Sources: (tools used)

GO术语名称P值基因来源
来源: (使用的工具)

7. Therapeutic Landscape

7. 治疗全景

7.1 Approved Drugs

7.1 已获批药物

DrugChEMBL IDMechanismTargetPhaseSource
药物ChEMBL ID作用机制靶点研发阶段来源

7.2 Druggable Targets

7.2 可成药靶点

GeneTractabilityModalityClinical PrecedentSource
基因可成药性作用方式临床先例来源

7.3 Drug Repurposing Candidates

7.3 药物重定位候选

DrugOriginal IndicationMechanismTargetSource
药物原适应症作用机制靶点来源

7.4 Clinical Trials

7.4 临床试验

NCT IDTitlePhaseStatusInterventionSource
NCT ID标题阶段状态干预措施来源

Therapeutic Summary

治疗全景总结

  • Approved drugs:
  • Clinical pipeline:
  • Novel targets:
Sources: (tools used)

  • 已获批药物:
  • 临床管线:
  • 新型靶点:
来源: (使用的工具)

8. Multi-Omics Integration

8. 多组学整合

8.1 Cross-Layer Gene Concordance

8.1 跨层基因一致性

GeneGenomicsTranscriptomicsProteomicsPathwaysLayersEvidence Tier
基因基因组学转录组学蛋白质组学通路涉及层面数证据等级

8.2 Multi-Omics Hub Genes (Top 20)

8.2 多组学枢纽基因(前20位)

RankGeneLayers FoundKey EvidenceDruggableSource
排名基因涉及层面数关键证据可成药性来源

8.3 Biomarker Candidates

8.3 候选生物标志物

BiomarkerTypeEvidence LayersConfidenceSource
生物标志物类型支持证据层面置信度来源

8.4 Mechanistic Hypotheses

8.4 机制假说

  1. (Hypothesis with supporting evidence from multiple layers)
  2. ...
  1. (基于多层面证据支持的假说)
  2. ...

8.5 Systems-Level Insights

8.5 系统层面洞察

  • Key disrupted processes:
  • Critical pathway nodes:
  • Therapeutic intervention points:
  • Testable hypotheses:

  • 关键失调过程:
  • 关键通路节点:
  • 治疗干预点:
  • 可验证假说:

Multi-Omics Confidence Score

多组学置信度评分

ComponentPointsMaxDetails
Genomics data10
Transcriptomics data10
Protein data5
Pathway data10
Clinical data5
Multi-layer genes20
Direction concordance10
Pathway-gene concordance10
Genetic evidence quality10
Clinical validation10
TOTAL100
Score: XX/100 - [Tier]

组成部分得分满分详情
基因组数据10
转录组数据10
蛋白质数据5
通路数据10
临床数据5
跨多层面基因20
方向一致性10
通路-基因一致性10
遗传学证据质量10
临床验证10
总分100
评分: XX/100 - [等级]

Data Availability Checklist

数据可用性检查清单

Omics LayerData AvailableTools UsedFindings
Genomics (GWAS)Yes/No
Genomics (Rare Variants)Yes/No
Transcriptomics (DEGs)Yes/No
Transcriptomics (Expression)Yes/No
Proteomics (PPI)Yes/No
Proteomics (Expression)Yes/No
Pathways (Enrichment)Yes/No
Pathways (KEGG/Reactome)Yes/No
Gene OntologyYes/No
Drugs/TherapeuticsYes/No
Clinical TrialsYes/No
LiteratureYes/No

组学层面数据是否可用使用工具发现
基因组学(GWAS)是/否
基因组学(罕见变异)是/否
转录组学(差异表达基因)是/否
转录组学(表达量)是/否
蛋白质组学(蛋白质相互作用)是/否
蛋白质组学(表达量)是/否
通路(富集分析)是/否
通路(KEGG/Reactome)是/否
基因本体是/否
药物/治疗是/否
临床试验是/否
文献是/否

Completeness Checklist

完整性检查清单

  • Disease disambiguation complete (IDs resolved)
  • Genomics layer analyzed (GWAS + variants)
  • Transcriptomics layer analyzed (DEGs + expression)
  • Proteomics layer analyzed (PPI + interactions)
  • Pathway layer analyzed (enrichment + mapping)
  • Gene Ontology analyzed (BP + MF + CC)
  • Therapeutic landscape analyzed (drugs + targets + trials)
  • Cross-layer integration complete (concordance analysis)
  • Multi-Omics Confidence Score calculated
  • Biomarker candidates identified
  • Hub genes identified
  • Mechanistic hypotheses generated
  • Executive summary written
  • All sections have source citations

  • 疾病消歧完成(标识符已解析)
  • 基因组学层面分析完成(GWAS + 变异)
  • 转录组学层面分析完成(差异表达基因 + 表达量)
  • 蛋白质组学层面分析完成(蛋白质相互作用 + 相互作用关系)
  • 通路层面分析完成(富集分析 + 映射)
  • 基因本体分析完成(生物过程 + 分子功能 + 细胞组分)
  • 治疗全景分析完成(药物 + 靶点 + 临床试验)
  • 跨层整合完成(一致性分析)
  • 多组学置信度评分已计算
  • 候选生物标志物已识别
  • 枢纽基因已识别
  • 机制假说已生成
  • 执行摘要已撰写
  • 所有章节均有来源引用

References

参考文献

Data Sources Used

使用的数据源

#ToolParametersSectionItems Retrieved
#工具参数章节检索条目数

Database Versions

数据库版本

  • OpenTargets: (current)
  • GWAS Catalog: (current)
  • STRING: (current)
  • Reactome: (current)

---
  • OpenTargets: (当前版本)
  • GWAS Catalog: (当前版本)
  • STRING: (当前版本)
  • Reactome: (当前版本)

---

Phase 0: Disease Disambiguation (ALWAYS FIRST)

阶段0:疾病消歧(始终优先执行)

Objective: Resolve disease to standard identifiers for all downstream queries.
目标: 解析疾病对应的标准标识符,用于所有下游查询。

Tools Used

使用工具

OpenTargets_get_disease_id_description_by_name (primary):
  • Input:
    diseaseName
    (string) - Disease name
  • Output:
    {data: {search: {hits: [{id, name, description}]}}}
  • Use: Get MONDO/EFO IDs and description
  • CRITICAL: Disease IDs from OpenTargets use underscore format (e.g.,
    MONDO_0004975
    ), NOT colon format
OSL_get_efo_id_by_disease_name (secondary):
  • Input:
    disease
    (string) - Disease name
  • Output:
    {efo_id, name}
  • Use: Get EFO/MONDO ID
OpenTargets_get_disease_description_by_efoId:
  • Input:
    efoId
    (string) - Disease ID (e.g.,
    MONDO_0004975
    )
  • Output:
    {data: {disease: {id, name, description, dbXRefs}}}
  • Use: Get full description, cross-references (OMIM, UMLS, DOID, etc.)
OpenTargets_get_disease_synonyms_by_efoId:
  • Input:
    efoId
    (string)
  • Output:
    {data: {disease: {id, name, synonyms: [{relation, terms}]}}}
OpenTargets_get_disease_therapeutic_areas_by_efoId:
  • Input:
    efoId
    (string)
  • Output:
    {data: {disease: {id, name, therapeuticAreas: [{id, name}]}}}
OpenTargets_get_disease_ancestors_parents_by_efoId:
  • Input:
    efoId
    (string)
  • Output:
    {data: {disease: {id, name, ancestors: [{id, name}]}}}
OpenTargets_get_disease_descendants_children_by_efoId:
  • Input:
    efoId
    (string)
  • Output:
    {data: {disease: {id, name, descendants: [{id, name}]}}}
OpenTargets_map_any_disease_id_to_all_other_ids:
  • Input:
    inputId
    (string) - Any known disease ID (e.g.,
    OMIM:104300
    ,
    UMLS:C0002395
    )
  • Output:
    {data: {disease: {id, name, dbXRefs: [str], ...}}}
  • Use: Cross-map between OMIM, UMLS, ICD10, DOID, etc.
OpenTargets_get_disease_id_description_by_name(主要工具):
  • 输入:
    diseaseName
    (字符串)- 疾病名称
  • 输出:
    {data: {search: {hits: [{id, name, description}]}}}
  • 用途: 获取MONDO/EFO ID及疾病描述
  • 关键提示: OpenTargets返回的疾病ID使用下划线格式(如
    MONDO_0004975
    ),而非冒号格式
OSL_get_efo_id_by_disease_name(次要工具):
  • 输入:
    disease
    (字符串)- 疾病名称
  • 输出:
    {efo_id, name}
  • 用途: 获取EFO/MONDO ID
OpenTargets_get_disease_description_by_efoId:
  • 输入:
    efoId
    (字符串)- 疾病ID(如
    MONDO_0004975
  • 输出:
    {data: {disease: {id, name, description, dbXRefs}}}
  • 用途: 获取完整疾病描述及交叉引用(OMIM、UMLS、DOID等)
OpenTargets_get_disease_synonyms_by_efoId:
  • 输入:
    efoId
    (字符串)
  • 输出:
    {data: {disease: {id, name, synonyms: [{relation, terms}]}}}
OpenTargets_get_disease_therapeutic_areas_by_efoId:
  • 输入:
    efoId
    (字符串)
  • 输出:
    {data: {disease: {id, name, therapeuticAreas: [{id, name}]}}}
OpenTargets_get_disease_ancestors_parents_by_efoId:
  • 输入:
    efoId
    (字符串)
  • 输出:
    {data: {disease: {id, name, ancestors: [{id, name}]}}}
OpenTargets_get_disease_descendants_children_by_efoId:
  • 输入:
    efoId
    (字符串)
  • 输出:
    {data: {disease: {id, name, descendants: [{id, name}]}}}
OpenTargets_map_any_disease_id_to_all_other_ids:
  • 输入:
    inputId
    (字符串)- 已知的任意疾病ID(如
    OMIM:104300
    ,
    UMLS:C0002395
  • 输出:
    {data: {disease: {id, name, dbXRefs: [str], ...}}}
  • 用途: 在OMIM、UMLS、ICD10、DOID等标识符间进行交叉映射

Workflow

工作流程

  1. Search by disease name to get primary ID (OpenTargets)
  2. Get full description and cross-references
  3. Get synonyms for search term expansion
  4. Get therapeutic areas for context
  5. Get disease hierarchy (parents/children)
  6. If user provided OMIM/other ID, map to MONDO/EFO first
  1. 通过疾病名称搜索获取主ID(OpenTargets)
  2. 获取完整疾病描述及交叉引用
  3. 获取同义词以扩展搜索词
  4. 获取治疗领域背景信息
  5. 获取疾病层级(父类/子类)
  6. 如果用户提供了OMIM或其他ID,先映射为MONDO/EFO ID

Collision-Aware Search

冲突感知搜索

When disease name returns multiple hits:
  • Check if user's input matches any hit exactly
  • If ambiguous, present top 3-5 options and ask user to select
  • Always prefer the most specific disease (not parent categories)
  • For cancer, prefer the specific tumor type over generic "cancer"
当疾病名称返回多个结果时:
  • 检查用户输入是否与任一结果完全匹配
  • 若存在歧义,展示前3-5个选项并请用户选择
  • 始终优先选择最具体的疾病(而非父类范畴)
  • 对于癌症,优先选择特定肿瘤类型而非通用的“癌症”

Key Disease IDs to Track

需要跟踪的关键疾病ID

After disambiguation, store these for all downstream queries:
  • efo_id
    - Primary ID for OpenTargets queries (e.g.,
    MONDO_0004975
    )
  • disease_name
    - Canonical name (e.g.,
    Alzheimer disease
    )
  • synonyms
    - For literature search expansion
  • therapeutic_areas
    - For context
  • dbXRefs
    - Cross-references (OMIM, UMLS, DOID, etc.)

消歧完成后,存储以下信息用于所有下游查询:
  • efo_id
    - OpenTargets查询的主ID(如
    MONDO_0004975
  • disease_name
    - 标准疾病名称(如
    Alzheimer disease
  • synonyms
    - 用于文献搜索扩展
  • therapeutic_areas
    - 背景信息
  • dbXRefs
    - 交叉引用(OMIM、UMLS、DOID等)

Phase 1: Genomics Layer

阶段1:基因组学层面

Objective: Identify genetic variants, GWAS associations, and genetically implicated genes.
目标: 识别遗传变异、GWAS关联及遗传学相关基因。

Tools Used

使用工具

OpenTargets_get_associated_targets_by_disease_efoId (primary):
  • Input:
    efoId
    (string) - Disease EFO/MONDO ID
  • Output:
    {data: {disease: {id, name, associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}}
  • Use: Get ALL disease-associated genes ranked by overall evidence score
  • NOTE: Returns top 25 by default. For comprehensive analysis, note the total
    count
OpenTargets_get_evidence_by_datasource:
  • Input:
    efoId
    (string),
    ensemblId
    (string), optional
    datasourceIds
    (array),
    size
    (int, default 50)
  • Output:
    {data: {disease: {evidences: {count, rows: [{...evidence details}]}}}}
  • Use: Get specific evidence types. Key datasourceIds for genomics:
    • ['ot_genetics_portal']
      - GWAS/genetics
    • ['gene2phenotype', 'genomics_england', 'orphanet']
      - Rare variants
    • ['eva']
      - ClinVar variants
gwas_search_associations (GWAS Catalog):
  • Input:
    disease_trait
    (string),
    size
    (int, default 20)
  • Output:
    {data: [{association_id, p_value, or_per_copy_num, or_value, beta, risk_frequency, efo_traits: [{...}], ...}], metadata: {pagination: {totalElements}}}
  • Use: Get genome-wide significant associations
  • NOTE: Use disease name (e.g., "Alzheimer"), not ID. Returns paginated results
gwas_get_studies_for_trait:
  • Input:
    disease_trait
    (string),
    size
    (int)
  • Output:
    {data: [...studies], metadata: {pagination}}
  • NOTE: May return empty if trait name does not match exactly. Try synonyms
gwas_get_variants_for_trait:
  • Input:
    disease_trait
    (string),
    size
    (int)
  • Output:
    {data: [...variants], metadata: {pagination}}
GWAS_search_associations_by_gene:
  • Input:
    gene_name
    (string)
  • Output: Associations for a specific gene
OpenTargets_search_gwas_studies_by_disease:
  • Input:
    diseaseIds
    (array of strings),
    enableIndirect
    (bool, default true),
    size
    (int, default 10)
  • Output:
    {data: {studies: {count, rows: [{id, studyType, traitFromSource, publicationFirstAuthor, publicationDate, pubmedId, nSamples, nCases, nControls, ...}]}}}
  • Use: Get GWAS studies from OpenTargets genetics portal
clinvar_search_variants:
  • Input:
    condition
    (string) or
    gene
    (string), optional
    max_results
    (int)
  • Output: List of ClinVar variants with clinical significance
  • Use: Rare variant / monogenic disease evidence
OpenTargets_get_associated_targets_by_disease_efoId(主要工具):
  • 输入:
    efoId
    (字符串)- 疾病EFO/MONDO ID
  • 输出:
    {data: {disease: {id, name, associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}}
  • 用途: 获取所有与疾病关联的基因,按整体证据评分排序
  • 注意: 默认返回前25个基因。如需全面分析,请记录总
    count
OpenTargets_get_evidence_by_datasource:
  • 输入:
    efoId
    (字符串),
    ensemblId
    (字符串), 可选
    datasourceIds
    (数组),
    size
    (整数,默认50)
  • 输出:
    {data: {disease: {evidences: {count, rows: [{...evidence details}]}}}}
  • 用途: 获取特定类型的证据。基因组学相关的关键
    datasourceIds
    :
    • ['ot_genetics_portal']
      - GWAS/遗传学
    • ['gene2phenotype', 'genomics_england', 'orphanet']
      - 罕见变异
    • ['eva']
      - ClinVar变异
gwas_search_associations(GWAS Catalog):
  • 输入:
    disease_trait
    (字符串),
    size
    (整数,默认20)
  • 输出:
    {data: [{association_id, p_value, or_per_copy_num, or_value, beta, risk_frequency, efo_traits: [{...}], ...}], metadata: {pagination: {totalElements}}}
  • 用途: 获取全基因组显著关联结果
  • 注意: 使用疾病名称(如"Alzheimer")而非ID。结果为分页返回
gwas_get_studies_for_trait:
  • 输入:
    disease_trait
    (字符串),
    size
    (整数)
  • 输出:
    {data: [...studies], metadata: {pagination}}
  • 注意: 如果性状名称不完全匹配,可能返回空结果。请尝试使用同义词
gwas_get_variants_for_trait:
  • 输入:
    disease_trait
    (字符串),
    size
    (整数)
  • 输出:
    {data: [...variants], metadata: {pagination}}
GWAS_search_associations_by_gene:
  • 输入:
    gene_name
    (字符串)
  • 输出: 特定基因的关联结果
OpenTargets_search_gwas_studies_by_disease:
  • 输入:
    diseaseIds
    (字符串数组),
    enableIndirect
    (布尔值,默认true),
    size
    (整数,默认10)
  • 输出:
    {data: {studies: {count, rows: [{id, studyType, traitFromSource, publicationFirstAuthor, publicationDate, pubmedId, nSamples, nCases, nControls, ...}]}}}
  • 用途: 从OpenTargets遗传学门户获取GWAS研究
clinvar_search_variants:
  • 输入:
    condition
    (字符串)或
    gene
    (字符串), 可选
    max_results
    (整数)
  • 输出: 包含临床意义的ClinVar变异列表
  • 用途: 罕见变异/单基因病证据

Workflow

工作流程

  1. Get associated genes from OpenTargets (overall scores)
  2. For top 10-15 genes, get genetic evidence specifically via
    OpenTargets_get_evidence_by_datasource
  3. Search GWAS Catalog for associations
  4. Search OpenTargets GWAS studies
  5. Search ClinVar for rare variants
  6. For top GWAS genes, check
    GWAS_search_associations_by_gene
  1. 从OpenTargets获取关联基因(整体评分)
  2. 对排名前10-15的基因,通过
    OpenTargets_get_evidence_by_datasource
    获取特定遗传学证据
  3. 在GWAS Catalog中搜索关联结果
  4. 在OpenTargets中搜索GWAS研究
  5. 在ClinVar中搜索罕见变异
  6. 对排名靠前的GWAS基因,使用
    GWAS_search_associations_by_gene
    进行验证

Gene Tracking

基因跟踪

Maintain a dictionary of genes found in genomics layer:
python
genomics_genes = {
    'PSEN1': {'score': 0.87, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000080815', 'layer': 'genomics'},
    'APP': {'score': 0.82, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000142192', 'layer': 'genomics'},
    # ...
}

维护基因组学层面发现的基因字典:
python
genomics_genes = {
    'PSEN1': {'score': 0.87, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000080815', 'layer': 'genomics'},
    'APP': {'score': 0.82, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000142192', 'layer': 'genomics'},
    # ...
}

Phase 2: Transcriptomics Layer

阶段2:转录组学层面

Objective: Identify differentially expressed genes, tissue-specific expression, and expression-based biomarkers.
目标: 识别差异表达基因、组织特异性表达及基于表达量的生物标志物。

Tools Used

使用工具

ExpressionAtlas_search_differential:
  • Input: optional
    gene
    (string),
    condition
    (string),
    species
    (string, default 'homo sapiens')
  • Output: Differential expression studies and results
  • Use: Find studies where genes are differentially expressed in disease
ExpressionAtlas_search_experiments:
  • Input: optional
    gene
    (string),
    condition
    (string),
    species
    (string)
  • Output: Expression experiments relevant to condition
  • Use: Find all Expression Atlas experiments for the disease
expression_atlas_disease_target_score:
  • Input:
    efoId
    (string),
    pageSize
    (int, required)
  • Output: Genes scored by expression evidence for the disease
  • Use: Get expression-based disease-gene association scores
europepmc_disease_target_score:
  • Input:
    efoId
    (string),
    pageSize
    (int, required)
  • Output: Genes scored by literature evidence for the disease
  • Use: Complement expression evidence with literature-mined associations
HPA_get_rna_expression_by_source (Human Protein Atlas):
  • Input:
    gene_name
    (string),
    source_type
    (string: 'tissue', 'blood', 'brain'),
    source_name
    (string: e.g., 'brain', 'liver')
  • Output:
    {status, data: {gene_name, source_type, source_name, expression_value, expression_level, expression_unit}}
  • NOTE: ALL 3 params required.
    source_type
    options: 'tissue', 'blood', 'brain', 'cell_line', 'single_cell'
HPA_get_rna_expression_in_specific_tissues:
  • Input:
    gene_name
    (string),
    tissues
    (array of strings)
  • Output: Expression across specified tissues
HPA_get_cancer_prognostics_by_gene:
  • Input:
    gene_name
    (string)
  • Output: Cancer prognostic data (if cancer context)
HPA_get_subcellular_location:
  • Input:
    gene_name
    (string)
  • Output: Subcellular localization data
HPA_search_genes_by_query:
  • Input:
    query
    (string)
  • Output: Matching genes in HPA
ExpressionAtlas_search_differential:
  • 输入: 可选
    gene
    (字符串),
    condition
    (字符串),
    species
    (字符串,默认'homo sapiens')
  • 输出: 差异表达研究及结果
  • 用途: 查找基因在疾病中差异表达的研究
ExpressionAtlas_search_experiments:
  • 输入: 可选
    gene
    (字符串),
    condition
    (字符串),
    species
    (字符串)
  • 输出: 与疾病相关的表达实验
  • 用途: 查找所有与疾病相关的Expression Atlas实验
expression_atlas_disease_target_score:
  • 输入:
    efoId
    (字符串),
    pageSize
    (整数,必填)
  • 输出: 基于表达量的疾病-基因关联评分
  • 用途: 获取基于表达量的疾病关联评分
europepmc_disease_target_score:
  • 输入:
    efoId
    (字符串),
    pageSize
    (整数,必填)
  • 输出: 基于文献挖掘的疾病-基因关联评分
  • 用途: 补充表达量证据,提供文献层面的关联评分
HPA_get_rna_expression_by_source(人类蛋白质图谱):
  • 输入:
    gene_name
    (字符串),
    source_type
    (字符串: 'tissue', 'blood', 'brain'),
    source_name
    (字符串: 如'brain', 'liver')
  • 输出:
    {status, data: {gene_name, source_type, source_name, expression_value, expression_level, expression_unit}}
  • 注意: 三个参数均为必填项。
    source_type
    选项: 'tissue', 'blood', 'brain', 'cell_line', 'single_cell'
HPA_get_rna_expression_in_specific_tissues:
  • 输入:
    gene_name
    (字符串),
    tissues
    (字符串数组)
  • 输出: 基因在指定组织中的表达情况
HPA_get_cancer_prognostics_by_gene:
  • 输入:
    gene_name
    (字符串)
  • 输出: 癌症预后数据(仅适用于癌症场景)
HPA_get_subcellular_location:
  • 输入:
    gene_name
    (字符串)
  • 输出: 亚细胞定位数据
HPA_search_genes_by_query:
  • 输入:
    query
    (字符串)
  • 输出: HPA中匹配的基因

Workflow

工作流程

  1. Search Expression Atlas for differential expression studies
  2. Get expression-based disease scores
  3. Get literature-based disease scores (EuropePMC)
  4. For top 10-15 genes from genomics layer, check tissue expression via HPA
  5. Check disease-relevant tissue expression patterns
  6. For cancer: check prognostic biomarkers
  1. 在Expression Atlas中搜索差异表达研究
  2. 获取基于表达量的疾病评分
  3. 获取基于文献的疾病评分(EuropePMC)
  4. 对基因组学层面排名前10-15的基因,通过HPA检查组织表达情况
  5. 检查疾病相关组织的表达模式
  6. 对于癌症场景:检查预后生物标志物

Gene Tracking

基因跟踪

Add transcriptomics genes to tracking:
python
transcriptomics_genes = {
    'APOE': {'expression_score': 0.75, 'tissues': ['brain'], 'evidence': 'differential_expression', 'layer': 'transcriptomics'},
    # ...
}

将转录组学层面的基因添加到跟踪字典:
python
transcriptomics_genes = {
    'APOE': {'expression_score': 0.75, 'tissues': ['brain'], 'evidence': 'differential_expression', 'layer': 'transcriptomics'},
    # ...
}

Phase 3: Proteomics & Interaction Layer

阶段3:蛋白质组学与相互作用层面

Objective: Map protein-protein interactions, identify hub genes, and characterize interaction networks.
目标: 绘制蛋白质-蛋白质相互作用图谱,识别枢纽基因并表征相互作用网络。

Tools Used

使用工具

STRING_get_interaction_partners (primary PPI):
  • Input:
    protein_ids
    (array of strings - gene names work),
    species
    (int, default 9606),
    confidence_score
    (float, default 0.4),
    limit
    (int, default 20)
  • Output:
    {status: 'success', data: [{stringId_A, stringId_B, preferredName_A, preferredName_B, ncbiTaxonId, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]}
  • Use: Get interaction partners for disease genes
  • NOTE:
    protein_ids
    is an array, NOT string. Gene symbols like
    ['APOE']
    work
STRING_get_network:
  • Input:
    protein_ids
    (array),
    species
    (int),
    confidence_score
    (float)
  • Output: Network of interactions between input proteins
  • Use: Build disease-specific PPI network
STRING_functional_enrichment:
  • Input:
    protein_ids
    (array),
    species
    (int)
  • Output: Functional enrichment results (GO, KEGG, etc.)
  • Use: Functional characterization of disease gene set
STRING_ppi_enrichment:
  • Input:
    protein_ids
    (array),
    species
    (int)
  • Output: Statistical test for PPI enrichment (more interactions than expected)
  • Use: Test if disease genes form a connected module
intact_get_interactions:
  • Input:
    identifier
    (string - UniProt ID or gene name)
  • Output: Molecular interaction data from IntAct
intact_search_interactions:
  • Input:
    query
    (string),
    first
    (int, default 0),
    max
    (int, default 25)
  • Output: Search results for interactions
HPA_get_protein_interactions_by_gene:
  • Input:
    gene_name
    (string)
  • Output:
    {gene, interactions, interactor_count, interactors: [...]}
humanbase_ppi_analysis:
  • Input:
    gene_list
    (array),
    tissue
    (string),
    max_node
    (int),
    interaction
    (string),
    string_mode
    (bool)
  • Output: Tissue-specific PPI network
  • NOTE: ALL params required.
    interaction
    options: 'coexpression', 'interaction', 'coexpression_and_interaction'.
    string_mode
    : true/false
STRING_get_interaction_partners(主要蛋白质相互作用工具):
  • 输入:
    protein_ids
    (字符串数组 - 基因名称可直接使用),
    species
    (整数,默认9606),
    confidence_score
    (浮点数,默认0.4),
    limit
    (整数,默认20)
  • 输出:
    {status: 'success', data: [{stringId_A, stringId_B, preferredName_A, preferredName_B, ncbiTaxonId, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]}
  • 用途: 获取疾病相关基因的相互作用伙伴
  • 注意:
    protein_ids
    是数组类型,而非字符串。基因符号如
    ['APOE']
    可直接使用
STRING_get_network:
  • 输入:
    protein_ids
    (数组),
    species
    (整数),
    confidence_score
    (浮点数)
  • 输出: 输入蛋白质间的相互作用网络
  • 用途: 构建疾病特异性蛋白质相互作用网络
STRING_functional_enrichment:
  • 输入:
    protein_ids
    (数组),
    species
    (整数)
  • 输出: 功能富集结果(GO、KEGG等)
  • 用途: 对疾病相关基因集进行功能表征
STRING_ppi_enrichment:
  • 输入:
    protein_ids
    (数组),
    species
    (整数)
  • 输出: 蛋白质相互作用富集的统计检验(是否比随机情况有更多相互作用)
  • 用途: 检验疾病相关基因是否形成连接模块
intact_get_interactions:
  • 输入:
    identifier
    (字符串 - UniProt ID或基因名称)
  • 输出: IntAct数据库中的分子相互作用数据
intact_search_interactions:
  • 输入:
    query
    (字符串),
    first
    (整数,默认0),
    max
    (整数,默认25)
  • 输出: 相互作用的搜索结果
HPA_get_protein_interactions_by_gene:
  • 输入:
    gene_name
    (字符串)
  • 输出:
    {gene, interactions, interactor_count, interactors: [...]}
humanbase_ppi_analysis:
  • 输入:
    gene_list
    (数组),
    tissue
    (字符串),
    max_node
    (整数),
    interaction
    (字符串),
    string_mode
    (布尔值)
  • 输出: 组织特异性蛋白质相互作用网络
  • 注意: 所有参数均为必填项。
    interaction
    选项: 'coexpression', 'interaction', 'coexpression_and_interaction'。
    string_mode
    : true/false

Workflow

工作流程

  1. Take top 15-20 genes from genomics + transcriptomics layers
  2. Query STRING for interaction partners of each gene
  3. Build composite PPI network using STRING_get_network
  4. Test PPI enrichment (are genes more connected than random?)
  5. Get functional enrichment from STRING
  6. For disease-relevant tissue, get tissue-specific network (HumanBase)
  7. Identify hub genes (highest degree centrality)
  8. Check IntAct for experimentally validated interactions
  1. 选取基因组学+转录组学层面排名前15-20的基因
  2. 为每个基因查询STRING获取相互作用伙伴
  3. 使用STRING_get_network构建复合蛋白质相互作用网络
  4. 进行蛋白质相互作用富集检验(基因间的连接是否比随机情况更紧密?)
  5. 从STRING获取功能富集结果
  6. 针对疾病相关组织,通过HumanBase获取组织特异性网络
  7. 识别枢纽基因(度中心性最高的基因)
  8. 在IntAct中查找实验验证的相互作用

Hub Gene Analysis

枢纽基因分析

Calculate network centrality metrics:
  • Degree: Number of interaction partners
  • Betweenness: Number of shortest paths through node
  • Hub score: Genes with degree > mean + 1 SD are hubs

计算网络中心性指标:
  • : 相互作用伙伴的数量
  • 介数: 经过该节点的最短路径数量
  • 枢纽评分: 度大于均值+1个标准差的基因即为枢纽基因

Phase 4: Pathway & Network Layer

阶段4:通路与网络层面

Objective: Identify enriched biological pathways and cross-pathway connections.
目标: 识别富集的生物通路及通路间的关联。

Tools Used

使用工具

enrichr_gene_enrichment_analysis (primary enrichment):
  • Input:
    gene_list
    (array of gene symbols, min 2),
    libs
    (array of library names)
  • Output:
    {status: 'success', data: '{...JSON string with enrichment results...}'}
  • Key libraries:
    ['KEGG_2021_Human']
    ,
    ['Reactome_2022']
    ,
    ['WikiPathway_2023_Human']
    ,
    ['GO_Biological_Process_2023']
    ,
    ['GO_Molecular_Function_2023']
    ,
    ['GO_Cellular_Component_2023']
  • NOTE:
    data
    field is a JSON string, needs parsing. Contains
    connected_paths
    and per-library results
  • NOTE:
    libs
    is REQUIRED as array
ReactomeAnalysis_pathway_enrichment:
  • Input:
    identifiers
    (string - space-separated gene list), optional
    page_size
    (int, default 20),
    include_disease
    (bool),
    projection
    (bool)
  • Output:
    {data: {token, analysis_type, pathways_found, pathways: [{pathway_id, name, species, is_disease, is_lowest_level, entities_found, entities_total, entities_ratio, p_value, fdr, reactions_found, reactions_total}]}}
  • Use: Reactome-specific pathway enrichment with statistical testing
Reactome_map_uniprot_to_pathways:
  • Input:
    id
    (string - UniProt accession)
  • Output: List of Reactome pathways containing this protein
  • Use: Map individual proteins to pathways
Reactome_get_pathway:
  • Input:
    stId
    (string - Reactome stable ID, e.g., 'R-HSA-73817')
  • Output: Pathway details
Reactome_get_pathway_reactions:
  • Input:
    stId
    (string)
  • Output: Reactions within pathway
kegg_search_pathway:
  • Input:
    keyword
    (string)
  • Output: Array of KEGG pathway matches
kegg_get_pathway_info:
  • Input:
    pathway_id
    (string, e.g., 'hsa04930')
  • Output: Detailed pathway information
WikiPathways_search:
  • Input:
    query
    (string), optional
    organism
    (string, e.g., 'Homo sapiens')
  • Output: Matching community-curated pathways
enrichr_gene_enrichment_analysis(主要富集分析工具):
  • 输入:
    gene_list
    (基因符号数组,最少2个),
    libs
    (数据库名称数组)
  • 输出:
    {status: 'success', data: '{...JSON string with enrichment results...}'}
  • 关键数据库:
    ['KEGG_2021_Human']
    ,
    ['Reactome_2022']
    ,
    ['WikiPathway_2023_Human']
    ,
    ['GO_Biological_Process_2023']
    ,
    ['GO_Molecular_Function_2023']
    ,
    ['GO_Cellular_Component_2023']
  • 注意:
    data
    字段是JSON字符串,需要解析。包含
    connected_paths
    及各数据库的富集结果
  • 注意:
    libs
    为必填数组参数
ReactomeAnalysis_pathway_enrichment:
  • 输入:
    identifiers
    (字符串 - 空格分隔的基因列表), 可选
    page_size
    (整数,默认20),
    include_disease
    (布尔值),
    projection
    (布尔值)
  • 输出:
    {data: {token, analysis_type, pathways_found, pathways: [{pathway_id, name, species, is_disease, is_lowest_level, entities_found, entities_total, entities_ratio, p_value, fdr, reactions_found, reactions_total}]}}
  • 用途: 针对Reactome数据库进行通路富集分析及统计检验
Reactome_map_uniprot_to_pathways:
  • 输入:
    id
    (字符串 - UniProt登录号)
  • 输出: 包含该蛋白质的Reactome通路列表
  • 用途: 将单个蛋白质映射到通路
Reactome_get_pathway:
  • 输入:
    stId
    (字符串 - Reactome稳定ID,如'R-HSA-73817')
  • 输出: 通路详情
Reactome_get_pathway_reactions:
  • 输入:
    stId
    (字符串)
  • 输出: 通路内的反应
kegg_search_pathway:
  • 输入:
    keyword
    (字符串)
  • 输出: 匹配的KEGG通路数组
kegg_get_pathway_info:
  • 输入:
    pathway_id
    (字符串,如'hsa04930')
  • 输出: 详细的通路信息
WikiPathways_search:
  • 输入:
    query
    (字符串), 可选
    organism
    (字符串,如'Homo sapiens')
  • 输出: 匹配的社区注释通路

Workflow

工作流程

  1. Collect all genes from genomics + transcriptomics layers (top 20-30)
  2. Run Enrichr enrichment for KEGG, Reactome, WikiPathways
  3. Run ReactomeAnalysis for more detailed Reactome enrichment with p-values
  4. Search KEGG for disease-specific pathways
  5. Search WikiPathways for disease pathways
  6. For top Reactome pathways, get detailed reactions
  7. Identify cross-pathway connections (genes in multiple pathways)

  1. 收集基因组学+转录组学层面的所有基因(前20-30个)
  2. 使用Enrichr针对KEGG、Reactome、WikiPathways进行富集分析
  3. 使用ReactomeAnalysis进行更详细的Reactome富集分析,获取P值
  4. 在KEGG中搜索疾病特异性通路
  5. 在WikiPathways中搜索疾病相关通路
  6. 对排名靠前的Reactome通路,获取详细的反应信息
  7. 识别通路间的关联(出现在多个通路中的基因)

Phase 5: Gene Ontology & Functional Annotation

阶段5:基因本体与功能注释

Objective: Characterize biological processes, molecular functions, and cellular components.
目标: 表征生物过程、分子功能及细胞组分。

Tools Used

使用工具

enrichr_gene_enrichment_analysis (GO enrichment):
  • Use with
    libs=['GO_Biological_Process_2023']
    for BP
  • Use with
    libs=['GO_Molecular_Function_2023']
    for MF
  • Use with
    libs=['GO_Cellular_Component_2023']
    for CC
GO_get_annotations_for_gene:
  • Input:
    gene_id
    (string - gene symbol or UniProt ID)
  • Output: List of GO annotations with terms, aspects, evidence codes
GO_search_terms:
  • Input:
    query
    (string)
  • Output: Matching GO terms
QuickGO_annotations_by_gene:
  • Input:
    gene_product_id
    (string - UniProt accession, e.g., 'UniProtKB:P02649'), optional
    aspect
    (string: 'biological_process', 'molecular_function', 'cellular_component'),
    taxon_id
    (int: 9606),
    limit
    (int: 25)
  • Output: GO annotations with evidence codes
OpenTargets_get_target_gene_ontology_by_ensemblID:
  • Input:
    ensemblId
    (string)
  • Output: GO terms associated with target
enrichr_gene_enrichment_analysis(GO富集分析):
  • 使用
    libs=['GO_Biological_Process_2023']
    获取生物过程(BP)
  • 使用
    libs=['GO_Molecular_Function_2023']
    获取分子功能(MF)
  • 使用
    libs=['GO_Cellular_Component_2023']
    获取细胞组分(CC)
GO_get_annotations_for_gene:
  • 输入:
    gene_id
    (字符串 - 基因符号或UniProt ID)
  • 输出: 包含术语、方向、证据编码的GO注释列表
GO_search_terms:
  • 输入:
    query
    (字符串)
  • 输出: 匹配的GO术语
QuickGO_annotations_by_gene:
  • 输入:
    gene_product_id
    (字符串 - UniProt登录号,如'UniProtKB:P02649'), 可选
    aspect
    (字符串: 'biological_process', 'molecular_function', 'cellular_component'),
    taxon_id
    (整数: 9606),
    limit
    (整数: 25)
  • 输出: 包含证据编码的GO注释
OpenTargets_get_target_gene_ontology_by_ensemblID:
  • 输入:
    ensemblId
    (字符串)
  • 输出: 与靶点关联的GO术语

Workflow

工作流程

  1. Run Enrichr GO enrichment for all 3 aspects using combined gene list
  2. For top 5 genes, get detailed GO annotations from QuickGO
  3. For top genes, get OpenTargets GO terms
  4. Summarize key biological processes, molecular functions, cellular components

  1. 使用合并后的基因列表,通过Enrichr对GO的3个层面进行富集分析
  2. 对排名前5的基因,从QuickGO获取详细的GO注释
  3. 对排名靠前的基因,获取OpenTargets中的GO术语
  4. 总结关键生物过程、分子功能及细胞组分

Phase 6: Therapeutic Landscape

阶段6:治疗全景

Objective: Map approved drugs, druggable targets, repurposing opportunities, and clinical trials.
目标: 绘制已获批药物、可成药靶点、药物重定位机会及临床试验的图谱。

Tools Used

使用工具

OpenTargets_get_associated_drugs_by_disease_efoId (primary):
  • Input:
    efoId
    (string),
    size
    (int, REQUIRED - use 100)
  • Output:
    {data: {disease: {knownDrugs: {count, rows: [{drug: {id, name, tradeNames, maximumClinicalTrialPhase, isApproved, hasBeenWithdrawn}, phase, mechanismOfAction, target: {id, approvedSymbol}, disease: {id, name}, urls: [{url, name}]}]}}}}
  • Use: All drugs associated with disease (approved + investigational)
OpenTargets_get_target_tractability_by_ensemblID:
  • Input:
    ensemblId
    (string)
  • Output: Tractability assessment (small molecule, antibody, PROTAC, etc.)
OpenTargets_get_associated_drugs_by_target_ensemblID:
  • Input:
    ensemblId
    (string),
    size
    (int, REQUIRED)
  • Output: Drugs targeting this gene/protein
search_clinical_trials:
  • Input:
    query_term
    (string, REQUIRED), optional
    condition
    (string),
    intervention
    (string),
    pageSize
    (int, default 10)
  • Output: Clinical trial results
  • NOTE:
    query_term
    is REQUIRED even if
    condition
    is provided
OpenTargets_get_drug_mechanisms_of_action_by_chemblId:
  • Input:
    chemblId
    (string)
  • Output: Mechanism of action details
OpenTargets_get_associated_drugs_by_disease_efoId(主要工具):
  • 输入:
    efoId
    (字符串),
    size
    (整数,必填 - 建议使用100)
  • 输出:
    {data: {disease: {knownDrugs: {count, rows: [{drug: {id, name, tradeNames, maximumClinicalTrialPhase, isApproved, hasBeenWithdrawn}, phase, mechanismOfAction, target: {id, approvedSymbol}, disease: {id, name}, urls: [{url, name}]}]}}}}
  • 用途: 获取所有与疾病关联的药物(已获批+研究中)
OpenTargets_get_target_tractability_by_ensemblID:
  • 输入:
    ensemblId
    (字符串)
  • 输出: 可成药性评估(小分子、抗体、PROTAC等)
OpenTargets_get_associated_drugs_by_target_ensemblID:
  • 输入:
    ensemblId
    (字符串),
    size
    (整数,必填)
  • 输出: 靶向该基因/蛋白质的药物
search_clinical_trials:
  • 输入:
    query_term
    (字符串,必填), 可选
    condition
    (字符串),
    intervention
    (字符串),
    pageSize
    (整数,默认10)
  • 输出: 临床试验结果
  • 注意: 即使提供了
    condition
    query_term
    仍为必填项
OpenTargets_get_drug_mechanisms_of_action_by_chemblId:
  • 输入:
    chemblId
    (字符串)
  • 输出: 药物作用机制详情

Workflow

工作流程

  1. Get all drugs for disease from OpenTargets
  2. For top disease-associated genes, check tractability
  3. For top genes with no approved drugs, identify repurposing candidates
  4. Search clinical trials for disease
  5. For top approved drugs, get mechanism of action
  1. 从OpenTargets获取所有与疾病关联的药物
  2. 对排名靠前的疾病关联基因,检查其可成药性
  3. 对尚无获批药物的排名靠前基因,识别药物重定位候选
  4. 搜索疾病相关的临床试验
  5. 对排名靠前的已获批药物,获取其作用机制

Drug Tracking

药物跟踪

python
drug_targets = {
    'PSEN1': {'drugs': ['Semagacestat'], 'tractability': 'small_molecule', 'clinical_phase': 3},
    'ACHE': {'drugs': ['Donepezil', 'Galantamine'], 'tractability': 'small_molecule', 'clinical_phase': 4},
    # ...
}

python
drug_targets = {
    'PSEN1': {'drugs': ['Semagacestat'], 'tractability': 'small_molecule', 'clinical_phase': 3},
    'ACHE': {'drugs': ['Donepezil', 'Galantamine'], 'tractability': 'small_molecule', 'clinical_phase': 4},
    # ...
}

Phase 7: Multi-Omics Integration

阶段7:多组学整合

Objective: Integrate findings across all layers to identify cross-layer genes, calculate concordance, and generate mechanistic hypotheses.
目标: 整合所有层面的发现,识别跨层基因、计算一致性并生成机制假说。

Cross-Layer Gene Concordance Analysis

跨层基因一致性分析

This is the core integrative step. For each gene found in the analysis:
  1. Count layers: In how many omics layers does this gene appear?
    • Genomics (GWAS, rare variants, genetic association)
    • Transcriptomics (DEGs, expression score)
    • Proteomics (PPI hub, protein expression)
    • Pathways (enriched pathway member)
    • Therapeutics (drug target)
  2. Score genes: Genes appearing in 3+ layers are "multi-omics hub genes"
  3. Direction concordance: Do genetics and expression agree?
    • Risk allele + upregulated = concordant gain-of-function
    • Risk allele + downregulated = concordant loss-of-function
    • Discordant = needs investigation
这是核心的整合步骤。对于分析中发现的每个基因:
  1. 统计涉及层面数: 该基因出现在多少个组学层面中?
    • 基因组学(GWAS、罕见变异、遗传关联)
    • 转录组学(差异表达基因、表达评分)
    • 蛋白质组学(蛋白质相互作用枢纽、蛋白质表达)
    • 通路(富集通路成员)
    • 治疗学(药物靶点)
  2. 基因评分: 出现在3个及以上层面的基因即为“多组学枢纽基因”
  3. 方向一致性: 遗传学与表达量结果是否一致?
    • 风险等位基因+上调表达 = 一致的功能获得
    • 风险等位基因+下调表达 = 一致的功能丧失
    • 不一致 = 需要进一步研究

Biomarker Identification

生物标志物识别

For each multi-omics hub gene, assess biomarker potential:
  • Diagnostic: Gene expression distinguishes disease vs healthy
  • Prognostic: Expression/variant predicts outcome (cancer prognostics from HPA)
  • Predictive: Variant/expression predicts treatment response (pharmacogenomics)
  • Evidence level: Number of supporting omics layers
对于每个多组学枢纽基因,评估其生物标志物潜力:
  • 诊断型: 基因表达可区分疾病与健康状态
  • 预后型: 表达量/变异可预测疾病结局(来自HPA的癌症预后数据)
  • 预测型: 变异/表达量可预测治疗响应(药物基因组学)
  • 证据等级: 支持的组学层面数量

Mechanistic Hypothesis Generation

机制假说生成

From the integrated data:
  1. Identify the most supported biological processes (GO + pathways)
  2. Map causal chain: genetic variant -> gene expression -> protein function -> pathway disruption -> disease
  3. Identify intervention points (druggable nodes in the causal chain)
  4. Generate testable hypotheses
从整合数据中:
  1. 识别支持证据最多的生物过程(GO+通路)
  2. 绘制因果链: 遗传变异 -> 基因表达 -> 蛋白质功能 -> 通路失调 -> 疾病
  3. 识别干预点(因果链中的可成药节点)
  4. 生成可验证的假说

Confidence Score Calculation

置信度评分计算

Calculate the Multi-Omics Confidence Score (0-100) based on:
  • Data availability across layers
  • Cross-layer concordance
  • Evidence quality
  • Clinical validation

基于以下指标计算多组学置信度评分(0-100):
  • 各层面的数据可用性
  • 跨层一致性
  • 证据质量
  • 临床验证

Phase 8: Report Finalization

阶段8:报告定稿

Executive Summary

执行摘要

Write a 2-3 sentence synthesis covering:
  • Disease mechanism in systems terms
  • Key genes/pathways identified
  • Therapeutic opportunities
撰写2-3句话的总结,涵盖:
  • 系统层面的疾病机制
  • 识别出的关键基因/通路
  • 治疗机会

Final Report Quality Checklist

最终报告质量检查清单

Before presenting to user, verify:
  • All 8 sections have content (or marked as "No data available")
  • Every data point has a source citation
  • Executive summary reflects key findings
  • Multi-Omics Confidence Score calculated
  • Top 20 genes ranked by multi-omics evidence
  • Top 10 enriched pathways listed
  • Biomarker candidates identified
  • Cross-layer concordance table complete
  • Therapeutic opportunities summarized
  • Mechanistic hypotheses generated
  • Data Availability Checklist complete
  • Completeness Checklist complete
  • References section lists all tools used

在提交给用户前,请验证:
  • 所有8个章节均有内容(或标注为“无可用数据”)
  • 每个数据点均有来源引用
  • 执行摘要反映关键发现
  • 多组学置信度评分已计算
  • 按多组学证据排名的前20个基因
  • 列出排名前10的富集通路
  • 已识别候选生物标志物
  • 跨层一致性表格已完成
  • 治疗机会已总结
  • 已生成机制假说
  • 数据可用性检查清单已完成
  • 完整性检查清单已完成
  • 参考文献章节列出所有使用的工具

Tool Parameter Quick Reference

工具参数快速参考

ToolKey ParametersNotes
OpenTargets_get_disease_id_description_by_name
diseaseName
Primary disambiguation
OSL_get_efo_id_by_disease_name
disease
Secondary disambiguation
OpenTargets_get_associated_targets_by_disease_efoId
efoId
Returns top 25 genes
OpenTargets_get_evidence_by_datasource
efoId
,
ensemblId
,
datasourceIds[]
,
size
Per-gene evidence
OpenTargets_search_gwas_studies_by_disease
diseaseIds[]
,
size
GWAS studies
gwas_search_associations
disease_trait
,
size
GWAS Catalog
clinvar_search_variants
condition
or
gene
,
max_results
Rare variants
ExpressionAtlas_search_differential
condition
,
species
DEGs
expression_atlas_disease_target_score
efoId
,
pageSize
(REQUIRED)
Expression scores
europepmc_disease_target_score
efoId
,
pageSize
(REQUIRED)
Literature scores
HPA_get_rna_expression_by_source
gene_name
,
source_type
,
source_name
(ALL REQUIRED)
Tissue expression
STRING_get_interaction_partners
protein_ids[]
,
species
(9606),
limit
PPI partners
STRING_get_network
protein_ids[]
,
species
PPI network
STRING_functional_enrichment
protein_ids[]
,
species
Functional enrichment
STRING_ppi_enrichment
protein_ids[]
,
species
Network significance
intact_search_interactions
query
,
max
Experimental PPIs
humanbase_ppi_analysis
gene_list[]
,
tissue
,
max_node
,
interaction
,
string_mode
(ALL REQ)
Tissue PPI
enrichr_gene_enrichment_analysis
gene_list[]
,
libs[]
(BOTH REQUIRED)
Pathway/GO enrichment
ReactomeAnalysis_pathway_enrichment
identifiers
(space-sep string)
Reactome enrichment
Reactome_map_uniprot_to_pathways
id
(UniProt accession)
Protein-pathway mapping
kegg_search_pathway
keyword
KEGG pathway search
WikiPathways_search
query
,
organism
WikiPathways search
GO_get_annotations_for_gene
gene_id
GO annotations
QuickGO_annotations_by_gene
gene_product_id
(e.g., 'UniProtKB:P02649')
Detailed GO
OpenTargets_get_associated_drugs_by_disease_efoId
efoId
,
size
(REQUIRED)
Disease drugs
OpenTargets_get_target_tractability_by_ensemblID
ensemblId
Druggability
search_clinical_trials
query_term
(REQUIRED),
condition
,
pageSize
Clinical trials
PubMed_search_articles
query
,
limit
Literature
ensembl_lookup_gene
gene_id
,
species
('homo_sapiens' REQUIRED)
Gene lookup
MyGene_query_genes
query
,
species
,
fields
,
size
Gene info
OpenTargets_get_similar_entities_by_disease_efoId
efoId
,
threshold
,
size
(ALL REQUIRED)
Similar diseases

工具关键参数注意事项
OpenTargets_get_disease_id_description_by_name
diseaseName
主要消歧工具
OSL_get_efo_id_by_disease_name
disease
次要消歧工具
OpenTargets_get_associated_targets_by_disease_efoId
efoId
返回前25个基因
OpenTargets_get_evidence_by_datasource
efoId
,
ensemblId
,
datasourceIds[]
,
size
单基因证据获取
OpenTargets_search_gwas_studies_by_disease
diseaseIds[]
,
size
GWAS研究获取
gwas_search_associations
disease_trait
,
size
GWAS Catalog关联搜索
clinvar_search_variants
condition
gene
,
max_results
罕见变异搜索
ExpressionAtlas_search_differential
condition
,
species
差异表达基因搜索
expression_atlas_disease_target_score
efoId
,
pageSize
(必填)
表达评分获取
europepmc_disease_target_score
efoId
,
pageSize
(必填)
文献评分获取
HPA_get_rna_expression_by_source
gene_name
,
source_type
,
source_name
(均为必填)
组织表达获取
STRING_get_interaction_partners
protein_ids[]
,
species
(9606),
limit
蛋白质相互作用伙伴获取
STRING_get_network
protein_ids[]
,
species
蛋白质相互作用网络构建
STRING_functional_enrichment
protein_ids[]
,
species
功能富集分析
STRING_ppi_enrichment
protein_ids[]
,
species
网络显著性检验
intact_search_interactions
query
,
max
实验验证的蛋白质相互作用搜索
humanbase_ppi_analysis
gene_list[]
,
tissue
,
max_node
,
interaction
,
string_mode
(均为必填)
组织特异性蛋白质相互作用网络
enrichr_gene_enrichment_analysis
gene_list[]
,
libs[]
(均为必填)
通路/GO富集分析
ReactomeAnalysis_pathway_enrichment
identifiers
(空格分隔字符串)
Reactome富集分析
Reactome_map_uniprot_to_pathways
id
(UniProt登录号)
蛋白质-通路映射
kegg_search_pathway
keyword
KEGG通路搜索
WikiPathways_search
query
,
organism
WikiPathways搜索
GO_get_annotations_for_gene
gene_id
GO注释获取
QuickGO_annotations_by_gene
gene_product_id
(如'UniProtKB:P02649')
详细GO注释获取
OpenTargets_get_associated_drugs_by_disease_efoId
efoId
,
size
(必填)
疾病关联药物获取
OpenTargets_get_target_tractability_by_ensemblID
ensemblId
可成药性评估
search_clinical_trials
query_term
(必填),
condition
,
pageSize
临床试验搜索
PubMed_search_articles
query
,
limit
文献搜索
ensembl_lookup_gene
gene_id
,
species
('homo_sapiens'必填)
基因查找
MyGene_query_genes
query
,
species
,
fields
,
size
基因信息获取
OpenTargets_get_similar_entities_by_disease_efoId
efoId
,
threshold
,
size
(均为必填)
相似疾病获取

Response Format Notes (Verified)

响应格式说明(已验证)

OpenTargets Associated Targets

OpenTargets关联靶点

json
{
  "data": {
    "disease": {
      "id": "MONDO_0004975",
      "name": "Alzheimer disease",
      "associatedTargets": {
        "count": 2456,
        "rows": [
          {
            "target": {"id": "ENSG00000080815", "approvedSymbol": "PSEN1"},
            "score": 0.87
          }
        ]
      }
    }
  }
}
json
{
  "data": {
    "disease": {
      "id": "MONDO_0004975",
      "name": "Alzheimer disease",
      "associatedTargets": {
        "count": 2456,
        "rows": [
          {
            "target": {"id": "ENSG00000080815", "approvedSymbol": "PSEN1"},
            "score": 0.87
          }
        ]
      }
    }
  }
}

GWAS Catalog Associations

GWAS Catalog关联

json
{
  "data": [
    {
      "association_id": 216440893,
      "p_value": 2e-09,
      "or_per_copy_num": 0.94,
      "or_value": "0.94",
      "efo_traits": [{"..."}],
      "risk_frequency": "NR"
    }
  ],
  "metadata": {"pagination": {"totalElements": 1061816}}
}
json
{
  "data": [
    {
      "association_id": 216440893,
      "p_value": 2e-09,
      "or_per_copy_num": 0.94,
      "or_value": "0.94",
      "efo_traits": [{"..."}],
      "risk_frequency": "NR"
    }
  ],
  "metadata": {"pagination": {"totalElements": 1061816}}
}

STRING Interactions

STRING相互作用

json
{
  "status": "success",
  "data": [
    {
      "stringId_A": "9606.ENSP00000252486",
      "stringId_B": "9606.ENSP00000466775",
      "preferredName_A": "APOE",
      "preferredName_B": "APOC2",
      "score": 0.999
    }
  ]
}
json
{
  "status": "success",
  "data": [
    {
      "stringId_A": "9606.ENSP00000252486",
      "stringId_B": "9606.ENSP00000466775",
      "preferredName_A": "APOE",
      "preferredName_B": "APOC2",
      "score": 0.999
    }
  ]
}

Reactome Enrichment

Reactome富集

json
{
  "data": {
    "token": "...",
    "pathways_found": 154,
    "pathways": [
      {
        "pathway_id": "R-HSA-1251985",
        "name": "Nuclear signaling by ERBB4",
        "species": "Homo sapiens",
        "is_disease": false,
        "is_lowest_level": true,
        "entities_found": 3,
        "entities_total": 47,
        "entities_ratio": 0.00291,
        "p_value": 4.0e-06,
        "fdr": 0.00068,
        "reactions_found": 3,
        "reactions_total": 34
      }
    ]
  }
}
json
{
  "data": {
    "token": "...",
    "pathways_found": 154,
    "pathways": [
      {
        "pathway_id": "R-HSA-1251985",
        "name": "Nuclear signaling by ERBB4",
        "species": "Homo sapiens",
        "is_disease": false,
        "is_lowest_level": true,
        "entities_found": 3,
        "entities_total": 47,
        "entities_ratio": 0.00291,
        "p_value": 4.0e-06,
        "fdr": 0.00068,
        "reactions_found": 3,
        "reactions_total": 34
      }
    ]
  }
}

HPA RNA Expression

HPA RNA表达

json
{
  "status": "success",
  "data": {
    "gene_name": "APOE",
    "source_type": "tissue",
    "source_name": "brain",
    "expression_value": "2714.9",
    "expression_level": "very high",
    "expression_unit": "nTPM"
  }
}
json
{
  "status": "success",
  "data": {
    "gene_name": "APOE",
    "source_type": "tissue",
    "source_name": "brain",
    "expression_value": "2714.9",
    "expression_level": "very high",
    "expression_unit": "nTPM"
  }
}

Enrichr Results

Enrichr结果

json
{
  "status": "success",
  "data": "{\"connected_paths\": {\"Path: ...\": \"Total Weight: ...\"}}"
}
NOTE: The
data
field is a JSON string that needs parsing.

json
{
  "status": "success",
  "data": "{\"connected_paths\": {\"Path: ...\": \"Total Weight: ...\"}}"
}
注意:
data
字段是JSON字符串,需要进行解析。

Common Use Patterns

常见使用模式

1. Comprehensive Disease Profiling

1. 全面疾病分析

User: "Characterize Alzheimer's disease across omics layers"
-> Run all 8 phases
-> Produce full multi-omics report
用户: "对阿尔茨海默病进行跨组学层面的特征分析"
-> 执行所有8个阶段
-> 生成完整多组学报告

2. Therapeutic Target Discovery

2. 治疗靶点发现

User: "What are druggable targets for rheumatoid arthritis?"
-> Emphasize Phase 1 (genomics), Phase 6 (therapeutics), Phase 7 (integration)
-> Focus on tractability and clinical precedent
用户: "类风湿关节炎的可成药靶点有哪些?"
-> 重点执行阶段1(基因组学)、阶段6(治疗学)、阶段7(整合)
-> 聚焦可成药性及临床先例

3. Biomarker Identification

3. 生物标志物识别

User: "Find diagnostic biomarkers for pancreatic cancer"
-> Emphasize Phase 2 (transcriptomics), Phase 3 (proteomics), Phase 7 (biomarkers)
-> Focus on tissue-specific expression and diagnostic potential
用户: "寻找胰腺癌的诊断生物标志物"
-> 重点执行阶段2(转录组学)、阶段3(蛋白质组学)、阶段7(生物标志物)
-> 聚焦组织特异性表达及诊断潜力

4. Mechanism Elucidation

4. 机制解析

User: "What pathways are dysregulated in Crohn's disease?"
-> Emphasize Phase 4 (pathways), Phase 5 (GO), Phase 7 (mechanistic hypotheses)
-> Focus on pathway enrichment and cross-pathway connections
用户: "克罗恩病中哪些通路失调?"
-> 重点执行阶段4(通路)、阶段5(GO)、阶段7(机制假说)
-> 聚焦通路富集及通路间关联

5. Drug Repurposing

5. 药物重定位

User: "What existing drugs could be repurposed for ALS?"
-> Emphasize Phase 1 (genetics), Phase 6 (therapeutic landscape), Phase 7 (repurposing)
-> Focus on drugs targeting disease-associated genes
用户: "哪些现有药物可重定位用于ALS?"
-> 重点执行阶段1(遗传学)、阶段6(治疗全景)、阶段7(重定位)
-> 聚焦靶向疾病关联基因的药物

6. Systems Biology

6. 系统生物学分析

User: "What are the hub genes and key pathways in type 2 diabetes?"
-> Emphasize Phase 3 (PPI network), Phase 4 (pathways), Phase 7 (network analysis)
-> Focus on hub genes and network modules

用户: "2型糖尿病中的枢纽基因和关键通路有哪些?"
-> 重点执行阶段3(蛋白质相互作用网络)、阶段4(通路)、阶段7(网络分析)
-> 聚焦枢纽基因及网络模块

Edge Case Handling

边缘场景处理

Rare Diseases (limited data)

罕见病(数据有限)

  • Genomics layer may dominate (single gene)
  • Limited GWAS data (monogenic)
  • Focus on ClinVar variants, pathway consequences
  • Confidence score will be lower (less cross-layer data)
  • 基因组学层面可能占主导(单基因)
  • GWAS数据有限(单基因病)
  • 聚焦ClinVar变异、通路影响
  • 置信度评分会较低(跨层数据较少)

Common Diseases (overwhelming data)

常见病(数据过多)

  • Thousands of GWAS associations
  • Prioritize by effect size and significance
  • Focus on top 20-30 genes for downstream analysis
  • Use strict significance thresholds (p < 5e-8)
  • 数千个GWAS关联结果
  • 按效应量和显著性优先排序
  • 下游分析聚焦前20-30个基因
  • 使用严格的显著性阈值(p < 5e-8)

Cancer

癌症

  • Include somatic mutations (if CIViC/cBioPortal available)
  • Check cancer prognostics via HPA
  • Include tumor-specific expression patterns
  • Clinical trial landscape may be extensive
  • 包含体细胞突变(若CIViC/cBioPortal可用)
  • 通过HPA检查癌症预后
  • 包含肿瘤特异性表达模式
  • 临床试验全景可能非常广泛

Monogenic Diseases

单基因病

  • Single gene dominates
  • ClinVar/OMIM evidence is primary
  • Pathway analysis reveals downstream effects
  • Therapeutic landscape may be limited (gene therapy, enzyme replacement)
  • 单个基因占主导
  • ClinVar/OMIM证据为主要依据
  • 通路分析揭示下游效应
  • 治疗全景可能有限(基因治疗、酶替代疗法)

Polygenic Diseases

多基因病

  • Many weak genetic signals
  • GWAS provides the gene list
  • Pathway enrichment reveals convergent biology
  • Network analysis identifies hub genes
  • 许多弱遗传信号
  • GWAS提供基因列表
  • 通路富集揭示趋同生物学特征
  • 网络分析识别枢纽基因

Tissue Ambiguity

组织歧义

  • Diseases affecting multiple tissues
  • Query HPA for all relevant tissues
  • Compare tissue-specific expression patterns
  • Use tissue context from disease ontology

  • 影响多个组织的疾病
  • 查询HPA获取所有相关组织的信息
  • 比较组织特异性表达模式
  • 使用疾病本体中的组织背景信息

Fallback Strategies

fallback策略

If disease name not found

若未找到疾病名称

  1. Try synonyms
  2. Try broader disease category
  3. Try OMIM/UMLS ID mapping
  4. Report disambiguation failure and ask user
  1. 尝试使用同义词
  2. 尝试更宽泛的疾病类别
  3. 尝试OMIM/UMLS ID映射
  4. 报告消歧失败并询问用户

If no GWAS data

若无GWAS数据

  1. Check ClinVar for rare variants
  2. Use OpenTargets genetic evidence
  3. Note in report as "Limited genetic data"
  4. Adjust confidence score accordingly
  1. 检查ClinVar中的罕见变异
  2. 使用OpenTargets中的遗传学证据
  3. 在报告中注明“遗传学数据有限”
  4. 相应调整置信度评分

If no expression data

若无表达数据

  1. Try different disease name/synonym
  2. Check HPA for individual gene expression
  3. Use OpenTargets expression evidence
  4. Note as "Limited transcriptomics data"
  1. 尝试不同的疾病名称/同义词
  2. 检查HPA中单个基因的表达情况
  3. 使用OpenTargets中的表达证据
  4. 注明“转录组学数据有限”

If no pathway enrichment

若无通路富集结果

  1. Reduce gene list stringency
  2. Try different pathway databases
  3. Map individual genes to pathways via Reactome
  4. Note as "No significant pathway enrichment"
  1. 降低基因列表的筛选严格度
  2. 尝试不同的通路数据库
  3. 通过Reactome将单个基因映射到通路
  4. 注明“无显著通路富集”

If no drugs found

若无药物数据

  1. Check if disease is rare/orphan
  2. Look for drugs targeting individual genes
  3. Check clinical trials for investigational therapies
  4. Note as "No approved drugs - novel therapeutic opportunity"
  1. 检查疾病是否为罕见/孤儿病
  2. 查找靶向单个基因的药物
  3. 检查研究中的临床试验疗法
  4. 注明“无获批药物 - 存在新型治疗机会”