Multi-Omics Disease Characterization Pipeline
Characterize diseases across multiple molecular layers (genomics, transcriptomics, proteomics, pathways) to provide systems-level understanding of disease mechanisms, identify therapeutic opportunities, and discover biomarker candidates.
KEY PRINCIPLES:
- Report-first approach - Create report file FIRST, then populate progressively
- Disease disambiguation FIRST - Resolve all identifiers before omics analysis
- Layer-by-layer analysis - Systematically cover all omics layers
- Cross-layer integration - Identify genes/targets appearing in multiple layers
- Evidence grading - Grade all evidence as T1 (human/clinical) to T4 (computational)
- Tissue context - Emphasize disease-relevant tissues/organs
- Quantitative scoring - Multi-Omics Confidence Score (0-100)
- Druggable focus - Prioritize targets with therapeutic potential
- Biomarker identification - Highlight diagnostic/prognostic markers
- Mechanistic synthesis - Generate testable hypotheses
- Source references - Every statement must cite tool/database
- Completeness checklist - Mandatory section showing analysis coverage
- English-first queries - Always use English terms in tool calls. Respond in user's language
When to Use This Skill
Apply when users:
- Ask about disease mechanisms across omics layers
- Need multi-omics characterization of a disease
- Want to understand disease at the systems biology level
- Ask "What pathways/genes/proteins are involved in [disease]?"
- Need biomarker discovery for a disease
- Want to identify druggable targets from disease profiling
- Ask for integrated genomics + transcriptomics + proteomics analysis
- Need cross-layer concordance analysis
- Ask about disease network biology / hub genes
NOT for (use other skills instead):
- Single gene/target validation -> Use
tooluniverse-drug-target-validation
- Drug safety profiling -> Use
tooluniverse-adverse-event-detection
- General disease overview -> Use
tooluniverse-disease-research
- Variant interpretation -> Use
tooluniverse-variant-interpretation
- GWAS-specific analysis -> Use skills
- Pathway-only analysis -> Use
tooluniverse-systems-biology
Input Parameters
| Parameter | Required | Description | Example |
|---|
| disease | Yes | Disease name, OMIM ID, EFO ID, or MONDO ID | , |
| tissue | No | Tissue/organ of interest | , , |
| focus_layers | No | Specific omics layers to emphasize | , , |
Multi-Omics Confidence Score (0-100)
Score Components
Data Availability (0-40 points):
- Genomics data available (GWAS or rare variants): 10 points
- Transcriptomics data available (DEGs or expression): 10 points
- Protein data available (PPI or expression): 5 points
- Pathway data available (enriched pathways): 10 points
- Clinical/drug data available (approved drugs or trials): 5 points
Evidence Concordance (0-40 points):
- Multi-layer genes (appear in 3+ layers): up to 20 points (2 per gene, max 10 genes)
- Consistent direction (genetics + expression concordant): 10 points
- Pathway-gene concordance (genes found in enriched pathways): 10 points
Evidence Quality (0-20 points):
- Strong genetic evidence (GWAS p < 5e-8): 10 points
- Clinical validation (approved drugs): 10 points
Score Interpretation
| Score | Tier | Interpretation |
|---|
| 80-100 | Excellent | Comprehensive multi-omics coverage, high confidence, strong cross-layer concordance |
| 60-79 | Good | Good coverage across most layers, some gaps |
| 40-59 | Moderate | Moderate coverage, limited cross-layer integration |
| 0-39 | Limited | Limited data, single-layer analysis dominates |
Evidence Grading System
| Tier | Symbol | Criteria | Examples |
|---|
| T1 | [T1] | Direct human evidence, clinical proof | FDA-approved drug, GWAS hit (p<5e-8), clinical trial result |
| T2 | [T2] | Experimental evidence | Differential expression (validated), functional screen, mouse KO |
| T3 | [T3] | Computational/database evidence | PPI network, pathway mapping, expression correlation |
| T4 | [T4] | Annotation/prediction only | GO annotation, text-mined association, predicted interaction |
Report Template
Create this file structure at the start:
{disease_name}_multiomic_report.md
markdown
# Multi-Omics Disease Characterization: {Disease Name}
**Report Generated**: {date}
**Disease Identifiers**: (to be filled)
**Multi-Omics Confidence Score**: (to be calculated)
---
## Executive Summary
(2-3 sentence disease mechanism synthesis - fill after all layers complete)
---
## 1. Disease Definition & Context
### Disease Identifiers
|--------|-----|--------|
### Description
### Synonyms
### Disease Hierarchy (parents/children)
### Affected Tissues/Organs
### Therapeutic Areas
**Sources**: (tools used)
---
## 2. Genomics Layer
### 2.1 GWAS Associations
|-----|---------|--------|------|-------|--------|
### 2.2 GWAS Studies Summary
|----------|-------|-------------|------|--------|
### 2.3 Associated Genes (Genetic Evidence)
|------|------------|-------------------|---------------|--------|
### 2.4 Rare Variants (ClinVar)
|---------|------|-----------------------|--------|
### Genomics Layer Summary
- Total GWAS hits:
- Top genes by genetic evidence:
- Genetic architecture:
**Sources**: (tools used)
---
## 3. Transcriptomics Layer
### 3.1 Differential Expression Studies
|------------|-----------|--------------|----------------|--------|
### 3.2 Expression Atlas Disease Evidence
|------|-------|--------|
### 3.3 Tissue Expression Patterns (GTEx/HPA)
|------|--------|-----------------|--------|
### 3.4 Biomarker Candidates (Expression-Based)
|------|-------------------|-------------|----------|--------|
### Transcriptomics Layer Summary
- Differential expression datasets:
- Top DEGs:
- Tissue-specific patterns:
**Sources**: (tools used)
---
## 4. Proteomics & Interaction Layer
### 4.1 Protein-Protein Interactions (STRING)
|-----------|-----------|-------|--------|
### 4.2 Hub Genes (Network Centrality)
|------|--------|-------------|------|--------|
### 4.3 Protein Complexes (IntAct)
|---------|---------|----------|--------|
### 4.4 Tissue-Specific PPI Network
|------|-------------------|--------|--------|
### Proteomics Layer Summary
- Total PPIs:
- Hub genes:
- Network modules:
**Sources**: (tools used)
---
## 5. Pathway & Network Layer
### 5.1 Enriched Pathways (Enrichr/Reactome)
|---------|----------|---------|-------|--------|
### 5.2 Reactome Pathway Details
|------------|------|----------------|--------|
### 5.3 KEGG Pathways
|------------|------|-------------|--------|
### 5.4 WikiPathways
|------------|------|----------|--------|
### Pathway Layer Summary
- Top enriched pathways:
- Key pathway nodes:
- Cross-pathway connections:
**Sources**: (tools used)
---
## 6. Gene Ontology & Functional Annotation
### 6.1 Biological Processes
|---------|------|---------|-------|--------|
### 6.2 Molecular Functions
|---------|------|---------|-------|--------|
### 6.3 Cellular Components
|---------|------|---------|-------|--------|
**Sources**: (tools used)
---
## 7. Therapeutic Landscape
### 7.1 Approved Drugs
|------|-----------|-----------|--------|-------|--------|
### 7.2 Druggable Targets
|------|-------------|----------|-------------------|--------|
### 7.3 Drug Repurposing Candidates
|------|---------------------|-----------|--------|--------|
### 7.4 Clinical Trials
|--------|-------|-------|--------|--------------|--------|
### Therapeutic Summary
- Approved drugs:
- Clinical pipeline:
- Novel targets:
**Sources**: (tools used)
---
## 8. Multi-Omics Integration
### 8.1 Cross-Layer Gene Concordance
|------|----------|-----------------|------------|----------|--------|---------------|
### 8.2 Multi-Omics Hub Genes (Top 20)
|------|------|-------------|--------------|-----------|--------|
### 8.3 Biomarker Candidates
|-----------|------|-----------------|------------|--------|
### 8.4 Mechanistic Hypotheses
1. (Hypothesis with supporting evidence from multiple layers)
2. ...
### 8.5 Systems-Level Insights
- Key disrupted processes:
- Critical pathway nodes:
- Therapeutic intervention points:
- Testable hypotheses:
---
## Multi-Omics Confidence Score
|-----------|--------|-----|---------|
| Genomics data | | 10 | |
| Transcriptomics data | | 10 | |
| Protein data | | 5 | |
| Pathway data | | 10 | |
| Clinical data | | 5 | |
| Multi-layer genes | | 20 | |
| Direction concordance | | 10 | |
| Pathway-gene concordance | | 10 | |
| Genetic evidence quality | | 10 | |
| Clinical validation | | 10 | |
| **TOTAL** | | **100** | |
**Score**: XX/100 - [Tier]
---
## Data Availability Checklist
|-------------|---------------|------------|----------|
| Genomics (GWAS) | Yes/No | | |
| Genomics (Rare Variants) | Yes/No | | |
| Transcriptomics (DEGs) | Yes/No | | |
| Transcriptomics (Expression) | Yes/No | | |
| Proteomics (PPI) | Yes/No | | |
| Proteomics (Expression) | Yes/No | | |
| Pathways (Enrichment) | Yes/No | | |
| Pathways (KEGG/Reactome) | Yes/No | | |
| Gene Ontology | Yes/No | | |
| Drugs/Therapeutics | Yes/No | | |
| Clinical Trials | Yes/No | | |
| Literature | Yes/No | | |
---
## Completeness Checklist
- [ ] Disease disambiguation complete (IDs resolved)
- [ ] Genomics layer analyzed (GWAS + variants)
- [ ] Transcriptomics layer analyzed (DEGs + expression)
- [ ] Proteomics layer analyzed (PPI + interactions)
- [ ] Pathway layer analyzed (enrichment + mapping)
- [ ] Gene Ontology analyzed (BP + MF + CC)
- [ ] Therapeutic landscape analyzed (drugs + targets + trials)
- [ ] Cross-layer integration complete (concordance analysis)
- [ ] Multi-Omics Confidence Score calculated
- [ ] Biomarker candidates identified
- [ ] Hub genes identified
- [ ] Mechanistic hypotheses generated
- [ ] Executive summary written
- [ ] All sections have source citations
---
## References
### Data Sources Used
|---|------|------------|---------|-----------------|
### Database Versions
- OpenTargets: (current)
- GWAS Catalog: (current)
- STRING: (current)
- Reactome: (current)
Phase 0: Disease Disambiguation (ALWAYS FIRST)
Objective: Resolve disease to standard identifiers for all downstream queries.
Tools Used
OpenTargets_get_disease_id_description_by_name (primary):
- Input: (string) - Disease name
- Output:
{data: {search: {hits: [{id, name, description}]}}}
- Use: Get MONDO/EFO IDs and description
- CRITICAL: Disease IDs from OpenTargets use underscore format (e.g., ), NOT colon format
OSL_get_efo_id_by_disease_name (secondary):
- Input: (string) - Disease name
- Output:
- Use: Get EFO/MONDO ID
OpenTargets_get_disease_description_by_efoId:
- Input: (string) - Disease ID (e.g., )
- Output:
{data: {disease: {id, name, description, dbXRefs}}}
- Use: Get full description, cross-references (OMIM, UMLS, DOID, etc.)
OpenTargets_get_disease_synonyms_by_efoId:
- Input: (string)
- Output:
{data: {disease: {id, name, synonyms: [{relation, terms}]}}}
OpenTargets_get_disease_therapeutic_areas_by_efoId:
- Input: (string)
- Output:
{data: {disease: {id, name, therapeuticAreas: [{id, name}]}}}
OpenTargets_get_disease_ancestors_parents_by_efoId:
- Input: (string)
- Output:
{data: {disease: {id, name, ancestors: [{id, name}]}}}
OpenTargets_get_disease_descendants_children_by_efoId:
- Input: (string)
- Output:
{data: {disease: {id, name, descendants: [{id, name}]}}}
OpenTargets_map_any_disease_id_to_all_other_ids:
- Input: (string) - Any known disease ID (e.g., , )
- Output:
{data: {disease: {id, name, dbXRefs: [str], ...}}}
- Use: Cross-map between OMIM, UMLS, ICD10, DOID, etc.
Workflow
- Search by disease name to get primary ID (OpenTargets)
- Get full description and cross-references
- Get synonyms for search term expansion
- Get therapeutic areas for context
- Get disease hierarchy (parents/children)
- If user provided OMIM/other ID, map to MONDO/EFO first
Collision-Aware Search
When disease name returns multiple hits:
- Check if user's input matches any hit exactly
- If ambiguous, present top 3-5 options and ask user to select
- Always prefer the most specific disease (not parent categories)
- For cancer, prefer the specific tumor type over generic "cancer"
Key Disease IDs to Track
After disambiguation, store these for all downstream queries:
- - Primary ID for OpenTargets queries (e.g., )
- - Canonical name (e.g., )
- - For literature search expansion
- - For context
- - Cross-references (OMIM, UMLS, DOID, etc.)
Phase 1: Genomics Layer
Objective: Identify genetic variants, GWAS associations, and genetically implicated genes.
Tools Used
OpenTargets_get_associated_targets_by_disease_efoId (primary):
- Input: (string) - Disease EFO/MONDO ID
- Output:
{data: {disease: {id, name, associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}}
- Use: Get ALL disease-associated genes ranked by overall evidence score
- NOTE: Returns top 25 by default. For comprehensive analysis, note the total
OpenTargets_get_evidence_by_datasource:
- Input: (string), (string), optional (array), (int, default 50)
- Output:
{data: {disease: {evidences: {count, rows: [{...evidence details}]}}}}
- Use: Get specific evidence types. Key datasourceIds for genomics:
- - GWAS/genetics
['gene2phenotype', 'genomics_england', 'orphanet']
- Rare variants
- - ClinVar variants
gwas_search_associations (GWAS Catalog):
- Input: (string), (int, default 20)
- Output:
{data: [{association_id, p_value, or_per_copy_num, or_value, beta, risk_frequency, efo_traits: [{...}], ...}], metadata: {pagination: {totalElements}}}
- Use: Get genome-wide significant associations
- NOTE: Use disease name (e.g., "Alzheimer"), not ID. Returns paginated results
gwas_get_studies_for_trait:
- Input: (string), (int)
- Output:
{data: [...studies], metadata: {pagination}}
- NOTE: May return empty if trait name does not match exactly. Try synonyms
gwas_get_variants_for_trait:
- Input: (string), (int)
- Output:
{data: [...variants], metadata: {pagination}}
GWAS_search_associations_by_gene:
- Input: (string)
- Output: Associations for a specific gene
OpenTargets_search_gwas_studies_by_disease:
- Input: (array of strings), (bool, default true), (int, default 10)
- Output:
{data: {studies: {count, rows: [{id, studyType, traitFromSource, publicationFirstAuthor, publicationDate, pubmedId, nSamples, nCases, nControls, ...}]}}}
- Use: Get GWAS studies from OpenTargets genetics portal
clinvar_search_variants:
- Input: (string) or (string), optional (int)
- Output: List of ClinVar variants with clinical significance
- Use: Rare variant / monogenic disease evidence
Workflow
- Get associated genes from OpenTargets (overall scores)
- For top 10-15 genes, get genetic evidence specifically via
OpenTargets_get_evidence_by_datasource
- Search GWAS Catalog for associations
- Search OpenTargets GWAS studies
- Search ClinVar for rare variants
- For top GWAS genes, check
GWAS_search_associations_by_gene
Gene Tracking
Maintain a dictionary of genes found in genomics layer:
python
genomics_genes = {
'PSEN1': {'score': 0.87, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000080815', 'layer': 'genomics'},
'APP': {'score': 0.82, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000142192', 'layer': 'genomics'},
# ...
}
Phase 2: Transcriptomics Layer
Objective: Identify differentially expressed genes, tissue-specific expression, and expression-based biomarkers.
Tools Used
ExpressionAtlas_search_differential:
- Input: optional (string), (string), (string, default 'homo sapiens')
- Output: Differential expression studies and results
- Use: Find studies where genes are differentially expressed in disease
ExpressionAtlas_search_experiments:
- Input: optional (string), (string), (string)
- Output: Expression experiments relevant to condition
- Use: Find all Expression Atlas experiments for the disease
expression_atlas_disease_target_score:
- Input: (string), (int, required)
- Output: Genes scored by expression evidence for the disease
- Use: Get expression-based disease-gene association scores
europepmc_disease_target_score:
- Input: (string), (int, required)
- Output: Genes scored by literature evidence for the disease
- Use: Complement expression evidence with literature-mined associations
HPA_get_rna_expression_by_source (Human Protein Atlas):
- Input: (string), (string: 'tissue', 'blood', 'brain'), (string: e.g., 'brain', 'liver')
- Output:
{status, data: {gene_name, source_type, source_name, expression_value, expression_level, expression_unit}}
- NOTE: ALL 3 params required. options: 'tissue', 'blood', 'brain', 'cell_line', 'single_cell'
HPA_get_rna_expression_in_specific_tissues:
- Input: (string), (array of strings)
- Output: Expression across specified tissues
HPA_get_cancer_prognostics_by_gene:
- Input: (string)
- Output: Cancer prognostic data (if cancer context)
HPA_get_subcellular_location:
- Input: (string)
- Output: Subcellular localization data
HPA_search_genes_by_query:
- Input: (string)
- Output: Matching genes in HPA
Workflow
- Search Expression Atlas for differential expression studies
- Get expression-based disease scores
- Get literature-based disease scores (EuropePMC)
- For top 10-15 genes from genomics layer, check tissue expression via HPA
- Check disease-relevant tissue expression patterns
- For cancer: check prognostic biomarkers
Gene Tracking
Add transcriptomics genes to tracking:
python
transcriptomics_genes = {
'APOE': {'expression_score': 0.75, 'tissues': ['brain'], 'evidence': 'differential_expression', 'layer': 'transcriptomics'},
# ...
}
Phase 3: Proteomics & Interaction Layer
Objective: Map protein-protein interactions, identify hub genes, and characterize interaction networks.
Tools Used
STRING_get_interaction_partners (primary PPI):
- Input: (array of strings - gene names work), (int, default 9606), (float, default 0.4), (int, default 20)
- Output:
{status: 'success', data: [{stringId_A, stringId_B, preferredName_A, preferredName_B, ncbiTaxonId, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]}
- Use: Get interaction partners for disease genes
- NOTE: is an array, NOT string. Gene symbols like work
STRING_get_network:
- Input: (array), (int), (float)
- Output: Network of interactions between input proteins
- Use: Build disease-specific PPI network
STRING_functional_enrichment:
- Input: (array), (int)
- Output: Functional enrichment results (GO, KEGG, etc.)
- Use: Functional characterization of disease gene set
STRING_ppi_enrichment:
- Input: (array), (int)
- Output: Statistical test for PPI enrichment (more interactions than expected)
- Use: Test if disease genes form a connected module
intact_get_interactions:
- Input: (string - UniProt ID or gene name)
- Output: Molecular interaction data from IntAct
intact_search_interactions:
- Input: (string), (int, default 0), (int, default 25)
- Output: Search results for interactions
HPA_get_protein_interactions_by_gene:
- Input: (string)
- Output:
{gene, interactions, interactor_count, interactors: [...]}
humanbase_ppi_analysis:
- Input: (array), (string), (int), (string), (bool)
- Output: Tissue-specific PPI network
- NOTE: ALL params required. options: 'coexpression', 'interaction', 'coexpression_and_interaction'. : true/false
Workflow
- Take top 15-20 genes from genomics + transcriptomics layers
- Query STRING for interaction partners of each gene
- Build composite PPI network using STRING_get_network
- Test PPI enrichment (are genes more connected than random?)
- Get functional enrichment from STRING
- For disease-relevant tissue, get tissue-specific network (HumanBase)
- Identify hub genes (highest degree centrality)
- Check IntAct for experimentally validated interactions
Hub Gene Analysis
Calculate network centrality metrics:
- Degree: Number of interaction partners
- Betweenness: Number of shortest paths through node
- Hub score: Genes with degree > mean + 1 SD are hubs
Phase 4: Pathway & Network Layer
Objective: Identify enriched biological pathways and cross-pathway connections.
Tools Used
enrichr_gene_enrichment_analysis (primary enrichment):
- Input: (array of gene symbols, min 2), (array of library names)
- Output:
{status: 'success', data: '{...JSON string with enrichment results...}'}
- Key libraries: , ,
['WikiPathway_2023_Human']
, ['GO_Biological_Process_2023']
, ['GO_Molecular_Function_2023']
, ['GO_Cellular_Component_2023']
- NOTE: field is a JSON string, needs parsing. Contains and per-library results
- NOTE: is REQUIRED as array
ReactomeAnalysis_pathway_enrichment:
- Input: (string - space-separated gene list), optional (int, default 20), (bool), (bool)
- Output:
{data: {token, analysis_type, pathways_found, pathways: [{pathway_id, name, species, is_disease, is_lowest_level, entities_found, entities_total, entities_ratio, p_value, fdr, reactions_found, reactions_total}]}}
- Use: Reactome-specific pathway enrichment with statistical testing
Reactome_map_uniprot_to_pathways:
- Input: (string - UniProt accession)
- Output: List of Reactome pathways containing this protein
- Use: Map individual proteins to pathways
Reactome_get_pathway:
- Input: (string - Reactome stable ID, e.g., 'R-HSA-73817')
- Output: Pathway details
Reactome_get_pathway_reactions:
- Input: (string)
- Output: Reactions within pathway
kegg_search_pathway:
- Input: (string)
- Output: Array of KEGG pathway matches
kegg_get_pathway_info:
- Input: (string, e.g., 'hsa04930')
- Output: Detailed pathway information
WikiPathways_search:
- Input: (string), optional (string, e.g., 'Homo sapiens')
- Output: Matching community-curated pathways
Workflow
- Collect all genes from genomics + transcriptomics layers (top 20-30)
- Run Enrichr enrichment for KEGG, Reactome, WikiPathways
- Run ReactomeAnalysis for more detailed Reactome enrichment with p-values
- Search KEGG for disease-specific pathways
- Search WikiPathways for disease pathways
- For top Reactome pathways, get detailed reactions
- Identify cross-pathway connections (genes in multiple pathways)
Phase 5: Gene Ontology & Functional Annotation
Objective: Characterize biological processes, molecular functions, and cellular components.
Tools Used
enrichr_gene_enrichment_analysis (GO enrichment):
- Use with
libs=['GO_Biological_Process_2023']
for BP
- Use with
libs=['GO_Molecular_Function_2023']
for MF
- Use with
libs=['GO_Cellular_Component_2023']
for CC
GO_get_annotations_for_gene:
- Input: (string - gene symbol or UniProt ID)
- Output: List of GO annotations with terms, aspects, evidence codes
GO_search_terms:
- Input: (string)
- Output: Matching GO terms
QuickGO_annotations_by_gene:
- Input: (string - UniProt accession, e.g., 'UniProtKB:P02649'), optional (string: 'biological_process', 'molecular_function', 'cellular_component'), (int: 9606), (int: 25)
- Output: GO annotations with evidence codes
OpenTargets_get_target_gene_ontology_by_ensemblID:
- Input: (string)
- Output: GO terms associated with target
Workflow
- Run Enrichr GO enrichment for all 3 aspects using combined gene list
- For top 5 genes, get detailed GO annotations from QuickGO
- For top genes, get OpenTargets GO terms
- Summarize key biological processes, molecular functions, cellular components
Phase 6: Therapeutic Landscape
Objective: Map approved drugs, druggable targets, repurposing opportunities, and clinical trials.
Tools Used
OpenTargets_get_associated_drugs_by_disease_efoId (primary):
- Input: (string), (int, REQUIRED - use 100)
- Output:
{data: {disease: {knownDrugs: {count, rows: [{drug: {id, name, tradeNames, maximumClinicalTrialPhase, isApproved, hasBeenWithdrawn}, phase, mechanismOfAction, target: {id, approvedSymbol}, disease: {id, name}, urls: [{url, name}]}]}}}}
- Use: All drugs associated with disease (approved + investigational)
OpenTargets_get_target_tractability_by_ensemblID:
- Input: (string)
- Output: Tractability assessment (small molecule, antibody, PROTAC, etc.)
OpenTargets_get_associated_drugs_by_target_ensemblID:
- Input: (string), (int, REQUIRED)
- Output: Drugs targeting this gene/protein
search_clinical_trials:
- Input: (string, REQUIRED), optional (string), (string), (int, default 10)
- Output: Clinical trial results
- NOTE: is REQUIRED even if is provided
OpenTargets_get_drug_mechanisms_of_action_by_chemblId:
- Input: (string)
- Output: Mechanism of action details
Workflow
- Get all drugs for disease from OpenTargets
- For top disease-associated genes, check tractability
- For top genes with no approved drugs, identify repurposing candidates
- Search clinical trials for disease
- For top approved drugs, get mechanism of action
Drug Tracking
python
drug_targets = {
'PSEN1': {'drugs': ['Semagacestat'], 'tractability': 'small_molecule', 'clinical_phase': 3},
'ACHE': {'drugs': ['Donepezil', 'Galantamine'], 'tractability': 'small_molecule', 'clinical_phase': 4},
# ...
}
Phase 7: Multi-Omics Integration
Objective: Integrate findings across all layers to identify cross-layer genes, calculate concordance, and generate mechanistic hypotheses.
Cross-Layer Gene Concordance Analysis
This is the core integrative step. For each gene found in the analysis:
-
Count layers: In how many omics layers does this gene appear?
- Genomics (GWAS, rare variants, genetic association)
- Transcriptomics (DEGs, expression score)
- Proteomics (PPI hub, protein expression)
- Pathways (enriched pathway member)
- Therapeutics (drug target)
-
Score genes: Genes appearing in 3+ layers are "multi-omics hub genes"
-
Direction concordance: Do genetics and expression agree?
- Risk allele + upregulated = concordant gain-of-function
- Risk allele + downregulated = concordant loss-of-function
- Discordant = needs investigation
Biomarker Identification
For each multi-omics hub gene, assess biomarker potential:
- Diagnostic: Gene expression distinguishes disease vs healthy
- Prognostic: Expression/variant predicts outcome (cancer prognostics from HPA)
- Predictive: Variant/expression predicts treatment response (pharmacogenomics)
- Evidence level: Number of supporting omics layers
Mechanistic Hypothesis Generation
From the integrated data:
- Identify the most supported biological processes (GO + pathways)
- Map causal chain: genetic variant -> gene expression -> protein function -> pathway disruption -> disease
- Identify intervention points (druggable nodes in the causal chain)
- Generate testable hypotheses
Confidence Score Calculation
Calculate the Multi-Omics Confidence Score (0-100) based on:
- Data availability across layers
- Cross-layer concordance
- Evidence quality
- Clinical validation
Phase 8: Report Finalization
Executive Summary
Write a 2-3 sentence synthesis covering:
- Disease mechanism in systems terms
- Key genes/pathways identified
- Therapeutic opportunities
Final Report Quality Checklist
Before presenting to user, verify:
Tool Parameter Quick Reference
| Tool | Key Parameters | Notes |
|---|
OpenTargets_get_disease_id_description_by_name
| | Primary disambiguation |
OSL_get_efo_id_by_disease_name
| | Secondary disambiguation |
OpenTargets_get_associated_targets_by_disease_efoId
| | Returns top 25 genes |
OpenTargets_get_evidence_by_datasource
| , , , | Per-gene evidence |
OpenTargets_search_gwas_studies_by_disease
| , | GWAS studies |
| , | GWAS Catalog |
| or , | Rare variants |
ExpressionAtlas_search_differential
| , | DEGs |
expression_atlas_disease_target_score
| , (REQUIRED) | Expression scores |
europepmc_disease_target_score
| , (REQUIRED) | Literature scores |
HPA_get_rna_expression_by_source
| , , (ALL REQUIRED) | Tissue expression |
STRING_get_interaction_partners
| , (9606), | PPI partners |
| , | PPI network |
STRING_functional_enrichment
| , | Functional enrichment |
| , | Network significance |
intact_search_interactions
| , | Experimental PPIs |
| , , , , (ALL REQ) | Tissue PPI |
enrichr_gene_enrichment_analysis
| , (BOTH REQUIRED) | Pathway/GO enrichment |
ReactomeAnalysis_pathway_enrichment
| (space-sep string) | Reactome enrichment |
Reactome_map_uniprot_to_pathways
| (UniProt accession) | Protein-pathway mapping |
| | KEGG pathway search |
| , | WikiPathways search |
GO_get_annotations_for_gene
| | GO annotations |
QuickGO_annotations_by_gene
| (e.g., 'UniProtKB:P02649') | Detailed GO |
OpenTargets_get_associated_drugs_by_disease_efoId
| , (REQUIRED) | Disease drugs |
OpenTargets_get_target_tractability_by_ensemblID
| | Druggability |
| (REQUIRED), , | Clinical trials |
| , | Literature |
| , ('homo_sapiens' REQUIRED) | Gene lookup |
| , , , | Gene info |
OpenTargets_get_similar_entities_by_disease_efoId
| , , (ALL REQUIRED) | Similar diseases |
Response Format Notes (Verified)
OpenTargets Associated Targets
json
{
"data": {
"disease": {
"id": "MONDO_0004975",
"name": "Alzheimer disease",
"associatedTargets": {
"count": 2456,
"rows": [
{
"target": {"id": "ENSG00000080815", "approvedSymbol": "PSEN1"},
"score": 0.87
}
]
}
}
}
}
GWAS Catalog Associations
json
{
"data": [
{
"association_id": 216440893,
"p_value": 2e-09,
"or_per_copy_num": 0.94,
"or_value": "0.94",
"efo_traits": [{"..."}],
"risk_frequency": "NR"
}
],
"metadata": {"pagination": {"totalElements": 1061816}}
}
STRING Interactions
json
{
"status": "success",
"data": [
{
"stringId_A": "9606.ENSP00000252486",
"stringId_B": "9606.ENSP00000466775",
"preferredName_A": "APOE",
"preferredName_B": "APOC2",
"score": 0.999
}
]
}
Reactome Enrichment
json
{
"data": {
"token": "...",
"pathways_found": 154,
"pathways": [
{
"pathway_id": "R-HSA-1251985",
"name": "Nuclear signaling by ERBB4",
"species": "Homo sapiens",
"is_disease": false,
"is_lowest_level": true,
"entities_found": 3,
"entities_total": 47,
"entities_ratio": 0.00291,
"p_value": 4.0e-06,
"fdr": 0.00068,
"reactions_found": 3,
"reactions_total": 34
}
]
}
}
HPA RNA Expression
json
{
"status": "success",
"data": {
"gene_name": "APOE",
"source_type": "tissue",
"source_name": "brain",
"expression_value": "2714.9",
"expression_level": "very high",
"expression_unit": "nTPM"
}
}
Enrichr Results
json
{
"status": "success",
"data": "{\"connected_paths\": {\"Path: ...\": \"Total Weight: ...\"}}"
}
NOTE: The
field is a JSON string that needs parsing.
Common Use Patterns
1. Comprehensive Disease Profiling
User: "Characterize Alzheimer's disease across omics layers"
-> Run all 8 phases
-> Produce full multi-omics report
2. Therapeutic Target Discovery
User: "What are druggable targets for rheumatoid arthritis?"
-> Emphasize Phase 1 (genomics), Phase 6 (therapeutics), Phase 7 (integration)
-> Focus on tractability and clinical precedent
3. Biomarker Identification
User: "Find diagnostic biomarkers for pancreatic cancer"
-> Emphasize Phase 2 (transcriptomics), Phase 3 (proteomics), Phase 7 (biomarkers)
-> Focus on tissue-specific expression and diagnostic potential
4. Mechanism Elucidation
User: "What pathways are dysregulated in Crohn's disease?"
-> Emphasize Phase 4 (pathways), Phase 5 (GO), Phase 7 (mechanistic hypotheses)
-> Focus on pathway enrichment and cross-pathway connections
5. Drug Repurposing
User: "What existing drugs could be repurposed for ALS?"
-> Emphasize Phase 1 (genetics), Phase 6 (therapeutic landscape), Phase 7 (repurposing)
-> Focus on drugs targeting disease-associated genes
6. Systems Biology
User: "What are the hub genes and key pathways in type 2 diabetes?"
-> Emphasize Phase 3 (PPI network), Phase 4 (pathways), Phase 7 (network analysis)
-> Focus on hub genes and network modules
Edge Case Handling
Rare Diseases (limited data)
- Genomics layer may dominate (single gene)
- Limited GWAS data (monogenic)
- Focus on ClinVar variants, pathway consequences
- Confidence score will be lower (less cross-layer data)
Common Diseases (overwhelming data)
- Thousands of GWAS associations
- Prioritize by effect size and significance
- Focus on top 20-30 genes for downstream analysis
- Use strict significance thresholds (p < 5e-8)
Cancer
- Include somatic mutations (if CIViC/cBioPortal available)
- Check cancer prognostics via HPA
- Include tumor-specific expression patterns
- Clinical trial landscape may be extensive
Monogenic Diseases
- Single gene dominates
- ClinVar/OMIM evidence is primary
- Pathway analysis reveals downstream effects
- Therapeutic landscape may be limited (gene therapy, enzyme replacement)
Polygenic Diseases
- Many weak genetic signals
- GWAS provides the gene list
- Pathway enrichment reveals convergent biology
- Network analysis identifies hub genes
Tissue Ambiguity
- Diseases affecting multiple tissues
- Query HPA for all relevant tissues
- Compare tissue-specific expression patterns
- Use tissue context from disease ontology
Fallback Strategies
If disease name not found
- Try synonyms
- Try broader disease category
- Try OMIM/UMLS ID mapping
- Report disambiguation failure and ask user
If no GWAS data
- Check ClinVar for rare variants
- Use OpenTargets genetic evidence
- Note in report as "Limited genetic data"
- Adjust confidence score accordingly
If no expression data
- Try different disease name/synonym
- Check HPA for individual gene expression
- Use OpenTargets expression evidence
- Note as "Limited transcriptomics data"
If no pathway enrichment
- Reduce gene list stringency
- Try different pathway databases
- Map individual genes to pathways via Reactome
- Note as "No significant pathway enrichment"
If no drugs found
- Check if disease is rare/orphan
- Look for drugs targeting individual genes
- Check clinical trials for investigational therapies
- Note as "No approved drugs - novel therapeutic opportunity"