tooluniverse-rare-disease-diagnosis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Rare Disease Diagnosis Advisor

罕见病诊断顾问

Systematic diagnosis support for rare diseases using phenotype matching, gene panel prioritization, and variant interpretation across Orphanet, OMIM, HPO, ClinVar, and structure-based analysis.
KEY PRINCIPLES:
  1. Report-first approach - Create report file FIRST, update progressively
  2. Phenotype-driven - Convert symptoms to HPO terms before searching
  3. Multi-database triangulation - Cross-reference Orphanet, OMIM, OpenTargets
  4. Evidence grading - Grade diagnoses by supporting evidence strength
  5. Actionable output - Prioritized differential diagnosis with next steps
  6. Genetic counseling aware - Consider inheritance patterns and family history
  7. English-first queries - Always use English terms in tool calls (phenotype descriptions, gene names, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language

通过表型匹配、基因panel优先排序以及跨Orphanet、OMIM、HPO、ClinVar的变异解读和基于结构的分析,为罕见病提供系统性诊断支持。
核心原则:
  1. 报告优先方法 - 先创建报告文件,逐步更新内容
  2. 表型驱动 - 在搜索前将症状转换为HPO术语
  3. 多数据库交叉验证 - 交叉引用Orphanet、OMIM、OpenTargets数据
  4. 证据分级 - 根据支持证据的强度对诊断进行分级
  5. 可执行输出 - 附带下一步建议的优先鉴别诊断结果
  6. 考虑遗传咨询 - 结合遗传模式和家族病史
  7. 英文优先查询 - 工具调用中始终使用英文术语(表型描述、基因名称、疾病名称),即使用户使用其他语言提问。仅在无法匹配时尝试使用原语言术语。以用户使用的语言回复

When to Use

使用场景

Apply when user asks:
  • "Patient has [symptoms], what rare disease could this be?"
  • "Unexplained developmental delay with [features]"
  • "WES found VUS in [gene], is this pathogenic?"
  • "What genes should we test for [phenotype]?"
  • "Differential diagnosis for [rare symptom combination]"

当用户提出以下问题时适用:
  • "患者有[症状],可能是什么罕见病?"
  • "不明原因的发育迟缓伴[特征]"
  • "全外显子测序(WES)在[基因]中发现VUS,该变异是否致病?"
  • "针对[表型]我们应该检测哪些基因?"
  • "[罕见症状组合]的鉴别诊断"

Critical Workflow Requirements

关键工作流要求

1. Report-First Approach (MANDATORY)

1. 报告优先方法(强制要求)

  1. Create the report file FIRST:
    • File name:
      [PATIENT_ID]_rare_disease_report.md
    • Initialize with all section headers
    • Add placeholder text:
      [Researching...]
  2. Progressively update as you gather data
  3. Output separate data files:
    • [PATIENT_ID]_gene_panel.csv
      - Prioritized genes for testing
    • [PATIENT_ID]_variant_interpretation.csv
      - If variants provided
  1. 先创建报告文件:
    • 文件名:
      [PATIENT_ID]_rare_disease_report.md
    • 初始化所有章节标题
    • 添加占位文本:
      [研究中...]
  2. 逐步更新内容:随着数据收集逐步完善报告
  3. 输出独立数据文件:
    • [PATIENT_ID]_gene_panel.csv
      - 优先推荐的检测基因列表
    • [PATIENT_ID]_variant_interpretation.csv
      - 若提供了变异信息则生成该文件

2. Citation Requirements (MANDATORY)

2. 引用要求(强制要求)

Every finding MUST include source:
markdown
undefined
所有发现必须标注来源:
markdown
undefined

Candidate Disease: Marfan Syndrome

候选疾病:马凡综合征

  • ORPHA: ORPHA:558
  • OMIM: 154700
  • Phenotype match: 85% (17/20 HPO terms)
  • Inheritance: AD
  • Gene: FBN1
Source: Orphanet via
Orphanet_558
, OMIM via
OMIM_get_entry

---
  • ORPHA: ORPHA:558
  • OMIM: 154700
  • 表型匹配度: 85% (17/20个HPO术语匹配)
  • 遗传模式: AD(常染色体显性遗传)
  • 致病基因: FBN1
来源: Orphanet via
Orphanet_558
, OMIM via
OMIM_get_entry

---

Phase 0: Tool Verification

阶段0:工具参数验证

CRITICAL: Verify tool parameters before calling.
关键: 在调用工具前务必验证参数正确性。

Known Parameter Corrections

已知参数修正

ToolWRONG ParameterCORRECT Parameter
OpenTargets_get_associated_diseases_by_target_ensemblId
ensemblID
ensemblId
ClinVar_get_variant_by_id
variant_id
id
MyGene_query_genes
gene
q
gnomAD_get_variant_frequencies
variant
variant_id

工具错误参数正确参数
OpenTargets_get_associated_diseases_by_target_ensemblId
ensemblID
ensemblId
ClinVar_get_variant_by_id
variant_id
id
MyGene_query_genes
gene
q
gnomAD_get_variant_frequencies
variant
variant_id

Workflow Overview

工作流概述

Phase 1: Phenotype Standardization
├── Convert symptoms to HPO terms
├── Identify core vs. variable features
└── Note age of onset, inheritance hints
Phase 2: Disease Matching
├── Orphanet phenotype search
├── OMIM clinical synopsis match
├── OpenTargets disease associations
└── OUTPUT: Ranked differential diagnosis
Phase 3: Gene Panel Identification
├── Extract genes from top diseases
├── Cross-reference expression (GTEx)
├── Prioritize by evidence strength
└── OUTPUT: Recommended gene panel
Phase 3.5: Expression & Tissue Context (NEW)
├── CELLxGENE: Cell-type specific expression
├── ChIPAtlas: Regulatory context (TF binding)
├── Tissue-specific gene networks
└── OUTPUT: Expression validation
Phase 3.6: Pathway Analysis (NEW)
├── KEGG: Metabolic/signaling pathways
├── Reactome: Biological processes
├── IntAct: Protein-protein interactions
└── OUTPUT: Biological context
Phase 4: Variant Interpretation (if provided)
├── ClinVar pathogenicity lookup
├── gnomAD population frequency
├── Protein domain/function impact
├── ENCODE/ChIPAtlas: Regulatory variant impact
└── OUTPUT: Variant classification
Phase 5: Structure Analysis (for VUS)
├── NvidiaNIM_alphafold2 → Predict structure
├── Map variant to structure
├── Assess functional domain impact
└── OUTPUT: Structural evidence
Phase 6: Literature Evidence (NEW)
├── PubMed: Published studies
├── BioRxiv/MedRxiv: Preprints
├── OpenAlex: Citation analysis
└── OUTPUT: Literature support
Phase 7: Report Synthesis
├── Prioritized differential diagnosis
├── Recommended genetic testing
├── Next steps for clinician
└── OUTPUT: Final report

阶段1:表型标准化
├── 将症状转换为HPO术语
├── 区分核心特征与可变特征
└── 记录发病年龄、遗传提示信息
阶段2:疾病匹配
├── Orphanet表型搜索
├── OMIM临床概要匹配
├── OpenTargets疾病关联分析
└── 输出:排序后的鉴别诊断列表
阶段3:基因Panel确定
├── 从排名靠前的疾病中提取基因
├── 交叉验证基因表达(GTEx)
├── 根据证据强度优先排序
└── 输出:推荐的基因检测Panel
阶段3.5:表达与组织背景分析(新增)
├── CELLxGENE:细胞类型特异性表达分析
├── ChIPAtlas:调控背景(转录因子结合)
├── 组织特异性基因网络分析
└── 输出:表达验证结果
阶段3.6:通路分析(新增)
├── KEGG:代谢/信号通路分析
├── Reactome:生物学过程分析
├── IntAct:蛋白质-蛋白质相互作用分析
└── 输出:生物学背景信息
阶段4:变异解读(若提供变异信息)
├── ClinVar致病性查询
├── gnomAD人群频率分析
├── 蛋白质结构域/功能影响评估
├── ENCODE/ChIPAtlas:调控变异影响分析
└── 输出:变异分类结果
阶段5:VUS结构分析
├── NvidiaNIM_alphafold2 → 预测蛋白质结构
├── 将变异映射到结构上
├── 评估功能结构域的影响
└── 输出:结构证据
阶段6:文献证据分析(新增)
├── PubMed:已发表研究
├── BioRxiv/MedRxiv:预印本
├── OpenAlex:引用分析
└── 输出:文献支持证据
阶段7:报告合成
├── 优先排序的鉴别诊断结果
├── 推荐的基因检测方案
├── 临床医生下一步行动建议
└── 输出:最终报告

Phase 1: Phenotype Standardization

阶段1:表型标准化

1.1 Convert Symptoms to HPO Terms

1.1 症状转HPO术语

python
def standardize_phenotype(tu, symptoms_list):
    """Convert clinical descriptions to HPO terms."""
    hpo_terms = []
    
    for symptom in symptoms_list:
        # Search HPO for matching terms
        results = tu.tools.HPO_search_terms(query=symptom)
        if results:
            hpo_terms.append({
                'original': symptom,
                'hpo_id': results[0]['id'],
                'hpo_name': results[0]['name'],
                'confidence': 'exact' if symptom.lower() in results[0]['name'].lower() else 'partial'
            })
    
    return hpo_terms
python
def standardize_phenotype(tu, symptoms_list):
    """将临床描述转换为HPO术语。"""
    hpo_terms = []
    
    for symptom in symptoms_list:
        # 搜索HPO匹配术语
        results = tu.tools.HPO_search_terms(query=symptom)
        if results:
            hpo_terms.append({
                'original': symptom,
                'hpo_id': results[0]['id'],
                'hpo_name': results[0]['name'],
                'confidence': 'exact' if symptom.lower() in results[0]['name'].lower() else 'partial'
            })
    
    return hpo_terms

1.2 Phenotype Categories

1.2 表型分类

CategoryExamplesWeight
Core featuresAlways present in diseaseHigh
Variable featuresPresent in >50%Medium
Occasional featuresPresent in <50%Low
Age-specificOnset-dependentContext
分类示例权重
核心特征疾病中始终存在的特征
可变特征出现率>50%的特征
偶发特征出现率<50%的特征
年龄特异性特征与发病年龄相关的特征上下文相关

1.3 Output for Report

1.3 报告输出内容

markdown
undefined
markdown
undefined

1. Phenotype Analysis

1. 表型分析

1.1 Standardized HPO Terms

1.1 标准化HPO术语

Clinical FeatureHPO TermHPO IDCategory
Tall statureTall statureHP:0000098Core
Long fingersArachnodactylyHP:0001166Core
Heart murmurCardiac murmurHP:0030148Variable
Joint hypermobilityJoint hypermobilityHP:0001382Core
Total HPO Terms: 8 Onset: Childhood Family History: Father with similar features (AD suspected)
Source: HPO via
HPO_search_terms

---
临床特征HPO术语HPO ID分类
身材高大Tall statureHP:0000098核心
细长指(蜘蛛指)ArachnodactylyHP:0001166核心
心脏杂音Cardiac murmurHP:0030148可变
关节过度活动Joint hypermobilityHP:0001382核心
HPO术语总数: 8 发病年龄: 儿童期 家族病史: 父亲有相似特征(疑似常染色体显性遗传)
来源: HPO via
HPO_search_terms

---

Phase 2: Disease Matching

阶段2:疾病匹配

2.1 Orphanet Disease Search (NEW TOOLS)

2.1 Orphanet疾病搜索(新增工具)

python
def match_diseases_orphanet(tu, symptom_keywords):
    """Find rare diseases matching symptoms using Orphanet."""
    candidate_diseases = []
    
    # Search Orphanet by disease keywords
    for keyword in symptom_keywords:
        results = tu.tools.Orphanet_search_diseases(
            operation="search_diseases",
            query=keyword
        )
        if results.get('status') == 'success':
            candidate_diseases.extend(results['data']['results'])
    
    # Get genes for each disease
    for disease in candidate_diseases:
        orpha_code = disease.get('ORPHAcode')
        genes = tu.tools.Orphanet_get_genes(
            operation="get_genes",
            orpha_code=orpha_code
        )
        disease['genes'] = genes.get('data', {}).get('genes', [])
    
    return deduplicate_and_rank(candidate_diseases)
python
def match_diseases_orphanet(tu, symptom_keywords):
    """使用Orphanet查找与症状匹配的罕见病。"""
    candidate_diseases = []
    
    # 按疾病关键词搜索Orphanet
    for keyword in symptom_keywords:
        results = tu.tools.Orphanet_search_diseases(
            operation="search_diseases",
            query=keyword
        )
        if results.get('status') == 'success':
            candidate_diseases.extend(results['data']['results'])
    
    # 获取每种疾病对应的基因
    for disease in candidate_diseases:
        orpha_code = disease.get('ORPHAcode')
        genes = tu.tools.Orphanet_get_genes(
            operation="get_genes",
            orpha_code=orpha_code
        )
        disease['genes'] = genes.get('data', {}).get('genes', [])
    
    return deduplicate_and_rank(candidate_diseases)

2.2 OMIM Cross-Reference (NEW TOOLS)

2.2 OMIM交叉验证(新增工具)

python
def cross_reference_omim(tu, orphanet_diseases, gene_symbols):
    """Get OMIM details for diseases and genes."""
    omim_data = {}
    
    # Search OMIM for each disease/gene
    for gene in gene_symbols:
        search_result = tu.tools.OMIM_search(
            operation="search",
            query=gene,
            limit=5
        )
        if search_result.get('status') == 'success':
            for entry in search_result['data'].get('entries', []):
                mim_number = entry.get('mimNumber')
                
                # Get detailed entry
                details = tu.tools.OMIM_get_entry(
                    operation="get_entry",
                    mim_number=str(mim_number)
                )
                
                # Get clinical synopsis (phenotype features)
                synopsis = tu.tools.OMIM_get_clinical_synopsis(
                    operation="get_clinical_synopsis",
                    mim_number=str(mim_number)
                )
                
                omim_data[gene] = {
                    'mim_number': mim_number,
                    'details': details.get('data', {}),
                    'clinical_synopsis': synopsis.get('data', {})
                }
    
    return omim_data
python
def cross_reference_omim(tu, orphanet_diseases, gene_symbols):
    """获取疾病和基因的OMIM详细信息。"""
    omim_data = {}
    
    # 搜索每种疾病/基因的OMIM数据
    for gene in gene_symbols:
        search_result = tu.tools.OMIM_search(
            operation="search",
            query=gene,
            limit=5
        )
        if search_result.get('status') == 'success':
            for entry in search_result['data'].get('entries', []):
                mim_number = entry.get('mimNumber')
                
                # 获取详细条目
                details = tu.tools.OMIM_get_entry(
                    operation="get_entry",
                    mim_number=str(mim_number)
                )
                
                # 获取临床概要(表型特征)
                synopsis = tu.tools.OMIM_get_clinical_synopsis(
                    operation="get_clinical_synopsis",
                    mim_number=str(mim_number)
                )
                
                omim_data[gene] = {
                    'mim_number': mim_number,
                    'details': details.get('data', {}),
                    'clinical_synopsis': synopsis.get('data', {})
                }
    
    return omim_data

2.3 DisGeNET Gene-Disease Associations (NEW TOOLS)

2.3 DisGeNET基因-疾病关联分析(新增工具)

python
def get_gene_disease_associations(tu, gene_symbols):
    """Get gene-disease associations from DisGeNET."""
    associations = {}
    
    for gene in gene_symbols:
        # Get diseases associated with gene
        result = tu.tools.DisGeNET_search_gene(
            operation="search_gene",
            gene=gene,
            limit=20
        )
        
        if result.get('status') == 'success':
            associations[gene] = result['data'].get('associations', [])
    
    return associations

def get_disease_genes_disgenet(tu, disease_name):
    """Get all genes associated with a disease."""
    result = tu.tools.DisGeNET_search_disease(
        operation="search_disease",
        disease=disease_name,
        limit=30
    )
    return result.get('data', {}).get('associations', [])
python
def get_gene_disease_associations(tu, gene_symbols):
    """从DisGeNET获取基因-疾病关联信息。"""
    associations = {}
    
    for gene in gene_symbols:
        # 获取与基因关联的疾病
        result = tu.tools.DisGeNET_search_gene(
            operation="search_gene",
            gene=gene,
            limit=20
        )
        
        if result.get('status') == 'success':
            associations[gene] = result['data'].get('associations', [])
    
    return associations

def get_disease_genes_disgenet(tu, disease_name):
    """获取与疾病相关的所有基因。"""
    result = tu.tools.DisGeNET_search_disease(
        operation="search_disease",
        disease=disease_name,
        limit=30
    )
    return result.get('data', {}).get('associations', [])

2.4 Phenotype Overlap Scoring

2.4 表型重叠评分

Match LevelScoreCriteria
Excellent>80%Most core + variable features match
Good60-80%Core features match, some variable
Possible40-60%Some overlap, needs consideration
Unlikely<40%Poor phenotype fit
匹配等级分数标准
优秀>80%大部分核心+可变特征匹配
良好60-80%核心特征匹配,部分可变特征匹配
可能40-60%存在部分重叠,需进一步考虑
** unlikely**<40%表型匹配度差

2.5 Output for Report

2.5 报告输出内容

markdown
undefined
markdown
undefined

2. Differential Diagnosis

2. 鉴别诊断

Top Candidate Diseases (Ranked by Phenotype Match)

优先候选疾病(按表型匹配度排序)

RankDiseaseORPHAOMIMMatchInheritanceKey Gene(s)
1Marfan syndrome55815470085%ADFBN1
2Loeys-Dietz syndrome6003060919272%ADTGFBR1, TGFBR2
3Ehlers-Danlos, vascular28613005065%ADCOL3A1
4Homocystinuria39423620058%ARCBS
排名疾病ORPHAOMIM匹配度遗传模式关键基因
1马凡综合征55815470085%ADFBN1
2Loeys-Dietz综合征6003060919272%ADTGFBR1, TGFBR2
3血管型Ehlers-Danlos综合征28613005065%ADCOL3A1
4高同型半胱氨酸尿症39423620058%ARCBS

DisGeNET Gene-Disease Evidence

DisGeNET基因-疾病证据

GeneAssociated DiseasesGDA ScoreEvidence
FBN1Marfan syndrome, MASS phenotype0.95★★★ Curated
TGFBR1Loeys-Dietz syndrome0.89★★★ Curated
COL3A1vascular EDS0.91★★★ Curated
Source: DisGeNET via
DisGeNET_search_gene
基因关联疾病GDA评分证据等级
FBN1马凡综合征、MASS表型0.95★★★ 已验证
TGFBR1Loeys-Dietz综合征0.89★★★ 已验证
COL3A1血管型EDS0.91★★★ 已验证
来源: DisGeNET via
DisGeNET_search_gene

Disease Details

疾病详情

1. Marfan Syndrome (★★★)

1. 马凡综合征(★★★)

ORPHA: 558 | OMIM: 154700 | Prevalence: 1-5/10,000
Phenotype Match Analysis:
Patient FeatureDisease FeatureMatch
Tall staturePresent in 95%
ArachnodactylyPresent in 90%
Joint hypermobilityPresent in 85%
Cardiac murmurAortic root dilation (70%)Partial
OMIM Clinical Synopsis (via
OMIM_get_clinical_synopsis
):
  • Cardiovascular: Aortic root dilation, mitral valve prolapse
  • Skeletal: Scoliosis, pectus excavatum, tall stature
  • Ocular: Ectopia lentis, myopia
Diagnostic Criteria: Ghent nosology (2010)
  • Aortic root dilation/dissection + FBN1 mutation = Diagnosis
  • Without genetic testing: systemic score ≥7 + ectopia lentis
Inheritance: Autosomal dominant (25% de novo)
Source: Orphanet via
Orphanet_get_disease
, OMIM via
OMIM_get_entry
, DisGeNET

---
ORPHA: 558 | OMIM: 154700 | 患病率: 1-5/10,000
表型匹配分析:
患者特征疾病特征匹配情况
身材高大95%患者存在
蜘蛛指90%患者存在
关节过度活动85%患者存在
心脏杂音主动脉根部扩张(70%患者)部分匹配
OMIM临床概要 (via
OMIM_get_clinical_synopsis
):
  • 心血管系统: 主动脉根部扩张、二尖瓣脱垂
  • 骨骼系统: 脊柱侧凸、漏斗胸、身材高大
  • 眼部: 晶状体异位、近视
诊断标准: Ghent分类标准(2010版)
  • 主动脉根部扩张/夹层 + FBN1突变 = 确诊
  • 无基因检测结果时:系统评分≥7 + 晶状体异位
遗传模式: 常染色体显性遗传(25%为新发突变)
Source: Orphanet via
Orphanet_get_disease
, OMIM via
OMIM_get_entry
, DisGeNET

---

Phase 3: Gene Panel Identification

阶段3:基因Panel确定

3.1 Extract Disease Genes

3.1 提取疾病相关基因

python
def build_gene_panel(tu, candidate_diseases):
    """Build prioritized gene panel from candidate diseases."""
    genes = {}
    
    for disease in candidate_diseases:
        for gene in disease['genes']:
            if gene not in genes:
                genes[gene] = {
                    'symbol': gene,
                    'diseases': [],
                    'evidence_level': 'unknown'
                }
            genes[gene]['diseases'].append(disease['name'])
    
    return genes
python
def build_gene_panel(tu, candidate_diseases):
    """从候选疾病中构建优先推荐的基因检测Panel。"""
    genes = {}
    
    for disease in candidate_diseases:
        for gene in disease['genes']:
            if gene not in genes:
                genes[gene] = {
                    'symbol': gene,
                    'diseases': [],
                    'evidence_level': 'unknown'
                }
            genes[gene]['diseases'].append(disease['name'])
    
    return genes

3.1.1 ClinGen Gene-Disease Validity Check (NEW)

3.1.1 ClinGen基因-疾病有效性验证(新增)

Critical: Always verify gene-disease validity through ClinGen before including in panel.
python
def get_clingen_gene_evidence(tu, gene_symbol):
    """
    Get ClinGen gene-disease validity and dosage sensitivity.
    ESSENTIAL for rare disease gene panel prioritization.
    """
    
    # 1. Gene-disease validity classification
    validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
    
    validity_levels = []
    diseases_with_validity = []
    if validity.get('data'):
        for entry in validity.get('data', []):
            validity_levels.append(entry.get('Classification'))
            diseases_with_validity.append({
                'disease': entry.get('Disease Label'),
                'mondo_id': entry.get('Disease ID (MONDO)'),
                'classification': entry.get('Classification'),
                'inheritance': entry.get('Inheritance')
            })
    
    # 2. Dosage sensitivity (critical for CNV interpretation)
    dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
    
    hi_score = None
    ts_score = None
    if dosage.get('data'):
        for entry in dosage.get('data', []):
            hi_score = entry.get('Haploinsufficiency Score')
            ts_score = entry.get('Triplosensitivity Score')
            break
    
    # 3. Clinical actionability (return of findings context)
    actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
    is_actionable = (actionability.get('adult_count', 0) > 0 or 
                     actionability.get('pediatric_count', 0) > 0)
    
    # Determine best evidence level
    level_priority = ['Definitive', 'Strong', 'Moderate', 'Limited', 'Disputed', 'Refuted']
    best_level = 'Not curated'
    for level in level_priority:
        if level in validity_levels:
            best_level = level
            break
    
    return {
        'gene': gene_symbol,
        'evidence_level': best_level,
        'diseases_curated': diseases_with_validity,
        'haploinsufficiency_score': hi_score,
        'triplosensitivity_score': ts_score,
        'is_actionable': is_actionable,
        'include_in_panel': best_level in ['Definitive', 'Strong', 'Moderate']
    }

def prioritize_genes_with_clingen(tu, gene_list):
    """Prioritize genes using ClinGen evidence levels."""
    
    prioritized = []
    for gene in gene_list:
        evidence = get_clingen_gene_evidence(tu, gene)
        
        # Score based on ClinGen classification
        score = 0
        if evidence['evidence_level'] == 'Definitive':
            score = 5
        elif evidence['evidence_level'] == 'Strong':
            score = 4
        elif evidence['evidence_level'] == 'Moderate':
            score = 3
        elif evidence['evidence_level'] == 'Limited':
            score = 1
        # Disputed/Refuted get 0
        
        # Bonus for haploinsufficiency score 3
        if evidence['haploinsufficiency_score'] == '3':
            score += 1
        
        # Bonus for actionability
        if evidence['is_actionable']:
            score += 1
        
        prioritized.append({
            **evidence,
            'priority_score': score
        })
    
    # Sort by priority score
    return sorted(prioritized, key=lambda x: x['priority_score'], reverse=True)
ClinGen Classification Impact on Panel:
ClassificationInclude in Panel?Priority
DefinitiveYES - mandatoryHighest
StrongYES - highly recommendedHigh
ModerateYESMedium
LimitedInclude but flagLow
DisputedExclude or separateAvoid
RefutedEXCLUDEDo not test
Not curatedUse other evidenceVariable
关键: 在将基因纳入检测Panel前,务必通过ClinGen验证基因-疾病的有效性。
python
def get_clingen_gene_evidence(tu, gene_symbol):
    """
    获取ClinGen基因-疾病有效性和剂量敏感性信息。
    这是罕见病基因Panel优先排序的关键步骤。
    """
    
    # 1. 基因-疾病有效性分类
    validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
    
    validity_levels = []
    diseases_with_validity = []
    if validity.get('data'):
        for entry in validity.get('data', []):
            validity_levels.append(entry.get('Classification'))
            diseases_with_validity.append({
                'disease': entry.get('Disease Label'),
                'mondo_id': entry.get('Disease ID (MONDO)'),
                'classification': entry.get('Classification'),
                'inheritance': entry.get('Inheritance')
            })
    
    # 2. 剂量敏感性(对CNV解读至关重要)
    dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
    
    hi_score = None
    ts_score = None
    if dosage.get('data'):
        for entry in dosage.get('data', []):
            hi_score = entry.get('Haploinsufficiency Score')
            ts_score = entry.get('Triplosensitivity Score')
            break
    
    # 3. 临床可操作性(结果返回的上下文信息)
    actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
    is_actionable = (actionability.get('adult_count', 0) > 0 or 
                     actionability.get('pediatric_count', 0) > 0)
    
    # 确定最佳证据等级
    level_priority = ['Definitive', 'Strong', 'Moderate', 'Limited', 'Disputed', 'Refuted']
    best_level = 'Not curated'
    for level in level_priority:
        if level in validity_levels:
            best_level = level
            break
    
    return {
        'gene': gene_symbol,
        'evidence_level': best_level,
        'diseases_curated': diseases_with_validity,
        'haploinsufficiency_score': hi_score,
        'triplosensitivity_score': ts_score,
        'is_actionable': is_actionable,
        'include_in_panel': best_level in ['Definitive', 'Strong', 'Moderate']
    }

def prioritize_genes_with_clingen(tu, gene_list):
    """使用ClinGen证据等级对基因进行优先排序。"""
    
    prioritized = []
    for gene in gene_list:
        evidence = get_clingen_gene_evidence(tu, gene)
        
        # 根据ClinGen分类评分
        score = 0
        if evidence['evidence_level'] == 'Definitive':
            score = 5
        elif evidence['evidence_level'] == 'Strong':
            score = 4
        elif evidence['evidence_level'] == 'Moderate':
            score = 3
        elif evidence['evidence_level'] == 'Limited':
            score = 1
        # Disputed/Refuted得0分
        
        # 单倍剂量不足评分为3时加分
        if evidence['haploinsufficiency_score'] == '3':
            score += 1
        
        # 可操作性加分
        if evidence['is_actionable']:
            score += 1
        
        prioritized.append({
            **evidence,
            'priority_score': score
        })
    
    # 按优先级评分排序
    return sorted(prioritized, key=lambda x: x['priority_score'], reverse=True)
ClinGen分类对Panel的影响:
分类是否纳入Panel?优先级
Definitive是 - 强制纳入最高
Strong是 - 强烈推荐
Moderate
Limited可纳入但需标注
Disputed排除或单独列出避免
Refuted排除不检测
Not curated参考其他证据可变

3.2 Gene Prioritization Criteria

3.2 基因优先排序标准

PriorityCriteriaPoints
Tier 1Gene causes #1 ranked disease+5
Tier 2Gene causes multiple candidates+3
Tier 3ClinGen "Definitive" evidence+3
Tier 4Expressed in affected tissue+2
Tier 5Constraint score pLI >0.9+1
优先级标准分数
Tier 1基因为排名第1的疾病的致病基因+5
Tier 2基因与多个候选疾病相关+3
Tier 3ClinGen "Definitive"证据+3
Tier 4基因在受累组织中表达+2
Tier 5约束评分pLI >0.9+1

3.3 Expression Validation

3.3 表达验证

python
def validate_expression(tu, gene_symbol, affected_tissue):
    """Check if gene is expressed in relevant tissue."""
    # Get Ensembl ID
    gene_info = tu.tools.MyGene_query_genes(q=gene_symbol, species="human")
    ensembl_id = gene_info.get('ensembl', {}).get('gene')
    
    # Check GTEx expression
    expression = tu.tools.GTEx_get_median_gene_expression(
        gencode_id=f"{ensembl_id}.latest"
    )
    
    return expression.get(affected_tissue, 0) > 1  # TPM > 1
python
def validate_expression(tu, gene_symbol, affected_tissue):
    """验证基因是否在相关组织中表达。"""
    # 获取Ensembl ID
    gene_info = tu.tools.MyGene_query_genes(q=gene_symbol, species="human")
    ensembl_id = gene_info.get('ensembl', {}).get('gene')
    
    # 检查GTEx表达数据
    expression = tu.tools.GTEx_get_median_gene_expression(
        gencode_id=f"{ensembl_id}.latest"
    )
    
    return expression.get(affected_tissue, 0) > 1  # TPM > 1

3.4 Output for Report

3.4 报告输出内容

markdown
undefined
markdown
undefined

3. Recommended Gene Panel

3. 推荐基因检测Panel

3.1 Prioritized Genes for Testing

3.1 优先推荐的检测基因

PriorityGeneDiseasesEvidenceConstraint (pLI)Expression
★★★FBN1Marfan syndromeDefinitive1.00Heart, aorta
★★★TGFBR1Loeys-Dietz 1Definitive0.98Ubiquitous
★★★TGFBR2Loeys-Dietz 2Definitive0.99Ubiquitous
★★☆COL3A1EDS vascularDefinitive1.00Connective tissue
★☆☆CBSHomocystinuriaDefinitive0.00Liver
优先级基因关联疾病证据等级约束评分(pLI)组织表达
★★★FBN1马凡综合征Definitive1.00心脏、主动脉
★★★TGFBR1Loeys-Dietz 1型Definitive0.98泛表达
★★★TGFBR2Loeys-Dietz 2型Definitive0.99泛表达
★★☆COL3A1血管型EDSDefinitive1.00结缔组织
★☆☆CBS高同型半胱氨酸尿症Definitive0.00肝脏

3.2 Panel Design Recommendation

3.2 Panel设计建议

Minimum Panel (high yield): FBN1, TGFBR1, TGFBR2, COL3A1 Extended Panel (+differential): Add CBS, SMAD3, ACTA2
Testing Strategy:
  1. Start with FBN1 sequencing (highest pre-test probability)
  2. If negative, proceed to full connective tissue panel
  3. Consider WES if panel negative
Source: ClinGen via gene-disease validity, GTEx expression

---
最小Panel(高检出率): FBN1, TGFBR1, TGFBR2, COL3A1 扩展Panel(覆盖更多鉴别诊断): 新增CBS, SMAD3, ACTA2
检测策略:
  1. 先进行FBN1测序(预检测概率最高)
  2. 若结果阴性,再进行完整结缔组织病Panel检测
  3. 若Panel检测阴性,考虑全外显子测序(WES)
Source: ClinGen via gene-disease validity, GTEx expression

---

Phase 3.5: Expression & Tissue Context (ENHANCED)

阶段3.5:表达与调控背景分析(增强版)

3.5.1 Cell-Type Specific Expression (CELLxGENE)

3.5.1 细胞类型特异性表达(CELLxGENE)

python
def get_cell_type_expression(tu, gene_symbol, affected_tissues):
    """Get single-cell expression to validate tissue relevance."""
    
    # Get expression across cell types
    expression = tu.tools.CELLxGENE_get_expression_data(
        gene=gene_symbol,
        tissue=affected_tissues[0] if affected_tissues else "all"
    )
    
    # Get cell type metadata
    cell_metadata = tu.tools.CELLxGENE_get_cell_metadata(
        gene=gene_symbol
    )
    
    # Identify high-expression cell types
    high_expression = [
        ct for ct in expression 
        if ct.get('mean_expression', 0) > 1.0  # TPM > 1
    ]
    
    return {
        'expression_data': expression,
        'high_expression_cells': high_expression,
        'total_cell_types': len(cell_metadata)
    }
Why it matters: Confirms candidate genes are expressed in disease-relevant tissues/cells.
python
def get_cell_type_expression(tu, gene_symbol, affected_tissues):
    """获取单细胞表达数据,验证组织相关性。"""
    
    # 获取细胞类型表达数据
    expression = tu.tools.CELLxGENE_get_expression_data(
        gene=gene_symbol,
        tissue=affected_tissues[0] if affected_tissues else "all"
    )
    
    # 获取细胞类型元数据
    cell_metadata = tu.tools.CELLxGENE_get_cell_metadata(
        gene=gene_symbol
    )
    
    # 筛选高表达细胞类型
    high_expression = [
        ct for ct in expression 
        if ct.get('mean_expression', 0) > 1.0  # TPM > 1
    ]
    
    return {
        'expression_data': expression,
        'high_expression_cells': high_expression,
        'total_cell_types': len(cell_metadata)
    }
重要性: 确认候选基因在疾病相关组织/细胞中表达,支持其作为致病基因的可能性。

3.5.2 Regulatory Context (ChIPAtlas)

3.5.2 调控背景分析(ChIPAtlas)

python
def get_regulatory_context(tu, gene_symbol):
    """Get transcription factor binding for candidate genes."""
    
    # Search for TF binding near gene
    tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
        gene=gene_symbol,
        cell_type="all"
    )
    
    # Get specific binding peaks
    peaks = tu.tools.ChIPAtlas_get_peak_data(
        gene=gene_symbol,
        experiment_type="TF"
    )
    
    return {
        'transcription_factors': tf_binding,
        'regulatory_peaks': peaks
    }
Why it matters: Identifies regulatory mechanisms that may be disrupted in disease.
python
def get_regulatory_context(tu, gene_symbol):
    """获取候选基因的转录因子结合信息。"""
    
    # 搜索基因附近的转录因子结合位点
    tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
        gene=gene_symbol,
        cell_type="all"
    )
    
    # 获取具体结合峰数据
    peaks = tu.tools.ChIPAtlas_get_peak_data(
        gene=gene_symbol,
        experiment_type="TF"
    )
    
    return {
        'transcription_factors': tf_binding,
        'regulatory_peaks': peaks
    }
重要性: 识别可能在疾病中被破坏的调控机制。

3.5.3 Output for Report

3.5.3 报告输出内容

markdown
undefined
markdown
undefined

3.5 Expression & Regulatory Context

3.5 表达与调控背景

Cell-Type Specific Expression (CELLxGENE)

细胞类型特异性表达(CELLxGENE)

GeneTop Expressing Cell TypesExpression LevelTissue Relevance
FBN1Fibroblasts, Smooth muscleHigh (TPM=45)✓ Connective tissue
TGFBR1Endothelial, FibroblastsMedium (TPM=12)✓ Vascular
COL3A1Fibroblasts, MyofibroblastsVery High (TPM=120)✓ Connective tissue
Interpretation: All top candidate genes show high expression in disease-relevant cell types (connective tissue, vascular cells), supporting their candidacy.
基因高表达细胞类型表达水平组织相关性
FBN1成纤维细胞、平滑肌细胞高(TPM=45)✓ 结缔组织
TGFBR1内皮细胞、成纤维细胞中(TPM=12)✓ 血管
COL3A1成纤维细胞、肌成纤维细胞极高(TPM=120)✓ 结缔组织
解读: 所有顶级候选基因在疾病相关细胞类型(结缔组织、血管细胞)中均呈高表达,支持其作为致病基因的合理性。

Regulatory Context (ChIPAtlas)

调控背景(ChIPAtlas)

GeneKey TF RegulatorsRegulatory Significance
FBN1TGFβ pathway (SMAD2/3), AP-1TGFβ-responsive
TGFBR1STAT3, NF-κBInflammation-responsive
Source: CELLxGENE Census, ChIPAtlas

---
基因关键转录因子调控因子调控意义
FBN1TGFβ通路(SMAD2/3), AP-1TGFβ响应基因
TGFBR1STAT3, NF-κB炎症响应基因
Source: CELLxGENE Census, ChIPAtlas

---

Phase 3.6: Pathway Analysis (NEW)

阶段3.6:通路分析(新增)

3.6.1 KEGG Pathway Context

3.6.1 KEGG通路背景

python
def get_pathway_context(tu, gene_symbols):
    """Get pathway context for candidate genes."""
    
    pathways = {}
    for gene in gene_symbols:
        # Search KEGG for gene
        kegg_genes = tu.tools.kegg_find_genes(query=f"hsa:{gene}")
        
        if kegg_genes:
            # Get pathway membership
            gene_info = tu.tools.kegg_get_gene_info(gene_id=kegg_genes[0]['id'])
            pathways[gene] = gene_info.get('pathways', [])
    
    return pathways
python
def get_pathway_context(tu, gene_symbols):
    """获取候选基因的通路背景信息。"""
    
    pathways = {}
    for gene in gene_symbols:
        # 搜索KEGG基因信息
        kegg_genes = tu.tools.kegg_find_genes(query=f"hsa:{gene}")
        
        if kegg_genes:
            # 获取通路成员信息
            gene_info = tu.tools.kegg_get_gene_info(gene_id=kegg_genes[0]['id'])
            pathways[gene] = gene_info.get('pathways', [])
    
    return pathways

3.6.2 Protein-Protein Interactions (IntAct)

3.6.2 蛋白质-蛋白质相互作用(IntAct)

python
def get_protein_interactions(tu, gene_symbol):
    """Get interaction partners for candidate genes."""
    
    # Search IntAct for interactions
    interactions = tu.tools.intact_search_interactions(
        query=gene_symbol,
        species="human"
    )
    
    # Get interaction network
    network = tu.tools.intact_get_interaction_network(
        gene=gene_symbol,
        depth=1  # Direct interactors only
    )
    
    return {
        'interactions': interactions,
        'network': network,
        'interactor_count': len(interactions)
    }
python
def get_protein_interactions(tu, gene_symbol):
    """获取候选基因的相互作用蛋白。"""
    
    # 搜索IntAct相互作用数据
    interactions = tu.tools.intact_search_interactions(
        query=gene_symbol,
        species="human"
    )
    
    # 获取相互作用网络
    network = tu.tools.intact_get_interaction_network(
        gene=gene_symbol,
        depth=1  # 仅直接相互作用蛋白
    )
    
    return {
        'interactions': interactions,
        'network': network,
        'interactor_count': len(interactions)
    }

3.6.3 Output for Report

3.6.3 报告输出内容

markdown
undefined
markdown
undefined

3.6 Pathway & Network Context

3.6 通路与网络背景

KEGG Pathways

KEGG通路

GeneKey PathwaysBiological Process
FBN1ECM-receptor interaction (hsa04512)Extracellular matrix
TGFBR1/2TGF-beta signaling (hsa04350)Cell signaling
COL3A1Focal adhesion (hsa04510)Cell-matrix adhesion
基因关键通路生物学过程
FBN1ECM-受体相互作用(hsa04512)细胞外基质
TGFBR1/2TGF-beta信号通路(hsa04350)细胞信号传导
COL3A1黏着斑(hsa04510)细胞-基质黏附

Shared Pathway Analysis

共享通路分析

Convergent pathways (≥2 candidate genes):
  • TGF-beta signaling pathway: FBN1, TGFBR1, TGFBR2, SMAD3
  • ECM organization: FBN1, COL3A1
Interpretation: Candidate genes converge on TGF-beta signaling and extracellular matrix pathways, consistent with connective tissue disorder etiology.
汇聚通路(≥2个候选基因参与):
  • TGF-beta信号通路: FBN1, TGFBR1, TGFBR2, SMAD3
  • ECM组织: FBN1, COL3A1
解读: 候选基因汇聚于TGF-beta信号通路和细胞外基质通路,与结缔组织疾病的病因一致。

Protein-Protein Interactions (IntAct)

蛋白质-蛋白质相互作用(IntAct)

GeneDirect InteractorsNotable Partners
FBN142LTBP1, TGFB1, ADAMTS10
TGFBR168TGFBR2, SMAD2, SMAD3
Source: KEGG, IntAct, Reactome

---
基因直接相互作用蛋白重要相互作用伙伴
FBN142个LTBP1, TGFB1, ADAMTS10
TGFBR168个TGFBR2, SMAD2, SMAD3
Source: KEGG, IntAct, Reactome

---

Phase 4: Variant Interpretation (If Provided)

阶段4:变异解读(若提供变异信息)

4.1 ClinVar Lookup

4.1 ClinVar查询

python
def interpret_variant(tu, variant_hgvs):
    """Get ClinVar interpretation for variant."""
    result = tu.tools.ClinVar_search_variants(query=variant_hgvs)
    
    return {
        'clinvar_id': result.get('id'),
        'classification': result.get('clinical_significance'),
        'review_status': result.get('review_status'),
        'conditions': result.get('conditions'),
        'last_evaluated': result.get('last_evaluated')
    }
python
def interpret_variant(tu, variant_hgvs):
    """获取ClinVar对变异的解读。"""
    result = tu.tools.ClinVar_search_variants(query=variant_hgvs)
    
    return {
        'clinvar_id': result.get('id'),
        'classification': result.get('clinical_significance'),
        'review_status': result.get('review_status'),
        'conditions': result.get('conditions'),
        'last_evaluated': result.get('last_evaluated')
    }

4.2 Population Frequency

4.2 人群频率分析

python
def check_population_frequency(tu, variant_id):
    """Get gnomAD allele frequency."""
    freq = tu.tools.gnomAD_get_variant_frequencies(variant_id=variant_id)
    
    # Interpret rarity
    if freq['allele_frequency'] < 0.00001:
        rarity = "Ultra-rare"
    elif freq['allele_frequency'] < 0.0001:
        rarity = "Rare"
    elif freq['allele_frequency'] < 0.01:
        rarity = "Low frequency"
    else:
        rarity = "Common (likely benign)"
    
    return freq, rarity
python
def check_population_frequency(tu, variant_id):
    """获取gnomAD等位基因频率。"""
    freq = tu.tools.gnomAD_get_variant_frequencies(variant_id=variant_id)
    
    # 解读稀有性
    if freq['allele_frequency'] < 0.00001:
        rarity = "Ultra-rare"
    elif freq['allele_frequency'] < 0.0001:
        rarity = "Rare"
    elif freq['allele_frequency'] < 0.01:
        rarity = "Low frequency"
    else:
        rarity = "Common (likely benign)"
    
    return freq, rarity

4.3 Computational Pathogenicity Prediction (ENHANCED)

4.3 计算致病性预测(增强版)

Use state-of-the-art prediction tools for VUS interpretation:
python
def comprehensive_vus_prediction(tu, variant_info):
    """
    Combine multiple prediction tools for VUS classification.
    Critical for rare disease variants not in ClinVar.
    """
    predictions = {}
    
    # 1. CADD - Deleteriousness (NEW API)
    cadd = tu.tools.CADD_get_variant_score(
        chrom=variant_info['chrom'],
        pos=variant_info['pos'],
        ref=variant_info['ref'],
        alt=variant_info['alt'],
        version="GRCh38-v1.7"
    )
    if cadd.get('status') == 'success':
        predictions['cadd'] = {
            'score': cadd['data'].get('phred_score'),
            'interpretation': cadd['data'].get('interpretation'),
            'acmg': 'PP3' if cadd['data'].get('phred_score', 0) >= 20 else 'neutral'
        }
    
    # 2. AlphaMissense - DeepMind pathogenicity (NEW)
    if variant_info.get('uniprot_id') and variant_info.get('aa_change'):
        am = tu.tools.AlphaMissense_get_variant_score(
            uniprot_id=variant_info['uniprot_id'],
            variant=variant_info['aa_change']  # e.g., "E1541K"
        )
        if am.get('status') == 'success' and am.get('data'):
            classification = am['data'].get('classification')
            predictions['alphamissense'] = {
                'score': am['data'].get('pathogenicity_score'),
                'classification': classification,
                'acmg': 'PP3 (strong)' if classification == 'pathogenic' else (
                    'BP4 (strong)' if classification == 'benign' else 'neutral'
                )
            }
    
    # 3. EVE - Evolutionary prediction (NEW)
    eve = tu.tools.EVE_get_variant_score(
        chrom=variant_info['chrom'],
        pos=variant_info['pos'],
        ref=variant_info['ref'],
        alt=variant_info['alt']
    )
    if eve.get('status') == 'success':
        eve_scores = eve['data'].get('eve_scores', [])
        if eve_scores:
            predictions['eve'] = {
                'score': eve_scores[0].get('eve_score'),
                'classification': eve_scores[0].get('classification'),
                'acmg': 'PP3' if eve_scores[0].get('eve_score', 0) > 0.5 else 'BP4'
            }
    
    # 4. SpliceAI - Splice variant prediction (NEW)
    # Use for intronic, synonymous, or exonic variants near splice sites
    variant_str = f"chr{variant_info['chrom']}-{variant_info['pos']}-{variant_info['ref']}-{variant_info['alt']}"
    splice = tu.tools.SpliceAI_predict_splice(
        variant=variant_str,
        genome="38"
    )
    if splice.get('data'):
        max_score = splice['data'].get('max_delta_score', 0)
        interpretation = splice['data'].get('interpretation', '')
        
        if max_score >= 0.8:
            splice_acmg = 'PP3 (strong) - high splice impact'
        elif max_score >= 0.5:
            splice_acmg = 'PP3 (moderate) - splice impact'
        elif max_score >= 0.2:
            splice_acmg = 'PP3 (supporting) - possible splice effect'
        else:
            splice_acmg = 'BP7 (if synonymous) - no splice impact'
        
        predictions['spliceai'] = {
            'max_delta_score': max_score,
            'interpretation': interpretation,
            'scores': splice['data'].get('scores', []),
            'acmg': splice_acmg
        }
    
    # Consensus for PP3/BP4
    damaging = sum(1 for p in predictions.values() if 'PP3' in p.get('acmg', ''))
    benign = sum(1 for p in predictions.values() if 'BP4' in p.get('acmg', ''))
    
    return {
        'predictions': predictions,
        'consensus': {
            'damaging_count': damaging,
            'benign_count': benign,
            'pp3_applicable': damaging >= 2 and benign == 0,
            'bp4_applicable': benign >= 2 and damaging == 0
        }
    }
使用最先进的预测工具进行VUS解读:
python
def comprehensive_vus_prediction(tu, variant_info):
    """
    结合多种预测工具进行VUS分类。
    这对数据库中未收录的罕见病变异至关重要。
    """
    predictions = {}
    
    # 1. CADD - 有害性预测(新增API)
    cadd = tu.tools.CADD_get_variant_score(
        chrom=variant_info['chrom'],
        pos=variant_info['pos'],
        ref=variant_info['ref'],
        alt=variant_info['alt'],
        version="GRCh38-v1.7"
    )
    if cadd.get('status') == 'success':
        predictions['cadd'] = {
            'score': cadd['data'].get('phred_score'),
            'interpretation': cadd['data'].get('interpretation'),
            'acmg': 'PP3' if cadd['data'].get('phred_score', 0) >= 20 else 'neutral'
        }
    
    # 2. AlphaMissense - DeepMind致病性预测(新增)
    if variant_info.get('uniprot_id') and variant_info.get('aa_change'):
        am = tu.tools.AlphaMissense_get_variant_score(
            uniprot_id=variant_info['uniprot_id'],
            variant=variant_info['aa_change']  # 例如: "E1541K"
        )
        if am.get('status') == 'success' and am.get('data'):
            classification = am['data'].get('classification')
            predictions['alphamissense'] = {
                'score': am['data'].get('pathogenicity_score'),
                'classification': classification,
                'acmg': 'PP3 (strong)' if classification == 'pathogenic' else (
                    'BP4 (strong)' if classification == 'benign' else 'neutral'
                )
            }
    
    # 3. EVE - 进化预测(新增)
    eve = tu.tools.EVE_get_variant_score(
        chrom=variant_info['chrom'],
        pos=variant_info['pos'],
        ref=variant_info['ref'],
        alt=variant_info['alt']
    )
    if eve.get('status') == 'success':
        eve_scores = eve['data'].get('eve_scores', [])
        if eve_scores:
            predictions['eve'] = {
                'score': eve_scores[0].get('eve_score'),
                'classification': eve_scores[0].get('classification'),
                'acmg': 'PP3' if eve_scores[0].get('eve_score', 0) > 0.5 else 'BP4'
            }
    
    # 4. SpliceAI - 剪接变异预测(新增)
    # 用于内含子、同义或剪接位点附近的外显子变异
    variant_str = f"chr{variant_info['chrom']}-{variant_info['pos']}-{variant_info['ref']}-{variant_info['alt']}"
    splice = tu.tools.SpliceAI_predict_splice(
        variant=variant_str,
        genome="38"
    )
    if splice.get('data'):
        max_score = splice['data'].get('max_delta_score', 0)
        interpretation = splice['data'].get('interpretation', '')
        
        if max_score >= 0.8:
            splice_acmg = 'PP3 (strong) - high splice impact'
        elif max_score >= 0.5:
            splice_acmg = 'PP3 (moderate) - splice impact'
        elif max_score >= 0.2:
            splice_acmg = 'PP3 (supporting) - possible splice effect'
        else:
            splice_acmg = 'BP7 (if synonymous) - no splice impact'
        
        predictions['spliceai'] = {
            'max_delta_score': max_score,
            'interpretation': interpretation,
            'scores': splice['data'].get('scores', []),
            'acmg': splice_acmg
        }
    
    # PP3/BP4共识
    damaging = sum(1 for p in predictions.values() if 'PP3' in p.get('acmg', ''))
    benign = sum(1 for p in predictions.values() if 'BP4' in p.get('acmg', ''))
    
    return {
        'predictions': predictions,
        'consensus': {
            'damaging_count': damaging,
            'benign_count': benign,
            'pp3_applicable': damaging >= 2 and benign == 0,
            'bp4_applicable': benign >= 2 and damaging == 0
        }
    }

4.4 ACMG Classification Criteria

4.4 ACMG分类标准

Evidence TypeCriteriaWeight
PVS1Null variant in gene where LOF is mechanismVery Strong
PS1Same amino acid change as established pathogenicStrong
PM2Absent from population databasesModerate
PP3Computational evidence supports deleterious (AlphaMissense, CADD, EVE, SpliceAI)Supporting
BA1Allele frequency >5%Benign standalone
Enhanced PP3 Evidence (NEW):
  • AlphaMissense pathogenic (>0.564) = Strong PP3 support (~90% accuracy)
  • CADD ≥20 + EVE >0.5 = Multiple concordant predictions
  • Agreement from 2+ predictors strengthens PP3 evidence
证据类型标准强度
PVS1基因功能缺失(LOF)为致病机制的基因中的无效变异极强
PS1与已明确致病的氨基酸改变相同
PM2人群数据库中未收录该变异中等
PP3计算证据支持有害(AlphaMissense、CADD、EVE、SpliceAI)支持
BA1等位基因频率>5%良性独立证据
增强版PP3证据(新增):
  • AlphaMissense pathogenic (>0.564) = 强PP3支持(约90%准确率)
  • CADD ≥20 + EVE >0.5 = 多种预测工具结果一致
  • 2种以上预测工具结果一致可增强PP3证据强度

4.5 Output for Report

4.5 报告输出内容

markdown
undefined
markdown
undefined

4. Variant Interpretation

4. 变异解读

4.1 Variant: FBN1 c.4621G>A (p.Glu1541Lys)

4.1 变异信息:FBN1 c.4621G>A (p.Glu1541Lys)

PropertyValueInterpretation
GeneFBN1Marfan syndrome gene
ConsequenceMissenseAmino acid change
ClinVarVUSUncertain significance
gnomAD AF0.000004Ultra-rare (PM2)
属性解读
基因FBN1马凡综合征致病基因
变异后果错义变异氨基酸改变
ClinVar分类VUS意义不明确的变异
gnomAD等位基因频率0.000004极罕见(PM2)

4.2 Computational Predictions (NEW)

4.2 计算预测结果(新增)

PredictorScoreClassificationACMG Support
AlphaMissense0.78PathogenicPP3 (strong)
CADD PHRED28.5Top 0.1% deleteriousPP3
EVE0.72Likely pathogenicPP3
Consensus: 3/3 predictors concordant damaging → Strong PP3 support
Source: AlphaMissense, CADD API, EVE via Ensembl VEP
预测工具分数分类ACMG支持证据
AlphaMissense0.78PathogenicPP3 (strong)
CADD PHRED28.5前0.1%有害变异PP3
EVE0.72Likely pathogenicPP3
共识: 3/3预测工具一致判定为有害 → 强PP3支持
Source: AlphaMissense, CADD API, EVE via Ensembl VEP

4.3 ACMG Evidence Summary

4.3 ACMG证据汇总

CriterionEvidenceStrength
PM2Absent from gnomAD (AF < 0.00001)Moderate
PP3AlphaMissense + CADD + EVE concordantSupporting (strong)
PP4Phenotype highly specific for MarfanSupporting
PS4Multiple affected family membersStrong
Preliminary Classification: Likely Pathogenic (1 Strong + 1 Moderate + 2 Supporting)
Source: ClinVar, gnomAD, AlphaMissense, CADD, EVE

---
标准证据强度
PM2gnomAD中未收录(AF < 0.00001)中等
PP3AlphaMissense + CADD + EVE结果一致支持(强)
PP4表型高度符合马凡综合征支持
PS4多个家族成员受累
初步分类: 可能致病(1项强证据 + 1项中等证据 + 2项支持证据)
Source: ClinVar, gnomAD, AlphaMissense, CADD, EVE

---

Phase 5: Structure Analysis for VUS

阶段5:VUS结构分析

5.1 When to Perform Structure Analysis

5.1 何时进行结构分析

Perform when:
  • Variant is VUS or conflicting interpretations
  • Missense variant in critical domain
  • Novel variant not in databases
  • Additional evidence needed for classification
在以下情况时进行:
  • 变异为VUS或存在相互矛盾的解读
  • 错义变异位于关键结构域
  • 数据库中未收录的新变异
  • 需要额外证据进行分类

5.2 Structure Prediction (NVIDIA NIM)

5.2 结构预测(NVIDIA NIM)

python
def analyze_variant_structure(tu, protein_sequence, variant_position):
    """Predict structure and analyze variant impact."""
    
    # Predict structure with AlphaFold2
    structure = tu.tools.NvidiaNIM_alphafold2(
        sequence=protein_sequence,
        algorithm="mmseqs2",
        relax_prediction=False
    )
    
    # Extract pLDDT at variant position
    variant_plddt = get_residue_plddt(structure, variant_position)
    
    # Check if in structured region
    confidence = "High" if variant_plddt > 70 else "Low"
    
    return {
        'structure': structure,
        'variant_plddt': variant_plddt,
        'confidence': confidence
    }
python
def analyze_variant_structure(tu, protein_sequence, variant_position):
    """预测蛋白质结构并分析变异影响。"""
    
    # 使用AlphaFold2预测结构
    structure = tu.tools.NvidiaNIM_alphafold2(
        sequence=protein_sequence,
        algorithm="mmseqs2",
        relax_prediction=False
    )
    
    # 提取变异位置的pLDDT值
    variant_plddt = get_residue_plddt(structure, variant_position)
    
    # 检查是否位于结构化区域
    confidence = "High" if variant_plddt > 70 else "Low"
    
    return {
        'structure': structure,
        'variant_plddt': variant_plddt,
        'confidence': confidence
    }

5.3 Domain Impact Assessment

5.3 结构域影响评估

python
def assess_domain_impact(tu, uniprot_id, variant_position):
    """Check if variant affects functional domain."""
    
    # Get domain annotations
    domains = tu.tools.InterPro_get_protein_domains(accession=uniprot_id)
    
    for domain in domains:
        if domain['start'] <= variant_position <= domain['end']:
            return {
                'in_domain': True,
                'domain_name': domain['name'],
                'domain_function': domain['description']
            }
    
    return {'in_domain': False}
python
def assess_domain_impact(tu, uniprot_id, variant_position):
    """检查变异是否影响功能结构域。"""
    
    # 获取结构域注释
    domains = tu.tools.InterPro_get_protein_domains(accession=uniprot_id)
    
    for domain in domains:
        if domain['start'] <= variant_position <= domain['end']:
            return {
                'in_domain': True,
                'domain_name': domain['name'],
                'domain_function': domain['description']
            }
    
    return {'in_domain': False}

5.4 Output for Report

5.4 报告输出内容

markdown
undefined
markdown
undefined

5. Structural Analysis

5. 结构分析

5.1 Structure Prediction

5.1 结构预测

Method: AlphaFold2 via NVIDIA NIM Protein: Fibrillin-1 (FBN1) Sequence Length: 2,871 amino acids
MetricValueInterpretation
Mean pLDDT85.3High confidence overall
Variant position pLDDT92.1Very high confidence
Nearby domaincbEGF-like domain 23Calcium-binding
方法: AlphaFold2 via NVIDIA NIM 蛋白质: 原纤维蛋白-1(FBN1) 序列长度: 2,871个氨基酸
指标解读
平均pLDDT85.3整体置信度高
变异位置pLDDT92.1置信度极高
附近结构域cbEGF-like domain 23钙结合结构域

5.2 Variant Location Analysis

5.2 变异位置分析

Variant: p.Glu1541Lys
FeatureFindingImpact
DomaincbEGF-like domain 23Critical for calcium binding
Conservation100% conserved across vertebratesHigh constraint
Structural roleCalcium coordination residueLikely destabilizing
Nearby pathogenicp.Glu1540Lys (Pathogenic)Adjacent residue
变异: p.Glu1541Lys
特征发现影响
结构域cbEGF-like domain 23对钙结合至关重要
保守性脊椎动物中100%保守高约束
结构作用钙配位残基可能导致结构不稳定
附近致病变异p.Glu1540Lys(致病)相邻残基

5.3 Structural Interpretation

5.3 结构解读

The variant p.Glu1541Lys:
  1. Located in cbEGF domain - These domains are critical for fibrillin-1 function
  2. Glutamate → Lysine - Charge reversal (negative to positive)
  3. Calcium binding - Glutamate at this position coordinates Ca2+
  4. Adjacent pathogenic variant - p.Glu1540Lys is classified Pathogenic
Structural Evidence: Strong support for pathogenicity (PM1 - critical domain)
Source: NVIDIA NIM via
NvidiaNIM_alphafold2
, InterPro

---
变异p.Glu1541Lys:
  1. 位于cbEGF结构域 - 该结构域对原纤维蛋白-1的功能至关重要
  2. 谷氨酸→赖氨酸 - 电荷反转(负→正)
  3. 钙结合 - 该位置的谷氨酸参与Ca2+配位
  4. 相邻致病变异 - p.Glu1540Lys被分类为致病性变异
结构证据: 强烈支持致病性(PM1 - 关键结构域)
Source: NVIDIA NIM via
NvidiaNIM_alphafold2
, InterPro

---

Phase 6: Literature Evidence (NEW)

阶段6:文献证据分析(新增)

6.1 Published Literature (PubMed)

6.1 已发表文献(PubMed)

python
def search_disease_literature(tu, disease_name, genes):
    """Search for relevant published literature."""
    
    # Disease-specific search
    disease_papers = tu.tools.PubMed_search_articles(
        query=f'"{disease_name}" AND (genetics OR mutation OR variant)',
        limit=20
    )
    
    # Gene-specific searches
    gene_papers = []
    for gene in genes[:5]:  # Top 5 genes
        papers = tu.tools.PubMed_search_articles(
            query=f'"{gene}" AND rare disease AND pathogenic',
            limit=10
        )
        gene_papers.extend(papers)
    
    return {
        'disease_literature': disease_papers,
        'gene_literature': gene_papers
    }
python
def search_disease_literature(tu, disease_name, genes):
    """搜索相关已发表文献。"""
    
    # 疾病特异性搜索
    disease_papers = tu.tools.PubMed_search_articles(
        query=f'"{disease_name}" AND (genetics OR mutation OR variant)',
        limit=20
    )
    
    # 基因特异性搜索
    gene_papers = []
    for gene in genes[:5]:  # 前5个基因
        papers = tu.tools.PubMed_search_articles(
            query=f'"{gene}" AND rare disease AND pathogenic',
            limit=10
        )
        gene_papers.extend(papers)
    
    return {
        'disease_literature': disease_papers,
        'gene_literature': gene_papers
    }

6.2 Preprint Literature (BioRxiv/MedRxiv)

6.2 预印本文献(BioRxiv/MedRxiv)

python
def search_preprints(tu, disease_name, genes):
    """Search preprints for cutting-edge findings."""
    
    # BioRxiv search
    biorxiv = tu.tools.BioRxiv_search_preprints(
        query=f"{disease_name} genetics",
        limit=10
    )
    
    # ArXiv for computational methods
    arxiv = tu.tools.ArXiv_search_papers(
        query=f"rare disease diagnosis {' OR '.join(genes[:3])}",
        category="q-bio",
        limit=5
    )
    
    return {
        'biorxiv': biorxiv,
        'arxiv': arxiv
    }
python
def search_preprints(tu, disease_name, genes):
    """搜索预印本获取前沿发现。"""
    
    # BioRxiv搜索
    biorxiv = tu.tools.BioRxiv_search_preprints(
        query=f"{disease_name} genetics",
        limit=10
    )
    
    # ArXiv计算方法相关搜索
    arxiv = tu.tools.ArXiv_search_papers(
        query=f"rare disease diagnosis {' OR '.join(genes[:3])}",
        category="q-bio",
        limit=5
    )
    
    return {
        'biorxiv': biorxiv,
        'arxiv': arxiv
    }

6.3 Citation Analysis (OpenAlex)

6.3 引用分析(OpenAlex)

python
def analyze_citations(tu, key_papers):
    """Analyze citation network for key papers."""
    
    citation_analysis = []
    for paper in key_papers[:5]:
        # Get citation data
        work = tu.tools.openalex_search_works(
            query=paper['title'],
            limit=1
        )
        if work:
            citation_analysis.append({
                'title': paper['title'],
                'citations': work[0].get('cited_by_count', 0),
                'year': work[0].get('publication_year')
            })
    
    return citation_analysis
python
def analyze_citations(tu, key_papers):
    """分析关键论文的引用网络。"""
    
    citation_analysis = []
    for paper in key_papers[:5]:
        # 获取引用数据
        work = tu.tools.openalex_search_works(
            query=paper['title'],
            limit=1
        )
        if work:
            citation_analysis.append({
                'title': paper['title'],
                'citations': work[0].get('cited_by_count', 0),
                'year': work[0].get('publication_year')
            })
    
    return citation_analysis

6.4 Output for Report

6.4 报告输出内容

markdown
undefined
markdown
undefined

6. Literature Evidence

6. 文献证据

6.1 Key Published Studies

6.1 关键已发表研究

PMIDTitleYearCitationsRelevance
32123456FBN1 variants in Marfan syndrome...202345Direct
31987654TGF-beta signaling in connective...202289Pathway
30876543Novel diagnostic criteria for...2021156Diagnostic
PMID标题年份引用数相关性
32123456FBN1 variants in Marfan syndrome...202345直接相关
31987654TGF-beta signaling in connective...202289通路相关
30876543Novel diagnostic criteria for...2021156诊断相关

6.2 Recent Preprints (Not Yet Peer-Reviewed)

6.2 近期预印本(尚未同行评审)

SourceTitlePostedRelevance
BioRxivNovel FBN1 splice variant causes...2024-01Case report
MedRxivMachine learning for Marfan...2024-02Diagnostic
⚠️ Note: Preprints have not undergone peer review. Use with caution.
来源标题发布日期相关性
BioRxivNovel FBN1 splice variant causes...2024-01病例报告
MedRxivMachine learning for Marfan...2024-02诊断相关
⚠️ 注意: 预印本尚未经过同行评审,谨慎使用。

6.3 Evidence Summary

6.3 证据汇总

Evidence TypeCountStrength
Case reports12Supporting
Functional studies5Strong
Clinical trials2Strong
Reviews8Context
Source: PubMed, BioRxiv, OpenAlex

---
证据类型数量强度
病例报告12支持
功能研究5
临床试验2
综述8背景
Source: PubMed, BioRxiv, OpenAlex

---

Report Template

报告模板

File:
[PATIENT_ID]_rare_disease_report.md
markdown
undefined
文件:
[PATIENT_ID]_rare_disease_report.md
markdown
undefined

Rare Disease Diagnostic Report

罕见病诊断报告

Patient ID: [ID] | Date: [Date] | Status: In Progress

患者ID: [ID] | 日期: [Date] | 状态: 研究中

Executive Summary

执行摘要

[Researching...]

[研究中...]

1. Phenotype Analysis

1. 表型分析

1.1 Standardized HPO Terms

1.1 标准化HPO术语

[Researching...]
[研究中...]

1.2 Key Clinical Features

1.2 关键临床特征

[Researching...]

[研究中...]

2. Differential Diagnosis

2. 鉴别诊断

2.1 Ranked Candidate Diseases

2.1 优先候选疾病

[Researching...]
[研究中...]

2.2 Disease Details

2.2 疾病详情

[Researching...]

[研究中...]

3. Recommended Gene Panel

3. 推荐基因检测Panel

3.1 Prioritized Genes

3.1 优先基因

[Researching...]
[研究中...]

3.2 Testing Strategy

3.2 检测策略

[Researching...]

[研究中...]

4. Variant Interpretation (if applicable)

4. 变异解读(如适用)

4.1 Variant Details

4.1 变异详情

[Researching...]
[研究中...]

4.2 ACMG Classification

4.2 ACMG分类

[Researching...]

[研究中...]

5. Structural Analysis (if applicable)

5. 结构分析(如适用)

5.1 Structure Prediction

5.1 结构预测

[Researching...]
[研究中...]

5.2 Variant Impact

5.2 变异影响

[Researching...]

[研究中...]

6. Clinical Recommendations

6. 临床建议

6.1 Diagnostic Next Steps

6.1 诊断下一步行动

[Researching...]
[研究中...]

6.2 Specialist Referrals

6.2 专科转诊建议

[Researching...]
[研究中...]

6.3 Family Screening

6.3 家族筛查建议

[Researching...]

[研究中...]

7. Data Gaps & Limitations

7. 数据缺口与局限性

[Researching...]

[研究中...]

8. Data Sources

8. 数据来源

[Will be populated as research progresses...]

---
[将随着研究进展逐步完善...]

---

Evidence Grading

证据分级

TierSymbolCriteriaExample
T1★★★Phenotype match >80% + gene matchMarfan with FBN1 mutation
T2★★☆Phenotype match 60-80% OR likely pathogenic variantGood phenotype fit
T3★☆☆Phenotype match 40-60% OR VUS in candidate genePossible diagnosis
T4☆☆☆Phenotype <40% OR uncertain geneLow probability

层级符号标准示例
T1★★★表型匹配度>80% + 基因匹配马凡综合征伴FBN1突变
T2★★☆表型匹配度60-80% 或 可能致病变异表型匹配度良好
T3★☆☆表型匹配度40-60% 或 候选基因中的VUS可能的诊断
T4☆☆☆表型匹配度<40% 或 基因关联性不确定低概率

Completeness Checklist

完整性检查清单

Phase 1: Phenotype

阶段1:表型

  • All symptoms converted to HPO terms
  • Core vs. variable features distinguished
  • Age of onset documented
  • Family history noted
  • 所有症状已转换为HPO术语
  • 已区分核心与可变特征
  • 已记录发病年龄
  • 已记录家族病史

Phase 2: Disease Matching

阶段2:疾病匹配

  • ≥5 candidate diseases identified (or all matching)
  • Phenotype overlap % calculated
  • Inheritance patterns noted
  • ORPHA and OMIM IDs provided
  • 已识别≥5种候选疾病(或所有匹配疾病)
  • 已计算表型重叠百分比
  • 已记录遗传模式
  • 已提供ORPHA和OMIM ID

Phase 3: Gene Panel

阶段3:基因Panel

  • ≥5 genes prioritized (or all from top diseases)
  • Evidence level for each gene (ClinGen)
  • Expression validation performed
  • Testing strategy recommended
  • 已优先推荐≥5个基因(或所有顶级疾病相关基因)
  • 已标注每个基因的证据等级(ClinGen)
  • 已完成表达验证
  • 已推荐检测策略

Phase 4: Variant Interpretation (if applicable)

阶段4:变异解读(如适用)

  • ClinVar classification retrieved
  • gnomAD frequency checked
  • ACMG criteria applied
  • Classification justified
  • 已获取ClinVar分类
  • 已检查gnomAD频率
  • 已应用ACMG标准
  • 已解释分类依据

Phase 5: Structure Analysis (if applicable)

阶段5:结构分析(如适用)

  • Structure predicted (if VUS)
  • pLDDT confidence reported
  • Domain impact assessed
  • Structural evidence summarized
  • 已预测结构(若为VUS)
  • 已报告pLDDT置信度
  • 已评估结构域影响
  • 已总结结构证据

Phase 6: Recommendations

阶段6:建议

  • ≥3 next steps listed
  • Specialist referrals suggested
  • Family screening addressed

  • 已列出≥3项下一步行动
  • 已建议专科转诊
  • 已提及家族筛查

Fallback Chains

备选工具链

Primary ToolFallback 1Fallback 2
Orphanet_search_by_hpo
OMIM_search
PubMed phenotype search
ClinVar_get_variant
gnomAD_get_variant
VEP annotation
NvidiaNIM_alphafold2
alphafold_get_prediction
UniProt features
GTEx_expression
HPA_expression
Tissue-specific literature
gnomAD_get_variant
ExAC_frequencies
1000 Genomes

主工具备选工具1备选工具2
Orphanet_search_by_hpo
OMIM_search
PubMed表型搜索
ClinVar_get_variant
gnomAD_get_variant
VEP注释
NvidiaNIM_alphafold2
alphafold_get_prediction
UniProt特征分析
GTEx_expression
HPA_expression
组织特异性文献
gnomAD_get_variant
ExAC_frequencies
1000 Genomes

Tool Reference

工具参考

See TOOLS_REFERENCE.md for complete tool documentation.
完整工具文档请参考 TOOLS_REFERENCE.md