tooluniverse-rare-disease-diagnosis

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Rare Disease Diagnosis Advisor

罕见病诊断顾问

Systematic diagnosis support for rare diseases using phenotype matching, gene panel prioritization, and variant interpretation across Orphanet, OMIM, HPO, ClinVar, and structure-based analysis.

KEY PRINCIPLES:

Report-first approach - Create report file FIRST, update progressively
Phenotype-driven - Convert symptoms to HPO terms before searching
Multi-database triangulation - Cross-reference Orphanet, OMIM, OpenTargets
Evidence grading - Grade diagnoses by supporting evidence strength
Actionable output - Prioritized differential diagnosis with next steps
Genetic counseling aware - Consider inheritance patterns and family history
English-first queries - Always use English terms in tool calls (phenotype descriptions, gene names, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language

通过表型匹配、基因panel优先排序以及跨Orphanet、OMIM、HPO、ClinVar的变异解读和基于结构的分析，为罕见病提供系统性诊断支持。

核心原则:

报告优先方法 - 先创建报告文件，逐步更新内容
表型驱动 - 在搜索前将症状转换为HPO术语
多数据库交叉验证 - 交叉引用Orphanet、OMIM、OpenTargets数据
证据分级 - 根据支持证据的强度对诊断进行分级
可执行输出 - 附带下一步建议的优先鉴别诊断结果
考虑遗传咨询 - 结合遗传模式和家族病史
英文优先查询 - 工具调用中始终使用英文术语（表型描述、基因名称、疾病名称），即使用户使用其他语言提问。仅在无法匹配时尝试使用原语言术语。以用户使用的语言回复

When to Use

使用场景

Apply when user asks:

"Patient has [symptoms], what rare disease could this be?"
"Unexplained developmental delay with [features]"
"WES found VUS in [gene], is this pathogenic?"
"What genes should we test for [phenotype]?"
"Differential diagnosis for [rare symptom combination]"

当用户提出以下问题时适用：

"患者有[症状]，可能是什么罕见病？"
"不明原因的发育迟缓伴[特征]"
"全外显子测序（WES）在[基因]中发现VUS，该变异是否致病？"
"针对[表型]我们应该检测哪些基因？"
"[罕见症状组合]的鉴别诊断"

Critical Workflow Requirements

关键工作流要求

1. Report-First Approach (MANDATORY)

1. 报告优先方法（强制要求）

Create the report file FIRST:
- File name:
```
[PATIENT_ID]_rare_disease_report.md
```
- Initialize with all section headers
- Add placeholder text:
```
[Researching...]
```
Progressively update as you gather data
Output separate data files:
- ```
[PATIENT_ID]_gene_panel.csv
```
  - Prioritized genes for testing
- ```
[PATIENT_ID]_variant_interpretation.csv
```
  - If variants provided

先创建报告文件:
- 文件名:
```
[PATIENT_ID]_rare_disease_report.md
```
- 初始化所有章节标题
- 添加占位文本:
```
[研究中...]
```
逐步更新内容：随着数据收集逐步完善报告
输出独立数据文件:
- ```
[PATIENT_ID]_gene_panel.csv
```
  - 优先推荐的检测基因列表
- ```
[PATIENT_ID]_variant_interpretation.csv
```
  - 若提供了变异信息则生成该文件

2. Citation Requirements (MANDATORY)

2. 引用要求（强制要求）

Every finding MUST include source:

markdown

undefined

所有发现必须标注来源：

markdown

undefined

Candidate Disease: Marfan Syndrome

候选疾病：马凡综合征

ORPHA: ORPHA:558
OMIM: 154700
Phenotype match: 85% (17/20 HPO terms)
Inheritance: AD
Gene: FBN1

Source: Orphanet via
Orphanet_558
, OMIM via
OMIM_get_entry

---

ORPHA: ORPHA:558
OMIM: 154700
表型匹配度: 85% (17/20个HPO术语匹配)
遗传模式: AD（常染色体显性遗传）
致病基因: FBN1

来源: Orphanet via
Orphanet_558
, OMIM via
OMIM_get_entry

---

Phase 0: Tool Verification

阶段0：工具参数验证

CRITICAL: Verify tool parameters before calling.

关键: 在调用工具前务必验证参数正确性。

Known Parameter Corrections

已知参数修正

Tool	WRONG Parameter	CORRECT Parameter
`OpenTargets_get_associated_diseases_by_target_ensemblId`	`ensemblID`	`ensemblId`
`ClinVar_get_variant_by_id`	`variant_id`	`id`
`MyGene_query_genes`	`gene`	`q`
`gnomAD_get_variant_frequencies`	`variant`	`variant_id`

工具	错误参数	正确参数
`OpenTargets_get_associated_diseases_by_target_ensemblId`	`ensemblID`	`ensemblId`
`ClinVar_get_variant_by_id`	`variant_id`	`id`
`MyGene_query_genes`	`gene`	`q`
`gnomAD_get_variant_frequencies`	`variant`	`variant_id`

Workflow Overview

工作流概述

Phase 1: Phenotype Standardization
├── Convert symptoms to HPO terms
├── Identify core vs. variable features
└── Note age of onset, inheritance hints
    ↓
Phase 2: Disease Matching
├── Orphanet phenotype search
├── OMIM clinical synopsis match
├── OpenTargets disease associations
└── OUTPUT: Ranked differential diagnosis
    ↓
Phase 3: Gene Panel Identification
├── Extract genes from top diseases
├── Cross-reference expression (GTEx)
├── Prioritize by evidence strength
└── OUTPUT: Recommended gene panel
    ↓
Phase 3.5: Expression & Tissue Context (NEW)
├── CELLxGENE: Cell-type specific expression
├── ChIPAtlas: Regulatory context (TF binding)
├── Tissue-specific gene networks
└── OUTPUT: Expression validation
    ↓
Phase 3.6: Pathway Analysis (NEW)
├── KEGG: Metabolic/signaling pathways
├── Reactome: Biological processes
├── IntAct: Protein-protein interactions
└── OUTPUT: Biological context
    ↓
Phase 4: Variant Interpretation (if provided)
├── ClinVar pathogenicity lookup
├── gnomAD population frequency
├── Protein domain/function impact
├── ENCODE/ChIPAtlas: Regulatory variant impact
└── OUTPUT: Variant classification
    ↓
Phase 5: Structure Analysis (for VUS)
├── NvidiaNIM_alphafold2 → Predict structure
├── Map variant to structure
├── Assess functional domain impact
└── OUTPUT: Structural evidence
    ↓
Phase 6: Literature Evidence (NEW)
├── PubMed: Published studies
├── BioRxiv/MedRxiv: Preprints
├── OpenAlex: Citation analysis
└── OUTPUT: Literature support
    ↓
Phase 7: Report Synthesis
├── Prioritized differential diagnosis
├── Recommended genetic testing
├── Next steps for clinician
└── OUTPUT: Final report

阶段1：表型标准化
├── 将症状转换为HPO术语
├── 区分核心特征与可变特征
└── 记录发病年龄、遗传提示信息
    ↓
阶段2：疾病匹配
├── Orphanet表型搜索
├── OMIM临床概要匹配
├── OpenTargets疾病关联分析
└── 输出：排序后的鉴别诊断列表
    ↓
阶段3：基因Panel确定
├── 从排名靠前的疾病中提取基因
├── 交叉验证基因表达（GTEx）
├── 根据证据强度优先排序
└── 输出：推荐的基因检测Panel
    ↓
阶段3.5：表达与组织背景分析（新增）
├── CELLxGENE：细胞类型特异性表达分析
├── ChIPAtlas：调控背景（转录因子结合）
├── 组织特异性基因网络分析
└── 输出：表达验证结果
    ↓
阶段3.6：通路分析（新增）
├── KEGG：代谢/信号通路分析
├── Reactome：生物学过程分析
├── IntAct：蛋白质-蛋白质相互作用分析
└── 输出：生物学背景信息
    ↓
阶段4：变异解读（若提供变异信息）
├── ClinVar致病性查询
├── gnomAD人群频率分析
├── 蛋白质结构域/功能影响评估
├── ENCODE/ChIPAtlas：调控变异影响分析
└── 输出：变异分类结果
    ↓
阶段5：VUS结构分析
├── NvidiaNIM_alphafold2 → 预测蛋白质结构
├── 将变异映射到结构上
├── 评估功能结构域的影响
└── 输出：结构证据
    ↓
阶段6：文献证据分析（新增）
├── PubMed：已发表研究
├── BioRxiv/MedRxiv：预印本
├── OpenAlex：引用分析
└── 输出：文献支持证据
    ↓
阶段7：报告合成
├── 优先排序的鉴别诊断结果
├── 推荐的基因检测方案
├── 临床医生下一步行动建议
└── 输出：最终报告

Phase 1: Phenotype Standardization

阶段1：表型标准化

1.1 Convert Symptoms to HPO Terms

1.1 症状转HPO术语

python

def standardize_phenotype(tu, symptoms_list):
    """Convert clinical descriptions to HPO terms."""
    hpo_terms = []
    
    for symptom in symptoms_list:
        # Search HPO for matching terms
        results = tu.tools.HPO_search_terms(query=symptom)
        if results:
            hpo_terms.append({
                'original': symptom,
                'hpo_id': results[0]['id'],
                'hpo_name': results[0]['name'],
                'confidence': 'exact' if symptom.lower() in results[0]['name'].lower() else 'partial'
            })
    
    return hpo_terms

python

def standardize_phenotype(tu, symptoms_list):
    """将临床描述转换为HPO术语。"""
    hpo_terms = []
    
    for symptom in symptoms_list:
        # 搜索HPO匹配术语
        results = tu.tools.HPO_search_terms(query=symptom)
        if results:
            hpo_terms.append({
                'original': symptom,
                'hpo_id': results[0]['id'],
                'hpo_name': results[0]['name'],
                'confidence': 'exact' if symptom.lower() in results[0]['name'].lower() else 'partial'
            })
    
    return hpo_terms

1.2 Phenotype Categories

1.2 表型分类

Category	Examples	Weight
Core features	Always present in disease	High
Variable features	Present in >50%	Medium
Occasional features	Present in <50%	Low
Age-specific	Onset-dependent	Context

分类	示例	权重
核心特征	疾病中始终存在的特征	高
可变特征	出现率>50%的特征	中
偶发特征	出现率<50%的特征	低
年龄特异性特征	与发病年龄相关的特征	上下文相关

1.3 Output for Report

1.3 报告输出内容

markdown

undefined

markdown

undefined

1. Phenotype Analysis

1. 表型分析

1.1 Standardized HPO Terms

1.1 标准化HPO术语

Clinical Feature	HPO Term	HPO ID	Category
Tall stature	Tall stature	HP:0000098	Core
Long fingers	Arachnodactyly	HP:0001166	Core
Heart murmur	Cardiac murmur	HP:0030148	Variable
Joint hypermobility	Joint hypermobility	HP:0001382	Core

Total HPO Terms: 8 Onset: Childhood Family History: Father with similar features (AD suspected)

Source: HPO via
HPO_search_terms

---

临床特征	HPO术语	HPO ID	分类
身材高大	Tall stature	HP:0000098	核心
细长指（蜘蛛指）	Arachnodactyly	HP:0001166	核心
心脏杂音	Cardiac murmur	HP:0030148	可变
关节过度活动	Joint hypermobility	HP:0001382	核心

HPO术语总数: 8 发病年龄: 儿童期 家族病史: 父亲有相似特征（疑似常染色体显性遗传）

来源: HPO via
HPO_search_terms

---

Phase 2: Disease Matching

阶段2：疾病匹配

2.1 Orphanet Disease Search (NEW TOOLS)

2.1 Orphanet疾病搜索（新增工具）

python

def match_diseases_orphanet(tu, symptom_keywords):
    """Find rare diseases matching symptoms using Orphanet."""
    candidate_diseases = []
    
    # Search Orphanet by disease keywords
    for keyword in symptom_keywords:
        results = tu.tools.Orphanet_search_diseases(
            operation="search_diseases",
            query=keyword
        )
        if results.get('status') == 'success':
            candidate_diseases.extend(results['data']['results'])
    
    # Get genes for each disease
    for disease in candidate_diseases:
        orpha_code = disease.get('ORPHAcode')
        genes = tu.tools.Orphanet_get_genes(
            operation="get_genes",
            orpha_code=orpha_code
        )
        disease['genes'] = genes.get('data', {}).get('genes', [])
    
    return deduplicate_and_rank(candidate_diseases)

python

def match_diseases_orphanet(tu, symptom_keywords):
    """使用Orphanet查找与症状匹配的罕见病。"""
    candidate_diseases = []
    
    # 按疾病关键词搜索Orphanet
    for keyword in symptom_keywords:
        results = tu.tools.Orphanet_search_diseases(
            operation="search_diseases",
            query=keyword
        )
        if results.get('status') == 'success':
            candidate_diseases.extend(results['data']['results'])
    
    # 获取每种疾病对应的基因
    for disease in candidate_diseases:
        orpha_code = disease.get('ORPHAcode')
        genes = tu.tools.Orphanet_get_genes(
            operation="get_genes",
            orpha_code=orpha_code
        )
        disease['genes'] = genes.get('data', {}).get('genes', [])
    
    return deduplicate_and_rank(candidate_diseases)

2.2 OMIM Cross-Reference (NEW TOOLS)

2.2 OMIM交叉验证（新增工具）

python

def cross_reference_omim(tu, orphanet_diseases, gene_symbols):
    """Get OMIM details for diseases and genes."""
    omim_data = {}
    
    # Search OMIM for each disease/gene
    for gene in gene_symbols:
        search_result = tu.tools.OMIM_search(
            operation="search",
            query=gene,
            limit=5
        )
        if search_result.get('status') == 'success':
            for entry in search_result['data'].get('entries', []):
                mim_number = entry.get('mimNumber')
                
                # Get detailed entry
                details = tu.tools.OMIM_get_entry(
                    operation="get_entry",
                    mim_number=str(mim_number)
                )
                
                # Get clinical synopsis (phenotype features)
                synopsis = tu.tools.OMIM_get_clinical_synopsis(
                    operation="get_clinical_synopsis",
                    mim_number=str(mim_number)
                )
                
                omim_data[gene] = {
                    'mim_number': mim_number,
                    'details': details.get('data', {}),
                    'clinical_synopsis': synopsis.get('data', {})
                }
    
    return omim_data

python

def cross_reference_omim(tu, orphanet_diseases, gene_symbols):
    """获取疾病和基因的OMIM详细信息。"""
    omim_data = {}
    
    # 搜索每种疾病/基因的OMIM数据
    for gene in gene_symbols:
        search_result = tu.tools.OMIM_search(
            operation="search",
            query=gene,
            limit=5
        )
        if search_result.get('status') == 'success':
            for entry in search_result['data'].get('entries', []):
                mim_number = entry.get('mimNumber')
                
                # 获取详细条目
                details = tu.tools.OMIM_get_entry(
                    operation="get_entry",
                    mim_number=str(mim_number)
                )
                
                # 获取临床概要（表型特征）
                synopsis = tu.tools.OMIM_get_clinical_synopsis(
                    operation="get_clinical_synopsis",
                    mim_number=str(mim_number)
                )
                
                omim_data[gene] = {
                    'mim_number': mim_number,
                    'details': details.get('data', {}),
                    'clinical_synopsis': synopsis.get('data', {})
                }
    
    return omim_data

2.3 DisGeNET Gene-Disease Associations (NEW TOOLS)

2.3 DisGeNET基因-疾病关联分析（新增工具）

python

def get_gene_disease_associations(tu, gene_symbols):
    """Get gene-disease associations from DisGeNET."""
    associations = {}
    
    for gene in gene_symbols:
        # Get diseases associated with gene
        result = tu.tools.DisGeNET_search_gene(
            operation="search_gene",
            gene=gene,
            limit=20
        )
        
        if result.get('status') == 'success':
            associations[gene] = result['data'].get('associations', [])
    
    return associations

def get_disease_genes_disgenet(tu, disease_name):
    """Get all genes associated with a disease."""
    result = tu.tools.DisGeNET_search_disease(
        operation="search_disease",
        disease=disease_name,
        limit=30
    )
    return result.get('data', {}).get('associations', [])

python

def get_gene_disease_associations(tu, gene_symbols):
    """从DisGeNET获取基因-疾病关联信息。"""
    associations = {}
    
    for gene in gene_symbols:
        # 获取与基因关联的疾病
        result = tu.tools.DisGeNET_search_gene(
            operation="search_gene",
            gene=gene,
            limit=20
        )
        
        if result.get('status') == 'success':
            associations[gene] = result['data'].get('associations', [])
    
    return associations

def get_disease_genes_disgenet(tu, disease_name):
    """获取与疾病相关的所有基因。"""
    result = tu.tools.DisGeNET_search_disease(
        operation="search_disease",
        disease=disease_name,
        limit=30
    )
    return result.get('data', {}).get('associations', [])

2.4 Phenotype Overlap Scoring

2.4 表型重叠评分

Match Level	Score	Criteria
Excellent	>80%	Most core + variable features match
Good	60-80%	Core features match, some variable
Possible	40-60%	Some overlap, needs consideration
Unlikely	<40%	Poor phenotype fit

匹配等级	分数	标准
优秀	>80%	大部分核心+可变特征匹配
良好	60-80%	核心特征匹配，部分可变特征匹配
可能	40-60%	存在部分重叠，需进一步考虑
unlikely	<40%	表型匹配度差

2.5 Output for Report

2.5 报告输出内容

markdown

undefined

markdown

undefined

2. Differential Diagnosis

2. 鉴别诊断

Top Candidate Diseases (Ranked by Phenotype Match)

优先候选疾病（按表型匹配度排序）

Rank	Disease	ORPHA	OMIM	Match	Inheritance	Key Gene(s)
1	Marfan syndrome	558	154700	85%	AD	FBN1
2	Loeys-Dietz syndrome	60030	609192	72%	AD	TGFBR1, TGFBR2
3	Ehlers-Danlos, vascular	286	130050	65%	AD	COL3A1
4	Homocystinuria	394	236200	58%	AR	CBS

排名	疾病	ORPHA	OMIM	匹配度	遗传模式	关键基因
1	马凡综合征	558	154700	85%	AD	FBN1
2	Loeys-Dietz综合征	60030	609192	72%	AD	TGFBR1, TGFBR2
3	血管型Ehlers-Danlos综合征	286	130050	65%	AD	COL3A1
4	高同型半胱氨酸尿症	394	236200	58%	AR	CBS

DisGeNET Gene-Disease Evidence

DisGeNET基因-疾病证据

Gene	Associated Diseases	GDA Score	Evidence
FBN1	Marfan syndrome, MASS phenotype	0.95	★★★ Curated
TGFBR1	Loeys-Dietz syndrome	0.89	★★★ Curated
COL3A1	vascular EDS	0.91	★★★ Curated

Source: DisGeNET via
DisGeNET_search_gene

基因	关联疾病	GDA评分	证据等级
FBN1	马凡综合征、MASS表型	0.95	★★★ 已验证
TGFBR1	Loeys-Dietz综合征	0.89	★★★ 已验证
COL3A1	血管型EDS	0.91	★★★ 已验证

来源: DisGeNET via
DisGeNET_search_gene

Disease Details

疾病详情

1. Marfan Syndrome (★★★)

1. 马凡综合征（★★★）

ORPHA: 558 | OMIM: 154700 | Prevalence: 1-5/10,000

Phenotype Match Analysis:

Patient Feature	Disease Feature	Match
Tall stature	Present in 95%	✓
Arachnodactyly	Present in 90%	✓
Joint hypermobility	Present in 85%	✓
Cardiac murmur	Aortic root dilation (70%)	Partial

OMIM Clinical Synopsis (via

OMIM_get_clinical_synopsis

Cardiovascular: Aortic root dilation, mitral valve prolapse
Skeletal: Scoliosis, pectus excavatum, tall stature
Ocular: Ectopia lentis, myopia

Diagnostic Criteria: Ghent nosology (2010)

Aortic root dilation/dissection + FBN1 mutation = Diagnosis
Without genetic testing: systemic score ≥7 + ectopia lentis

Inheritance: Autosomal dominant (25% de novo)

Source: Orphanet via
Orphanet_get_disease
, OMIM via
OMIM_get_entry
, DisGeNET

---

ORPHA: 558 | OMIM: 154700 | 患病率: 1-5/10,000

表型匹配分析:

患者特征	疾病特征	匹配情况
身材高大	95%患者存在	✓
蜘蛛指	90%患者存在	✓
关节过度活动	85%患者存在	✓
心脏杂音	主动脉根部扩张（70%患者）	部分匹配

OMIM临床概要 (via

OMIM_get_clinical_synopsis

心血管系统: 主动脉根部扩张、二尖瓣脱垂
骨骼系统: 脊柱侧凸、漏斗胸、身材高大
眼部: 晶状体异位、近视

诊断标准: Ghent分类标准（2010版）

主动脉根部扩张/夹层 + FBN1突变 = 确诊
无基因检测结果时：系统评分≥7 + 晶状体异位

遗传模式: 常染色体显性遗传（25%为新发突变）

Source: Orphanet via
Orphanet_get_disease
, OMIM via
OMIM_get_entry
, DisGeNET

---

Phase 3: Gene Panel Identification

阶段3：基因Panel确定

3.1 Extract Disease Genes

3.1 提取疾病相关基因

python

def build_gene_panel(tu, candidate_diseases):
    """Build prioritized gene panel from candidate diseases."""
    genes = {}
    
    for disease in candidate_diseases:
        for gene in disease['genes']:
            if gene not in genes:
                genes[gene] = {
                    'symbol': gene,
                    'diseases': [],
                    'evidence_level': 'unknown'
                }
            genes[gene]['diseases'].append(disease['name'])
    
    return genes

python

def build_gene_panel(tu, candidate_diseases):
    """从候选疾病中构建优先推荐的基因检测Panel。"""
    genes = {}
    
    for disease in candidate_diseases:
        for gene in disease['genes']:
            if gene not in genes:
                genes[gene] = {
                    'symbol': gene,
                    'diseases': [],
                    'evidence_level': 'unknown'
                }
            genes[gene]['diseases'].append(disease['name'])
    
    return genes

3.1.1 ClinGen Gene-Disease Validity Check (NEW)

3.1.1 ClinGen基因-疾病有效性验证（新增）

Critical: Always verify gene-disease validity through ClinGen before including in panel.

python

def get_clingen_gene_evidence(tu, gene_symbol):
    """
    Get ClinGen gene-disease validity and dosage sensitivity.
    ESSENTIAL for rare disease gene panel prioritization.
    """
    
    # 1. Gene-disease validity classification
    validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
    
    validity_levels = []
    diseases_with_validity = []
    if validity.get('data'):
        for entry in validity.get('data', []):
            validity_levels.append(entry.get('Classification'))
            diseases_with_validity.append({
                'disease': entry.get('Disease Label'),
                'mondo_id': entry.get('Disease ID (MONDO)'),
                'classification': entry.get('Classification'),
                'inheritance': entry.get('Inheritance')
            })
    
    # 2. Dosage sensitivity (critical for CNV interpretation)
    dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
    
    hi_score = None
    ts_score = None
    if dosage.get('data'):
        for entry in dosage.get('data', []):
            hi_score = entry.get('Haploinsufficiency Score')
            ts_score = entry.get('Triplosensitivity Score')
            break
    
    # 3. Clinical actionability (return of findings context)
    actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
    is_actionable = (actionability.get('adult_count', 0) > 0 or 
                     actionability.get('pediatric_count', 0) > 0)
    
    # Determine best evidence level
    level_priority = ['Definitive', 'Strong', 'Moderate', 'Limited', 'Disputed', 'Refuted']
    best_level = 'Not curated'
    for level in level_priority:
        if level in validity_levels:
            best_level = level
            break
    
    return {
        'gene': gene_symbol,
        'evidence_level': best_level,
        'diseases_curated': diseases_with_validity,
        'haploinsufficiency_score': hi_score,
        'triplosensitivity_score': ts_score,
        'is_actionable': is_actionable,
        'include_in_panel': best_level in ['Definitive', 'Strong', 'Moderate']
    }

def prioritize_genes_with_clingen(tu, gene_list):
    """Prioritize genes using ClinGen evidence levels."""
    
    prioritized = []
    for gene in gene_list:
        evidence = get_clingen_gene_evidence(tu, gene)
        
        # Score based on ClinGen classification
        score = 0
        if evidence['evidence_level'] == 'Definitive':
            score = 5
        elif evidence['evidence_level'] == 'Strong':
            score = 4
        elif evidence['evidence_level'] == 'Moderate':
            score = 3
        elif evidence['evidence_level'] == 'Limited':
            score = 1
        # Disputed/Refuted get 0
        
        # Bonus for haploinsufficiency score 3
        if evidence['haploinsufficiency_score'] == '3':
            score += 1
        
        # Bonus for actionability
        if evidence['is_actionable']:
            score += 1
        
        prioritized.append({
            **evidence,
            'priority_score': score
        })
    
    # Sort by priority score
    return sorted(prioritized, key=lambda x: x['priority_score'], reverse=True)

ClinGen Classification Impact on Panel:

Classification	Include in Panel?	Priority
Definitive	YES - mandatory	Highest
Strong	YES - highly recommended	High
Moderate	YES	Medium
Limited	Include but flag	Low
Disputed	Exclude or separate	Avoid
Refuted	EXCLUDE	Do not test
Not curated	Use other evidence	Variable

关键: 在将基因纳入检测Panel前，务必通过ClinGen验证基因-疾病的有效性。

python

def get_clingen_gene_evidence(tu, gene_symbol):
    """
    获取ClinGen基因-疾病有效性和剂量敏感性信息。
    这是罕见病基因Panel优先排序的关键步骤。
    """
    
    # 1. 基因-疾病有效性分类
    validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
    
    validity_levels = []
    diseases_with_validity = []
    if validity.get('data'):
        for entry in validity.get('data', []):
            validity_levels.append(entry.get('Classification'))
            diseases_with_validity.append({
                'disease': entry.get('Disease Label'),
                'mondo_id': entry.get('Disease ID (MONDO)'),
                'classification': entry.get('Classification'),
                'inheritance': entry.get('Inheritance')
            })
    
    # 2. 剂量敏感性（对CNV解读至关重要）
    dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
    
    hi_score = None
    ts_score = None
    if dosage.get('data'):
        for entry in dosage.get('data', []):
            hi_score = entry.get('Haploinsufficiency Score')
            ts_score = entry.get('Triplosensitivity Score')
            break
    
    # 3. 临床可操作性（结果返回的上下文信息）
    actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
    is_actionable = (actionability.get('adult_count', 0) > 0 or 
                     actionability.get('pediatric_count', 0) > 0)
    
    # 确定最佳证据等级
    level_priority = ['Definitive', 'Strong', 'Moderate', 'Limited', 'Disputed', 'Refuted']
    best_level = 'Not curated'
    for level in level_priority:
        if level in validity_levels:
            best_level = level
            break
    
    return {
        'gene': gene_symbol,
        'evidence_level': best_level,
        'diseases_curated': diseases_with_validity,
        'haploinsufficiency_score': hi_score,
        'triplosensitivity_score': ts_score,
        'is_actionable': is_actionable,
        'include_in_panel': best_level in ['Definitive', 'Strong', 'Moderate']
    }

def prioritize_genes_with_clingen(tu, gene_list):
    """使用ClinGen证据等级对基因进行优先排序。"""
    
    prioritized = []
    for gene in gene_list:
        evidence = get_clingen_gene_evidence(tu, gene)
        
        # 根据ClinGen分类评分
        score = 0
        if evidence['evidence_level'] == 'Definitive':
            score = 5
        elif evidence['evidence_level'] == 'Strong':
            score = 4
        elif evidence['evidence_level'] == 'Moderate':
            score = 3
        elif evidence['evidence_level'] == 'Limited':
            score = 1
        # Disputed/Refuted得0分
        
        # 单倍剂量不足评分为3时加分
        if evidence['haploinsufficiency_score'] == '3':
            score += 1
        
        # 可操作性加分
        if evidence['is_actionable']:
            score += 1
        
        prioritized.append({
            **evidence,
            'priority_score': score
        })
    
    # 按优先级评分排序
    return sorted(prioritized, key=lambda x: x['priority_score'], reverse=True)

ClinGen分类对Panel的影响:

分类	是否纳入Panel?	优先级
Definitive	是 - 强制纳入	最高
Strong	是 - 强烈推荐	高
Moderate	是	中
Limited	可纳入但需标注	低
Disputed	排除或单独列出	避免
Refuted	排除	不检测
Not curated	参考其他证据	可变

3.2 Gene Prioritization Criteria

3.2 基因优先排序标准

Priority	Criteria	Points
Tier 1	Gene causes #1 ranked disease	+5
Tier 2	Gene causes multiple candidates	+3
Tier 3	ClinGen "Definitive" evidence	+3
Tier 4	Expressed in affected tissue	+2
Tier 5	Constraint score pLI >0.9	+1

优先级	标准	分数
Tier 1	基因为排名第1的疾病的致病基因	+5
Tier 2	基因与多个候选疾病相关	+3
Tier 3	ClinGen "Definitive"证据	+3
Tier 4	基因在受累组织中表达	+2
Tier 5	约束评分pLI >0.9	+1

3.3 Expression Validation

3.3 表达验证

python

def validate_expression(tu, gene_symbol, affected_tissue):
    """Check if gene is expressed in relevant tissue."""
    # Get Ensembl ID
    gene_info = tu.tools.MyGene_query_genes(q=gene_symbol, species="human")
    ensembl_id = gene_info.get('ensembl', {}).get('gene')
    
    # Check GTEx expression
    expression = tu.tools.GTEx_get_median_gene_expression(
        gencode_id=f"{ensembl_id}.latest"
    )
    
    return expression.get(affected_tissue, 0) > 1  # TPM > 1

python

def validate_expression(tu, gene_symbol, affected_tissue):
    """验证基因是否在相关组织中表达。"""
    # 获取Ensembl ID
    gene_info = tu.tools.MyGene_query_genes(q=gene_symbol, species="human")
    ensembl_id = gene_info.get('ensembl', {}).get('gene')
    
    # 检查GTEx表达数据
    expression = tu.tools.GTEx_get_median_gene_expression(
        gencode_id=f"{ensembl_id}.latest"
    )
    
    return expression.get(affected_tissue, 0) > 1  # TPM > 1

3.4 Output for Report

3.4 报告输出内容

markdown

undefined

markdown

undefined

3. Recommended Gene Panel

3. 推荐基因检测Panel

3.1 Prioritized Genes for Testing

3.1 优先推荐的检测基因

Priority	Gene	Diseases	Evidence	Constraint (pLI)	Expression
★★★	FBN1	Marfan syndrome	Definitive	1.00	Heart, aorta
★★★	TGFBR1	Loeys-Dietz 1	Definitive	0.98	Ubiquitous
★★★	TGFBR2	Loeys-Dietz 2	Definitive	0.99	Ubiquitous
★★☆	COL3A1	EDS vascular	Definitive	1.00	Connective tissue
★☆☆	CBS	Homocystinuria	Definitive	0.00	Liver

优先级	基因	关联疾病	证据等级	约束评分(pLI)	组织表达
★★★	FBN1	马凡综合征	Definitive	1.00	心脏、主动脉
★★★	TGFBR1	Loeys-Dietz 1型	Definitive	0.98	泛表达
★★★	TGFBR2	Loeys-Dietz 2型	Definitive	0.99	泛表达
★★☆	COL3A1	血管型EDS	Definitive	1.00	结缔组织
★☆☆	CBS	高同型半胱氨酸尿症	Definitive	0.00	肝脏

3.2 Panel Design Recommendation

3.2 Panel设计建议

Minimum Panel (high yield): FBN1, TGFBR1, TGFBR2, COL3A1 Extended Panel (+differential): Add CBS, SMAD3, ACTA2

Testing Strategy:

Start with FBN1 sequencing (highest pre-test probability)
If negative, proceed to full connective tissue panel
Consider WES if panel negative

Source: ClinGen via gene-disease validity, GTEx expression

---

最小Panel（高检出率）: FBN1, TGFBR1, TGFBR2, COL3A1 扩展Panel（覆盖更多鉴别诊断）: 新增CBS, SMAD3, ACTA2

检测策略:

先进行FBN1测序（预检测概率最高）
若结果阴性，再进行完整结缔组织病Panel检测
若Panel检测阴性，考虑全外显子测序（WES）

Source: ClinGen via gene-disease validity, GTEx expression

---

Phase 3.5: Expression & Tissue Context (ENHANCED)

阶段3.5：表达与调控背景分析（增强版）

3.5.1 Cell-Type Specific Expression (CELLxGENE)

3.5.1 细胞类型特异性表达（CELLxGENE）

python

def get_cell_type_expression(tu, gene_symbol, affected_tissues):
    """Get single-cell expression to validate tissue relevance."""
    
    # Get expression across cell types
    expression = tu.tools.CELLxGENE_get_expression_data(
        gene=gene_symbol,
        tissue=affected_tissues[0] if affected_tissues else "all"
    )
    
    # Get cell type metadata
    cell_metadata = tu.tools.CELLxGENE_get_cell_metadata(
        gene=gene_symbol
    )
    
    # Identify high-expression cell types
    high_expression = [
        ct for ct in expression 
        if ct.get('mean_expression', 0) > 1.0  # TPM > 1
    ]
    
    return {
        'expression_data': expression,
        'high_expression_cells': high_expression,
        'total_cell_types': len(cell_metadata)
    }

Why it matters: Confirms candidate genes are expressed in disease-relevant tissues/cells.

python

def get_cell_type_expression(tu, gene_symbol, affected_tissues):
    """获取单细胞表达数据，验证组织相关性。"""
    
    # 获取细胞类型表达数据
    expression = tu.tools.CELLxGENE_get_expression_data(
        gene=gene_symbol,
        tissue=affected_tissues[0] if affected_tissues else "all"
    )
    
    # 获取细胞类型元数据
    cell_metadata = tu.tools.CELLxGENE_get_cell_metadata(
        gene=gene_symbol
    )
    
    # 筛选高表达细胞类型
    high_expression = [
        ct for ct in expression 
        if ct.get('mean_expression', 0) > 1.0  # TPM > 1
    ]
    
    return {
        'expression_data': expression,
        'high_expression_cells': high_expression,
        'total_cell_types': len(cell_metadata)
    }

重要性: 确认候选基因在疾病相关组织/细胞中表达，支持其作为致病基因的可能性。

3.5.2 Regulatory Context (ChIPAtlas)

3.5.2 调控背景分析（ChIPAtlas）

python

def get_regulatory_context(tu, gene_symbol):
    """Get transcription factor binding for candidate genes."""
    
    # Search for TF binding near gene
    tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
        gene=gene_symbol,
        cell_type="all"
    )
    
    # Get specific binding peaks
    peaks = tu.tools.ChIPAtlas_get_peak_data(
        gene=gene_symbol,
        experiment_type="TF"
    )
    
    return {
        'transcription_factors': tf_binding,
        'regulatory_peaks': peaks
    }

Why it matters: Identifies regulatory mechanisms that may be disrupted in disease.

python

def get_regulatory_context(tu, gene_symbol):
    """获取候选基因的转录因子结合信息。"""
    
    # 搜索基因附近的转录因子结合位点
    tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
        gene=gene_symbol,
        cell_type="all"
    )
    
    # 获取具体结合峰数据
    peaks = tu.tools.ChIPAtlas_get_peak_data(
        gene=gene_symbol,
        experiment_type="TF"
    )
    
    return {
        'transcription_factors': tf_binding,
        'regulatory_peaks': peaks
    }

重要性: 识别可能在疾病中被破坏的调控机制。

3.5.3 Output for Report

3.5.3 报告输出内容

markdown

undefined

markdown

undefined

3.5 Expression & Regulatory Context

3.5 表达与调控背景

Cell-Type Specific Expression (CELLxGENE)

细胞类型特异性表达（CELLxGENE）

Gene	Top Expressing Cell Types	Expression Level	Tissue Relevance
FBN1	Fibroblasts, Smooth muscle	High (TPM=45)	✓ Connective tissue
TGFBR1	Endothelial, Fibroblasts	Medium (TPM=12)	✓ Vascular
COL3A1	Fibroblasts, Myofibroblasts	Very High (TPM=120)	✓ Connective tissue

Interpretation: All top candidate genes show high expression in disease-relevant cell types (connective tissue, vascular cells), supporting their candidacy.

基因	高表达细胞类型	表达水平	组织相关性
FBN1	成纤维细胞、平滑肌细胞	高(TPM=45)	✓ 结缔组织
TGFBR1	内皮细胞、成纤维细胞	中(TPM=12)	✓ 血管
COL3A1	成纤维细胞、肌成纤维细胞	极高(TPM=120)	✓ 结缔组织

解读: 所有顶级候选基因在疾病相关细胞类型（结缔组织、血管细胞）中均呈高表达，支持其作为致病基因的合理性。

Regulatory Context (ChIPAtlas)

调控背景（ChIPAtlas）

Gene	Key TF Regulators	Regulatory Significance
FBN1	TGFβ pathway (SMAD2/3), AP-1	TGFβ-responsive
TGFBR1	STAT3, NF-κB	Inflammation-responsive

Source: CELLxGENE Census, ChIPAtlas

---

基因	关键转录因子调控因子	调控意义
FBN1	TGFβ通路(SMAD2/3), AP-1	TGFβ响应基因
TGFBR1	STAT3, NF-κB	炎症响应基因

Source: CELLxGENE Census, ChIPAtlas

---

Phase 3.6: Pathway Analysis (NEW)

阶段3.6：通路分析（新增）

3.6.1 KEGG Pathway Context

3.6.1 KEGG通路背景

python

def get_pathway_context(tu, gene_symbols):
    """Get pathway context for candidate genes."""
    
    pathways = {}
    for gene in gene_symbols:
        # Search KEGG for gene
        kegg_genes = tu.tools.kegg_find_genes(query=f"hsa:{gene}")
        
        if kegg_genes:
            # Get pathway membership
            gene_info = tu.tools.kegg_get_gene_info(gene_id=kegg_genes[0]['id'])
            pathways[gene] = gene_info.get('pathways', [])
    
    return pathways

python

def get_pathway_context(tu, gene_symbols):
    """获取候选基因的通路背景信息。"""
    
    pathways = {}
    for gene in gene_symbols:
        # 搜索KEGG基因信息
        kegg_genes = tu.tools.kegg_find_genes(query=f"hsa:{gene}")
        
        if kegg_genes:
            # 获取通路成员信息
            gene_info = tu.tools.kegg_get_gene_info(gene_id=kegg_genes[0]['id'])
            pathways[gene] = gene_info.get('pathways', [])
    
    return pathways

3.6.2 Protein-Protein Interactions (IntAct)

3.6.2 蛋白质-蛋白质相互作用（IntAct）

python

def get_protein_interactions(tu, gene_symbol):
    """Get interaction partners for candidate genes."""
    
    # Search IntAct for interactions
    interactions = tu.tools.intact_search_interactions(
        query=gene_symbol,
        species="human"
    )
    
    # Get interaction network
    network = tu.tools.intact_get_interaction_network(
        gene=gene_symbol,
        depth=1  # Direct interactors only
    )
    
    return {
        'interactions': interactions,
        'network': network,
        'interactor_count': len(interactions)
    }

python

def get_protein_interactions(tu, gene_symbol):
    """获取候选基因的相互作用蛋白。"""
    
    # 搜索IntAct相互作用数据
    interactions = tu.tools.intact_search_interactions(
        query=gene_symbol,
        species="human"
    )
    
    # 获取相互作用网络
    network = tu.tools.intact_get_interaction_network(
        gene=gene_symbol,
        depth=1  # 仅直接相互作用蛋白
    )
    
    return {
        'interactions': interactions,
        'network': network,
        'interactor_count': len(interactions)
    }

3.6.3 Output for Report

3.6.3 报告输出内容

markdown

undefined

markdown

undefined

3.6 Pathway & Network Context

3.6 通路与网络背景

KEGG Pathways

KEGG通路

Gene	Key Pathways	Biological Process
FBN1	ECM-receptor interaction (hsa04512)	Extracellular matrix
TGFBR1/2	TGF-beta signaling (hsa04350)	Cell signaling
COL3A1	Focal adhesion (hsa04510)	Cell-matrix adhesion

基因	关键通路	生物学过程
FBN1	ECM-受体相互作用(hsa04512)	细胞外基质
TGFBR1/2	TGF-beta信号通路(hsa04350)	细胞信号传导
COL3A1	黏着斑(hsa04510)	细胞-基质黏附

Shared Pathway Analysis

共享通路分析

Convergent pathways (≥2 candidate genes):

TGF-beta signaling pathway: FBN1, TGFBR1, TGFBR2, SMAD3
ECM organization: FBN1, COL3A1

Interpretation: Candidate genes converge on TGF-beta signaling and extracellular matrix pathways, consistent with connective tissue disorder etiology.

汇聚通路（≥2个候选基因参与）:

TGF-beta信号通路: FBN1, TGFBR1, TGFBR2, SMAD3
ECM组织: FBN1, COL3A1

解读: 候选基因汇聚于TGF-beta信号通路和细胞外基质通路，与结缔组织疾病的病因一致。

Protein-Protein Interactions (IntAct)

蛋白质-蛋白质相互作用（IntAct）

Gene	Direct Interactors	Notable Partners
FBN1	42	LTBP1, TGFB1, ADAMTS10
TGFBR1	68	TGFBR2, SMAD2, SMAD3

Source: KEGG, IntAct, Reactome

---

基因	直接相互作用蛋白	重要相互作用伙伴
FBN1	42个	LTBP1, TGFB1, ADAMTS10
TGFBR1	68个	TGFBR2, SMAD2, SMAD3

Source: KEGG, IntAct, Reactome

---

Phase 4: Variant Interpretation (If Provided)

阶段4：变异解读（若提供变异信息）

4.1 ClinVar Lookup

4.1 ClinVar查询

python

def interpret_variant(tu, variant_hgvs):
    """Get ClinVar interpretation for variant."""
    result = tu.tools.ClinVar_search_variants(query=variant_hgvs)
    
    return {
        'clinvar_id': result.get('id'),
        'classification': result.get('clinical_significance'),
        'review_status': result.get('review_status'),
        'conditions': result.get('conditions'),
        'last_evaluated': result.get('last_evaluated')
    }

python

def interpret_variant(tu, variant_hgvs):
    """获取ClinVar对变异的解读。"""
    result = tu.tools.ClinVar_search_variants(query=variant_hgvs)
    
    return {
        'clinvar_id': result.get('id'),
        'classification': result.get('clinical_significance'),
        'review_status': result.get('review_status'),
        'conditions': result.get('conditions'),
        'last_evaluated': result.get('last_evaluated')
    }

4.2 Population Frequency

4.2 人群频率分析

python

def check_population_frequency(tu, variant_id):
    """Get gnomAD allele frequency."""
    freq = tu.tools.gnomAD_get_variant_frequencies(variant_id=variant_id)
    
    # Interpret rarity
    if freq['allele_frequency'] < 0.00001:
        rarity = "Ultra-rare"
    elif freq['allele_frequency'] < 0.0001:
        rarity = "Rare"
    elif freq['allele_frequency'] < 0.01:
        rarity = "Low frequency"
    else:
        rarity = "Common (likely benign)"
    
    return freq, rarity

python

def check_population_frequency(tu, variant_id):
    """获取gnomAD等位基因频率。"""
    freq = tu.tools.gnomAD_get_variant_frequencies(variant_id=variant_id)
    
    # 解读稀有性
    if freq['allele_frequency'] < 0.00001:
        rarity = "Ultra-rare"
    elif freq['allele_frequency'] < 0.0001:
        rarity = "Rare"
    elif freq['allele_frequency'] < 0.01:
        rarity = "Low frequency"
    else:
        rarity = "Common (likely benign)"
    
    return freq, rarity

4.3 Computational Pathogenicity Prediction (ENHANCED)

4.3 计算致病性预测（增强版）

Use state-of-the-art prediction tools for VUS interpretation:

python

def comprehensive_vus_prediction(tu, variant_info):
    """
    Combine multiple prediction tools for VUS classification.
    Critical for rare disease variants not in ClinVar.
    """
    predictions = {}
    
    # 1. CADD - Deleteriousness (NEW API)
    cadd = tu.tools.CADD_get_variant_score(
        chrom=variant_info['chrom'],
        pos=variant_info['pos'],
        ref=variant_info['ref'],
        alt=variant_info['alt'],
        version="GRCh38-v1.7"
    )
    if cadd.get('status') == 'success':
        predictions['cadd'] = {
            'score': cadd['data'].get('phred_score'),
            'interpretation': cadd['data'].get('interpretation'),
            'acmg': 'PP3' if cadd['data'].get('phred_score', 0) >= 20 else 'neutral'
        }
    
    # 2. AlphaMissense - DeepMind pathogenicity (NEW)
    if variant_info.get('uniprot_id') and variant_info.get('aa_change'):
        am = tu.tools.AlphaMissense_get_variant_score(
            uniprot_id=variant_info['uniprot_id'],
            variant=variant_info['aa_change']  # e.g., "E1541K"
        )
        if am.get('status') == 'success' and am.get('data'):
            classification = am['data'].get('classification')
            predictions['alphamissense'] = {
                'score': am['data'].get('pathogenicity_score'),
                'classification': classification,
                'acmg': 'PP3 (strong)' if classification == 'pathogenic' else (
                    'BP4 (strong)' if classification == 'benign' else 'neutral'
                )
            }
    
    # 3. EVE - Evolutionary prediction (NEW)
    eve = tu.tools.EVE_get_variant_score(
        chrom=variant_info['chrom'],
        pos=variant_info['pos'],
        ref=variant_info['ref'],
        alt=variant_info['alt']
    )
    if eve.get('status') == 'success':
        eve_scores = eve['data'].get('eve_scores', [])
        if eve_scores:
            predictions['eve'] = {
                'score': eve_scores[0].get('eve_score'),
                'classification': eve_scores[0].get('classification'),
                'acmg': 'PP3' if eve_scores[0].get('eve_score', 0) > 0.5 else 'BP4'
            }
    
    # 4. SpliceAI - Splice variant prediction (NEW)
    # Use for intronic, synonymous, or exonic variants near splice sites
    variant_str = f"chr{variant_info['chrom']}-{variant_info['pos']}-{variant_info['ref']}-{variant_info['alt']}"
    splice = tu.tools.SpliceAI_predict_splice(
        variant=variant_str,
        genome="38"
    )
    if splice.get('data'):
        max_score = splice['data'].get('max_delta_score', 0)
        interpretation = splice['data'].get('interpretation', '')
        
        if max_score >= 0.8:
            splice_acmg = 'PP3 (strong) - high splice impact'
        elif max_score >= 0.5:
            splice_acmg = 'PP3 (moderate) - splice impact'
        elif max_score >= 0.2:
            splice_acmg = 'PP3 (supporting) - possible splice effect'
        else:
            splice_acmg = 'BP7 (if synonymous) - no splice impact'
        
        predictions['spliceai'] = {
            'max_delta_score': max_score,
            'interpretation': interpretation,
            'scores': splice['data'].get('scores', []),
            'acmg': splice_acmg
        }
    
    # Consensus for PP3/BP4
    damaging = sum(1 for p in predictions.values() if 'PP3' in p.get('acmg', ''))
    benign = sum(1 for p in predictions.values() if 'BP4' in p.get('acmg', ''))
    
    return {
        'predictions': predictions,
        'consensus': {
            'damaging_count': damaging,
            'benign_count': benign,
            'pp3_applicable': damaging >= 2 and benign == 0,
            'bp4_applicable': benign >= 2 and damaging == 0
        }
    }

使用最先进的预测工具进行VUS解读：

python

def comprehensive_vus_prediction(tu, variant_info):
    """
    结合多种预测工具进行VUS分类。
    这对数据库中未收录的罕见病变异至关重要。
    """
    predictions = {}
    
    # 1. CADD - 有害性预测（新增API）
    cadd = tu.tools.CADD_get_variant_score(
        chrom=variant_info['chrom'],
        pos=variant_info['pos'],
        ref=variant_info['ref'],
        alt=variant_info['alt'],
        version="GRCh38-v1.7"
    )
    if cadd.get('status') == 'success':
        predictions['cadd'] = {
            'score': cadd['data'].get('phred_score'),
            'interpretation': cadd['data'].get('interpretation'),
            'acmg': 'PP3' if cadd['data'].get('phred_score', 0) >= 20 else 'neutral'
        }
    
    # 2. AlphaMissense - DeepMind致病性预测（新增）
    if variant_info.get('uniprot_id') and variant_info.get('aa_change'):
        am = tu.tools.AlphaMissense_get_variant_score(
            uniprot_id=variant_info['uniprot_id'],
            variant=variant_info['aa_change']  # 例如: "E1541K"
        )
        if am.get('status') == 'success' and am.get('data'):
            classification = am['data'].get('classification')
            predictions['alphamissense'] = {
                'score': am['data'].get('pathogenicity_score'),
                'classification': classification,
                'acmg': 'PP3 (strong)' if classification == 'pathogenic' else (
                    'BP4 (strong)' if classification == 'benign' else 'neutral'
                )
            }
    
    # 3. EVE - 进化预测（新增）
    eve = tu.tools.EVE_get_variant_score(
        chrom=variant_info['chrom'],
        pos=variant_info['pos'],
        ref=variant_info['ref'],
        alt=variant_info['alt']
    )
    if eve.get('status') == 'success':
        eve_scores = eve['data'].get('eve_scores', [])
        if eve_scores:
            predictions['eve'] = {
                'score': eve_scores[0].get('eve_score'),
                'classification': eve_scores[0].get('classification'),
                'acmg': 'PP3' if eve_scores[0].get('eve_score', 0) > 0.5 else 'BP4'
            }
    
    # 4. SpliceAI - 剪接变异预测（新增）
    # 用于内含子、同义或剪接位点附近的外显子变异
    variant_str = f"chr{variant_info['chrom']}-{variant_info['pos']}-{variant_info['ref']}-{variant_info['alt']}"
    splice = tu.tools.SpliceAI_predict_splice(
        variant=variant_str,
        genome="38"
    )
    if splice.get('data'):
        max_score = splice['data'].get('max_delta_score', 0)
        interpretation = splice['data'].get('interpretation', '')
        
        if max_score >= 0.8:
            splice_acmg = 'PP3 (strong) - high splice impact'
        elif max_score >= 0.5:
            splice_acmg = 'PP3 (moderate) - splice impact'
        elif max_score >= 0.2:
            splice_acmg = 'PP3 (supporting) - possible splice effect'
        else:
            splice_acmg = 'BP7 (if synonymous) - no splice impact'
        
        predictions['spliceai'] = {
            'max_delta_score': max_score,
            'interpretation': interpretation,
            'scores': splice['data'].get('scores', []),
            'acmg': splice_acmg
        }
    
    # PP3/BP4共识
    damaging = sum(1 for p in predictions.values() if 'PP3' in p.get('acmg', ''))
    benign = sum(1 for p in predictions.values() if 'BP4' in p.get('acmg', ''))
    
    return {
        'predictions': predictions,
        'consensus': {
            'damaging_count': damaging,
            'benign_count': benign,
            'pp3_applicable': damaging >= 2 and benign == 0,
            'bp4_applicable': benign >= 2 and damaging == 0
        }
    }

4.4 ACMG Classification Criteria

4.4 ACMG分类标准

Evidence Type	Criteria	Weight
PVS1	Null variant in gene where LOF is mechanism	Very Strong
PS1	Same amino acid change as established pathogenic	Strong
PM2	Absent from population databases	Moderate
PP3	Computational evidence supports deleterious (AlphaMissense, CADD, EVE, SpliceAI)	Supporting
BA1	Allele frequency >5%	Benign standalone

Enhanced PP3 Evidence (NEW):

AlphaMissense pathogenic (>0.564) = Strong PP3 support (~90% accuracy)
CADD ≥20 + EVE >0.5 = Multiple concordant predictions
Agreement from 2+ predictors strengthens PP3 evidence

证据类型	标准	强度
PVS1	基因功能缺失（LOF）为致病机制的基因中的无效变异	极强
PS1	与已明确致病的氨基酸改变相同	强
PM2	人群数据库中未收录该变异	中等
PP3	计算证据支持有害（AlphaMissense、CADD、EVE、SpliceAI）	支持
BA1	等位基因频率>5%	良性独立证据

增强版PP3证据（新增）:

AlphaMissense pathogenic (>0.564) = 强PP3支持（约90%准确率）
CADD ≥20 + EVE >0.5 = 多种预测工具结果一致
2种以上预测工具结果一致可增强PP3证据强度

4.5 Output for Report

4.5 报告输出内容

markdown

undefined

markdown

undefined

4. Variant Interpretation

4. 变异解读

4.1 Variant: FBN1 c.4621G>A (p.Glu1541Lys)

4.1 变异信息：FBN1 c.4621G>A (p.Glu1541Lys)

Property	Value	Interpretation
Gene	FBN1	Marfan syndrome gene
Consequence	Missense	Amino acid change
ClinVar	VUS	Uncertain significance
gnomAD AF	0.000004	Ultra-rare (PM2)

属性	值	解读
基因	FBN1	马凡综合征致病基因
变异后果	错义变异	氨基酸改变
ClinVar分类	VUS	意义不明确的变异
gnomAD等位基因频率	0.000004	极罕见（PM2）

4.2 Computational Predictions (NEW)

4.2 计算预测结果（新增）

Predictor	Score	Classification	ACMG Support
AlphaMissense	0.78	Pathogenic	PP3 (strong)
CADD PHRED	28.5	Top 0.1% deleterious	PP3
EVE	0.72	Likely pathogenic	PP3

Consensus: 3/3 predictors concordant damaging → Strong PP3 support

Source: AlphaMissense, CADD API, EVE via Ensembl VEP

预测工具	分数	分类	ACMG支持证据
AlphaMissense	0.78	Pathogenic	PP3 (strong)
CADD PHRED	28.5	前0.1%有害变异	PP3
EVE	0.72	Likely pathogenic	PP3

共识: 3/3预测工具一致判定为有害 → 强PP3支持

Source: AlphaMissense, CADD API, EVE via Ensembl VEP

4.3 ACMG Evidence Summary

4.3 ACMG证据汇总

Criterion	Evidence	Strength
PM2	Absent from gnomAD (AF < 0.00001)	Moderate
PP3	AlphaMissense + CADD + EVE concordant	Supporting (strong)
PP4	Phenotype highly specific for Marfan	Supporting
PS4	Multiple affected family members	Strong

Preliminary Classification: Likely Pathogenic (1 Strong + 1 Moderate + 2 Supporting)

Source: ClinVar, gnomAD, AlphaMissense, CADD, EVE

---

标准	证据	强度
PM2	gnomAD中未收录（AF < 0.00001）	中等
PP3	AlphaMissense + CADD + EVE结果一致	支持（强）
PP4	表型高度符合马凡综合征	支持
PS4	多个家族成员受累	强

初步分类: 可能致病（1项强证据 + 1项中等证据 + 2项支持证据）

Source: ClinVar, gnomAD, AlphaMissense, CADD, EVE

---

Phase 5: Structure Analysis for VUS

阶段5：VUS结构分析

5.1 When to Perform Structure Analysis

5.1 何时进行结构分析

Perform when:

Variant is VUS or conflicting interpretations
Missense variant in critical domain
Novel variant not in databases
Additional evidence needed for classification

在以下情况时进行：

变异为VUS或存在相互矛盾的解读
错义变异位于关键结构域
数据库中未收录的新变异
需要额外证据进行分类

5.2 Structure Prediction (NVIDIA NIM)

5.2 结构预测（NVIDIA NIM）

python

def analyze_variant_structure(tu, protein_sequence, variant_position):
    """Predict structure and analyze variant impact."""
    
    # Predict structure with AlphaFold2
    structure = tu.tools.NvidiaNIM_alphafold2(
        sequence=protein_sequence,
        algorithm="mmseqs2",
        relax_prediction=False
    )
    
    # Extract pLDDT at variant position
    variant_plddt = get_residue_plddt(structure, variant_position)
    
    # Check if in structured region
    confidence = "High" if variant_plddt > 70 else "Low"
    
    return {
        'structure': structure,
        'variant_plddt': variant_plddt,
        'confidence': confidence
    }

python

def analyze_variant_structure(tu, protein_sequence, variant_position):
    """预测蛋白质结构并分析变异影响。"""
    
    # 使用AlphaFold2预测结构
    structure = tu.tools.NvidiaNIM_alphafold2(
        sequence=protein_sequence,
        algorithm="mmseqs2",
        relax_prediction=False
    )
    
    # 提取变异位置的pLDDT值
    variant_plddt = get_residue_plddt(structure, variant_position)
    
    # 检查是否位于结构化区域
    confidence = "High" if variant_plddt > 70 else "Low"
    
    return {
        'structure': structure,
        'variant_plddt': variant_plddt,
        'confidence': confidence
    }

5.3 Domain Impact Assessment

5.3 结构域影响评估

python

def assess_domain_impact(tu, uniprot_id, variant_position):
    """Check if variant affects functional domain."""
    
    # Get domain annotations
    domains = tu.tools.InterPro_get_protein_domains(accession=uniprot_id)
    
    for domain in domains:
        if domain['start'] <= variant_position <= domain['end']:
            return {
                'in_domain': True,
                'domain_name': domain['name'],
                'domain_function': domain['description']
            }
    
    return {'in_domain': False}

python

def assess_domain_impact(tu, uniprot_id, variant_position):
    """检查变异是否影响功能结构域。"""
    
    # 获取结构域注释
    domains = tu.tools.InterPro_get_protein_domains(accession=uniprot_id)
    
    for domain in domains:
        if domain['start'] <= variant_position <= domain['end']:
            return {
                'in_domain': True,
                'domain_name': domain['name'],
                'domain_function': domain['description']
            }
    
    return {'in_domain': False}

5.4 Output for Report

5.4 报告输出内容

markdown

undefined

markdown

undefined

5. Structural Analysis

5. 结构分析

5.1 Structure Prediction

5.1 结构预测

Method: AlphaFold2 via NVIDIA NIM Protein: Fibrillin-1 (FBN1) Sequence Length: 2,871 amino acids

Metric	Value	Interpretation
Mean pLDDT	85.3	High confidence overall
Variant position pLDDT	92.1	Very high confidence
Nearby domain	cbEGF-like domain 23	Calcium-binding

方法: AlphaFold2 via NVIDIA NIM 蛋白质: 原纤维蛋白-1（FBN1） 序列长度: 2,871个氨基酸

指标	值	解读
平均pLDDT	85.3	整体置信度高
变异位置pLDDT	92.1	置信度极高
附近结构域	cbEGF-like domain 23	钙结合结构域

5.2 Variant Location Analysis

5.2 变异位置分析

Variant: p.Glu1541Lys

Feature	Finding	Impact
Domain	cbEGF-like domain 23	Critical for calcium binding
Conservation	100% conserved across vertebrates	High constraint
Structural role	Calcium coordination residue	Likely destabilizing
Nearby pathogenic	p.Glu1540Lys (Pathogenic)	Adjacent residue

变异: p.Glu1541Lys

特征	发现	影响
结构域	cbEGF-like domain 23	对钙结合至关重要
保守性	脊椎动物中100%保守	高约束
结构作用	钙配位残基	可能导致结构不稳定
附近致病变异	p.Glu1540Lys（致病）	相邻残基

5.3 Structural Interpretation

5.3 结构解读

The variant p.Glu1541Lys:

Located in cbEGF domain - These domains are critical for fibrillin-1 function
Glutamate → Lysine - Charge reversal (negative to positive)
Calcium binding - Glutamate at this position coordinates Ca2+
Adjacent pathogenic variant - p.Glu1540Lys is classified Pathogenic

Structural Evidence: Strong support for pathogenicity (PM1 - critical domain)

Source: NVIDIA NIM via
NvidiaNIM_alphafold2
, InterPro

---

变异p.Glu1541Lys：

位于cbEGF结构域 - 该结构域对原纤维蛋白-1的功能至关重要
谷氨酸→赖氨酸 - 电荷反转（负→正）
钙结合 - 该位置的谷氨酸参与Ca2+配位
相邻致病变异 - p.Glu1540Lys被分类为致病性变异

结构证据: 强烈支持致病性（PM1 - 关键结构域）

Source: NVIDIA NIM via
NvidiaNIM_alphafold2
, InterPro

---

Phase 6: Literature Evidence (NEW)

阶段6：文献证据分析（新增）

6.1 Published Literature (PubMed)

6.1 已发表文献（PubMed）

python

def search_disease_literature(tu, disease_name, genes):
    """Search for relevant published literature."""
    
    # Disease-specific search
    disease_papers = tu.tools.PubMed_search_articles(
        query=f'"{disease_name}" AND (genetics OR mutation OR variant)',
        limit=20
    )
    
    # Gene-specific searches
    gene_papers = []
    for gene in genes[:5]:  # Top 5 genes
        papers = tu.tools.PubMed_search_articles(
            query=f'"{gene}" AND rare disease AND pathogenic',
            limit=10
        )
        gene_papers.extend(papers)
    
    return {
        'disease_literature': disease_papers,
        'gene_literature': gene_papers
    }

python

def search_disease_literature(tu, disease_name, genes):
    """搜索相关已发表文献。"""
    
    # 疾病特异性搜索
    disease_papers = tu.tools.PubMed_search_articles(
        query=f'"{disease_name}" AND (genetics OR mutation OR variant)',
        limit=20
    )
    
    # 基因特异性搜索
    gene_papers = []
    for gene in genes[:5]:  # 前5个基因
        papers = tu.tools.PubMed_search_articles(
            query=f'"{gene}" AND rare disease AND pathogenic',
            limit=10
        )
        gene_papers.extend(papers)
    
    return {
        'disease_literature': disease_papers,
        'gene_literature': gene_papers
    }

6.2 Preprint Literature (BioRxiv/MedRxiv)

6.2 预印本文献（BioRxiv/MedRxiv）

python

def search_preprints(tu, disease_name, genes):
    """Search preprints for cutting-edge findings."""
    
    # BioRxiv search
    biorxiv = tu.tools.BioRxiv_search_preprints(
        query=f"{disease_name} genetics",
        limit=10
    )
    
    # ArXiv for computational methods
    arxiv = tu.tools.ArXiv_search_papers(
        query=f"rare disease diagnosis {' OR '.join(genes[:3])}",
        category="q-bio",
        limit=5
    )
    
    return {
        'biorxiv': biorxiv,
        'arxiv': arxiv
    }

python

def search_preprints(tu, disease_name, genes):
    """搜索预印本获取前沿发现。"""
    
    # BioRxiv搜索
    biorxiv = tu.tools.BioRxiv_search_preprints(
        query=f"{disease_name} genetics",
        limit=10
    )
    
    # ArXiv计算方法相关搜索
    arxiv = tu.tools.ArXiv_search_papers(
        query=f"rare disease diagnosis {' OR '.join(genes[:3])}",
        category="q-bio",
        limit=5
    )
    
    return {
        'biorxiv': biorxiv,
        'arxiv': arxiv
    }

6.3 Citation Analysis (OpenAlex)

6.3 引用分析（OpenAlex）

python

def analyze_citations(tu, key_papers):
    """Analyze citation network for key papers."""
    
    citation_analysis = []
    for paper in key_papers[:5]:
        # Get citation data
        work = tu.tools.openalex_search_works(
            query=paper['title'],
            limit=1
        )
        if work:
            citation_analysis.append({
                'title': paper['title'],
                'citations': work[0].get('cited_by_count', 0),
                'year': work[0].get('publication_year')
            })
    
    return citation_analysis

python

def analyze_citations(tu, key_papers):
    """分析关键论文的引用网络。"""
    
    citation_analysis = []
    for paper in key_papers[:5]:
        # 获取引用数据
        work = tu.tools.openalex_search_works(
            query=paper['title'],
            limit=1
        )
        if work:
            citation_analysis.append({
                'title': paper['title'],
                'citations': work[0].get('cited_by_count', 0),
                'year': work[0].get('publication_year')
            })
    
    return citation_analysis

6.4 Output for Report

6.4 报告输出内容

markdown

undefined

markdown

undefined

6. Literature Evidence

6. 文献证据

6.1 Key Published Studies

6.1 关键已发表研究

PMID	Title	Year	Citations	Relevance
32123456	FBN1 variants in Marfan syndrome...	2023	45	Direct
31987654	TGF-beta signaling in connective...	2022	89	Pathway
30876543	Novel diagnostic criteria for...	2021	156	Diagnostic

PMID	标题	年份	引用数	相关性
32123456	FBN1 variants in Marfan syndrome...	2023	45	直接相关
31987654	TGF-beta signaling in connective...	2022	89	通路相关
30876543	Novel diagnostic criteria for...	2021	156	诊断相关

6.2 Recent Preprints (Not Yet Peer-Reviewed)

6.2 近期预印本（尚未同行评审）

Source	Title	Posted	Relevance
BioRxiv	Novel FBN1 splice variant causes...	2024-01	Case report
MedRxiv	Machine learning for Marfan...	2024-02	Diagnostic

⚠️ Note: Preprints have not undergone peer review. Use with caution.

来源	标题	发布日期	相关性
BioRxiv	Novel FBN1 splice variant causes...	2024-01	病例报告
MedRxiv	Machine learning for Marfan...	2024-02	诊断相关

⚠️ 注意: 预印本尚未经过同行评审，谨慎使用。

6.3 Evidence Summary

6.3 证据汇总

Evidence Type	Count	Strength
Case reports	12	Supporting
Functional studies	5	Strong
Clinical trials	2	Strong
Reviews	8	Context

Source: PubMed, BioRxiv, OpenAlex

---

证据类型	数量	强度
病例报告	12	支持
功能研究	5	强
临床试验	2	强
综述	8	背景

Source: PubMed, BioRxiv, OpenAlex

---

Report Template

报告模板

File:

[PATIENT_ID]_rare_disease_report.md

markdown

undefined

文件:

[PATIENT_ID]_rare_disease_report.md

markdown

undefined

[Researching...]

[研究中...]

7. Data Gaps & Limitations

7. 数据缺口与局限性

[Researching...]

[研究中...]

8. Data Sources

8. 数据来源

[Will be populated as research progresses...]

---

[将随着研究进展逐步完善...]

---

Evidence Grading

证据分级

Tier	Symbol	Criteria	Example
T1	★★★	Phenotype match >80% + gene match	Marfan with FBN1 mutation
T2	★★☆	Phenotype match 60-80% OR likely pathogenic variant	Good phenotype fit
T3	★☆☆	Phenotype match 40-60% OR VUS in candidate gene	Possible diagnosis
T4	☆☆☆	Phenotype <40% OR uncertain gene	Low probability

层级	符号	标准	示例
T1	★★★	表型匹配度>80% + 基因匹配	马凡综合征伴FBN1突变
T2	★★☆	表型匹配度60-80% 或可能致病变异	表型匹配度良好
T3	★☆☆	表型匹配度40-60% 或候选基因中的VUS	可能的诊断
T4	☆☆☆	表型匹配度<40% 或基因关联性不确定	低概率

Completeness Checklist

完整性检查清单

Phase 1: Phenotype

阶段1：表型

All symptoms converted to HPO terms
Core vs. variable features distinguished
Age of onset documented
Family history noted

所有症状已转换为HPO术语
已区分核心与可变特征
已记录发病年龄
已记录家族病史

Phase 2: Disease Matching

阶段2：疾病匹配

≥5 candidate diseases identified (or all matching)
Phenotype overlap % calculated
Inheritance patterns noted
ORPHA and OMIM IDs provided

已识别≥5种候选疾病（或所有匹配疾病）
已计算表型重叠百分比
已记录遗传模式
已提供ORPHA和OMIM ID

Phase 3: Gene Panel

阶段3：基因Panel

≥5 genes prioritized (or all from top diseases)
Evidence level for each gene (ClinGen)
Expression validation performed
Testing strategy recommended

已优先推荐≥5个基因（或所有顶级疾病相关基因）
已标注每个基因的证据等级（ClinGen）
已完成表达验证
已推荐检测策略

Phase 4: Variant Interpretation (if applicable)

阶段4：变异解读（如适用）

Phase 5: Structure Analysis (if applicable)

阶段5：结构分析（如适用）

Phase 6: Recommendations

阶段6：建议

Fallback Chains

备选工具链

Primary Tool	Fallback 1	Fallback 2
`Orphanet_search_by_hpo`	`OMIM_search`	PubMed phenotype search
`ClinVar_get_variant`	`gnomAD_get_variant`	VEP annotation
`NvidiaNIM_alphafold2`	`alphafold_get_prediction`	UniProt features
`GTEx_expression`	`HPA_expression`	Tissue-specific literature
`gnomAD_get_variant`	`ExAC_frequencies`	1000 Genomes

主工具	备选工具1	备选工具2
`Orphanet_search_by_hpo`	`OMIM_search`	PubMed表型搜索
`ClinVar_get_variant`	`gnomAD_get_variant`	VEP注释
`NvidiaNIM_alphafold2`	`alphafold_get_prediction`	UniProt特征分析
`GTEx_expression`	`HPA_expression`	组织特异性文献
`gnomAD_get_variant`	`ExAC_frequencies`	1000 Genomes

Tool Reference

工具参考

See TOOLS_REFERENCE.md for complete tool documentation.

完整工具文档请参考 TOOLS_REFERENCE.md。