tooluniverse-rare-disease-diagnosis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRare Disease Diagnosis Advisor
罕见病诊断顾问
Systematic diagnosis support for rare diseases using phenotype matching, gene panel prioritization, and variant interpretation across Orphanet, OMIM, HPO, ClinVar, and structure-based analysis.
KEY PRINCIPLES:
- Report-first approach - Create report file FIRST, update progressively
- Phenotype-driven - Convert symptoms to HPO terms before searching
- Multi-database triangulation - Cross-reference Orphanet, OMIM, OpenTargets
- Evidence grading - Grade diagnoses by supporting evidence strength
- Actionable output - Prioritized differential diagnosis with next steps
- Genetic counseling aware - Consider inheritance patterns and family history
- English-first queries - Always use English terms in tool calls (phenotype descriptions, gene names, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language
通过表型匹配、基因panel优先排序以及跨Orphanet、OMIM、HPO、ClinVar的变异解读和基于结构的分析,为罕见病提供系统性诊断支持。
核心原则:
- 报告优先方法 - 先创建报告文件,逐步更新内容
- 表型驱动 - 在搜索前将症状转换为HPO术语
- 多数据库交叉验证 - 交叉引用Orphanet、OMIM、OpenTargets数据
- 证据分级 - 根据支持证据的强度对诊断进行分级
- 可执行输出 - 附带下一步建议的优先鉴别诊断结果
- 考虑遗传咨询 - 结合遗传模式和家族病史
- 英文优先查询 - 工具调用中始终使用英文术语(表型描述、基因名称、疾病名称),即使用户使用其他语言提问。仅在无法匹配时尝试使用原语言术语。以用户使用的语言回复
When to Use
使用场景
Apply when user asks:
- "Patient has [symptoms], what rare disease could this be?"
- "Unexplained developmental delay with [features]"
- "WES found VUS in [gene], is this pathogenic?"
- "What genes should we test for [phenotype]?"
- "Differential diagnosis for [rare symptom combination]"
当用户提出以下问题时适用:
- "患者有[症状],可能是什么罕见病?"
- "不明原因的发育迟缓伴[特征]"
- "全外显子测序(WES)在[基因]中发现VUS,该变异是否致病?"
- "针对[表型]我们应该检测哪些基因?"
- "[罕见症状组合]的鉴别诊断"
Critical Workflow Requirements
关键工作流要求
1. Report-First Approach (MANDATORY)
1. 报告优先方法(强制要求)
-
Create the report file FIRST:
- File name:
[PATIENT_ID]_rare_disease_report.md - Initialize with all section headers
- Add placeholder text:
[Researching...]
- File name:
-
Progressively update as you gather data
-
Output separate data files:
- - Prioritized genes for testing
[PATIENT_ID]_gene_panel.csv - - If variants provided
[PATIENT_ID]_variant_interpretation.csv
-
先创建报告文件:
- 文件名:
[PATIENT_ID]_rare_disease_report.md - 初始化所有章节标题
- 添加占位文本:
[研究中...]
- 文件名:
-
逐步更新内容:随着数据收集逐步完善报告
-
输出独立数据文件:
- - 优先推荐的检测基因列表
[PATIENT_ID]_gene_panel.csv - - 若提供了变异信息则生成该文件
[PATIENT_ID]_variant_interpretation.csv
2. Citation Requirements (MANDATORY)
2. 引用要求(强制要求)
Every finding MUST include source:
markdown
undefined所有发现必须标注来源:
markdown
undefinedCandidate Disease: Marfan Syndrome
候选疾病:马凡综合征
- ORPHA: ORPHA:558
- OMIM: 154700
- Phenotype match: 85% (17/20 HPO terms)
- Inheritance: AD
- Gene: FBN1
Source: Orphanet via , OMIM via
Orphanet_558OMIM_get_entry
---- ORPHA: ORPHA:558
- OMIM: 154700
- 表型匹配度: 85% (17/20个HPO术语匹配)
- 遗传模式: AD(常染色体显性遗传)
- 致病基因: FBN1
来源: Orphanet via , OMIM via
Orphanet_558OMIM_get_entry
---Phase 0: Tool Verification
阶段0:工具参数验证
CRITICAL: Verify tool parameters before calling.
关键: 在调用工具前务必验证参数正确性。
Known Parameter Corrections
已知参数修正
| Tool | WRONG Parameter | CORRECT Parameter |
|---|---|---|
| | |
| | |
| | |
| | |
| 工具 | 错误参数 | 正确参数 |
|---|---|---|
| | |
| | |
| | |
| | |
Workflow Overview
工作流概述
Phase 1: Phenotype Standardization
├── Convert symptoms to HPO terms
├── Identify core vs. variable features
└── Note age of onset, inheritance hints
↓
Phase 2: Disease Matching
├── Orphanet phenotype search
├── OMIM clinical synopsis match
├── OpenTargets disease associations
└── OUTPUT: Ranked differential diagnosis
↓
Phase 3: Gene Panel Identification
├── Extract genes from top diseases
├── Cross-reference expression (GTEx)
├── Prioritize by evidence strength
└── OUTPUT: Recommended gene panel
↓
Phase 3.5: Expression & Tissue Context (NEW)
├── CELLxGENE: Cell-type specific expression
├── ChIPAtlas: Regulatory context (TF binding)
├── Tissue-specific gene networks
└── OUTPUT: Expression validation
↓
Phase 3.6: Pathway Analysis (NEW)
├── KEGG: Metabolic/signaling pathways
├── Reactome: Biological processes
├── IntAct: Protein-protein interactions
└── OUTPUT: Biological context
↓
Phase 4: Variant Interpretation (if provided)
├── ClinVar pathogenicity lookup
├── gnomAD population frequency
├── Protein domain/function impact
├── ENCODE/ChIPAtlas: Regulatory variant impact
└── OUTPUT: Variant classification
↓
Phase 5: Structure Analysis (for VUS)
├── NvidiaNIM_alphafold2 → Predict structure
├── Map variant to structure
├── Assess functional domain impact
└── OUTPUT: Structural evidence
↓
Phase 6: Literature Evidence (NEW)
├── PubMed: Published studies
├── BioRxiv/MedRxiv: Preprints
├── OpenAlex: Citation analysis
└── OUTPUT: Literature support
↓
Phase 7: Report Synthesis
├── Prioritized differential diagnosis
├── Recommended genetic testing
├── Next steps for clinician
└── OUTPUT: Final report阶段1:表型标准化
├── 将症状转换为HPO术语
├── 区分核心特征与可变特征
└── 记录发病年龄、遗传提示信息
↓
阶段2:疾病匹配
├── Orphanet表型搜索
├── OMIM临床概要匹配
├── OpenTargets疾病关联分析
└── 输出:排序后的鉴别诊断列表
↓
阶段3:基因Panel确定
├── 从排名靠前的疾病中提取基因
├── 交叉验证基因表达(GTEx)
├── 根据证据强度优先排序
└── 输出:推荐的基因检测Panel
↓
阶段3.5:表达与组织背景分析(新增)
├── CELLxGENE:细胞类型特异性表达分析
├── ChIPAtlas:调控背景(转录因子结合)
├── 组织特异性基因网络分析
└── 输出:表达验证结果
↓
阶段3.6:通路分析(新增)
├── KEGG:代谢/信号通路分析
├── Reactome:生物学过程分析
├── IntAct:蛋白质-蛋白质相互作用分析
└── 输出:生物学背景信息
↓
阶段4:变异解读(若提供变异信息)
├── ClinVar致病性查询
├── gnomAD人群频率分析
├── 蛋白质结构域/功能影响评估
├── ENCODE/ChIPAtlas:调控变异影响分析
└── 输出:变异分类结果
↓
阶段5:VUS结构分析
├── NvidiaNIM_alphafold2 → 预测蛋白质结构
├── 将变异映射到结构上
├── 评估功能结构域的影响
└── 输出:结构证据
↓
阶段6:文献证据分析(新增)
├── PubMed:已发表研究
├── BioRxiv/MedRxiv:预印本
├── OpenAlex:引用分析
└── 输出:文献支持证据
↓
阶段7:报告合成
├── 优先排序的鉴别诊断结果
├── 推荐的基因检测方案
├── 临床医生下一步行动建议
└── 输出:最终报告Phase 1: Phenotype Standardization
阶段1:表型标准化
1.1 Convert Symptoms to HPO Terms
1.1 症状转HPO术语
python
def standardize_phenotype(tu, symptoms_list):
"""Convert clinical descriptions to HPO terms."""
hpo_terms = []
for symptom in symptoms_list:
# Search HPO for matching terms
results = tu.tools.HPO_search_terms(query=symptom)
if results:
hpo_terms.append({
'original': symptom,
'hpo_id': results[0]['id'],
'hpo_name': results[0]['name'],
'confidence': 'exact' if symptom.lower() in results[0]['name'].lower() else 'partial'
})
return hpo_termspython
def standardize_phenotype(tu, symptoms_list):
"""将临床描述转换为HPO术语。"""
hpo_terms = []
for symptom in symptoms_list:
# 搜索HPO匹配术语
results = tu.tools.HPO_search_terms(query=symptom)
if results:
hpo_terms.append({
'original': symptom,
'hpo_id': results[0]['id'],
'hpo_name': results[0]['name'],
'confidence': 'exact' if symptom.lower() in results[0]['name'].lower() else 'partial'
})
return hpo_terms1.2 Phenotype Categories
1.2 表型分类
| Category | Examples | Weight |
|---|---|---|
| Core features | Always present in disease | High |
| Variable features | Present in >50% | Medium |
| Occasional features | Present in <50% | Low |
| Age-specific | Onset-dependent | Context |
| 分类 | 示例 | 权重 |
|---|---|---|
| 核心特征 | 疾病中始终存在的特征 | 高 |
| 可变特征 | 出现率>50%的特征 | 中 |
| 偶发特征 | 出现率<50%的特征 | 低 |
| 年龄特异性特征 | 与发病年龄相关的特征 | 上下文相关 |
1.3 Output for Report
1.3 报告输出内容
markdown
undefinedmarkdown
undefined1. Phenotype Analysis
1. 表型分析
1.1 Standardized HPO Terms
1.1 标准化HPO术语
| Clinical Feature | HPO Term | HPO ID | Category |
|---|---|---|---|
| Tall stature | Tall stature | HP:0000098 | Core |
| Long fingers | Arachnodactyly | HP:0001166 | Core |
| Heart murmur | Cardiac murmur | HP:0030148 | Variable |
| Joint hypermobility | Joint hypermobility | HP:0001382 | Core |
Total HPO Terms: 8
Onset: Childhood
Family History: Father with similar features (AD suspected)
Source: HPO via
HPO_search_terms
---| 临床特征 | HPO术语 | HPO ID | 分类 |
|---|---|---|---|
| 身材高大 | Tall stature | HP:0000098 | 核心 |
| 细长指(蜘蛛指) | Arachnodactyly | HP:0001166 | 核心 |
| 心脏杂音 | Cardiac murmur | HP:0030148 | 可变 |
| 关节过度活动 | Joint hypermobility | HP:0001382 | 核心 |
HPO术语总数: 8
发病年龄: 儿童期
家族病史: 父亲有相似特征(疑似常染色体显性遗传)
来源: HPO via
HPO_search_terms
---Phase 2: Disease Matching
阶段2:疾病匹配
2.1 Orphanet Disease Search (NEW TOOLS)
2.1 Orphanet疾病搜索(新增工具)
python
def match_diseases_orphanet(tu, symptom_keywords):
"""Find rare diseases matching symptoms using Orphanet."""
candidate_diseases = []
# Search Orphanet by disease keywords
for keyword in symptom_keywords:
results = tu.tools.Orphanet_search_diseases(
operation="search_diseases",
query=keyword
)
if results.get('status') == 'success':
candidate_diseases.extend(results['data']['results'])
# Get genes for each disease
for disease in candidate_diseases:
orpha_code = disease.get('ORPHAcode')
genes = tu.tools.Orphanet_get_genes(
operation="get_genes",
orpha_code=orpha_code
)
disease['genes'] = genes.get('data', {}).get('genes', [])
return deduplicate_and_rank(candidate_diseases)python
def match_diseases_orphanet(tu, symptom_keywords):
"""使用Orphanet查找与症状匹配的罕见病。"""
candidate_diseases = []
# 按疾病关键词搜索Orphanet
for keyword in symptom_keywords:
results = tu.tools.Orphanet_search_diseases(
operation="search_diseases",
query=keyword
)
if results.get('status') == 'success':
candidate_diseases.extend(results['data']['results'])
# 获取每种疾病对应的基因
for disease in candidate_diseases:
orpha_code = disease.get('ORPHAcode')
genes = tu.tools.Orphanet_get_genes(
operation="get_genes",
orpha_code=orpha_code
)
disease['genes'] = genes.get('data', {}).get('genes', [])
return deduplicate_and_rank(candidate_diseases)2.2 OMIM Cross-Reference (NEW TOOLS)
2.2 OMIM交叉验证(新增工具)
python
def cross_reference_omim(tu, orphanet_diseases, gene_symbols):
"""Get OMIM details for diseases and genes."""
omim_data = {}
# Search OMIM for each disease/gene
for gene in gene_symbols:
search_result = tu.tools.OMIM_search(
operation="search",
query=gene,
limit=5
)
if search_result.get('status') == 'success':
for entry in search_result['data'].get('entries', []):
mim_number = entry.get('mimNumber')
# Get detailed entry
details = tu.tools.OMIM_get_entry(
operation="get_entry",
mim_number=str(mim_number)
)
# Get clinical synopsis (phenotype features)
synopsis = tu.tools.OMIM_get_clinical_synopsis(
operation="get_clinical_synopsis",
mim_number=str(mim_number)
)
omim_data[gene] = {
'mim_number': mim_number,
'details': details.get('data', {}),
'clinical_synopsis': synopsis.get('data', {})
}
return omim_datapython
def cross_reference_omim(tu, orphanet_diseases, gene_symbols):
"""获取疾病和基因的OMIM详细信息。"""
omim_data = {}
# 搜索每种疾病/基因的OMIM数据
for gene in gene_symbols:
search_result = tu.tools.OMIM_search(
operation="search",
query=gene,
limit=5
)
if search_result.get('status') == 'success':
for entry in search_result['data'].get('entries', []):
mim_number = entry.get('mimNumber')
# 获取详细条目
details = tu.tools.OMIM_get_entry(
operation="get_entry",
mim_number=str(mim_number)
)
# 获取临床概要(表型特征)
synopsis = tu.tools.OMIM_get_clinical_synopsis(
operation="get_clinical_synopsis",
mim_number=str(mim_number)
)
omim_data[gene] = {
'mim_number': mim_number,
'details': details.get('data', {}),
'clinical_synopsis': synopsis.get('data', {})
}
return omim_data2.3 DisGeNET Gene-Disease Associations (NEW TOOLS)
2.3 DisGeNET基因-疾病关联分析(新增工具)
python
def get_gene_disease_associations(tu, gene_symbols):
"""Get gene-disease associations from DisGeNET."""
associations = {}
for gene in gene_symbols:
# Get diseases associated with gene
result = tu.tools.DisGeNET_search_gene(
operation="search_gene",
gene=gene,
limit=20
)
if result.get('status') == 'success':
associations[gene] = result['data'].get('associations', [])
return associations
def get_disease_genes_disgenet(tu, disease_name):
"""Get all genes associated with a disease."""
result = tu.tools.DisGeNET_search_disease(
operation="search_disease",
disease=disease_name,
limit=30
)
return result.get('data', {}).get('associations', [])python
def get_gene_disease_associations(tu, gene_symbols):
"""从DisGeNET获取基因-疾病关联信息。"""
associations = {}
for gene in gene_symbols:
# 获取与基因关联的疾病
result = tu.tools.DisGeNET_search_gene(
operation="search_gene",
gene=gene,
limit=20
)
if result.get('status') == 'success':
associations[gene] = result['data'].get('associations', [])
return associations
def get_disease_genes_disgenet(tu, disease_name):
"""获取与疾病相关的所有基因。"""
result = tu.tools.DisGeNET_search_disease(
operation="search_disease",
disease=disease_name,
limit=30
)
return result.get('data', {}).get('associations', [])2.4 Phenotype Overlap Scoring
2.4 表型重叠评分
| Match Level | Score | Criteria |
|---|---|---|
| Excellent | >80% | Most core + variable features match |
| Good | 60-80% | Core features match, some variable |
| Possible | 40-60% | Some overlap, needs consideration |
| Unlikely | <40% | Poor phenotype fit |
| 匹配等级 | 分数 | 标准 |
|---|---|---|
| 优秀 | >80% | 大部分核心+可变特征匹配 |
| 良好 | 60-80% | 核心特征匹配,部分可变特征匹配 |
| 可能 | 40-60% | 存在部分重叠,需进一步考虑 |
| ** unlikely** | <40% | 表型匹配度差 |
2.5 Output for Report
2.5 报告输出内容
markdown
undefinedmarkdown
undefined2. Differential Diagnosis
2. 鉴别诊断
Top Candidate Diseases (Ranked by Phenotype Match)
优先候选疾病(按表型匹配度排序)
| Rank | Disease | ORPHA | OMIM | Match | Inheritance | Key Gene(s) |
|---|---|---|---|---|---|---|
| 1 | Marfan syndrome | 558 | 154700 | 85% | AD | FBN1 |
| 2 | Loeys-Dietz syndrome | 60030 | 609192 | 72% | AD | TGFBR1, TGFBR2 |
| 3 | Ehlers-Danlos, vascular | 286 | 130050 | 65% | AD | COL3A1 |
| 4 | Homocystinuria | 394 | 236200 | 58% | AR | CBS |
| 排名 | 疾病 | ORPHA | OMIM | 匹配度 | 遗传模式 | 关键基因 |
|---|---|---|---|---|---|---|
| 1 | 马凡综合征 | 558 | 154700 | 85% | AD | FBN1 |
| 2 | Loeys-Dietz综合征 | 60030 | 609192 | 72% | AD | TGFBR1, TGFBR2 |
| 3 | 血管型Ehlers-Danlos综合征 | 286 | 130050 | 65% | AD | COL3A1 |
| 4 | 高同型半胱氨酸尿症 | 394 | 236200 | 58% | AR | CBS |
DisGeNET Gene-Disease Evidence
DisGeNET基因-疾病证据
| Gene | Associated Diseases | GDA Score | Evidence |
|---|---|---|---|
| FBN1 | Marfan syndrome, MASS phenotype | 0.95 | ★★★ Curated |
| TGFBR1 | Loeys-Dietz syndrome | 0.89 | ★★★ Curated |
| COL3A1 | vascular EDS | 0.91 | ★★★ Curated |
Source: DisGeNET via
DisGeNET_search_gene| 基因 | 关联疾病 | GDA评分 | 证据等级 |
|---|---|---|---|
| FBN1 | 马凡综合征、MASS表型 | 0.95 | ★★★ 已验证 |
| TGFBR1 | Loeys-Dietz综合征 | 0.89 | ★★★ 已验证 |
| COL3A1 | 血管型EDS | 0.91 | ★★★ 已验证 |
来源: DisGeNET via
DisGeNET_search_geneDisease Details
疾病详情
1. Marfan Syndrome (★★★)
1. 马凡综合征(★★★)
ORPHA: 558 | OMIM: 154700 | Prevalence: 1-5/10,000
Phenotype Match Analysis:
| Patient Feature | Disease Feature | Match |
|---|---|---|
| Tall stature | Present in 95% | ✓ |
| Arachnodactyly | Present in 90% | ✓ |
| Joint hypermobility | Present in 85% | ✓ |
| Cardiac murmur | Aortic root dilation (70%) | Partial |
OMIM Clinical Synopsis (via ):
OMIM_get_clinical_synopsis- Cardiovascular: Aortic root dilation, mitral valve prolapse
- Skeletal: Scoliosis, pectus excavatum, tall stature
- Ocular: Ectopia lentis, myopia
Diagnostic Criteria: Ghent nosology (2010)
- Aortic root dilation/dissection + FBN1 mutation = Diagnosis
- Without genetic testing: systemic score ≥7 + ectopia lentis
Inheritance: Autosomal dominant (25% de novo)
Source: Orphanet via , OMIM via , DisGeNET
Orphanet_get_diseaseOMIM_get_entry
---ORPHA: 558 | OMIM: 154700 | 患病率: 1-5/10,000
表型匹配分析:
| 患者特征 | 疾病特征 | 匹配情况 |
|---|---|---|
| 身材高大 | 95%患者存在 | ✓ |
| 蜘蛛指 | 90%患者存在 | ✓ |
| 关节过度活动 | 85%患者存在 | ✓ |
| 心脏杂音 | 主动脉根部扩张(70%患者) | 部分匹配 |
OMIM临床概要 (via ):
OMIM_get_clinical_synopsis- 心血管系统: 主动脉根部扩张、二尖瓣脱垂
- 骨骼系统: 脊柱侧凸、漏斗胸、身材高大
- 眼部: 晶状体异位、近视
诊断标准: Ghent分类标准(2010版)
- 主动脉根部扩张/夹层 + FBN1突变 = 确诊
- 无基因检测结果时:系统评分≥7 + 晶状体异位
遗传模式: 常染色体显性遗传(25%为新发突变)
Source: Orphanet via , OMIM via , DisGeNET
Orphanet_get_diseaseOMIM_get_entry
---Phase 3: Gene Panel Identification
阶段3:基因Panel确定
3.1 Extract Disease Genes
3.1 提取疾病相关基因
python
def build_gene_panel(tu, candidate_diseases):
"""Build prioritized gene panel from candidate diseases."""
genes = {}
for disease in candidate_diseases:
for gene in disease['genes']:
if gene not in genes:
genes[gene] = {
'symbol': gene,
'diseases': [],
'evidence_level': 'unknown'
}
genes[gene]['diseases'].append(disease['name'])
return genespython
def build_gene_panel(tu, candidate_diseases):
"""从候选疾病中构建优先推荐的基因检测Panel。"""
genes = {}
for disease in candidate_diseases:
for gene in disease['genes']:
if gene not in genes:
genes[gene] = {
'symbol': gene,
'diseases': [],
'evidence_level': 'unknown'
}
genes[gene]['diseases'].append(disease['name'])
return genes3.1.1 ClinGen Gene-Disease Validity Check (NEW)
3.1.1 ClinGen基因-疾病有效性验证(新增)
Critical: Always verify gene-disease validity through ClinGen before including in panel.
python
def get_clingen_gene_evidence(tu, gene_symbol):
"""
Get ClinGen gene-disease validity and dosage sensitivity.
ESSENTIAL for rare disease gene panel prioritization.
"""
# 1. Gene-disease validity classification
validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
validity_levels = []
diseases_with_validity = []
if validity.get('data'):
for entry in validity.get('data', []):
validity_levels.append(entry.get('Classification'))
diseases_with_validity.append({
'disease': entry.get('Disease Label'),
'mondo_id': entry.get('Disease ID (MONDO)'),
'classification': entry.get('Classification'),
'inheritance': entry.get('Inheritance')
})
# 2. Dosage sensitivity (critical for CNV interpretation)
dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
hi_score = None
ts_score = None
if dosage.get('data'):
for entry in dosage.get('data', []):
hi_score = entry.get('Haploinsufficiency Score')
ts_score = entry.get('Triplosensitivity Score')
break
# 3. Clinical actionability (return of findings context)
actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
is_actionable = (actionability.get('adult_count', 0) > 0 or
actionability.get('pediatric_count', 0) > 0)
# Determine best evidence level
level_priority = ['Definitive', 'Strong', 'Moderate', 'Limited', 'Disputed', 'Refuted']
best_level = 'Not curated'
for level in level_priority:
if level in validity_levels:
best_level = level
break
return {
'gene': gene_symbol,
'evidence_level': best_level,
'diseases_curated': diseases_with_validity,
'haploinsufficiency_score': hi_score,
'triplosensitivity_score': ts_score,
'is_actionable': is_actionable,
'include_in_panel': best_level in ['Definitive', 'Strong', 'Moderate']
}
def prioritize_genes_with_clingen(tu, gene_list):
"""Prioritize genes using ClinGen evidence levels."""
prioritized = []
for gene in gene_list:
evidence = get_clingen_gene_evidence(tu, gene)
# Score based on ClinGen classification
score = 0
if evidence['evidence_level'] == 'Definitive':
score = 5
elif evidence['evidence_level'] == 'Strong':
score = 4
elif evidence['evidence_level'] == 'Moderate':
score = 3
elif evidence['evidence_level'] == 'Limited':
score = 1
# Disputed/Refuted get 0
# Bonus for haploinsufficiency score 3
if evidence['haploinsufficiency_score'] == '3':
score += 1
# Bonus for actionability
if evidence['is_actionable']:
score += 1
prioritized.append({
**evidence,
'priority_score': score
})
# Sort by priority score
return sorted(prioritized, key=lambda x: x['priority_score'], reverse=True)ClinGen Classification Impact on Panel:
| Classification | Include in Panel? | Priority |
|---|---|---|
| Definitive | YES - mandatory | Highest |
| Strong | YES - highly recommended | High |
| Moderate | YES | Medium |
| Limited | Include but flag | Low |
| Disputed | Exclude or separate | Avoid |
| Refuted | EXCLUDE | Do not test |
| Not curated | Use other evidence | Variable |
关键: 在将基因纳入检测Panel前,务必通过ClinGen验证基因-疾病的有效性。
python
def get_clingen_gene_evidence(tu, gene_symbol):
"""
获取ClinGen基因-疾病有效性和剂量敏感性信息。
这是罕见病基因Panel优先排序的关键步骤。
"""
# 1. 基因-疾病有效性分类
validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
validity_levels = []
diseases_with_validity = []
if validity.get('data'):
for entry in validity.get('data', []):
validity_levels.append(entry.get('Classification'))
diseases_with_validity.append({
'disease': entry.get('Disease Label'),
'mondo_id': entry.get('Disease ID (MONDO)'),
'classification': entry.get('Classification'),
'inheritance': entry.get('Inheritance')
})
# 2. 剂量敏感性(对CNV解读至关重要)
dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
hi_score = None
ts_score = None
if dosage.get('data'):
for entry in dosage.get('data', []):
hi_score = entry.get('Haploinsufficiency Score')
ts_score = entry.get('Triplosensitivity Score')
break
# 3. 临床可操作性(结果返回的上下文信息)
actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
is_actionable = (actionability.get('adult_count', 0) > 0 or
actionability.get('pediatric_count', 0) > 0)
# 确定最佳证据等级
level_priority = ['Definitive', 'Strong', 'Moderate', 'Limited', 'Disputed', 'Refuted']
best_level = 'Not curated'
for level in level_priority:
if level in validity_levels:
best_level = level
break
return {
'gene': gene_symbol,
'evidence_level': best_level,
'diseases_curated': diseases_with_validity,
'haploinsufficiency_score': hi_score,
'triplosensitivity_score': ts_score,
'is_actionable': is_actionable,
'include_in_panel': best_level in ['Definitive', 'Strong', 'Moderate']
}
def prioritize_genes_with_clingen(tu, gene_list):
"""使用ClinGen证据等级对基因进行优先排序。"""
prioritized = []
for gene in gene_list:
evidence = get_clingen_gene_evidence(tu, gene)
# 根据ClinGen分类评分
score = 0
if evidence['evidence_level'] == 'Definitive':
score = 5
elif evidence['evidence_level'] == 'Strong':
score = 4
elif evidence['evidence_level'] == 'Moderate':
score = 3
elif evidence['evidence_level'] == 'Limited':
score = 1
# Disputed/Refuted得0分
# 单倍剂量不足评分为3时加分
if evidence['haploinsufficiency_score'] == '3':
score += 1
# 可操作性加分
if evidence['is_actionable']:
score += 1
prioritized.append({
**evidence,
'priority_score': score
})
# 按优先级评分排序
return sorted(prioritized, key=lambda x: x['priority_score'], reverse=True)ClinGen分类对Panel的影响:
| 分类 | 是否纳入Panel? | 优先级 |
|---|---|---|
| Definitive | 是 - 强制纳入 | 最高 |
| Strong | 是 - 强烈推荐 | 高 |
| Moderate | 是 | 中 |
| Limited | 可纳入但需标注 | 低 |
| Disputed | 排除或单独列出 | 避免 |
| Refuted | 排除 | 不检测 |
| Not curated | 参考其他证据 | 可变 |
3.2 Gene Prioritization Criteria
3.2 基因优先排序标准
| Priority | Criteria | Points |
|---|---|---|
| Tier 1 | Gene causes #1 ranked disease | +5 |
| Tier 2 | Gene causes multiple candidates | +3 |
| Tier 3 | ClinGen "Definitive" evidence | +3 |
| Tier 4 | Expressed in affected tissue | +2 |
| Tier 5 | Constraint score pLI >0.9 | +1 |
| 优先级 | 标准 | 分数 |
|---|---|---|
| Tier 1 | 基因为排名第1的疾病的致病基因 | +5 |
| Tier 2 | 基因与多个候选疾病相关 | +3 |
| Tier 3 | ClinGen "Definitive"证据 | +3 |
| Tier 4 | 基因在受累组织中表达 | +2 |
| Tier 5 | 约束评分pLI >0.9 | +1 |
3.3 Expression Validation
3.3 表达验证
python
def validate_expression(tu, gene_symbol, affected_tissue):
"""Check if gene is expressed in relevant tissue."""
# Get Ensembl ID
gene_info = tu.tools.MyGene_query_genes(q=gene_symbol, species="human")
ensembl_id = gene_info.get('ensembl', {}).get('gene')
# Check GTEx expression
expression = tu.tools.GTEx_get_median_gene_expression(
gencode_id=f"{ensembl_id}.latest"
)
return expression.get(affected_tissue, 0) > 1 # TPM > 1python
def validate_expression(tu, gene_symbol, affected_tissue):
"""验证基因是否在相关组织中表达。"""
# 获取Ensembl ID
gene_info = tu.tools.MyGene_query_genes(q=gene_symbol, species="human")
ensembl_id = gene_info.get('ensembl', {}).get('gene')
# 检查GTEx表达数据
expression = tu.tools.GTEx_get_median_gene_expression(
gencode_id=f"{ensembl_id}.latest"
)
return expression.get(affected_tissue, 0) > 1 # TPM > 13.4 Output for Report
3.4 报告输出内容
markdown
undefinedmarkdown
undefined3. Recommended Gene Panel
3. 推荐基因检测Panel
3.1 Prioritized Genes for Testing
3.1 优先推荐的检测基因
| Priority | Gene | Diseases | Evidence | Constraint (pLI) | Expression |
|---|---|---|---|---|---|
| ★★★ | FBN1 | Marfan syndrome | Definitive | 1.00 | Heart, aorta |
| ★★★ | TGFBR1 | Loeys-Dietz 1 | Definitive | 0.98 | Ubiquitous |
| ★★★ | TGFBR2 | Loeys-Dietz 2 | Definitive | 0.99 | Ubiquitous |
| ★★☆ | COL3A1 | EDS vascular | Definitive | 1.00 | Connective tissue |
| ★☆☆ | CBS | Homocystinuria | Definitive | 0.00 | Liver |
| 优先级 | 基因 | 关联疾病 | 证据等级 | 约束评分(pLI) | 组织表达 |
|---|---|---|---|---|---|
| ★★★ | FBN1 | 马凡综合征 | Definitive | 1.00 | 心脏、主动脉 |
| ★★★ | TGFBR1 | Loeys-Dietz 1型 | Definitive | 0.98 | 泛表达 |
| ★★★ | TGFBR2 | Loeys-Dietz 2型 | Definitive | 0.99 | 泛表达 |
| ★★☆ | COL3A1 | 血管型EDS | Definitive | 1.00 | 结缔组织 |
| ★☆☆ | CBS | 高同型半胱氨酸尿症 | Definitive | 0.00 | 肝脏 |
3.2 Panel Design Recommendation
3.2 Panel设计建议
Minimum Panel (high yield): FBN1, TGFBR1, TGFBR2, COL3A1
Extended Panel (+differential): Add CBS, SMAD3, ACTA2
Testing Strategy:
- Start with FBN1 sequencing (highest pre-test probability)
- If negative, proceed to full connective tissue panel
- Consider WES if panel negative
Source: ClinGen via gene-disease validity, GTEx expression
---最小Panel(高检出率): FBN1, TGFBR1, TGFBR2, COL3A1
扩展Panel(覆盖更多鉴别诊断): 新增CBS, SMAD3, ACTA2
检测策略:
- 先进行FBN1测序(预检测概率最高)
- 若结果阴性,再进行完整结缔组织病Panel检测
- 若Panel检测阴性,考虑全外显子测序(WES)
Source: ClinGen via gene-disease validity, GTEx expression
---Phase 3.5: Expression & Tissue Context (ENHANCED)
阶段3.5:表达与调控背景分析(增强版)
3.5.1 Cell-Type Specific Expression (CELLxGENE)
3.5.1 细胞类型特异性表达(CELLxGENE)
python
def get_cell_type_expression(tu, gene_symbol, affected_tissues):
"""Get single-cell expression to validate tissue relevance."""
# Get expression across cell types
expression = tu.tools.CELLxGENE_get_expression_data(
gene=gene_symbol,
tissue=affected_tissues[0] if affected_tissues else "all"
)
# Get cell type metadata
cell_metadata = tu.tools.CELLxGENE_get_cell_metadata(
gene=gene_symbol
)
# Identify high-expression cell types
high_expression = [
ct for ct in expression
if ct.get('mean_expression', 0) > 1.0 # TPM > 1
]
return {
'expression_data': expression,
'high_expression_cells': high_expression,
'total_cell_types': len(cell_metadata)
}Why it matters: Confirms candidate genes are expressed in disease-relevant tissues/cells.
python
def get_cell_type_expression(tu, gene_symbol, affected_tissues):
"""获取单细胞表达数据,验证组织相关性。"""
# 获取细胞类型表达数据
expression = tu.tools.CELLxGENE_get_expression_data(
gene=gene_symbol,
tissue=affected_tissues[0] if affected_tissues else "all"
)
# 获取细胞类型元数据
cell_metadata = tu.tools.CELLxGENE_get_cell_metadata(
gene=gene_symbol
)
# 筛选高表达细胞类型
high_expression = [
ct for ct in expression
if ct.get('mean_expression', 0) > 1.0 # TPM > 1
]
return {
'expression_data': expression,
'high_expression_cells': high_expression,
'total_cell_types': len(cell_metadata)
}重要性: 确认候选基因在疾病相关组织/细胞中表达,支持其作为致病基因的可能性。
3.5.2 Regulatory Context (ChIPAtlas)
3.5.2 调控背景分析(ChIPAtlas)
python
def get_regulatory_context(tu, gene_symbol):
"""Get transcription factor binding for candidate genes."""
# Search for TF binding near gene
tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
gene=gene_symbol,
cell_type="all"
)
# Get specific binding peaks
peaks = tu.tools.ChIPAtlas_get_peak_data(
gene=gene_symbol,
experiment_type="TF"
)
return {
'transcription_factors': tf_binding,
'regulatory_peaks': peaks
}Why it matters: Identifies regulatory mechanisms that may be disrupted in disease.
python
def get_regulatory_context(tu, gene_symbol):
"""获取候选基因的转录因子结合信息。"""
# 搜索基因附近的转录因子结合位点
tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
gene=gene_symbol,
cell_type="all"
)
# 获取具体结合峰数据
peaks = tu.tools.ChIPAtlas_get_peak_data(
gene=gene_symbol,
experiment_type="TF"
)
return {
'transcription_factors': tf_binding,
'regulatory_peaks': peaks
}重要性: 识别可能在疾病中被破坏的调控机制。
3.5.3 Output for Report
3.5.3 报告输出内容
markdown
undefinedmarkdown
undefined3.5 Expression & Regulatory Context
3.5 表达与调控背景
Cell-Type Specific Expression (CELLxGENE)
细胞类型特异性表达(CELLxGENE)
| Gene | Top Expressing Cell Types | Expression Level | Tissue Relevance |
|---|---|---|---|
| FBN1 | Fibroblasts, Smooth muscle | High (TPM=45) | ✓ Connective tissue |
| TGFBR1 | Endothelial, Fibroblasts | Medium (TPM=12) | ✓ Vascular |
| COL3A1 | Fibroblasts, Myofibroblasts | Very High (TPM=120) | ✓ Connective tissue |
Interpretation: All top candidate genes show high expression in disease-relevant cell types (connective tissue, vascular cells), supporting their candidacy.
| 基因 | 高表达细胞类型 | 表达水平 | 组织相关性 |
|---|---|---|---|
| FBN1 | 成纤维细胞、平滑肌细胞 | 高(TPM=45) | ✓ 结缔组织 |
| TGFBR1 | 内皮细胞、成纤维细胞 | 中(TPM=12) | ✓ 血管 |
| COL3A1 | 成纤维细胞、肌成纤维细胞 | 极高(TPM=120) | ✓ 结缔组织 |
解读: 所有顶级候选基因在疾病相关细胞类型(结缔组织、血管细胞)中均呈高表达,支持其作为致病基因的合理性。
Regulatory Context (ChIPAtlas)
调控背景(ChIPAtlas)
| Gene | Key TF Regulators | Regulatory Significance |
|---|---|---|
| FBN1 | TGFβ pathway (SMAD2/3), AP-1 | TGFβ-responsive |
| TGFBR1 | STAT3, NF-κB | Inflammation-responsive |
Source: CELLxGENE Census, ChIPAtlas
---| 基因 | 关键转录因子调控因子 | 调控意义 |
|---|---|---|
| FBN1 | TGFβ通路(SMAD2/3), AP-1 | TGFβ响应基因 |
| TGFBR1 | STAT3, NF-κB | 炎症响应基因 |
Source: CELLxGENE Census, ChIPAtlas
---Phase 3.6: Pathway Analysis (NEW)
阶段3.6:通路分析(新增)
3.6.1 KEGG Pathway Context
3.6.1 KEGG通路背景
python
def get_pathway_context(tu, gene_symbols):
"""Get pathway context for candidate genes."""
pathways = {}
for gene in gene_symbols:
# Search KEGG for gene
kegg_genes = tu.tools.kegg_find_genes(query=f"hsa:{gene}")
if kegg_genes:
# Get pathway membership
gene_info = tu.tools.kegg_get_gene_info(gene_id=kegg_genes[0]['id'])
pathways[gene] = gene_info.get('pathways', [])
return pathwayspython
def get_pathway_context(tu, gene_symbols):
"""获取候选基因的通路背景信息。"""
pathways = {}
for gene in gene_symbols:
# 搜索KEGG基因信息
kegg_genes = tu.tools.kegg_find_genes(query=f"hsa:{gene}")
if kegg_genes:
# 获取通路成员信息
gene_info = tu.tools.kegg_get_gene_info(gene_id=kegg_genes[0]['id'])
pathways[gene] = gene_info.get('pathways', [])
return pathways3.6.2 Protein-Protein Interactions (IntAct)
3.6.2 蛋白质-蛋白质相互作用(IntAct)
python
def get_protein_interactions(tu, gene_symbol):
"""Get interaction partners for candidate genes."""
# Search IntAct for interactions
interactions = tu.tools.intact_search_interactions(
query=gene_symbol,
species="human"
)
# Get interaction network
network = tu.tools.intact_get_interaction_network(
gene=gene_symbol,
depth=1 # Direct interactors only
)
return {
'interactions': interactions,
'network': network,
'interactor_count': len(interactions)
}python
def get_protein_interactions(tu, gene_symbol):
"""获取候选基因的相互作用蛋白。"""
# 搜索IntAct相互作用数据
interactions = tu.tools.intact_search_interactions(
query=gene_symbol,
species="human"
)
# 获取相互作用网络
network = tu.tools.intact_get_interaction_network(
gene=gene_symbol,
depth=1 # 仅直接相互作用蛋白
)
return {
'interactions': interactions,
'network': network,
'interactor_count': len(interactions)
}3.6.3 Output for Report
3.6.3 报告输出内容
markdown
undefinedmarkdown
undefined3.6 Pathway & Network Context
3.6 通路与网络背景
KEGG Pathways
KEGG通路
| Gene | Key Pathways | Biological Process |
|---|---|---|
| FBN1 | ECM-receptor interaction (hsa04512) | Extracellular matrix |
| TGFBR1/2 | TGF-beta signaling (hsa04350) | Cell signaling |
| COL3A1 | Focal adhesion (hsa04510) | Cell-matrix adhesion |
| 基因 | 关键通路 | 生物学过程 |
|---|---|---|
| FBN1 | ECM-受体相互作用(hsa04512) | 细胞外基质 |
| TGFBR1/2 | TGF-beta信号通路(hsa04350) | 细胞信号传导 |
| COL3A1 | 黏着斑(hsa04510) | 细胞-基质黏附 |
Shared Pathway Analysis
共享通路分析
Convergent pathways (≥2 candidate genes):
- TGF-beta signaling pathway: FBN1, TGFBR1, TGFBR2, SMAD3
- ECM organization: FBN1, COL3A1
Interpretation: Candidate genes converge on TGF-beta signaling and extracellular matrix pathways, consistent with connective tissue disorder etiology.
汇聚通路(≥2个候选基因参与):
- TGF-beta信号通路: FBN1, TGFBR1, TGFBR2, SMAD3
- ECM组织: FBN1, COL3A1
解读: 候选基因汇聚于TGF-beta信号通路和细胞外基质通路,与结缔组织疾病的病因一致。
Protein-Protein Interactions (IntAct)
蛋白质-蛋白质相互作用(IntAct)
| Gene | Direct Interactors | Notable Partners |
|---|---|---|
| FBN1 | 42 | LTBP1, TGFB1, ADAMTS10 |
| TGFBR1 | 68 | TGFBR2, SMAD2, SMAD3 |
Source: KEGG, IntAct, Reactome
---| 基因 | 直接相互作用蛋白 | 重要相互作用伙伴 |
|---|---|---|
| FBN1 | 42个 | LTBP1, TGFB1, ADAMTS10 |
| TGFBR1 | 68个 | TGFBR2, SMAD2, SMAD3 |
Source: KEGG, IntAct, Reactome
---Phase 4: Variant Interpretation (If Provided)
阶段4:变异解读(若提供变异信息)
4.1 ClinVar Lookup
4.1 ClinVar查询
python
def interpret_variant(tu, variant_hgvs):
"""Get ClinVar interpretation for variant."""
result = tu.tools.ClinVar_search_variants(query=variant_hgvs)
return {
'clinvar_id': result.get('id'),
'classification': result.get('clinical_significance'),
'review_status': result.get('review_status'),
'conditions': result.get('conditions'),
'last_evaluated': result.get('last_evaluated')
}python
def interpret_variant(tu, variant_hgvs):
"""获取ClinVar对变异的解读。"""
result = tu.tools.ClinVar_search_variants(query=variant_hgvs)
return {
'clinvar_id': result.get('id'),
'classification': result.get('clinical_significance'),
'review_status': result.get('review_status'),
'conditions': result.get('conditions'),
'last_evaluated': result.get('last_evaluated')
}4.2 Population Frequency
4.2 人群频率分析
python
def check_population_frequency(tu, variant_id):
"""Get gnomAD allele frequency."""
freq = tu.tools.gnomAD_get_variant_frequencies(variant_id=variant_id)
# Interpret rarity
if freq['allele_frequency'] < 0.00001:
rarity = "Ultra-rare"
elif freq['allele_frequency'] < 0.0001:
rarity = "Rare"
elif freq['allele_frequency'] < 0.01:
rarity = "Low frequency"
else:
rarity = "Common (likely benign)"
return freq, raritypython
def check_population_frequency(tu, variant_id):
"""获取gnomAD等位基因频率。"""
freq = tu.tools.gnomAD_get_variant_frequencies(variant_id=variant_id)
# 解读稀有性
if freq['allele_frequency'] < 0.00001:
rarity = "Ultra-rare"
elif freq['allele_frequency'] < 0.0001:
rarity = "Rare"
elif freq['allele_frequency'] < 0.01:
rarity = "Low frequency"
else:
rarity = "Common (likely benign)"
return freq, rarity4.3 Computational Pathogenicity Prediction (ENHANCED)
4.3 计算致病性预测(增强版)
Use state-of-the-art prediction tools for VUS interpretation:
python
def comprehensive_vus_prediction(tu, variant_info):
"""
Combine multiple prediction tools for VUS classification.
Critical for rare disease variants not in ClinVar.
"""
predictions = {}
# 1. CADD - Deleteriousness (NEW API)
cadd = tu.tools.CADD_get_variant_score(
chrom=variant_info['chrom'],
pos=variant_info['pos'],
ref=variant_info['ref'],
alt=variant_info['alt'],
version="GRCh38-v1.7"
)
if cadd.get('status') == 'success':
predictions['cadd'] = {
'score': cadd['data'].get('phred_score'),
'interpretation': cadd['data'].get('interpretation'),
'acmg': 'PP3' if cadd['data'].get('phred_score', 0) >= 20 else 'neutral'
}
# 2. AlphaMissense - DeepMind pathogenicity (NEW)
if variant_info.get('uniprot_id') and variant_info.get('aa_change'):
am = tu.tools.AlphaMissense_get_variant_score(
uniprot_id=variant_info['uniprot_id'],
variant=variant_info['aa_change'] # e.g., "E1541K"
)
if am.get('status') == 'success' and am.get('data'):
classification = am['data'].get('classification')
predictions['alphamissense'] = {
'score': am['data'].get('pathogenicity_score'),
'classification': classification,
'acmg': 'PP3 (strong)' if classification == 'pathogenic' else (
'BP4 (strong)' if classification == 'benign' else 'neutral'
)
}
# 3. EVE - Evolutionary prediction (NEW)
eve = tu.tools.EVE_get_variant_score(
chrom=variant_info['chrom'],
pos=variant_info['pos'],
ref=variant_info['ref'],
alt=variant_info['alt']
)
if eve.get('status') == 'success':
eve_scores = eve['data'].get('eve_scores', [])
if eve_scores:
predictions['eve'] = {
'score': eve_scores[0].get('eve_score'),
'classification': eve_scores[0].get('classification'),
'acmg': 'PP3' if eve_scores[0].get('eve_score', 0) > 0.5 else 'BP4'
}
# 4. SpliceAI - Splice variant prediction (NEW)
# Use for intronic, synonymous, or exonic variants near splice sites
variant_str = f"chr{variant_info['chrom']}-{variant_info['pos']}-{variant_info['ref']}-{variant_info['alt']}"
splice = tu.tools.SpliceAI_predict_splice(
variant=variant_str,
genome="38"
)
if splice.get('data'):
max_score = splice['data'].get('max_delta_score', 0)
interpretation = splice['data'].get('interpretation', '')
if max_score >= 0.8:
splice_acmg = 'PP3 (strong) - high splice impact'
elif max_score >= 0.5:
splice_acmg = 'PP3 (moderate) - splice impact'
elif max_score >= 0.2:
splice_acmg = 'PP3 (supporting) - possible splice effect'
else:
splice_acmg = 'BP7 (if synonymous) - no splice impact'
predictions['spliceai'] = {
'max_delta_score': max_score,
'interpretation': interpretation,
'scores': splice['data'].get('scores', []),
'acmg': splice_acmg
}
# Consensus for PP3/BP4
damaging = sum(1 for p in predictions.values() if 'PP3' in p.get('acmg', ''))
benign = sum(1 for p in predictions.values() if 'BP4' in p.get('acmg', ''))
return {
'predictions': predictions,
'consensus': {
'damaging_count': damaging,
'benign_count': benign,
'pp3_applicable': damaging >= 2 and benign == 0,
'bp4_applicable': benign >= 2 and damaging == 0
}
}使用最先进的预测工具进行VUS解读:
python
def comprehensive_vus_prediction(tu, variant_info):
"""
结合多种预测工具进行VUS分类。
这对数据库中未收录的罕见病变异至关重要。
"""
predictions = {}
# 1. CADD - 有害性预测(新增API)
cadd = tu.tools.CADD_get_variant_score(
chrom=variant_info['chrom'],
pos=variant_info['pos'],
ref=variant_info['ref'],
alt=variant_info['alt'],
version="GRCh38-v1.7"
)
if cadd.get('status') == 'success':
predictions['cadd'] = {
'score': cadd['data'].get('phred_score'),
'interpretation': cadd['data'].get('interpretation'),
'acmg': 'PP3' if cadd['data'].get('phred_score', 0) >= 20 else 'neutral'
}
# 2. AlphaMissense - DeepMind致病性预测(新增)
if variant_info.get('uniprot_id') and variant_info.get('aa_change'):
am = tu.tools.AlphaMissense_get_variant_score(
uniprot_id=variant_info['uniprot_id'],
variant=variant_info['aa_change'] # 例如: "E1541K"
)
if am.get('status') == 'success' and am.get('data'):
classification = am['data'].get('classification')
predictions['alphamissense'] = {
'score': am['data'].get('pathogenicity_score'),
'classification': classification,
'acmg': 'PP3 (strong)' if classification == 'pathogenic' else (
'BP4 (strong)' if classification == 'benign' else 'neutral'
)
}
# 3. EVE - 进化预测(新增)
eve = tu.tools.EVE_get_variant_score(
chrom=variant_info['chrom'],
pos=variant_info['pos'],
ref=variant_info['ref'],
alt=variant_info['alt']
)
if eve.get('status') == 'success':
eve_scores = eve['data'].get('eve_scores', [])
if eve_scores:
predictions['eve'] = {
'score': eve_scores[0].get('eve_score'),
'classification': eve_scores[0].get('classification'),
'acmg': 'PP3' if eve_scores[0].get('eve_score', 0) > 0.5 else 'BP4'
}
# 4. SpliceAI - 剪接变异预测(新增)
# 用于内含子、同义或剪接位点附近的外显子变异
variant_str = f"chr{variant_info['chrom']}-{variant_info['pos']}-{variant_info['ref']}-{variant_info['alt']}"
splice = tu.tools.SpliceAI_predict_splice(
variant=variant_str,
genome="38"
)
if splice.get('data'):
max_score = splice['data'].get('max_delta_score', 0)
interpretation = splice['data'].get('interpretation', '')
if max_score >= 0.8:
splice_acmg = 'PP3 (strong) - high splice impact'
elif max_score >= 0.5:
splice_acmg = 'PP3 (moderate) - splice impact'
elif max_score >= 0.2:
splice_acmg = 'PP3 (supporting) - possible splice effect'
else:
splice_acmg = 'BP7 (if synonymous) - no splice impact'
predictions['spliceai'] = {
'max_delta_score': max_score,
'interpretation': interpretation,
'scores': splice['data'].get('scores', []),
'acmg': splice_acmg
}
# PP3/BP4共识
damaging = sum(1 for p in predictions.values() if 'PP3' in p.get('acmg', ''))
benign = sum(1 for p in predictions.values() if 'BP4' in p.get('acmg', ''))
return {
'predictions': predictions,
'consensus': {
'damaging_count': damaging,
'benign_count': benign,
'pp3_applicable': damaging >= 2 and benign == 0,
'bp4_applicable': benign >= 2 and damaging == 0
}
}4.4 ACMG Classification Criteria
4.4 ACMG分类标准
| Evidence Type | Criteria | Weight |
|---|---|---|
| PVS1 | Null variant in gene where LOF is mechanism | Very Strong |
| PS1 | Same amino acid change as established pathogenic | Strong |
| PM2 | Absent from population databases | Moderate |
| PP3 | Computational evidence supports deleterious (AlphaMissense, CADD, EVE, SpliceAI) | Supporting |
| BA1 | Allele frequency >5% | Benign standalone |
Enhanced PP3 Evidence (NEW):
- AlphaMissense pathogenic (>0.564) = Strong PP3 support (~90% accuracy)
- CADD ≥20 + EVE >0.5 = Multiple concordant predictions
- Agreement from 2+ predictors strengthens PP3 evidence
| 证据类型 | 标准 | 强度 |
|---|---|---|
| PVS1 | 基因功能缺失(LOF)为致病机制的基因中的无效变异 | 极强 |
| PS1 | 与已明确致病的氨基酸改变相同 | 强 |
| PM2 | 人群数据库中未收录该变异 | 中等 |
| PP3 | 计算证据支持有害(AlphaMissense、CADD、EVE、SpliceAI) | 支持 |
| BA1 | 等位基因频率>5% | 良性独立证据 |
增强版PP3证据(新增):
- AlphaMissense pathogenic (>0.564) = 强PP3支持(约90%准确率)
- CADD ≥20 + EVE >0.5 = 多种预测工具结果一致
- 2种以上预测工具结果一致可增强PP3证据强度
4.5 Output for Report
4.5 报告输出内容
markdown
undefinedmarkdown
undefined4. Variant Interpretation
4. 变异解读
4.1 Variant: FBN1 c.4621G>A (p.Glu1541Lys)
4.1 变异信息:FBN1 c.4621G>A (p.Glu1541Lys)
| Property | Value | Interpretation |
|---|---|---|
| Gene | FBN1 | Marfan syndrome gene |
| Consequence | Missense | Amino acid change |
| ClinVar | VUS | Uncertain significance |
| gnomAD AF | 0.000004 | Ultra-rare (PM2) |
| 属性 | 值 | 解读 |
|---|---|---|
| 基因 | FBN1 | 马凡综合征致病基因 |
| 变异后果 | 错义变异 | 氨基酸改变 |
| ClinVar分类 | VUS | 意义不明确的变异 |
| gnomAD等位基因频率 | 0.000004 | 极罕见(PM2) |
4.2 Computational Predictions (NEW)
4.2 计算预测结果(新增)
| Predictor | Score | Classification | ACMG Support |
|---|---|---|---|
| AlphaMissense | 0.78 | Pathogenic | PP3 (strong) |
| CADD PHRED | 28.5 | Top 0.1% deleterious | PP3 |
| EVE | 0.72 | Likely pathogenic | PP3 |
Consensus: 3/3 predictors concordant damaging → Strong PP3 support
Source: AlphaMissense, CADD API, EVE via Ensembl VEP
| 预测工具 | 分数 | 分类 | ACMG支持证据 |
|---|---|---|---|
| AlphaMissense | 0.78 | Pathogenic | PP3 (strong) |
| CADD PHRED | 28.5 | 前0.1%有害变异 | PP3 |
| EVE | 0.72 | Likely pathogenic | PP3 |
共识: 3/3预测工具一致判定为有害 → 强PP3支持
Source: AlphaMissense, CADD API, EVE via Ensembl VEP
4.3 ACMG Evidence Summary
4.3 ACMG证据汇总
| Criterion | Evidence | Strength |
|---|---|---|
| PM2 | Absent from gnomAD (AF < 0.00001) | Moderate |
| PP3 | AlphaMissense + CADD + EVE concordant | Supporting (strong) |
| PP4 | Phenotype highly specific for Marfan | Supporting |
| PS4 | Multiple affected family members | Strong |
Preliminary Classification: Likely Pathogenic (1 Strong + 1 Moderate + 2 Supporting)
Source: ClinVar, gnomAD, AlphaMissense, CADD, EVE
---| 标准 | 证据 | 强度 |
|---|---|---|
| PM2 | gnomAD中未收录(AF < 0.00001) | 中等 |
| PP3 | AlphaMissense + CADD + EVE结果一致 | 支持(强) |
| PP4 | 表型高度符合马凡综合征 | 支持 |
| PS4 | 多个家族成员受累 | 强 |
初步分类: 可能致病(1项强证据 + 1项中等证据 + 2项支持证据)
Source: ClinVar, gnomAD, AlphaMissense, CADD, EVE
---Phase 5: Structure Analysis for VUS
阶段5:VUS结构分析
5.1 When to Perform Structure Analysis
5.1 何时进行结构分析
Perform when:
- Variant is VUS or conflicting interpretations
- Missense variant in critical domain
- Novel variant not in databases
- Additional evidence needed for classification
在以下情况时进行:
- 变异为VUS或存在相互矛盾的解读
- 错义变异位于关键结构域
- 数据库中未收录的新变异
- 需要额外证据进行分类
5.2 Structure Prediction (NVIDIA NIM)
5.2 结构预测(NVIDIA NIM)
python
def analyze_variant_structure(tu, protein_sequence, variant_position):
"""Predict structure and analyze variant impact."""
# Predict structure with AlphaFold2
structure = tu.tools.NvidiaNIM_alphafold2(
sequence=protein_sequence,
algorithm="mmseqs2",
relax_prediction=False
)
# Extract pLDDT at variant position
variant_plddt = get_residue_plddt(structure, variant_position)
# Check if in structured region
confidence = "High" if variant_plddt > 70 else "Low"
return {
'structure': structure,
'variant_plddt': variant_plddt,
'confidence': confidence
}python
def analyze_variant_structure(tu, protein_sequence, variant_position):
"""预测蛋白质结构并分析变异影响。"""
# 使用AlphaFold2预测结构
structure = tu.tools.NvidiaNIM_alphafold2(
sequence=protein_sequence,
algorithm="mmseqs2",
relax_prediction=False
)
# 提取变异位置的pLDDT值
variant_plddt = get_residue_plddt(structure, variant_position)
# 检查是否位于结构化区域
confidence = "High" if variant_plddt > 70 else "Low"
return {
'structure': structure,
'variant_plddt': variant_plddt,
'confidence': confidence
}5.3 Domain Impact Assessment
5.3 结构域影响评估
python
def assess_domain_impact(tu, uniprot_id, variant_position):
"""Check if variant affects functional domain."""
# Get domain annotations
domains = tu.tools.InterPro_get_protein_domains(accession=uniprot_id)
for domain in domains:
if domain['start'] <= variant_position <= domain['end']:
return {
'in_domain': True,
'domain_name': domain['name'],
'domain_function': domain['description']
}
return {'in_domain': False}python
def assess_domain_impact(tu, uniprot_id, variant_position):
"""检查变异是否影响功能结构域。"""
# 获取结构域注释
domains = tu.tools.InterPro_get_protein_domains(accession=uniprot_id)
for domain in domains:
if domain['start'] <= variant_position <= domain['end']:
return {
'in_domain': True,
'domain_name': domain['name'],
'domain_function': domain['description']
}
return {'in_domain': False}5.4 Output for Report
5.4 报告输出内容
markdown
undefinedmarkdown
undefined5. Structural Analysis
5. 结构分析
5.1 Structure Prediction
5.1 结构预测
Method: AlphaFold2 via NVIDIA NIM
Protein: Fibrillin-1 (FBN1)
Sequence Length: 2,871 amino acids
| Metric | Value | Interpretation |
|---|---|---|
| Mean pLDDT | 85.3 | High confidence overall |
| Variant position pLDDT | 92.1 | Very high confidence |
| Nearby domain | cbEGF-like domain 23 | Calcium-binding |
方法: AlphaFold2 via NVIDIA NIM
蛋白质: 原纤维蛋白-1(FBN1)
序列长度: 2,871个氨基酸
| 指标 | 值 | 解读 |
|---|---|---|
| 平均pLDDT | 85.3 | 整体置信度高 |
| 变异位置pLDDT | 92.1 | 置信度极高 |
| 附近结构域 | cbEGF-like domain 23 | 钙结合结构域 |
5.2 Variant Location Analysis
5.2 变异位置分析
Variant: p.Glu1541Lys
| Feature | Finding | Impact |
|---|---|---|
| Domain | cbEGF-like domain 23 | Critical for calcium binding |
| Conservation | 100% conserved across vertebrates | High constraint |
| Structural role | Calcium coordination residue | Likely destabilizing |
| Nearby pathogenic | p.Glu1540Lys (Pathogenic) | Adjacent residue |
变异: p.Glu1541Lys
| 特征 | 发现 | 影响 |
|---|---|---|
| 结构域 | cbEGF-like domain 23 | 对钙结合至关重要 |
| 保守性 | 脊椎动物中100%保守 | 高约束 |
| 结构作用 | 钙配位残基 | 可能导致结构不稳定 |
| 附近致病变异 | p.Glu1540Lys(致病) | 相邻残基 |
5.3 Structural Interpretation
5.3 结构解读
The variant p.Glu1541Lys:
- Located in cbEGF domain - These domains are critical for fibrillin-1 function
- Glutamate → Lysine - Charge reversal (negative to positive)
- Calcium binding - Glutamate at this position coordinates Ca2+
- Adjacent pathogenic variant - p.Glu1540Lys is classified Pathogenic
Structural Evidence: Strong support for pathogenicity (PM1 - critical domain)
Source: NVIDIA NIM via , InterPro
NvidiaNIM_alphafold2
---变异p.Glu1541Lys:
- 位于cbEGF结构域 - 该结构域对原纤维蛋白-1的功能至关重要
- 谷氨酸→赖氨酸 - 电荷反转(负→正)
- 钙结合 - 该位置的谷氨酸参与Ca2+配位
- 相邻致病变异 - p.Glu1540Lys被分类为致病性变异
结构证据: 强烈支持致病性(PM1 - 关键结构域)
Source: NVIDIA NIM via , InterPro
NvidiaNIM_alphafold2
---Phase 6: Literature Evidence (NEW)
阶段6:文献证据分析(新增)
6.1 Published Literature (PubMed)
6.1 已发表文献(PubMed)
python
def search_disease_literature(tu, disease_name, genes):
"""Search for relevant published literature."""
# Disease-specific search
disease_papers = tu.tools.PubMed_search_articles(
query=f'"{disease_name}" AND (genetics OR mutation OR variant)',
limit=20
)
# Gene-specific searches
gene_papers = []
for gene in genes[:5]: # Top 5 genes
papers = tu.tools.PubMed_search_articles(
query=f'"{gene}" AND rare disease AND pathogenic',
limit=10
)
gene_papers.extend(papers)
return {
'disease_literature': disease_papers,
'gene_literature': gene_papers
}python
def search_disease_literature(tu, disease_name, genes):
"""搜索相关已发表文献。"""
# 疾病特异性搜索
disease_papers = tu.tools.PubMed_search_articles(
query=f'"{disease_name}" AND (genetics OR mutation OR variant)',
limit=20
)
# 基因特异性搜索
gene_papers = []
for gene in genes[:5]: # 前5个基因
papers = tu.tools.PubMed_search_articles(
query=f'"{gene}" AND rare disease AND pathogenic',
limit=10
)
gene_papers.extend(papers)
return {
'disease_literature': disease_papers,
'gene_literature': gene_papers
}6.2 Preprint Literature (BioRxiv/MedRxiv)
6.2 预印本文献(BioRxiv/MedRxiv)
python
def search_preprints(tu, disease_name, genes):
"""Search preprints for cutting-edge findings."""
# BioRxiv search
biorxiv = tu.tools.BioRxiv_search_preprints(
query=f"{disease_name} genetics",
limit=10
)
# ArXiv for computational methods
arxiv = tu.tools.ArXiv_search_papers(
query=f"rare disease diagnosis {' OR '.join(genes[:3])}",
category="q-bio",
limit=5
)
return {
'biorxiv': biorxiv,
'arxiv': arxiv
}python
def search_preprints(tu, disease_name, genes):
"""搜索预印本获取前沿发现。"""
# BioRxiv搜索
biorxiv = tu.tools.BioRxiv_search_preprints(
query=f"{disease_name} genetics",
limit=10
)
# ArXiv计算方法相关搜索
arxiv = tu.tools.ArXiv_search_papers(
query=f"rare disease diagnosis {' OR '.join(genes[:3])}",
category="q-bio",
limit=5
)
return {
'biorxiv': biorxiv,
'arxiv': arxiv
}6.3 Citation Analysis (OpenAlex)
6.3 引用分析(OpenAlex)
python
def analyze_citations(tu, key_papers):
"""Analyze citation network for key papers."""
citation_analysis = []
for paper in key_papers[:5]:
# Get citation data
work = tu.tools.openalex_search_works(
query=paper['title'],
limit=1
)
if work:
citation_analysis.append({
'title': paper['title'],
'citations': work[0].get('cited_by_count', 0),
'year': work[0].get('publication_year')
})
return citation_analysispython
def analyze_citations(tu, key_papers):
"""分析关键论文的引用网络。"""
citation_analysis = []
for paper in key_papers[:5]:
# 获取引用数据
work = tu.tools.openalex_search_works(
query=paper['title'],
limit=1
)
if work:
citation_analysis.append({
'title': paper['title'],
'citations': work[0].get('cited_by_count', 0),
'year': work[0].get('publication_year')
})
return citation_analysis6.4 Output for Report
6.4 报告输出内容
markdown
undefinedmarkdown
undefined6. Literature Evidence
6. 文献证据
6.1 Key Published Studies
6.1 关键已发表研究
| PMID | Title | Year | Citations | Relevance |
|---|---|---|---|---|
| 32123456 | FBN1 variants in Marfan syndrome... | 2023 | 45 | Direct |
| 31987654 | TGF-beta signaling in connective... | 2022 | 89 | Pathway |
| 30876543 | Novel diagnostic criteria for... | 2021 | 156 | Diagnostic |
| PMID | 标题 | 年份 | 引用数 | 相关性 |
|---|---|---|---|---|
| 32123456 | FBN1 variants in Marfan syndrome... | 2023 | 45 | 直接相关 |
| 31987654 | TGF-beta signaling in connective... | 2022 | 89 | 通路相关 |
| 30876543 | Novel diagnostic criteria for... | 2021 | 156 | 诊断相关 |
6.2 Recent Preprints (Not Yet Peer-Reviewed)
6.2 近期预印本(尚未同行评审)
| Source | Title | Posted | Relevance |
|---|---|---|---|
| BioRxiv | Novel FBN1 splice variant causes... | 2024-01 | Case report |
| MedRxiv | Machine learning for Marfan... | 2024-02 | Diagnostic |
⚠️ Note: Preprints have not undergone peer review. Use with caution.
| 来源 | 标题 | 发布日期 | 相关性 |
|---|---|---|---|
| BioRxiv | Novel FBN1 splice variant causes... | 2024-01 | 病例报告 |
| MedRxiv | Machine learning for Marfan... | 2024-02 | 诊断相关 |
⚠️ 注意: 预印本尚未经过同行评审,谨慎使用。
6.3 Evidence Summary
6.3 证据汇总
| Evidence Type | Count | Strength |
|---|---|---|
| Case reports | 12 | Supporting |
| Functional studies | 5 | Strong |
| Clinical trials | 2 | Strong |
| Reviews | 8 | Context |
Source: PubMed, BioRxiv, OpenAlex
---| 证据类型 | 数量 | 强度 |
|---|---|---|
| 病例报告 | 12 | 支持 |
| 功能研究 | 5 | 强 |
| 临床试验 | 2 | 强 |
| 综述 | 8 | 背景 |
Source: PubMed, BioRxiv, OpenAlex
---Report Template
报告模板
File:
[PATIENT_ID]_rare_disease_report.mdmarkdown
undefined文件:
[PATIENT_ID]_rare_disease_report.mdmarkdown
undefinedRare Disease Diagnostic Report
罕见病诊断报告
Patient ID: [ID] | Date: [Date] | Status: In Progress
患者ID: [ID] | 日期: [Date] | 状态: 研究中
Executive Summary
执行摘要
[Researching...]
[研究中...]
1. Phenotype Analysis
1. 表型分析
1.1 Standardized HPO Terms
1.1 标准化HPO术语
[Researching...]
[研究中...]
1.2 Key Clinical Features
1.2 关键临床特征
[Researching...]
[研究中...]
2. Differential Diagnosis
2. 鉴别诊断
2.1 Ranked Candidate Diseases
2.1 优先候选疾病
[Researching...]
[研究中...]
2.2 Disease Details
2.2 疾病详情
[Researching...]
[研究中...]
3. Recommended Gene Panel
3. 推荐基因检测Panel
3.1 Prioritized Genes
3.1 优先基因
[Researching...]
[研究中...]
3.2 Testing Strategy
3.2 检测策略
[Researching...]
[研究中...]
4. Variant Interpretation (if applicable)
4. 变异解读(如适用)
4.1 Variant Details
4.1 变异详情
[Researching...]
[研究中...]
4.2 ACMG Classification
4.2 ACMG分类
[Researching...]
[研究中...]
5. Structural Analysis (if applicable)
5. 结构分析(如适用)
5.1 Structure Prediction
5.1 结构预测
[Researching...]
[研究中...]
5.2 Variant Impact
5.2 变异影响
[Researching...]
[研究中...]
6. Clinical Recommendations
6. 临床建议
6.1 Diagnostic Next Steps
6.1 诊断下一步行动
[Researching...]
[研究中...]
6.2 Specialist Referrals
6.2 专科转诊建议
[Researching...]
[研究中...]
6.3 Family Screening
6.3 家族筛查建议
[Researching...]
[研究中...]
7. Data Gaps & Limitations
7. 数据缺口与局限性
[Researching...]
[研究中...]
8. Data Sources
8. 数据来源
[Will be populated as research progresses...]
---[将随着研究进展逐步完善...]
---Evidence Grading
证据分级
| Tier | Symbol | Criteria | Example |
|---|---|---|---|
| T1 | ★★★ | Phenotype match >80% + gene match | Marfan with FBN1 mutation |
| T2 | ★★☆ | Phenotype match 60-80% OR likely pathogenic variant | Good phenotype fit |
| T3 | ★☆☆ | Phenotype match 40-60% OR VUS in candidate gene | Possible diagnosis |
| T4 | ☆☆☆ | Phenotype <40% OR uncertain gene | Low probability |
| 层级 | 符号 | 标准 | 示例 |
|---|---|---|---|
| T1 | ★★★ | 表型匹配度>80% + 基因匹配 | 马凡综合征伴FBN1突变 |
| T2 | ★★☆ | 表型匹配度60-80% 或 可能致病变异 | 表型匹配度良好 |
| T3 | ★☆☆ | 表型匹配度40-60% 或 候选基因中的VUS | 可能的诊断 |
| T4 | ☆☆☆ | 表型匹配度<40% 或 基因关联性不确定 | 低概率 |
Completeness Checklist
完整性检查清单
Phase 1: Phenotype
阶段1:表型
- All symptoms converted to HPO terms
- Core vs. variable features distinguished
- Age of onset documented
- Family history noted
- 所有症状已转换为HPO术语
- 已区分核心与可变特征
- 已记录发病年龄
- 已记录家族病史
Phase 2: Disease Matching
阶段2:疾病匹配
- ≥5 candidate diseases identified (or all matching)
- Phenotype overlap % calculated
- Inheritance patterns noted
- ORPHA and OMIM IDs provided
- 已识别≥5种候选疾病(或所有匹配疾病)
- 已计算表型重叠百分比
- 已记录遗传模式
- 已提供ORPHA和OMIM ID
Phase 3: Gene Panel
阶段3:基因Panel
- ≥5 genes prioritized (or all from top diseases)
- Evidence level for each gene (ClinGen)
- Expression validation performed
- Testing strategy recommended
- 已优先推荐≥5个基因(或所有顶级疾病相关基因)
- 已标注每个基因的证据等级(ClinGen)
- 已完成表达验证
- 已推荐检测策略
Phase 4: Variant Interpretation (if applicable)
阶段4:变异解读(如适用)
- ClinVar classification retrieved
- gnomAD frequency checked
- ACMG criteria applied
- Classification justified
- 已获取ClinVar分类
- 已检查gnomAD频率
- 已应用ACMG标准
- 已解释分类依据
Phase 5: Structure Analysis (if applicable)
阶段5:结构分析(如适用)
- Structure predicted (if VUS)
- pLDDT confidence reported
- Domain impact assessed
- Structural evidence summarized
- 已预测结构(若为VUS)
- 已报告pLDDT置信度
- 已评估结构域影响
- 已总结结构证据
Phase 6: Recommendations
阶段6:建议
- ≥3 next steps listed
- Specialist referrals suggested
- Family screening addressed
- 已列出≥3项下一步行动
- 已建议专科转诊
- 已提及家族筛查
Fallback Chains
备选工具链
| Primary Tool | Fallback 1 | Fallback 2 |
|---|---|---|
| | PubMed phenotype search |
| | VEP annotation |
| | UniProt features |
| | Tissue-specific literature |
| | 1000 Genomes |
| 主工具 | 备选工具1 | 备选工具2 |
|---|---|---|
| | PubMed表型搜索 |
| | VEP注释 |
| | UniProt特征分析 |
| | 组织特异性文献 |
| | 1000 Genomes |
Tool Reference
工具参考
See TOOLS_REFERENCE.md for complete tool documentation.
完整工具文档请参考 TOOLS_REFERENCE.md。