tooluniverse-variant-interpretation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesename: tooluniverse-variant-interpretation description: Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.
name: tooluniverse-variant-interpretation description: 从原始变异检测结果到符合ACMG分类建议的系统性临床变异解读,包含结构影响分析。整合ClinVar、gnomAD、CIViC、UniProt和PDB等数据库的证据,匹配ACMG标准。生成致病性评分(0-100)、临床建议和治疗指导。适用于基因变异解读、意义未明变异(VUS)分类、ACMG变异分类,以及将变异检测结果转化为临床可执行方案的场景。
Clinical Variant Interpreter
临床变异解读工具
Systematic variant interpretation skill using ToolUniverse - from raw variant calls to ACMG-classified clinical recommendations with structural impact analysis.
基于ToolUniverse的系统性变异解读工具——从原始变异检测结果到符合ACMG分类的临床建议,包含结构影响分析。
Problem This Skill Solves
本工具解决的问题
Clinical labs and researchers face critical challenges in variant interpretation:
- Variant classification uncertainty - VUS (Variants of Uncertain Significance) comprise 40-60% of clinical variants
- Evidence aggregation burden - Must integrate data from 10+ databases per variant
- Structural context missing - Traditional annotation ignores 3D protein impact
- Clinical actionability unclear - How does classification translate to patient care?
This skill provides: A systematic workflow that combines population databases, functional predictions, structural analysis (via AlphaFold2), and literature evidence into ACMG-compliant interpretations with clear clinical recommendations.
临床实验室和研究人员在变异解读中面临以下关键挑战:
- 变异分类不确定性 - 意义未明变异(VUS)占临床变异的40-60%
- 证据整合负担 - 每个变异需整合10余个数据库的数据
- 结构信息缺失 - 传统注释忽略蛋白质3D结构影响
- 临床可执行性不明确 - 分类结果如何转化为患者护理方案?
本工具提供:一套系统性工作流,整合人群数据库、功能预测、结构分析(基于AlphaFold2)和文献证据,生成符合ACMG标准的解读结果及明确的临床建议。
Key Principles
核心原则
- ACMG-Guided Classification - Follow ACMG/AMP 2015 guidelines with explicit evidence codes
- Structural Evidence Integration - Use AlphaFold2 for novel structural impact analysis
- Population Context - gnomAD frequencies with ancestry-specific data
- Gene-Disease Validity - ClinGen curation status for clinical relevance
- Actionable Output - Clear recommendations, not just classifications
- English-first queries - Always use English terms in tool calls (gene names, variant descriptions, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language
- ACMG指导分类 - 遵循ACMG/AMP 2015指南,使用明确的证据代码
- 结构证据整合 - 采用AlphaFold2进行新型结构影响分析
- 人群背景分析 - 结合gnomAD频率及祖先特异性数据
- 基因-疾病有效性 - 基于ClinGen的临床相关性分类状态
- 可执行输出 - 提供明确的建议,而非仅分类结果
- 优先英文查询 - 工具调用中始终使用英文术语(基因名、变异描述、疾病名),即使用户使用其他语言提问。仅在必要时尝试原语言术语作为备选。以用户使用的语言回复
Triggers
触发场景
Use this skill when users:
- Ask about variant interpretation or classification
- Have VCF data needing clinical annotation
- Ask "what does this variant mean clinically?"
- Need ACMG classification for variants
- Want structural impact analysis for missense variants
- Ask about pathogenicity of specific variants
当用户有以下需求时使用本工具:
- 询问变异解读或分类相关问题
- 有VCF数据需要临床注释
- 询问“该变异在临床上有何意义?”
- 需要对变异进行ACMG分类
- 希望对意义未明错义变异进行结构影响分析
- 询问特定变异的致病性
Workflow Overview
工作流概述
┌─────────────────────────────────────────────────────────────────┐
│ VARIANT INTERPRETATION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1: VARIANT IDENTITY │
│ ├── Normalize variant notation (HGVS) │
│ ├── Map to gene, transcript, protein │
│ └── Get consequence type (missense, nonsense, etc.) │
│ │
│ Phase 2: CLINICAL DATABASES │
│ ├── ClinVar: Existing classifications │
│ ├── gnomAD: Population frequencies (all + ancestry) │
│ ├── OMIM: Gene-disease associations │
│ ├── ClinGen: Gene validity + dosage sensitivity (ENHANCED) │
│ │ └─ ClinGen_search_gene_validity, ClinGen_search_dosage │
│ └── SpliceAI: Splice variant prediction (NEW) │
│ │
│ Phase 2.5: REGULATORY CONTEXT (NEW - for non-coding variants) │
│ ├── ChIPAtlas: TF binding at position │
│ ├── ENCODE: Regulatory elements (enhancers, promoters) │
│ ├── Conservation in regulatory regions │
│ └── Functional annotation of regulatory impact │
│ │
│ Phase 3: COMPUTATIONAL PREDICTIONS │
│ ├── SIFT/PolyPhen: Damaging predictions │
│ ├── CADD: Deleteriousness score │
│ ├── SpliceAI: Splice impact (if applicable) │
│ └── Conservation: Cross-species alignment │
│ │
│ Phase 4: STRUCTURAL ANALYSIS (for VUS/novel missense) │
│ ├── Get protein structure (PDB or AlphaFold2) │
│ ├── Map variant to structure │
│ ├── Assess domain/functional site impact │
│ └── Predict structural destabilization │
│ │
│ Phase 4.5: EXPRESSION CONTEXT (NEW) │
│ ├── CELLxGENE: Cell-type specific expression │
│ ├── Tissue relevance to phenotype │
│ └── Expression validation │
│ │
│ Phase 5: LITERATURE EVIDENCE │
│ ├── PubMed: Functional studies │
│ ├── BioRxiv/MedRxiv: Recent preprints (NEW) │
│ ├── Case reports: Phenotype correlations │
│ └── Segregation data (if in literature) │
│ │
│ Phase 6: ACMG CLASSIFICATION │
│ ├── Apply evidence codes (PVS1, PM2, PP3, etc.) │
│ ├── Calculate classification │
│ ├── Identify limiting factors │
│ └── Generate clinical recommendations │
│ │
└─────────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────────┐
│ VARIANT INTERPRETATION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1: VARIANT IDENTITY │
│ ├── Normalize variant notation (HGVS) │
│ ├── Map to gene, transcript, protein │
│ └── Get consequence type (missense, nonsense, etc.) │
│ │
│ Phase 2: CLINICAL DATABASES │
│ ├── ClinVar: Existing classifications │
│ ├── gnomAD: Population frequencies (all + ancestry) │
│ ├── OMIM: Gene-disease associations │
│ ├── ClinGen: Gene validity + dosage sensitivity (ENHANCED) │
│ │ └─ ClinGen_search_gene_validity, ClinGen_search_dosage │
│ └── SpliceAI: Splice variant prediction (NEW) │
│ │
│ Phase 2.5: REGULATORY CONTEXT (NEW - for non-coding variants) │
│ ├── ChIPAtlas: TF binding at position │
│ ├── ENCODE: Regulatory elements (enhancers, promoters) │
│ ├── Conservation in regulatory regions │
│ └── Functional annotation of regulatory impact │
│ │
│ Phase 3: COMPUTATIONAL PREDICTIONS │
│ ├── SIFT/PolyPhen: Damaging predictions │
│ ├── CADD: Deleteriousness score │
│ ├── SpliceAI: Splice impact (if applicable) │
│ └── Conservation: Cross-species alignment │
│ │
│ Phase 4: STRUCTURAL ANALYSIS (for VUS/novel missense) │
│ ├── Get protein structure (PDB or AlphaFold2) │
│ ├── Map variant to structure │
│ ├── Assess domain/functional site impact │
│ └── Predict structural destabilization │
│ │
│ Phase 4.5: EXPRESSION CONTEXT (NEW) │
│ ├── CELLxGENE: Cell-type specific expression │
│ ├── Tissue relevance to phenotype │
│ └── Expression validation │
│ │
│ Phase 5: LITERATURE EVIDENCE │
│ ├── PubMed: Functional studies │
│ ├── BioRxiv/MedRxiv: Recent preprints (NEW) │
│ ├── Case reports: Phenotype correlations │
│ └── Segregation data (if in literature) │
│ │
│ Phase 6: ACMG CLASSIFICATION │
│ ├── Apply evidence codes (PVS1, PM2, PP3, etc.) │
│ ├── Calculate classification │
│ ├── Identify limiting factors │
│ └── Generate clinical recommendations │
│ │
└─────────────────────────────────────────────────────────────────┘Phase Details
阶段详情
Phase 1: Variant Identity & Normalization
阶段1:变异识别与标准化
Goal: Standardize variant notation and determine molecular consequence
Tools:
| Tool | Purpose |
|---|---|
| Get variant annotations from MyVariant.info |
| Variant effect predictor data |
| Gene information |
Key Information to Capture:
- HGVS notation (c. and p.)
- Gene symbol and Ensembl ID
- Transcript (canonical/MANE Select)
- Consequence type
- Amino acid change (for missense)
- Exon/intron location
目标:标准化变异命名并确定分子影响
工具:
| 工具 | 用途 |
|---|---|
| 从MyVariant.info获取变异注释 |
| 变异效应预测器数据 |
| 基因信息 |
需捕获的关键信息:
- HGVS命名(c. 和 p. 格式)
- 基因符号和Ensembl ID
- 转录本(标准/MANE Select)
- 变异类型
- 氨基酸变化(针对错义变异)
- 外显子/内含子位置
Phase 2: Clinical Database Queries
阶段2:临床数据库查询
Goal: Aggregate existing clinical knowledge
Tools:
| Tool | Purpose | Key Data |
|---|---|---|
| Existing classifications | Classification, review status, submissions |
| Population frequency | AF, ancestry-specific AFs, homozygotes |
| Gene-disease | Inheritance, phenotypes |
| Curation status | Gene-disease validity level |
| Somatic mutations (NEW) | Cancer frequency, histology |
| Gene-disease associations (NEW) | Evidence scores, sources |
目标:整合现有临床知识
工具:
| 工具 | 用途 | 关键数据 |
|---|---|---|
| 已有分类结果 | 分类结果、评审状态、提交记录 |
| 人群频率 | 等位基因频率、祖先特异性频率、纯合子数量 |
| 基因-疾病关联 | 遗传方式、表型 |
| 分类状态 | 基因-疾病有效性等级 |
| 体细胞变异(新增) | 癌症频率、组织学类型 |
| 基因-疾病关联(新增) | 证据评分、来源 |
2.1 COSMIC for Somatic Context (NEW)
2.1 体细胞变异的COSMIC背景(新增)
For cancer variants, check COSMIC for somatic mutation frequency:
python
def get_somatic_context(tu, gene_symbol, variant_aa):
"""Get somatic mutation context from COSMIC."""
# Search for specific mutation
cosmic = tu.tools.COSMIC_search_mutations(
operation="search",
terms=f"{gene_symbol} {variant_aa}",
max_results=20,
genome_build=38
)
# Get all gene mutations for context
gene_mutations = tu.tools.COSMIC_get_mutations_by_gene(
operation="get_by_gene",
gene=gene_symbol,
max_results=100
)
# Determine if it's a hotspot
mutation_counts = Counter(m['MutationAA'] for m in gene_mutations.get('results', []))
is_hotspot = variant_aa in [m[0] for m in mutation_counts.most_common(10)]
return {
'cosmic_hits': cosmic.get('results', []),
'is_somatic_hotspot': is_hotspot,
'cancer_types': [m['PrimarySite'] for m in cosmic.get('results', [])],
'total_cosmic_count': cosmic.get('total_count', 0)
}针对癌症变异,通过COSMIC查询体细胞变异频率:
python
def get_somatic_context(tu, gene_symbol, variant_aa):
"""Get somatic mutation context from COSMIC."""
# Search for specific mutation
cosmic = tu.tools.COSMIC_search_mutations(
operation="search",
terms=f"{gene_symbol} {variant_aa}",
max_results=20,
genome_build=38
)
# Get all gene mutations for context
gene_mutations = tu.tools.COSMIC_get_mutations_by_gene(
operation="get_by_gene",
gene=gene_symbol,
max_results=100
)
# Determine if it's a hotspot
mutation_counts = Counter(m['MutationAA'] for m in gene_mutations.get('results', []))
is_hotspot = variant_aa in [m[0] for m in mutation_counts.most_common(10)]
return {
'cosmic_hits': cosmic.get('results', []),
'is_somatic_hotspot': is_hotspot,
'cancer_types': [m['PrimarySite'] for m in cosmic.get('results', [])],
'total_cosmic_count': cosmic.get('total_count', 0)
}2.2 OMIM Gene-Disease Context (NEW)
2.2 OMIM基因-疾病背景(新增)
python
def get_omim_context(tu, gene_symbol):
"""Get OMIM gene-disease associations."""
# Search OMIM for gene
search = tu.tools.OMIM_search(
operation="search",
query=gene_symbol,
limit=5
)
omim_data = []
for entry in search.get('data', {}).get('entries', []):
mim = entry.get('mimNumber')
# Get detailed entry
details = tu.tools.OMIM_get_entry(
operation="get_entry",
mim_number=str(mim)
)
# Get clinical synopsis
synopsis = tu.tools.OMIM_get_clinical_synopsis(
operation="get_clinical_synopsis",
mim_number=str(mim)
)
omim_data.append({
'mim_number': mim,
'title': details.get('data', {}).get('titles', {}),
'inheritance': synopsis.get('data', {}).get('inheritance'),
'clinical_features': synopsis.get('data', {})
})
return omim_datapython
def get_omim_context(tu, gene_symbol):
"""Get OMIM gene-disease associations."""
# Search OMIM for gene
search = tu.tools.OMIM_search(
operation="search",
query=gene_symbol,
limit=5
)
omim_data = []
for entry in search.get('data', {}).get('entries', []):
mim = entry.get('mimNumber')
# Get detailed entry
details = tu.tools.OMIM_get_entry(
operation="get_entry",
mim_number=str(mim)
)
# Get clinical synopsis
synopsis = tu.tools.OMIM_get_clinical_synopsis(
operation="get_clinical_synopsis",
mim_number=str(mim)
)
omim_data.append({
'mim_number': mim,
'title': details.get('data', {}).get('titles', {}),
'inheritance': synopsis.get('data', {}).get('inheritance'),
'clinical_features': synopsis.get('data', {})
})
return omim_data2.3 DisGeNET Gene-Disease Evidence (NEW)
2.3 DisGeNET基因-疾病证据(新增)
python
def get_disgenet_context(tu, gene_symbol, variant_rsid=None):
"""Get gene-disease associations from DisGeNET."""
# Gene-disease associations
gda = tu.tools.DisGeNET_search_gene(
operation="search_gene",
gene=gene_symbol,
limit=20
)
# Variant-disease associations (if rsID available)
vda = None
if variant_rsid:
vda = tu.tools.DisGeNET_get_vda(
operation="get_vda",
variant=variant_rsid,
limit=20
)
return {
'gene_associations': gda.get('data', {}).get('associations', []),
'variant_associations': vda.get('data', {}).get('associations', []) if vda else []
}python
def get_disgenet_context(tu, gene_symbol, variant_rsid=None):
"""Get gene-disease associations from DisGeNET."""
# Gene-disease associations
gda = tu.tools.DisGeNET_search_gene(
operation="search_gene",
gene=gene_symbol,
limit=20
)
# Variant-disease associations (if rsID available)
vda = None
if variant_rsid:
vda = tu.tools.DisGeNET_get_vda(
operation="get_vda",
variant=variant_rsid,
limit=20
)
return {
'gene_associations': gda.get('data', {}).get('associations', []),
'variant_associations': vda.get('data', {}).get('associations', []) if vda else []
}2.4 ClinGen Gene Validity & Dosage Sensitivity (NEW)
2.4 ClinGen基因有效性与剂量敏感性(新增)
ClinGen provides authoritative curation of gene-disease relationships:
python
def get_clingen_evidence(tu, gene_symbol):
"""
Get ClinGen gene validity and dosage sensitivity data.
CRITICAL for ACMG classification - establishes gene-disease validity.
"""
# 1. Gene-disease validity (Definitive/Strong/Moderate/Limited)
validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
validity_data = []
if validity.get('data'):
for entry in validity.get('data', []):
validity_data.append({
'disease': entry.get('Disease Label'),
'classification': entry.get('Classification'), # Definitive, Strong, etc.
'inheritance': entry.get('Inheritance'),
'mondo_id': entry.get('Disease ID (MONDO)')
})
# 2. Dosage sensitivity (haploinsufficiency, triplosensitivity)
dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
dosage_data = {}
if dosage.get('data'):
for entry in dosage.get('data', []):
dosage_data = {
'haploinsufficiency_score': entry.get('Haploinsufficiency Score'),
'triplosensitivity_score': entry.get('Triplosensitivity Score'),
'disease': entry.get('Disease')
}
break # Usually one entry per gene
# 3. Clinical actionability (for incidental findings context)
actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
return {
'gene_validity': validity_data,
'dosage_sensitivity': dosage_data,
'actionability': actionability.get('data', {}),
'has_definitive_validity': any(v['classification'] == 'Definitive' for v in validity_data),
'is_haploinsufficient': dosage_data.get('haploinsufficiency_score') == '3'
}ClinGen Validity Levels (for ACMG PM1/PP4):
| Classification | Meaning | ACMG Impact |
|---|---|---|
| Definitive | Multiple concordant studies | Strong gene-disease support |
| Strong | Extensive evidence | Moderate-strong support |
| Moderate | Some evidence | Moderate support |
| Limited | Minimal evidence | Weak support, use caution |
| Disputed | Conflicting evidence | Do not use for classification |
| Refuted | Evidence against | Gene NOT associated |
Dosage Sensitivity Scores (for CNV interpretation):
| Score | Meaning | Interpretation |
|---|---|---|
| 3 | Sufficient evidence | Haploinsufficiency/triplosensitivity established |
| 2 | Emerging evidence | Some support, not definitive |
| 1 | Little evidence | Minimal support |
| 0 | No evidence | Unknown |
ClinGen提供权威的基因-疾病关系分类:
python
def get_clingen_evidence(tu, gene_symbol):
"""
Get ClinGen gene validity and dosage sensitivity data.
CRITICAL for ACMG classification - establishes gene-disease validity.
"""
# 1. Gene-disease validity (Definitive/Strong/Moderate/Limited)
validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
validity_data = []
if validity.get('data'):
for entry in validity.get('data', []):
validity_data.append({
'disease': entry.get('Disease Label'),
'classification': entry.get('Classification'), # Definitive, Strong, etc.
'inheritance': entry.get('Inheritance'),
'mondo_id': entry.get('Disease ID (MONDO)')
})
# 2. Dosage sensitivity (haploinsufficiency, triplosensitivity)
dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
dosage_data = {}
if dosage.get('data'):
for entry in dosage.get('data', []):
dosage_data = {
'haploinsufficiency_score': entry.get('Haploinsufficiency Score'),
'triplosensitivity_score': entry.get('Triplosensitivity Score'),
'disease': entry.get('Disease')
}
break # Usually one entry per gene
# 3. Clinical actionability (for incidental findings context)
actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
return {
'gene_validity': validity_data,
'dosage_sensitivity': dosage_data,
'actionability': actionability.get('data', {}),
'has_definitive_validity': any(v['classification'] == 'Definitive' for v in validity_data),
'is_haploinsufficient': dosage_data.get('haploinsufficiency_score') == '3'
}ClinGen有效性等级(用于ACMG PM1/PP4):
| 分类 | 含义 | ACMG影响 |
|---|---|---|
| Definitive | 多项一致研究支持 | 强基因-疾病关联 |
| Strong | 大量证据支持 | 中-强关联 |
| Moderate | 部分证据支持 | 中等关联 |
| Limited | 少量证据支持 | 弱关联,谨慎使用 |
| Disputed | 证据冲突 | 不用于分类 |
| Refuted | 反向证据 | 基因与疾病无关 |
剂量敏感性评分(用于CNV解读):
| 评分 | 含义 | 解读 |
|---|---|---|
| 3 | 充分证据 | 单倍剂量不足/三倍体敏感性已确立 |
| 2 | 新兴证据 | 部分支持,非确定性 |
| 1 | 少量证据 | minimal支持 |
| 0 | 无证据 | 未知 |
2.5 SpliceAI Splice Variant Prediction (NEW)
2.5 SpliceAI剪接变异预测(新增)
~15% of pathogenic variants affect splicing. SpliceAI is the gold standard for splice prediction:
python
def get_spliceai_prediction(tu, chrom, pos, ref, alt, genome="38"):
"""
Get SpliceAI splice effect predictions.
Delta scores:
- DS_AG: Acceptor gain
- DS_AL: Acceptor loss
- DS_DG: Donor gain
- DS_DL: Donor loss
Thresholds:
- ≥0.8: High pathogenicity (strong PP3)
- 0.5-0.8: Moderate (supporting PP3)
- 0.2-0.5: Low (weak evidence)
- <0.2: Likely benign
"""
# Format variant for SpliceAI
variant = f"chr{chrom}-{pos}-{ref}-{alt}"
# Get full splice predictions
result = tu.tools.SpliceAI_predict_splice(
variant=variant,
genome=genome
)
if result.get('data'):
max_score = result['data'].get('max_delta_score', 0)
interpretation = result['data'].get('interpretation', '')
# Determine ACMG support
if max_score >= 0.8:
acmg = 'PP3 (strong) - high splice impact'
elif max_score >= 0.5:
acmg = 'PP3 (supporting) - moderate splice impact'
elif max_score >= 0.2:
acmg = 'PP3 (weak) - possible splice impact'
else:
acmg = 'BP7 (if synonymous) - splice benign'
return {
'max_delta_score': max_score,
'interpretation': interpretation,
'acmg_support': acmg,
'scores': result['data'].get('scores', [])
}
return None
def quick_splice_check(tu, variant, genome="38"):
"""Quick triage using max delta score only."""
result = tu.tools.SpliceAI_get_max_delta(
variant=variant,
genome=genome
)
return result.get('data', {})When to Use SpliceAI:
- Intronic variants near splice sites (±50bp)
- Synonymous variants (may still affect splicing)
- Exonic variants near splice junctions
- Variants creating cryptic splice sites
Report Section for Splice Variants:
markdown
undefined约15%的致病性变异影响剪接。SpliceAI是剪接预测的金标准:
python
def get_spliceai_prediction(tu, chrom, pos, ref, alt, genome="38"):
"""
Get SpliceAI splice effect predictions.
Delta scores:
- DS_AG: Acceptor gain
- DS_AL: Acceptor loss
- DS_DG: Donor gain
- DS_DL: Donor loss
Thresholds:
- ≥0.8: High pathogenicity (strong PP3)
- 0.5-0.8: Moderate (supporting PP3)
- 0.2-0.5: Low (weak evidence)
- <0.2: Likely benign
"""
# Format variant for SpliceAI
variant = f"chr{chrom}-{pos}-{ref}-{alt}"
# Get full splice predictions
result = tu.tools.SpliceAI_predict_splice(
variant=variant,
genome=genome
)
if result.get('data'):
max_score = result['data'].get('max_delta_score', 0)
interpretation = result['data'].get('interpretation', '')
# Determine ACMG support
if max_score >= 0.8:
acmg = 'PP3 (strong) - high splice impact'
elif max_score >= 0.5:
acmg = 'PP3 (supporting) - moderate splice impact'
elif max_score >= 0.2:
acmg = 'PP3 (weak) - possible splice impact'
else:
acmg = 'BP7 (if synonymous) - splice benign'
return {
'max_delta_score': max_score,
'interpretation': interpretation,
'acmg_support': acmg,
'scores': result['data'].get('scores', [])
}
return None
def quick_splice_check(tu, variant, genome="38"):
"""Quick triage using max delta score only."""
result = tu.tools.SpliceAI_get_max_delta(
variant=variant,
genome=genome
)
return result.get('data', {})SpliceAI适用场景:
- 剪接位点附近的内含子变异(±50bp)
- 同义变异(仍可能影响剪接)
- 剪接 junction附近的外显子变异
- 产生隐蔽剪接位点的变异
剪接变异报告示例:
markdown
undefinedSplice Impact Analysis (SpliceAI)
剪接影响分析(SpliceAI)
| Score Type | Value | Position | Interpretation |
|---|---|---|---|
| DS_AG | 0.02 | +15 | Acceptor gain unlikely |
| DS_AL | 0.85 | -2 | High acceptor loss |
| DS_DG | 0.01 | +8 | Donor gain unlikely |
| DS_DL | 0.03 | +1 | Donor loss unlikely |
Max Delta Score: 0.85 (DS_AL)
Interpretation: High impact - likely disrupts acceptor site
ACMG Support: PP3 (strong) for splice-altering effect
Source: SpliceAI via
SpliceAI_predict_splice
**ClinVar Classification Map**:
| ClinVar | Interpretation |
|---------|----------------|
| Pathogenic | Disease-causing |
| Likely pathogenic | 90%+ confidence pathogenic |
| VUS | Uncertain significance |
| Likely benign | 90%+ confidence benign |
| Benign | Not disease-causing |
| Conflicting | Multiple interpretations |
**gnomAD Thresholds (for rare disease)**:
| Frequency | ACMG Code | Interpretation |
|-----------|-----------|----------------|
| Absent | PM2_Supporting | Absent from controls |
| <0.00001 | PM2_Supporting | Extremely rare |
| <0.0001 | - | Rare (use with caution) |
| >0.01 | BS1/BA1 | Too common for rare disease |
**COSMIC Somatic Evidence (NEW)**:
| COSMIC Finding | Interpretation | ACMG Support |
|----------------|----------------|--------------|
| Recurrent hotspot (>100 samples) | Known oncogenic driver | PS3 (functional) |
| Moderate frequency (10-100) | Likely oncogenic | PM1 (hotspot) |
| Rare somatic (<10) | Unknown significance | No support |
**DisGeNET Score Interpretation (NEW)**:
| GDA Score | Evidence Level | ACMG Support |
|-----------|----------------|--------------|
| >0.7 | Strong | PP4 (phenotype) |
| 0.4-0.7 | Moderate | Supporting |
| <0.4 | Weak | Insufficient || 评分类型 | 数值 | 位置 | 解读 |
|---|---|---|---|
| DS_AG | 0.02 | +15 | 不太可能产生新的受体位点 |
| DS_AL | 0.85 | -2 | 高受体位点丢失风险 |
| DS_DG | 0.01 | +8 | 不太可能产生新的供体位点 |
| DS_DL | 0.03 | +1 | 不太可能丢失供体位点 |
最大Delta评分: 0.85 (DS_AL)
解读: 高影响 - 可能破坏受体位点
ACMG支持: PP3(强)剪接改变效应
来源: SpliceAI via
SpliceAI_predict_splice
**ClinVar分类映射**:
| ClinVar分类 | 解读 |
|---------|----------------|
| Pathogenic | 致病 |
| Likely pathogenic | 90%+置信度致病 |
| VUS | 意义未明 |
| Likely benign | 90%+置信度良性 |
| Benign | 非致病 |
| Conflicting | 多种解读结果 |
**gnomAD阈值(罕见病)**:
| 频率 | ACMG代码 | 解读 |
|-----------|-----------|----------------|
| 未检出 | PM2_Supporting | 对照人群中未发现 |
| <0.00001 | PM2_Supporting | 极罕见 |
| <0.0001 | - | 罕见(谨慎使用) |
| >0.01 | BS1/BA1 | 过于常见,不符合罕见病特征 |
**COSMIC体细胞证据(新增)**:
| COSMIC发现 | 解读 | ACMG支持 |
|----------------|----------------|--------------|
| 反复出现的热点变异(>100样本) | 已知致癌驱动因子 | PS3(功能证据) |
| 中等频率(10-100) | 可能致癌 | PM1(热点变异) |
| 罕见体细胞变异(<10) | 意义未明 | 无支持 |
**DisGeNET评分解读(新增)**:
| GDA评分 | 证据等级 | ACMG支持 |
|-----------|----------------|--------------|
| >0.7 | 强 | PP4(表型关联) |
| 0.4-0.7 | 中等 | 支持性证据 |
| <0.4 | 弱 | 证据不足 |Phase 2.5: Regulatory Context (NEW - for Non-Coding Variants)
阶段2.5:调控背景(新增 - 针对非编码变异)
Goal: Assess regulatory impact for non-coding, intronic, and promoter variants
When to Apply:
- Intronic variants (not splice site)
- Promoter variants
- 5'UTR / 3'UTR variants
- Intergenic variants near disease genes
Tools:
| Tool | Purpose | Key Data |
|---|---|---|
| TF binding at position | Bound TFs, cell types |
| ChIP-seq peaks | Peak coordinates, scores |
| Regulatory elements | Enhancers, promoters, DHS |
| Experiment details | Assay type, targets |
Regulatory Impact Assessment:
python
def assess_regulatory_impact(tu, variant_position, gene_symbol):
"""Assess regulatory impact of non-coding variant."""
# Check TF binding at position
tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
gene=gene_symbol,
cell_type="all"
)
# Get ChIP-seq peaks overlapping variant
peaks = tu.tools.ChIPAtlas_get_peak_data(
gene=gene_symbol,
experiment_type="TF"
)
# Search ENCODE for regulatory annotations
encode_data = tu.tools.ENCODE_search_experiments(
assay_title="ATAC-seq",
biosample="all"
)
# Assess if variant disrupts TF binding
binding_disrupted = check_motif_disruption(variant_position, peaks)
return {
'tf_binding': tf_binding,
'regulatory_peaks': peaks,
'encode_annotations': encode_data,
'likely_regulatory': binding_disrupted
}Regulatory Impact Categories:
| Category | Criteria | ACMG Support |
|---|---|---|
| High impact | Disrupts known TF binding motif | PP3 (supporting) |
| Moderate impact | In active regulatory region | Consider context |
| Low impact | No regulatory annotation | No support |
Output for Report:
markdown
undefined目标:评估非编码、内含子和启动子变异的调控影响
适用场景:
- 内含子变异(非剪接位点)
- 启动子变异
- 5'UTR / 3'UTR变异
- 疾病基因附近的基因间变异
工具:
| 工具 | 用途 | 关键数据 |
|---|---|---|
| 位点的转录因子结合情况 | 结合的转录因子、细胞类型 |
| ChIP-seq峰 | 峰坐标、评分 |
| 调控元件 | 增强子、启动子、DNase I超敏位点 |
| 实验详情 | 检测类型、靶点 |
调控影响评估:
python
def assess_regulatory_impact(tu, variant_position, gene_symbol):
"""Assess regulatory impact of non-coding variant."""
# Check TF binding at position
tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
gene=gene_symbol,
cell_type="all"
)
# Get ChIP-seq peaks overlapping variant
peaks = tu.tools.ChIPAtlas_get_peak_data(
gene=gene_symbol,
experiment_type="TF"
)
# Search ENCODE for regulatory annotations
encode_data = tu.tools.ENCODE_search_experiments(
assay_title="ATAC-seq",
biosample="all"
)
# Assess if variant disrupts TF binding
binding_disrupted = check_motif_disruption(variant_position, peaks)
return {
'tf_binding': tf_binding,
'regulatory_peaks': peaks,
'encode_annotations': encode_data,
'likely_regulatory': binding_disrupted
}调控影响分类:
| 分类 | 标准 | ACMG支持 |
|---|---|---|
| 高影响 | 破坏已知转录因子结合基序 | PP3(支持性) |
| 中影响 | 位于活性调控区域 | 需结合上下文判断 |
| 低影响 | 无调控注释 | 无支持 |
报告输出示例:
markdown
undefined2.5 Regulatory Context (for Non-Coding Variants)
2.5 调控背景(针对非编码变异)
| Feature | Finding | Significance |
|---|---|---|
| Variant location | Intron 5, 120bp from exon 6 | Not canonical splice |
| TF binding site | CTCF binding peak (ChIPAtlas) | May affect insulation |
| ENCODE annotation | Active enhancer (H3K27ac) | Regulatory function |
| Conservation | PhyloP = 2.8 | Moderate conservation |
Regulatory Interpretation: Variant overlaps CTCF binding site in active enhancer region. Potential impact on gene regulation.
Source: ChIPAtlas, ENCODE
undefined| 特征 | 发现 | 显著性 |
|---|---|---|
| 变异位置 | 第5内含子,距第6外显子120bp | 非经典剪接位点 |
| 转录因子结合位点 | CTCF结合峰(ChIPAtlas) | 可能影响绝缘作用 |
| ENCODE注释 | 活性增强子(H3K27ac) | 具有调控功能 |
| 保守性 | PhyloP = 2.8 | 中等保守 |
调控解读: 变异位于CTCF结合位点及活性增强子区域,可能影响基因调控。
来源: ChIPAtlas, ENCODE
undefinedPhase 3: Computational Predictions (ENHANCED)
阶段3:计算预测(增强版)
Goal: Assess in silico pathogenicity predictions using state-of-the-art models
Tools:
| Tool | Purpose | Score Range |
|---|---|---|
| Deleteriousness score (NEW API) | PHRED 0-99 |
| DeepMind pathogenicity (NEW) | 0-1 |
| Evolutionary pathogenicity (NEW) | 0-1 |
| Aggregated predictions | SIFT, PolyPhen |
| VEP predictions | SIFT, PolyPhen |
目标:利用最先进的模型评估致病性预测结果
工具:
| 工具 | 用途 | 评分范围 |
|---|---|---|
| 有害性评分(新增API) | PHRED 0-99 |
| DeepMind致病性预测(新增) | 0-1 |
| 进化致病性预测(新增) | 0-1 |
| 整合预测结果 | SIFT、PolyPhen |
| VEP预测结果 | SIFT、PolyPhen |
3.1 CADD Deleteriousness Scoring (NEW)
3.1 CADD有害性评分(新增)
python
def get_cadd_score(tu, chrom, pos, ref, alt):
"""Get CADD deleteriousness score for a variant."""
result = tu.tools.CADD_get_variant_score(
chrom=str(chrom),
pos=pos,
ref=ref,
alt=alt,
version="GRCh38-v1.7"
)
if result.get('status') == 'success':
phred = result['data'].get('phred_score')
return {
'score': phred,
'interpretation': result['data'].get('interpretation'),
'acmg_support': 'PP3' if phred >= 20 else ('BP4' if phred < 15 else 'neutral')
}
return Nonepython
def get_cadd_score(tu, chrom, pos, ref, alt):
"""Get CADD deleteriousness score for a variant."""
result = tu.tools.CADD_get_variant_score(
chrom=str(chrom),
pos=pos,
ref=ref,
alt=alt,
version="GRCh38-v1.7"
)
if result.get('status') == 'success':
phred = result['data'].get('phred_score')
return {
'score': phred,
'interpretation': result['data'].get('interpretation'),
'acmg_support': 'PP3' if phred >= 20 else ('BP4' if phred < 15 else 'neutral')
}
return None3.2 AlphaMissense Pathogenicity (NEW)
3.2 AlphaMissense致病性预测(新增)
DeepMind's AlphaMissense provides state-of-the-art missense pathogenicity prediction:
python
def get_alphamissense_score(tu, uniprot_id, variant):
"""
Get AlphaMissense pathogenicity score.
variant format: 'R123H' or 'p.R123H'
Thresholds:
- Pathogenic: score > 0.564
- Ambiguous: 0.34-0.564
- Benign: score < 0.34
"""
result = tu.tools.AlphaMissense_get_variant_score(
uniprot_id=uniprot_id,
variant=variant
)
if result.get('status') == 'success' and result.get('data'):
score = result['data'].get('pathogenicity_score')
classification = result['data'].get('classification')
# Map to ACMG
if classification == 'pathogenic':
acmg = 'PP3 (strong)' # AlphaMissense has high accuracy
elif classification == 'benign':
acmg = 'BP4 (strong)'
else:
acmg = 'neutral'
return {
'score': score,
'classification': classification,
'acmg_support': acmg
}
return NoneDeepMind的AlphaMissense提供最先进的错义变异致病性预测:
python
def get_alphamissense_score(tu, uniprot_id, variant):
"""
Get AlphaMissense pathogenicity score.
variant format: 'R123H' or 'p.R123H'
Thresholds:
- Pathogenic: score > 0.564
- Ambiguous: 0.34-0.564
- Benign: score < 0.34
"""
result = tu.tools.AlphaMissense_get_variant_score(
uniprot_id=uniprot_id,
variant=variant
)
if result.get('status') == 'success' and result.get('data'):
score = result['data'].get('pathogenicity_score')
classification = result['data'].get('classification')
# Map to ACMG
if classification == 'pathogenic':
acmg = 'PP3 (strong)' # AlphaMissense has high accuracy
elif classification == 'benign':
acmg = 'BP4 (strong)'
else:
acmg = 'neutral'
return {
'score': score,
'classification': classification,
'acmg_support': acmg
}
return None3.3 EVE Evolutionary Prediction (NEW)
3.3 EVE进化预测(新增)
EVE uses unsupervised learning on evolutionary data:
python
def get_eve_score(tu, chrom, pos, ref, alt):
"""
Get EVE evolutionary pathogenicity score.
Threshold: >0.5 indicates likely pathogenic
"""
result = tu.tools.EVE_get_variant_score(
chrom=str(chrom),
pos=pos,
ref=ref,
alt=alt
)
if result.get('status') == 'success':
eve_scores = result['data'].get('eve_scores', [])
if eve_scores:
best_score = eve_scores[0]
return {
'score': best_score.get('eve_score'),
'classification': best_score.get('classification'),
'gene': best_score.get('gene_symbol'),
'acmg_support': 'PP3' if best_score.get('eve_score', 0) > 0.5 else 'BP4'
}
return NoneEVE利用无监督学习分析进化数据:
python
def get_eve_score(tu, chrom, pos, ref, alt):
"""
Get EVE evolutionary pathogenicity score.
Threshold: >0.5 indicates likely pathogenic
"""
result = tu.tools.EVE_get_variant_score(
chrom=str(chrom),
pos=pos,
ref=ref,
alt=alt
)
if result.get('status') == 'success':
eve_scores = result['data'].get('eve_scores', [])
if eve_scores:
best_score = eve_scores[0]
return {
'score': best_score.get('eve_score'),
'classification': best_score.get('classification'),
'gene': best_score.get('gene_symbol'),
'acmg_support': 'PP3' if best_score.get('eve_score', 0) > 0.5 else 'BP4'
}
return None3.4 Integrated Prediction Strategy
3.4 整合预测策略
For VUS (Variants of Uncertain Significance), combine multiple predictors:
python
def comprehensive_pathogenicity_assessment(tu, variant_info):
"""
Combine all prediction tools for robust classification.
"""
chrom = variant_info['chrom']
pos = variant_info['pos']
ref = variant_info['ref']
alt = variant_info['alt']
uniprot_id = variant_info.get('uniprot_id')
aa_change = variant_info.get('aa_change') # e.g., 'R123H'
predictions = {}
# 1. CADD (works for all variant types)
cadd = get_cadd_score(tu, chrom, pos, ref, alt)
if cadd:
predictions['cadd'] = cadd
# 2. AlphaMissense (missense only, requires UniProt ID)
if uniprot_id and aa_change:
am = get_alphamissense_score(tu, uniprot_id, aa_change)
if am:
predictions['alphamissense'] = am
# 3. EVE (missense only)
eve = get_eve_score(tu, chrom, pos, ref, alt)
if eve:
predictions['eve'] = eve
# Consensus assessment
damaging_count = sum(1 for p in predictions.values()
if 'PP3' in p.get('acmg_support', ''))
benign_count = sum(1 for p in predictions.values()
if 'BP4' in p.get('acmg_support', ''))
if damaging_count >= 2 and benign_count == 0:
consensus = 'likely_damaging'
acmg = 'PP3 (multiple predictors concordant)'
elif benign_count >= 2 and damaging_count == 0:
consensus = 'likely_benign'
acmg = 'BP4 (multiple predictors concordant)'
else:
consensus = 'uncertain'
acmg = 'neutral (discordant predictions)'
return {
'predictions': predictions,
'consensus': consensus,
'acmg_recommendation': acmg
}Prediction Interpretation (Updated):
| Predictor | Damaging | Benign |
|---|---|---|
| AlphaMissense | >0.564 | <0.34 |
| CADD PHRED | ≥20 (top 1%) | <15 |
| EVE | >0.5 | ≤0.5 |
| SIFT | <0.05 | ≥0.05 |
| PolyPhen2 | >0.85 (probably) | <0.15 (benign) |
ACMG Application (Enhanced):
- PP3: Multiple concordant damaging predictions (AlphaMissense + CADD + EVE agreement = strong PP3)
- BP4: Multiple concordant benign predictions
- Note: AlphaMissense alone achieves ~90% accuracy on ClinVar pathogenic variants
针对VUS(意义未明变异),结合多个预测工具:
python
def comprehensive_pathogenicity_assessment(tu, variant_info):
"""
Combine all prediction tools for robust classification.
"""
chrom = variant_info['chrom']
pos = variant_info['pos']
ref = variant_info['ref']
alt = variant_info['alt']
uniprot_id = variant_info.get('uniprot_id')
aa_change = variant_info.get('aa_change') # e.g., 'R123H'
predictions = {}
# 1. CADD (works for all variant types)
cadd = get_cadd_score(tu, chrom, pos, ref, alt)
if cadd:
predictions['cadd'] = cadd
# 2. AlphaMissense (missense only, requires UniProt ID)
if uniprot_id and aa_change:
am = get_alphamissense_score(tu, uniprot_id, aa_change)
if am:
predictions['alphamissense'] = am
# 3. EVE (missense only)
eve = get_eve_score(tu, chrom, pos, ref, alt)
if eve:
predictions['eve'] = eve
# Consensus assessment
damaging_count = sum(1 for p in predictions.values()
if 'PP3' in p.get('acmg_support', ''))
benign_count = sum(1 for p in predictions.values()
if 'BP4' in p.get('acmg_support', ''))
if damaging_count >= 2 and benign_count == 0:
consensus = 'likely_damaging'
acmg = 'PP3 (multiple predictors concordant)'
elif benign_count >= 2 and damaging_count == 0:
consensus = 'likely_benign'
acmg = 'BP4 (multiple predictors concordant)'
else:
consensus = 'uncertain'
acmg = 'neutral (discordant predictions)'
return {
'predictions': predictions,
'consensus': consensus,
'acmg_recommendation': acmg
}预测解读(更新):
| 预测工具 | 致病性 | 良性 |
|---|---|---|
| AlphaMissense | >0.564 | <0.34 |
| CADD PHRED | ≥20(前1%) | <15 |
| EVE | >0.5 | ≤0.5 |
| SIFT | <0.05 | ≥0.05 |
| PolyPhen2 | >0.85(可能致病) | <0.15(良性) |
ACMG应用(增强版):
- PP3: 多个一致的致病性预测结果(AlphaMissense + CADD + EVE一致 = 强PP3)
- BP4: 多个一致的良性预测结果
- 注意: AlphaMissense单独使用对ClinVar致病性变异的准确率约为90%
Phase 4: Structural Analysis
阶段4:结构分析
Goal: Assess protein structural impact (especially for VUS)
Tools:
| Tool | Purpose |
|---|---|
| Find experimental structures |
| Predict structure if no PDB |
| Get AlphaFold DB structure |
| Domain annotations |
| Functional sites |
Structural Impact Categories:
| Impact Level | Description | ACMG Support |
|---|---|---|
| Critical | Active site, catalytic residue | PM1 (strong) |
| High | Buried residue, disulfide, structural core | PM1 (moderate) |
| Moderate | Domain interface, binding site | PM1 (supporting) |
| Low | Surface, flexible region | No support |
Using AlphaFold2 for VUS:
1. Get wildtype structure (PDB or AlphaFold)
2. Identify residue location:
- pLDDT at position (confidence)
- Solvent accessibility
- Secondary structure
3. Assess structural context:
- Distance to functional sites
- Interaction partners
- Conservation in structure
4. Predict impact:
- Side chain burial
- Hydrogen bond disruption
- Charge changes in buried positions目标:评估蛋白质结构影响(尤其针对VUS)
工具:
| 工具 | 用途 |
|---|---|
| 查找实验结构 |
| 无PDB结构时预测结构 |
| 获取AlphaFold DB结构 |
| 结构域注释 |
| 功能位点 |
结构影响分类:
| 影响等级 | 描述 | ACMG支持 |
|---|---|---|
| Critical | 活性位点、催化残基 | PM1(强) |
| High | 埋藏残基、二硫键、结构核心 | PM1(中) |
| Moderate | 结构域界面、结合位点 | PM1(支持性) |
| Low | 表面、柔性区域 | 无支持 |
AlphaFold2在VUS分析中的应用:
1. 获取野生型结构(PDB或AlphaFold)
2. 识别残基位置:
- 该位置的pLDDT(置信度)
- 溶剂可及性
- 二级结构
3. 评估结构背景:
- 与功能位点的距离
- 相互作用伙伴
- 结构中的保守性
4. 预测影响:
- 侧链埋藏情况
- 氢键破坏
- 埋藏位置的电荷变化Phase 4.5: Expression Context (NEW)
阶段4.5:表达背景(新增)
Goal: Validate gene expression in disease-relevant tissues/cells
Tools:
| Tool | Purpose | Key Data |
|---|---|---|
| Cell-type specific expression | TPM per cell type |
| Cell type annotations | Tissue, disease state |
| Tissue expression | TPM per tissue |
Expression Validation:
python
def validate_expression_context(tu, gene_symbol, phenotype_tissues):
"""Validate gene is expressed in phenotype-relevant tissues."""
# Single-cell expression
sc_expression = tu.tools.CELLxGENE_get_expression_data(
gene=gene_symbol,
tissue=phenotype_tissues[0] if phenotype_tissues else "all"
)
# Bulk tissue expression (GTEx)
gtex = tu.tools.GTEx_get_median_gene_expression(
gene=gene_symbol
)
# Check expression in relevant tissues
relevant_expression = {
tissue: gtex.get(tissue, 0)
for tissue in phenotype_tissues
}
return {
'single_cell': sc_expression,
'gtex': relevant_expression,
'expressed_in_phenotype_tissue': any(v > 1 for v in relevant_expression.values())
}Why it matters:
- Confirms gene is expressed where disease manifests
- Supports PP4 (phenotype-specific) if highly restricted expression
- Can challenge classification if not expressed in affected tissue
Output for Report:
markdown
undefined目标:验证基因在疾病相关组织/细胞中的表达情况
工具:
| 工具 | 用途 | 关键数据 |
|---|---|---|
| 细胞类型特异性表达 | 各细胞类型的TPM值 |
| 细胞类型注释 | 组织、疾病状态 |
| 组织表达 | 各组织的TPM值 |
表达验证:
python
def validate_expression_context(tu, gene_symbol, phenotype_tissues):
"""Validate gene is expressed in phenotype-relevant tissues."""
# Single-cell expression
sc_expression = tu.tools.CELLxGENE_get_expression_data(
gene=gene_symbol,
tissue=phenotype_tissues[0] if phenotype_tissues else "all"
)
# Bulk tissue expression (GTEx)
gtex = tu.tools.GTEx_get_median_gene_expression(
gene=gene_symbol
)
# Check expression in relevant tissues
relevant_expression = {
tissue: gtex.get(tissue, 0)
for tissue in phenotype_tissues
}
return {
'single_cell': sc_expression,
'gtex': relevant_expression,
'expressed_in_phenotype_tissue': any(v > 1 for v in relevant_expression.values())
}重要性:
- 确认基因在疾病发生组织中表达
- 若表达高度受限,支持PP4(表型特异性)
- 若在受影响组织中不表达,可能质疑分类结果
报告输出示例:
markdown
undefined4.5 Expression Context
4.5 表达背景
| Tissue | Expression (TPM) | Relevance |
|---|---|---|
| Heart | 45.2 | ✓ Primary disease tissue |
| Skeletal muscle | 38.7 | ✓ Secondary involvement |
| Liver | 2.1 | Low expression |
| Brain | 0.5 | Not expressed |
Single-Cell Analysis (CELLxGENE):
- Cardiomyocytes: High expression (TPM=85)
- Cardiac fibroblasts: Low expression (TPM=5)
Interpretation: Gene highly expressed in cardiomyocytes, supporting cardiac phenotype association.
Source: GTEx, CELLxGENE Census
undefined| 组织 | 表达量(TPM) | 相关性 |
|---|---|---|
| 心脏 | 45.2 | ✓ 主要疾病组织 |
| 骨骼肌 | 38.7 | ✓ 次要受累组织 |
| 肝脏 | 2.1 | 低表达 |
| 脑 | 0.5 | 不表达 |
单细胞分析(CELLxGENE):
- 心肌细胞: 高表达(TPM=85)
- 心脏成纤维细胞: 低表达(TPM=5)
解读: 基因在心肌细胞中高表达,支持与心脏表型的关联。
Source: GTEx, CELLxGENE Census
undefinedPhase 5: Literature Evidence (ENHANCED)
阶段5:文献证据(增强版)
Goal: Find functional studies, case reports, and cutting-edge preprints
Tools:
| Tool | Purpose | Coverage |
|---|---|---|
| Peer-reviewed studies | Comprehensive |
| Additional literature | Europe PMC |
| Biology preprints | Recent findings |
| Clinical preprints | Clinical studies |
| Citation analysis | Impact metrics |
| AI-ranked search | Relevance |
Search Strategies:
python
def comprehensive_literature_search(tu, gene, variant, phenotype):
"""Search across all literature sources."""
# 1. PubMed: Peer-reviewed
pubmed = tu.tools.PubMed_search(
query=f'"{gene}" AND ("{variant}" OR functional)',
max_results=30
)
# 2. BioRxiv: Recent preprints
biorxiv = tu.tools.BioRxiv_search_preprints(
query=f"{gene} {phenotype}",
limit=10
)
# 3. MedRxiv: Clinical preprints
medrxiv = tu.tools.MedRxiv_search_preprints(
query=f"{gene} variant {phenotype}",
limit=10
)
# 4. Citation analysis
key_papers = pubmed[:5] # Top papers
for paper in key_papers:
citations = tu.tools.openalex_search_works(
query=paper['title'],
limit=1
)
paper['citation_count'] = citations[0].get('cited_by_count', 0) if citations else 0
return {
'pubmed': pubmed,
'preprints': biorxiv + medrxiv,
'key_papers_with_citations': key_papers
}Search Queries:
undefined目标:查找功能研究、病例报告和前沿预印本
工具:
| 工具 | 用途 | 覆盖范围 |
|---|---|---|
| 同行评审研究 | 全面覆盖 |
| 补充文献 | Europe PMC |
| 生物学预印本 | 最新发现 |
| 临床预印本 | 临床研究 |
| 引用分析 | 影响指标 |
| AI排序搜索 | 相关性 |
搜索策略:
python
def comprehensive_literature_search(tu, gene, variant, phenotype):
"""Search across all literature sources."""
# 1. PubMed: Peer-reviewed
pubmed = tu.tools.PubMed_search(
query=f'"{gene}" AND ("{variant}" OR functional)',
max_results=30
)
# 2. BioRxiv: Recent preprints
biorxiv = tu.tools.BioRxiv_search_preprints(
query=f"{gene} {phenotype}",
limit=10
)
# 3. MedRxiv: Clinical preprints
medrxiv = tu.tools.MedRxiv_search_preprints(
query=f"{gene} variant {phenotype}",
limit=10
)
# 4. Citation analysis
key_papers = pubmed[:5] # Top papers
for paper in key_papers:
citations = tu.tools.openalex_search_works(
query=paper['title'],
limit=1
)
paper['citation_count'] = citations[0].get('cited_by_count', 0) if citations else 0
return {
'pubmed': pubmed,
'preprints': biorxiv + medrxiv,
'key_papers_with_citations': key_papers
}搜索查询示例:
undefinedGene + variant specific
基因+变异特异性查询
"{GENE} AND ({HGVS_p} OR {AA_change})"
"{GENE} AND ({HGVS_p} OR {AA_change})"
Functional studies
功能研究查询
"{GENE} AND (functional OR functional study OR mutagenesis)"
"{GENE} AND (functional OR functional study OR mutagenesis)"
Clinical reports
临床报告查询
"{GENE} AND (case report OR patient) AND {phenotype}"
"{GENE} AND (case report OR patient) AND {phenotype}"
Preprint-specific
预印本特定查询
"{GENE} genetics 2024" (for recent preprints)
**⚠️ Preprint Warning**: Always flag preprints as NOT peer-reviewed in reports.
**Evidence Types**:
| Evidence | ACMG Code | Weight |
|----------|-----------|--------|
| Functional study (null) | PS3 | Strong |
| Functional study (reduced) | PS3_Moderate | Moderate |
| Case reports with segregation | PP1 | Supporting to Moderate |
| Co-occurrence with pathogenic | BP2 | Supporting against |"{GENE} genetics 2024" (用于最新预印本)
**⚠️ 预印本提示**: 报告中需始终标注预印本为**未经过同行评审**。
**证据类型**:
| 证据 | ACMG代码 | 权重 |
|----------|-----------|--------|
| 功能研究(无效) | PS3 | 强 |
| 功能研究(功能降低) | PS3_Moderate | 中 |
| 带有分离数据的病例报告 | PP1 | 支持性到中等 |
| 与致病性变异共现 | BP2 | 反向支持 |Phase 6: ACMG Classification
阶段6:ACMG分类
Goal: Systematic classification with explicit evidence
ACMG Evidence Codes:
Pathogenic:
| Code | Strength | Description |
|---|---|---|
| PVS1 | Very Strong | Null variant in gene where LOF is mechanism |
| PS1 | Strong | Same amino acid change as known pathogenic |
| PS3 | Strong | Well-established functional studies |
| PM1 | Moderate | Mutational hot spot / functional domain |
| PM2 | Moderate | Absent from controls |
| PM5 | Moderate | Different missense at same residue as pathogenic |
| PP3 | Supporting | Multiple computational predictions |
| PP5 | Supporting | Reputable source reports pathogenic |
Benign:
| Code | Strength | Description |
|---|---|---|
| BA1 | Stand-alone | MAF >5% |
| BS1 | Strong | MAF greater than expected |
| BS3 | Strong | Functional studies show no effect |
| BP4 | Supporting | Multiple computational predictions benign |
| BP7 | Supporting | Synonymous with no splice impact |
Classification Algorithm:
| Classification | Evidence Required |
|---|---|
| Pathogenic | 1 Very Strong + 1 Strong; OR 2 Strong; OR 1 Strong + 3 Moderate |
| Likely Pathogenic | 1 Very Strong + 1 Moderate; OR 1 Strong + 2 Moderate; OR 1 Strong + 2 Supporting |
| Likely Benign | 1 Strong + 1 Supporting; OR 2 Supporting |
| Benign | 1 Stand-alone; OR 2 Strong |
| VUS | Criteria not met |
目标:基于明确证据进行系统性分类
ACMG证据代码:
致病性:
| 代码 | 强度 | 描述 |
|---|---|---|
| PVS1 | 极强 | 基因功能缺失为致病机制时的无义变异 |
| PS1 | 强 | 与已知致病性变异的氨基酸改变相同 |
| PS3 | 强 | 成熟的功能研究支持 |
| PM1 | 中 | 突变热点 / 功能结构域 |
| PM2 | 中 | 对照人群中未检出 |
| PM5 | 中 | 同一残基的不同错义变异为致病性 |
| PP3 | 支持性 | 多个计算预测结果一致 |
| PP5 | 支持性 | 权威来源报告为致病性 |
良性:
| 代码 | 强度 | 描述 |
|---|---|---|
| BA1 | 独立 | 次要等位基因频率>5% |
| BS1 | 强 | 频率高于预期 |
| BS3 | 强 | 功能研究显示无影响 |
| BP4 | 支持性 | 多个计算预测结果为良性 |
| BP7 | 支持性 | 同义变异且无剪接影响 |
分类算法:
| 分类 | 所需证据 |
|---|---|
| 致病性 | 1个极强 +1个强;或2个强;或1个强+3个中 |
| 可能致病性 | 1个极强+1个中;或1个强+2个中;或1个强+2个支持性 |
| 可能良性 | 1个强+1个支持性;或2个支持性 |
| 良性 | 1个独立;或2个强 |
| VUS | 未满足上述标准 |
Output Structure
输出结构
Report Sections
报告章节
markdown
undefinedmarkdown
undefinedVariant Interpretation Report: {GENE} {VARIANT}
变异解读报告: {GENE} {VARIANT}
Executive Summary
执行摘要
- Variant: {HGVS notation}
- Gene: {gene symbol}
- Classification: {Pathogenic/Likely Pathogenic/VUS/Likely Benign/Benign}
- Evidence Strength: {strong/moderate/limited}
- Key Finding: {one-sentence summary}
- 变异: {HGVS命名}
- 基因: {基因符号}
- 分类: {致病性/可能致病性/VUS/可能良性/良性}
- 证据强度: {强/中/有限}
- 关键发现: {一句话总结}
1. Variant Identity
1. 变异识别
{gene, transcript, protein change, consequence}
{基因、转录本、蛋白质变化、变异类型}
2. Population Data
2. 人群数据
{gnomAD frequencies, ancestry breakdown}
{gnomAD频率、祖先细分数据}
3. Clinical Database Evidence
3. 临床数据库证据
{ClinVar, ClinGen, OMIM}
{ClinVar、ClinGen、OMIM结果}
4. Computational Predictions
4. 计算预测结果
{SIFT, PolyPhen, CADD scores}
{SIFT、PolyPhen、CADD评分}
5. Structural Analysis
5. 结构分析
{Domain location, functional site proximity, AlphaFold confidence}
{结构域位置、功能位点距离、AlphaFold置信度}
6. Literature Evidence
6. 文献证据
{Functional studies, case reports}
{功能研究、病例报告}
7. ACMG Classification
7. ACMG分类
{Evidence codes applied, classification rationale}
{应用的证据代码、分类依据}
8. Clinical Recommendations
8. 临床建议
{Testing, management, family screening}
{检测、管理、家族筛查}
9. Limitations & Uncertainties
9. 局限性与不确定性
{Missing data, conflicting evidence}
{缺失数据、冲突证据}
Data Sources
数据来源
{All tools and databases queried}
---{所有查询的工具和数据库}
---Evidence Grading
证据分级
Classification Confidence
分类置信度
| Symbol | Classification | Evidence Level |
|---|---|---|
| ★★★ | High confidence | Multiple independent lines |
| ★★☆ | Moderate confidence | Some supporting evidence |
| ★☆☆ | Limited confidence | Minimal evidence |
| VUS | Uncertain | Insufficient data |
| 符号 | 分类 | 证据等级 |
|---|---|---|
| ★★★ | 高置信度 | 多个独立证据链 |
| ★★☆ | 中置信度 | 部分支持性证据 |
| ★☆☆ | 有限置信度 | 少量证据 |
| VUS | 意义未明 | 数据不足 |
Structural Impact Confidence
结构影响置信度
| pLDDT Range | Interpretation |
|---|---|
| >90 | Very high confidence in position |
| 70-90 | High confidence |
| 50-70 | Moderate (often loops) |
| <50 | Low confidence (disorder) |
| pLDDT范围 | 解读 |
|---|---|
| >90 | 位置置信度极高 |
| 70-90 | 高置信度 |
| 50-70 | 中置信度(常为环区) |
| <50 | 低置信度(无序区) |
Special Scenarios
特殊场景
Scenario 1: Novel Missense VUS
场景1:新型错义VUS
Additional workflow:
- Check if other pathogenic variants at same residue
- Get AlphaFold2 structure
- Analyze:
- Is residue buried or surface?
- What secondary structure?
- Proximity to active/binding sites?
- Conservation across species?
- Apply PM1 if in functional domain
- Apply PP3 if predictions concordant
额外工作流:
- 检查同一残基是否存在其他致病性变异
- 获取AlphaFold2结构
- 分析:
- 残基是埋藏还是表面?
- 二级结构类型?
- 与活性/结合位点的距离?
- 跨物种保守性?
- 若位于功能结构域,应用PM1
- 若预测结果一致,应用PP3
Scenario 2: Truncating Variant
场景2:截短变异
Additional workflow:
- Check if LOF is mechanism for gene
- Determine if escapes NMD (last exon)
- Check for alternative isoforms
- Review ClinGen LOF curation
PVS1 Application:
| Scenario | PVS1 Strength |
|---|---|
| Canonical LOF gene, NMD predicted | Very Strong |
| LOF gene, last exon | Moderate |
| Non-LOF gene | Not applicable |
额外工作流:
- 检查基因的致病机制是否为功能缺失
- 判断是否逃逸无义介导的降解(最后一个外显子)
- 检查是否存在可变剪接体
- 回顾ClinGen的功能缺失分类
PVS1应用规则:
| 场景 | PVS1强度 |
|---|---|
| 标准功能缺失基因,预测会发生NMD | 极强 |
| 功能缺失基因,位于最后一个外显子 | 中 |
| 非功能缺失基因 | 不适用 |
Scenario 3: Splice Variant
场景3:剪接变异
Additional workflow:
- Check SpliceAI scores (if available)
- Determine canonical splice site distance
- Review for in-frame skipping potential
- Check for cryptic splice activation
额外工作流:
- 检查SpliceAI评分(若可用)
- 确定与经典剪接位点的距离
- 评估框内跳跃的可能性
- 检查是否激活隐蔽剪接位点
Quantified Minimums
量化最低要求
| Section | Requirement |
|---|---|
| Population frequency | gnomAD overall + ≥3 ancestry groups |
| Predictions | ≥3 computational predictors |
| Literature search | ≥2 search strategies |
| ACMG codes | All applicable codes listed |
| 章节 | 要求 |
|---|---|
| 人群频率 | gnomAD总频率 + ≥3个祖先群体数据 |
| 预测结果 | ≥3个计算预测工具 |
| 文献搜索 | ≥2种搜索策略 |
| ACMG代码 | 列出所有适用代码 |
NVIDIA NIM Integration
NVIDIA NIM集成
When to Use AlphaFold2 for Variants
AlphaFold2在变异分析中的适用场景
Use Case: VUS missense variants where structural context aids interpretation
Workflow:
python
undefined适用场景:结构背景有助于解读的VUS错义变异
工作流:
python
undefined1. Get protein sequence
1. 获取蛋白质序列
protein_seq = tu.tools.UniProt_get_protein_sequence(accession=uniprot_id)
protein_seq = tu.tools.UniProt_get_protein_sequence(accession=uniprot_id)
2. Get/predict structure
2. 获取/预测结构
try:
pdb_hits = tu.tools.PDB_search_by_uniprot(uniprot_id=uniprot_id)
structure = tu.tools.PDB_get_structure(pdb_id=pdb_hits[0]['pdb_id'])
except:
# Predict with AlphaFold2
structure = tu.tools.NvidiaNIM_alphafold2(
sequence=protein_seq['sequence'],
algorithm="mmseqs2"
)
try:
pdb_hits = tu.tools.PDB_search_by_uniprot(uniprot_id=uniprot_id)
structure = tu.tools.PDB_get_structure(pdb_id=pdb_hits[0]['pdb_id'])
except:
# Predict with AlphaFold2
structure = tu.tools.NvidiaNIM_alphafold2(
sequence=protein_seq['sequence'],
algorithm="mmseqs2"
)
3. Analyze variant position
3. 分析变异位置
- Extract pLDDT at residue position
- 提取残基位置的pLDDT
- Calculate solvent accessibility
- 计算溶剂可及性
- Check for nearby functional sites
- 检查附近的功能位点
**Structural Features to Report**:
- pLDDT at variant position
- Secondary structure (helix/sheet/coil)
- Solvent accessibility (buried/exposed)
- Distance to active site (if applicable)
- Interactions disrupted (H-bonds, salt bridges)
---
**需报告的结构特征**:
- 变异位置的pLDDT
- 二级结构(螺旋/片层/卷曲)
- 溶剂可及性(埋藏/暴露)
- 与活性位点的距离(若适用)
- 被破坏的相互作用(氢键、盐桥)
---Report File Naming
报告文件命名规则
{GENE}_{VARIANT}_interpretation_report.md
Examples:
BRCA1_c.5266dupC_interpretation_report.md
TP53_p.R273H_interpretation_report.md{GENE}_{VARIANT}_interpretation_report.md
示例:
BRCA1_c.5266dupC_interpretation_report.md
TP53_p.R273H_interpretation_report.mdClinical Recommendations Framework
临床建议框架
For Pathogenic/Likely Pathogenic
致病性/可能致病性变异
| Disease Context | Recommendations |
|---|---|
| Cancer predisposition | Enhanced screening, risk-reducing options |
| Pharmacogenomics | Drug dosing adjustment |
| Carrier status | Reproductive counseling |
| Predictive testing | Family cascade screening |
| 疾病背景 | 建议 |
|---|---|
| 癌症易感 | 加强筛查、风险降低方案 |
| 药物基因组学 | 调整药物剂量 |
| 携带者状态 | 生殖咨询 |
| 预测性检测 | 家族级联筛查 |
For VUS
VUS
| Action | Details |
|---|---|
| Clinical management | Do not use for medical decisions |
| Follow-up | Reinterpret in 1-2 years |
| Research | Functional studies if available |
| Family | Segregation data valuable |
| 行动 | 详情 |
|---|---|
| 临床管理 | 不用于医疗决策 |
| 随访 | 1-2年后重新解读 |
| 研究 | 若有条件进行功能研究 |
| 家族 | 分离数据具有价值 |
For Benign/Likely Benign
良性/可能良性变异
| Action | Details |
|---|---|
| Clinical | Not expected to cause disease |
| Family | No cascade testing needed |
| Documentation | Include in report for completeness |
| 行动 | 详情 |
|---|---|
| 临床 | 预期不会致病 |
| 家族 | 无需级联检测 |
| 文档 | 纳入报告以保证完整性 |
See Also
参考文档
- - Pre-delivery verification
CHECKLIST.md - - Sample interpretations
EXAMPLES.md - - Tool parameters and fallbacks
TOOLS_REFERENCE.md
- - 交付前验证清单
CHECKLIST.md - - 解读示例
EXAMPLES.md - - 工具参数与备选方案
TOOLS_REFERENCE.md