tooluniverse-variant-interpretation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

name: tooluniverse-variant-interpretation description: Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.


name: tooluniverse-variant-interpretation description: 从原始变异检测结果到符合ACMG分类建议的系统性临床变异解读,包含结构影响分析。整合ClinVar、gnomAD、CIViC、UniProt和PDB等数据库的证据,匹配ACMG标准。生成致病性评分(0-100)、临床建议和治疗指导。适用于基因变异解读、意义未明变异(VUS)分类、ACMG变异分类,以及将变异检测结果转化为临床可执行方案的场景。

Clinical Variant Interpreter

临床变异解读工具

Systematic variant interpretation skill using ToolUniverse - from raw variant calls to ACMG-classified clinical recommendations with structural impact analysis.

基于ToolUniverse的系统性变异解读工具——从原始变异检测结果到符合ACMG分类的临床建议,包含结构影响分析。

Problem This Skill Solves

本工具解决的问题

Clinical labs and researchers face critical challenges in variant interpretation:
  1. Variant classification uncertainty - VUS (Variants of Uncertain Significance) comprise 40-60% of clinical variants
  2. Evidence aggregation burden - Must integrate data from 10+ databases per variant
  3. Structural context missing - Traditional annotation ignores 3D protein impact
  4. Clinical actionability unclear - How does classification translate to patient care?
This skill provides: A systematic workflow that combines population databases, functional predictions, structural analysis (via AlphaFold2), and literature evidence into ACMG-compliant interpretations with clear clinical recommendations.

临床实验室和研究人员在变异解读中面临以下关键挑战:
  1. 变异分类不确定性 - 意义未明变异(VUS)占临床变异的40-60%
  2. 证据整合负担 - 每个变异需整合10余个数据库的数据
  3. 结构信息缺失 - 传统注释忽略蛋白质3D结构影响
  4. 临床可执行性不明确 - 分类结果如何转化为患者护理方案?
本工具提供:一套系统性工作流,整合人群数据库、功能预测、结构分析(基于AlphaFold2)和文献证据,生成符合ACMG标准的解读结果及明确的临床建议。

Key Principles

核心原则

  1. ACMG-Guided Classification - Follow ACMG/AMP 2015 guidelines with explicit evidence codes
  2. Structural Evidence Integration - Use AlphaFold2 for novel structural impact analysis
  3. Population Context - gnomAD frequencies with ancestry-specific data
  4. Gene-Disease Validity - ClinGen curation status for clinical relevance
  5. Actionable Output - Clear recommendations, not just classifications
  6. English-first queries - Always use English terms in tool calls (gene names, variant descriptions, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language

  1. ACMG指导分类 - 遵循ACMG/AMP 2015指南,使用明确的证据代码
  2. 结构证据整合 - 采用AlphaFold2进行新型结构影响分析
  3. 人群背景分析 - 结合gnomAD频率及祖先特异性数据
  4. 基因-疾病有效性 - 基于ClinGen的临床相关性分类状态
  5. 可执行输出 - 提供明确的建议,而非仅分类结果
  6. 优先英文查询 - 工具调用中始终使用英文术语(基因名、变异描述、疾病名),即使用户使用其他语言提问。仅在必要时尝试原语言术语作为备选。以用户使用的语言回复

Triggers

触发场景

Use this skill when users:
  • Ask about variant interpretation or classification
  • Have VCF data needing clinical annotation
  • Ask "what does this variant mean clinically?"
  • Need ACMG classification for variants
  • Want structural impact analysis for missense variants
  • Ask about pathogenicity of specific variants

当用户有以下需求时使用本工具:
  • 询问变异解读或分类相关问题
  • 有VCF数据需要临床注释
  • 询问“该变异在临床上有何意义?”
  • 需要对变异进行ACMG分类
  • 希望对意义未明错义变异进行结构影响分析
  • 询问特定变异的致病性

Workflow Overview

工作流概述

┌─────────────────────────────────────────────────────────────────┐
│                    VARIANT INTERPRETATION                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 1: VARIANT IDENTITY                                       │
│  ├── Normalize variant notation (HGVS)                          │
│  ├── Map to gene, transcript, protein                           │
│  └── Get consequence type (missense, nonsense, etc.)            │
│                                                                  │
│  Phase 2: CLINICAL DATABASES                                     │
│  ├── ClinVar: Existing classifications                          │
│  ├── gnomAD: Population frequencies (all + ancestry)            │
│  ├── OMIM: Gene-disease associations                            │
│  ├── ClinGen: Gene validity + dosage sensitivity (ENHANCED)     │
│  │   └─ ClinGen_search_gene_validity, ClinGen_search_dosage     │
│  └── SpliceAI: Splice variant prediction (NEW)                  │
│                                                                  │
│  Phase 2.5: REGULATORY CONTEXT (NEW - for non-coding variants)  │
│  ├── ChIPAtlas: TF binding at position                          │
│  ├── ENCODE: Regulatory elements (enhancers, promoters)         │
│  ├── Conservation in regulatory regions                         │
│  └── Functional annotation of regulatory impact                 │
│                                                                  │
│  Phase 3: COMPUTATIONAL PREDICTIONS                              │
│  ├── SIFT/PolyPhen: Damaging predictions                        │
│  ├── CADD: Deleteriousness score                                │
│  ├── SpliceAI: Splice impact (if applicable)                    │
│  └── Conservation: Cross-species alignment                      │
│                                                                  │
│  Phase 4: STRUCTURAL ANALYSIS (for VUS/novel missense)          │
│  ├── Get protein structure (PDB or AlphaFold2)                  │
│  ├── Map variant to structure                                   │
│  ├── Assess domain/functional site impact                       │
│  └── Predict structural destabilization                         │
│                                                                  │
│  Phase 4.5: EXPRESSION CONTEXT (NEW)                            │
│  ├── CELLxGENE: Cell-type specific expression                   │
│  ├── Tissue relevance to phenotype                              │
│  └── Expression validation                                       │
│                                                                  │
│  Phase 5: LITERATURE EVIDENCE                                    │
│  ├── PubMed: Functional studies                                 │
│  ├── BioRxiv/MedRxiv: Recent preprints (NEW)                   │
│  ├── Case reports: Phenotype correlations                       │
│  └── Segregation data (if in literature)                        │
│                                                                  │
│  Phase 6: ACMG CLASSIFICATION                                    │
│  ├── Apply evidence codes (PVS1, PM2, PP3, etc.)               │
│  ├── Calculate classification                                   │
│  ├── Identify limiting factors                                  │
│  └── Generate clinical recommendations                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                    VARIANT INTERPRETATION                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 1: VARIANT IDENTITY                                       │
│  ├── Normalize variant notation (HGVS)                          │
│  ├── Map to gene, transcript, protein                           │
│  └── Get consequence type (missense, nonsense, etc.)            │
│                                                                  │
│  Phase 2: CLINICAL DATABASES                                     │
│  ├── ClinVar: Existing classifications                          │
│  ├── gnomAD: Population frequencies (all + ancestry)            │
│  ├── OMIM: Gene-disease associations                            │
│  ├── ClinGen: Gene validity + dosage sensitivity (ENHANCED)     │
│  │   └─ ClinGen_search_gene_validity, ClinGen_search_dosage     │
│  └── SpliceAI: Splice variant prediction (NEW)                  │
│                                                                  │
│  Phase 2.5: REGULATORY CONTEXT (NEW - for non-coding variants)  │
│  ├── ChIPAtlas: TF binding at position                          │
│  ├── ENCODE: Regulatory elements (enhancers, promoters)         │
│  ├── Conservation in regulatory regions                         │
│  └── Functional annotation of regulatory impact                 │
│                                                                  │
│  Phase 3: COMPUTATIONAL PREDICTIONS                              │
│  ├── SIFT/PolyPhen: Damaging predictions                        │
│  ├── CADD: Deleteriousness score                                │
│  ├── SpliceAI: Splice impact (if applicable)                    │
│  └── Conservation: Cross-species alignment                      │
│                                                                  │
│  Phase 4: STRUCTURAL ANALYSIS (for VUS/novel missense)          │
│  ├── Get protein structure (PDB or AlphaFold2)                  │
│  ├── Map variant to structure                                   │
│  ├── Assess domain/functional site impact                       │
│  └── Predict structural destabilization                         │
│                                                                  │
│  Phase 4.5: EXPRESSION CONTEXT (NEW)                            │
│  ├── CELLxGENE: Cell-type specific expression                   │
│  ├── Tissue relevance to phenotype                              │
│  └── Expression validation                                       │
│                                                                  │
│  Phase 5: LITERATURE EVIDENCE                                    │
│  ├── PubMed: Functional studies                                 │
│  ├── BioRxiv/MedRxiv: Recent preprints (NEW)                   │
│  ├── Case reports: Phenotype correlations                       │
│  └── Segregation data (if in literature)                        │
│                                                                  │
│  Phase 6: ACMG CLASSIFICATION                                    │
│  ├── Apply evidence codes (PVS1, PM2, PP3, etc.)               │
│  ├── Calculate classification                                   │
│  ├── Identify limiting factors                                  │
│  └── Generate clinical recommendations                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Phase Details

阶段详情

Phase 1: Variant Identity & Normalization

阶段1:变异识别与标准化

Goal: Standardize variant notation and determine molecular consequence
Tools:
ToolPurpose
myvariant_query
Get variant annotations from MyVariant.info
Ensembl_get_variant_info
Variant effect predictor data
NCBI_gene_search
Gene information
Key Information to Capture:
  • HGVS notation (c. and p.)
  • Gene symbol and Ensembl ID
  • Transcript (canonical/MANE Select)
  • Consequence type
  • Amino acid change (for missense)
  • Exon/intron location
目标:标准化变异命名并确定分子影响
工具:
工具用途
myvariant_query
从MyVariant.info获取变异注释
Ensembl_get_variant_info
变异效应预测器数据
NCBI_gene_search
基因信息
需捕获的关键信息:
  • HGVS命名(c. 和 p. 格式)
  • 基因符号和Ensembl ID
  • 转录本(标准/MANE Select)
  • 变异类型
  • 氨基酸变化(针对错义变异)
  • 外显子/内含子位置

Phase 2: Clinical Database Queries

阶段2:临床数据库查询

Goal: Aggregate existing clinical knowledge
Tools:
ToolPurposeKey Data
clinvar_search
Existing classificationsClassification, review status, submissions
gnomad_search
Population frequencyAF, ancestry-specific AFs, homozygotes
OMIM_search
,
OMIM_get_entry
Gene-diseaseInheritance, phenotypes
ClinGen_gene_validity
Curation statusGene-disease validity level
COSMIC_search_mutations
Somatic mutations (NEW)Cancer frequency, histology
DisGeNET_search_gene
Gene-disease associations (NEW)Evidence scores, sources
目标:整合现有临床知识
工具:
工具用途关键数据
clinvar_search
已有分类结果分类结果、评审状态、提交记录
gnomad_search
人群频率等位基因频率、祖先特异性频率、纯合子数量
OMIM_search
,
OMIM_get_entry
基因-疾病关联遗传方式、表型
ClinGen_gene_validity
分类状态基因-疾病有效性等级
COSMIC_search_mutations
体细胞变异(新增)癌症频率、组织学类型
DisGeNET_search_gene
基因-疾病关联(新增)证据评分、来源

2.1 COSMIC for Somatic Context (NEW)

2.1 体细胞变异的COSMIC背景(新增)

For cancer variants, check COSMIC for somatic mutation frequency:
python
def get_somatic_context(tu, gene_symbol, variant_aa):
    """Get somatic mutation context from COSMIC."""
    
    # Search for specific mutation
    cosmic = tu.tools.COSMIC_search_mutations(
        operation="search",
        terms=f"{gene_symbol} {variant_aa}",
        max_results=20,
        genome_build=38
    )
    
    # Get all gene mutations for context
    gene_mutations = tu.tools.COSMIC_get_mutations_by_gene(
        operation="get_by_gene",
        gene=gene_symbol,
        max_results=100
    )
    
    # Determine if it's a hotspot
    mutation_counts = Counter(m['MutationAA'] for m in gene_mutations.get('results', []))
    is_hotspot = variant_aa in [m[0] for m in mutation_counts.most_common(10)]
    
    return {
        'cosmic_hits': cosmic.get('results', []),
        'is_somatic_hotspot': is_hotspot,
        'cancer_types': [m['PrimarySite'] for m in cosmic.get('results', [])],
        'total_cosmic_count': cosmic.get('total_count', 0)
    }
针对癌症变异,通过COSMIC查询体细胞变异频率:
python
def get_somatic_context(tu, gene_symbol, variant_aa):
    """Get somatic mutation context from COSMIC."""
    
    # Search for specific mutation
    cosmic = tu.tools.COSMIC_search_mutations(
        operation="search",
        terms=f"{gene_symbol} {variant_aa}",
        max_results=20,
        genome_build=38
    )
    
    # Get all gene mutations for context
    gene_mutations = tu.tools.COSMIC_get_mutations_by_gene(
        operation="get_by_gene",
        gene=gene_symbol,
        max_results=100
    )
    
    # Determine if it's a hotspot
    mutation_counts = Counter(m['MutationAA'] for m in gene_mutations.get('results', []))
    is_hotspot = variant_aa in [m[0] for m in mutation_counts.most_common(10)]
    
    return {
        'cosmic_hits': cosmic.get('results', []),
        'is_somatic_hotspot': is_hotspot,
        'cancer_types': [m['PrimarySite'] for m in cosmic.get('results', [])],
        'total_cosmic_count': cosmic.get('total_count', 0)
    }

2.2 OMIM Gene-Disease Context (NEW)

2.2 OMIM基因-疾病背景(新增)

python
def get_omim_context(tu, gene_symbol):
    """Get OMIM gene-disease associations."""
    
    # Search OMIM for gene
    search = tu.tools.OMIM_search(
        operation="search",
        query=gene_symbol,
        limit=5
    )
    
    omim_data = []
    for entry in search.get('data', {}).get('entries', []):
        mim = entry.get('mimNumber')
        
        # Get detailed entry
        details = tu.tools.OMIM_get_entry(
            operation="get_entry",
            mim_number=str(mim)
        )
        
        # Get clinical synopsis
        synopsis = tu.tools.OMIM_get_clinical_synopsis(
            operation="get_clinical_synopsis",
            mim_number=str(mim)
        )
        
        omim_data.append({
            'mim_number': mim,
            'title': details.get('data', {}).get('titles', {}),
            'inheritance': synopsis.get('data', {}).get('inheritance'),
            'clinical_features': synopsis.get('data', {})
        })
    
    return omim_data
python
def get_omim_context(tu, gene_symbol):
    """Get OMIM gene-disease associations."""
    
    # Search OMIM for gene
    search = tu.tools.OMIM_search(
        operation="search",
        query=gene_symbol,
        limit=5
    )
    
    omim_data = []
    for entry in search.get('data', {}).get('entries', []):
        mim = entry.get('mimNumber')
        
        # Get detailed entry
        details = tu.tools.OMIM_get_entry(
            operation="get_entry",
            mim_number=str(mim)
        )
        
        # Get clinical synopsis
        synopsis = tu.tools.OMIM_get_clinical_synopsis(
            operation="get_clinical_synopsis",
            mim_number=str(mim)
        )
        
        omim_data.append({
            'mim_number': mim,
            'title': details.get('data', {}).get('titles', {}),
            'inheritance': synopsis.get('data', {}).get('inheritance'),
            'clinical_features': synopsis.get('data', {})
        })
    
    return omim_data

2.3 DisGeNET Gene-Disease Evidence (NEW)

2.3 DisGeNET基因-疾病证据(新增)

python
def get_disgenet_context(tu, gene_symbol, variant_rsid=None):
    """Get gene-disease associations from DisGeNET."""
    
    # Gene-disease associations
    gda = tu.tools.DisGeNET_search_gene(
        operation="search_gene",
        gene=gene_symbol,
        limit=20
    )
    
    # Variant-disease associations (if rsID available)
    vda = None
    if variant_rsid:
        vda = tu.tools.DisGeNET_get_vda(
            operation="get_vda",
            variant=variant_rsid,
            limit=20
        )
    
    return {
        'gene_associations': gda.get('data', {}).get('associations', []),
        'variant_associations': vda.get('data', {}).get('associations', []) if vda else []
    }
python
def get_disgenet_context(tu, gene_symbol, variant_rsid=None):
    """Get gene-disease associations from DisGeNET."""
    
    # Gene-disease associations
    gda = tu.tools.DisGeNET_search_gene(
        operation="search_gene",
        gene=gene_symbol,
        limit=20
    )
    
    # Variant-disease associations (if rsID available)
    vda = None
    if variant_rsid:
        vda = tu.tools.DisGeNET_get_vda(
            operation="get_vda",
            variant=variant_rsid,
            limit=20
        )
    
    return {
        'gene_associations': gda.get('data', {}).get('associations', []),
        'variant_associations': vda.get('data', {}).get('associations', []) if vda else []
    }

2.4 ClinGen Gene Validity & Dosage Sensitivity (NEW)

2.4 ClinGen基因有效性与剂量敏感性(新增)

ClinGen provides authoritative curation of gene-disease relationships:
python
def get_clingen_evidence(tu, gene_symbol):
    """
    Get ClinGen gene validity and dosage sensitivity data.
    CRITICAL for ACMG classification - establishes gene-disease validity.
    """
    
    # 1. Gene-disease validity (Definitive/Strong/Moderate/Limited)
    validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
    
    validity_data = []
    if validity.get('data'):
        for entry in validity.get('data', []):
            validity_data.append({
                'disease': entry.get('Disease Label'),
                'classification': entry.get('Classification'),  # Definitive, Strong, etc.
                'inheritance': entry.get('Inheritance'),
                'mondo_id': entry.get('Disease ID (MONDO)')
            })
    
    # 2. Dosage sensitivity (haploinsufficiency, triplosensitivity)
    dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
    
    dosage_data = {}
    if dosage.get('data'):
        for entry in dosage.get('data', []):
            dosage_data = {
                'haploinsufficiency_score': entry.get('Haploinsufficiency Score'),
                'triplosensitivity_score': entry.get('Triplosensitivity Score'),
                'disease': entry.get('Disease')
            }
            break  # Usually one entry per gene
    
    # 3. Clinical actionability (for incidental findings context)
    actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
    
    return {
        'gene_validity': validity_data,
        'dosage_sensitivity': dosage_data,
        'actionability': actionability.get('data', {}),
        'has_definitive_validity': any(v['classification'] == 'Definitive' for v in validity_data),
        'is_haploinsufficient': dosage_data.get('haploinsufficiency_score') == '3'
    }
ClinGen Validity Levels (for ACMG PM1/PP4):
ClassificationMeaningACMG Impact
DefinitiveMultiple concordant studiesStrong gene-disease support
StrongExtensive evidenceModerate-strong support
ModerateSome evidenceModerate support
LimitedMinimal evidenceWeak support, use caution
DisputedConflicting evidenceDo not use for classification
RefutedEvidence againstGene NOT associated
Dosage Sensitivity Scores (for CNV interpretation):
ScoreMeaningInterpretation
3Sufficient evidenceHaploinsufficiency/triplosensitivity established
2Emerging evidenceSome support, not definitive
1Little evidenceMinimal support
0No evidenceUnknown
ClinGen提供权威的基因-疾病关系分类:
python
def get_clingen_evidence(tu, gene_symbol):
    """
    Get ClinGen gene validity and dosage sensitivity data.
    CRITICAL for ACMG classification - establishes gene-disease validity.
    """
    
    # 1. Gene-disease validity (Definitive/Strong/Moderate/Limited)
    validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
    
    validity_data = []
    if validity.get('data'):
        for entry in validity.get('data', []):
            validity_data.append({
                'disease': entry.get('Disease Label'),
                'classification': entry.get('Classification'),  # Definitive, Strong, etc.
                'inheritance': entry.get('Inheritance'),
                'mondo_id': entry.get('Disease ID (MONDO)')
            })
    
    # 2. Dosage sensitivity (haploinsufficiency, triplosensitivity)
    dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
    
    dosage_data = {}
    if dosage.get('data'):
        for entry in dosage.get('data', []):
            dosage_data = {
                'haploinsufficiency_score': entry.get('Haploinsufficiency Score'),
                'triplosensitivity_score': entry.get('Triplosensitivity Score'),
                'disease': entry.get('Disease')
            }
            break  # Usually one entry per gene
    
    # 3. Clinical actionability (for incidental findings context)
    actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
    
    return {
        'gene_validity': validity_data,
        'dosage_sensitivity': dosage_data,
        'actionability': actionability.get('data', {}),
        'has_definitive_validity': any(v['classification'] == 'Definitive' for v in validity_data),
        'is_haploinsufficient': dosage_data.get('haploinsufficiency_score') == '3'
    }
ClinGen有效性等级(用于ACMG PM1/PP4):
分类含义ACMG影响
Definitive多项一致研究支持强基因-疾病关联
Strong大量证据支持中-强关联
Moderate部分证据支持中等关联
Limited少量证据支持弱关联,谨慎使用
Disputed证据冲突不用于分类
Refuted反向证据基因与疾病无关
剂量敏感性评分(用于CNV解读):
评分含义解读
3充分证据单倍剂量不足/三倍体敏感性已确立
2新兴证据部分支持,非确定性
1少量证据minimal支持
0无证据未知

2.5 SpliceAI Splice Variant Prediction (NEW)

2.5 SpliceAI剪接变异预测(新增)

~15% of pathogenic variants affect splicing. SpliceAI is the gold standard for splice prediction:
python
def get_spliceai_prediction(tu, chrom, pos, ref, alt, genome="38"):
    """
    Get SpliceAI splice effect predictions.
    
    Delta scores:
    - DS_AG: Acceptor gain
    - DS_AL: Acceptor loss  
    - DS_DG: Donor gain
    - DS_DL: Donor loss
    
    Thresholds:
    - ≥0.8: High pathogenicity (strong PP3)
    - 0.5-0.8: Moderate (supporting PP3)
    - 0.2-0.5: Low (weak evidence)
    - <0.2: Likely benign
    """
    
    # Format variant for SpliceAI
    variant = f"chr{chrom}-{pos}-{ref}-{alt}"
    
    # Get full splice predictions
    result = tu.tools.SpliceAI_predict_splice(
        variant=variant,
        genome=genome
    )
    
    if result.get('data'):
        max_score = result['data'].get('max_delta_score', 0)
        interpretation = result['data'].get('interpretation', '')
        
        # Determine ACMG support
        if max_score >= 0.8:
            acmg = 'PP3 (strong) - high splice impact'
        elif max_score >= 0.5:
            acmg = 'PP3 (supporting) - moderate splice impact'
        elif max_score >= 0.2:
            acmg = 'PP3 (weak) - possible splice impact'
        else:
            acmg = 'BP7 (if synonymous) - splice benign'
        
        return {
            'max_delta_score': max_score,
            'interpretation': interpretation,
            'acmg_support': acmg,
            'scores': result['data'].get('scores', [])
        }
    return None

def quick_splice_check(tu, variant, genome="38"):
    """Quick triage using max delta score only."""
    
    result = tu.tools.SpliceAI_get_max_delta(
        variant=variant,
        genome=genome
    )
    
    return result.get('data', {})
When to Use SpliceAI:
  • Intronic variants near splice sites (±50bp)
  • Synonymous variants (may still affect splicing)
  • Exonic variants near splice junctions
  • Variants creating cryptic splice sites
Report Section for Splice Variants:
markdown
undefined
约15%的致病性变异影响剪接。SpliceAI是剪接预测的金标准:
python
def get_spliceai_prediction(tu, chrom, pos, ref, alt, genome="38"):
    """
    Get SpliceAI splice effect predictions.
    
    Delta scores:
    - DS_AG: Acceptor gain
    - DS_AL: Acceptor loss  
    - DS_DG: Donor gain
    - DS_DL: Donor loss
    
    Thresholds:
    - ≥0.8: High pathogenicity (strong PP3)
    - 0.5-0.8: Moderate (supporting PP3)
    - 0.2-0.5: Low (weak evidence)
    - <0.2: Likely benign
    """
    
    # Format variant for SpliceAI
    variant = f"chr{chrom}-{pos}-{ref}-{alt}"
    
    # Get full splice predictions
    result = tu.tools.SpliceAI_predict_splice(
        variant=variant,
        genome=genome
    )
    
    if result.get('data'):
        max_score = result['data'].get('max_delta_score', 0)
        interpretation = result['data'].get('interpretation', '')
        
        # Determine ACMG support
        if max_score >= 0.8:
            acmg = 'PP3 (strong) - high splice impact'
        elif max_score >= 0.5:
            acmg = 'PP3 (supporting) - moderate splice impact'
        elif max_score >= 0.2:
            acmg = 'PP3 (weak) - possible splice impact'
        else:
            acmg = 'BP7 (if synonymous) - splice benign'
        
        return {
            'max_delta_score': max_score,
            'interpretation': interpretation,
            'acmg_support': acmg,
            'scores': result['data'].get('scores', [])
        }
    return None

def quick_splice_check(tu, variant, genome="38"):
    """Quick triage using max delta score only."""
    
    result = tu.tools.SpliceAI_get_max_delta(
        variant=variant,
        genome=genome
    )
    
    return result.get('data', {})
SpliceAI适用场景:
  • 剪接位点附近的内含子变异(±50bp)
  • 同义变异(仍可能影响剪接)
  • 剪接 junction附近的外显子变异
  • 产生隐蔽剪接位点的变异
剪接变异报告示例:
markdown
undefined

Splice Impact Analysis (SpliceAI)

剪接影响分析(SpliceAI)

Score TypeValuePositionInterpretation
DS_AG0.02+15Acceptor gain unlikely
DS_AL0.85-2High acceptor loss
DS_DG0.01+8Donor gain unlikely
DS_DL0.03+1Donor loss unlikely
Max Delta Score: 0.85 (DS_AL) Interpretation: High impact - likely disrupts acceptor site ACMG Support: PP3 (strong) for splice-altering effect
Source: SpliceAI via
SpliceAI_predict_splice

**ClinVar Classification Map**:
| ClinVar | Interpretation |
|---------|----------------|
| Pathogenic | Disease-causing |
| Likely pathogenic | 90%+ confidence pathogenic |
| VUS | Uncertain significance |
| Likely benign | 90%+ confidence benign |
| Benign | Not disease-causing |
| Conflicting | Multiple interpretations |

**gnomAD Thresholds (for rare disease)**:
| Frequency | ACMG Code | Interpretation |
|-----------|-----------|----------------|
| Absent | PM2_Supporting | Absent from controls |
| <0.00001 | PM2_Supporting | Extremely rare |
| <0.0001 | - | Rare (use with caution) |
| >0.01 | BS1/BA1 | Too common for rare disease |

**COSMIC Somatic Evidence (NEW)**:
| COSMIC Finding | Interpretation | ACMG Support |
|----------------|----------------|--------------|
| Recurrent hotspot (>100 samples) | Known oncogenic driver | PS3 (functional) |
| Moderate frequency (10-100) | Likely oncogenic | PM1 (hotspot) |
| Rare somatic (<10) | Unknown significance | No support |

**DisGeNET Score Interpretation (NEW)**:
| GDA Score | Evidence Level | ACMG Support |
|-----------|----------------|--------------|
| >0.7 | Strong | PP4 (phenotype) |
| 0.4-0.7 | Moderate | Supporting |
| <0.4 | Weak | Insufficient |
评分类型数值位置解读
DS_AG0.02+15不太可能产生新的受体位点
DS_AL0.85-2高受体位点丢失风险
DS_DG0.01+8不太可能产生新的供体位点
DS_DL0.03+1不太可能丢失供体位点
最大Delta评分: 0.85 (DS_AL) 解读: 高影响 - 可能破坏受体位点 ACMG支持: PP3(强)剪接改变效应
来源: SpliceAI via
SpliceAI_predict_splice

**ClinVar分类映射**:
| ClinVar分类 | 解读 |
|---------|----------------|
| Pathogenic | 致病 |
| Likely pathogenic | 90%+置信度致病 |
| VUS | 意义未明 |
| Likely benign | 90%+置信度良性 |
| Benign | 非致病 |
| Conflicting | 多种解读结果 |

**gnomAD阈值(罕见病)**:
| 频率 | ACMG代码 | 解读 |
|-----------|-----------|----------------|
| 未检出 | PM2_Supporting | 对照人群中未发现 |
| <0.00001 | PM2_Supporting | 极罕见 |
| <0.0001 | - | 罕见(谨慎使用) |
| >0.01 | BS1/BA1 | 过于常见,不符合罕见病特征 |

**COSMIC体细胞证据(新增)**:
| COSMIC发现 | 解读 | ACMG支持 |
|----------------|----------------|--------------|
| 反复出现的热点变异(>100样本) | 已知致癌驱动因子 | PS3(功能证据) |
| 中等频率(10-100) | 可能致癌 | PM1(热点变异) |
| 罕见体细胞变异(<10) | 意义未明 | 无支持 |

**DisGeNET评分解读(新增)**:
| GDA评分 | 证据等级 | ACMG支持 |
|-----------|----------------|--------------|
| >0.7 | 强 | PP4(表型关联) |
| 0.4-0.7 | 中等 | 支持性证据 |
| <0.4 | 弱 | 证据不足 |

Phase 2.5: Regulatory Context (NEW - for Non-Coding Variants)

阶段2.5:调控背景(新增 - 针对非编码变异)

Goal: Assess regulatory impact for non-coding, intronic, and promoter variants
When to Apply:
  • Intronic variants (not splice site)
  • Promoter variants
  • 5'UTR / 3'UTR variants
  • Intergenic variants near disease genes
Tools:
ToolPurposeKey Data
ChIPAtlas_enrichment_analysis
TF binding at positionBound TFs, cell types
ChIPAtlas_get_peak_data
ChIP-seq peaksPeak coordinates, scores
ENCODE_search_experiments
Regulatory elementsEnhancers, promoters, DHS
ENCODE_get_experiment
Experiment detailsAssay type, targets
Regulatory Impact Assessment:
python
def assess_regulatory_impact(tu, variant_position, gene_symbol):
    """Assess regulatory impact of non-coding variant."""
    
    # Check TF binding at position
    tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
        gene=gene_symbol,
        cell_type="all"
    )
    
    # Get ChIP-seq peaks overlapping variant
    peaks = tu.tools.ChIPAtlas_get_peak_data(
        gene=gene_symbol,
        experiment_type="TF"
    )
    
    # Search ENCODE for regulatory annotations
    encode_data = tu.tools.ENCODE_search_experiments(
        assay_title="ATAC-seq",
        biosample="all"
    )
    
    # Assess if variant disrupts TF binding
    binding_disrupted = check_motif_disruption(variant_position, peaks)
    
    return {
        'tf_binding': tf_binding,
        'regulatory_peaks': peaks,
        'encode_annotations': encode_data,
        'likely_regulatory': binding_disrupted
    }
Regulatory Impact Categories:
CategoryCriteriaACMG Support
High impactDisrupts known TF binding motifPP3 (supporting)
Moderate impactIn active regulatory regionConsider context
Low impactNo regulatory annotationNo support
Output for Report:
markdown
undefined
目标:评估非编码、内含子和启动子变异的调控影响
适用场景:
  • 内含子变异(非剪接位点)
  • 启动子变异
  • 5'UTR / 3'UTR变异
  • 疾病基因附近的基因间变异
工具:
工具用途关键数据
ChIPAtlas_enrichment_analysis
位点的转录因子结合情况结合的转录因子、细胞类型
ChIPAtlas_get_peak_data
ChIP-seq峰峰坐标、评分
ENCODE_search_experiments
调控元件增强子、启动子、DNase I超敏位点
ENCODE_get_experiment
实验详情检测类型、靶点
调控影响评估:
python
def assess_regulatory_impact(tu, variant_position, gene_symbol):
    """Assess regulatory impact of non-coding variant."""
    
    # Check TF binding at position
    tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
        gene=gene_symbol,
        cell_type="all"
    )
    
    # Get ChIP-seq peaks overlapping variant
    peaks = tu.tools.ChIPAtlas_get_peak_data(
        gene=gene_symbol,
        experiment_type="TF"
    )
    
    # Search ENCODE for regulatory annotations
    encode_data = tu.tools.ENCODE_search_experiments(
        assay_title="ATAC-seq",
        biosample="all"
    )
    
    # Assess if variant disrupts TF binding
    binding_disrupted = check_motif_disruption(variant_position, peaks)
    
    return {
        'tf_binding': tf_binding,
        'regulatory_peaks': peaks,
        'encode_annotations': encode_data,
        'likely_regulatory': binding_disrupted
    }
调控影响分类:
分类标准ACMG支持
高影响破坏已知转录因子结合基序PP3(支持性)
中影响位于活性调控区域需结合上下文判断
低影响无调控注释无支持
报告输出示例:
markdown
undefined

2.5 Regulatory Context (for Non-Coding Variants)

2.5 调控背景(针对非编码变异)

FeatureFindingSignificance
Variant locationIntron 5, 120bp from exon 6Not canonical splice
TF binding siteCTCF binding peak (ChIPAtlas)May affect insulation
ENCODE annotationActive enhancer (H3K27ac)Regulatory function
ConservationPhyloP = 2.8Moderate conservation
Regulatory Interpretation: Variant overlaps CTCF binding site in active enhancer region. Potential impact on gene regulation.
Source: ChIPAtlas, ENCODE
undefined
特征发现显著性
变异位置第5内含子,距第6外显子120bp非经典剪接位点
转录因子结合位点CTCF结合峰(ChIPAtlas)可能影响绝缘作用
ENCODE注释活性增强子(H3K27ac)具有调控功能
保守性PhyloP = 2.8中等保守
调控解读: 变异位于CTCF结合位点及活性增强子区域,可能影响基因调控。
来源: ChIPAtlas, ENCODE
undefined

Phase 3: Computational Predictions (ENHANCED)

阶段3:计算预测(增强版)

Goal: Assess in silico pathogenicity predictions using state-of-the-art models
Tools:
ToolPurposeScore Range
CADD_get_variant_score
Deleteriousness score (NEW API)PHRED 0-99
AlphaMissense_get_variant_score
DeepMind pathogenicity (NEW)0-1
EVE_get_variant_score
Evolutionary pathogenicity (NEW)0-1
myvariant_query
Aggregated predictionsSIFT, PolyPhen
Ensembl_get_variant_info
VEP predictionsSIFT, PolyPhen
目标:利用最先进的模型评估致病性预测结果
工具:
工具用途评分范围
CADD_get_variant_score
有害性评分(新增API)PHRED 0-99
AlphaMissense_get_variant_score
DeepMind致病性预测(新增)0-1
EVE_get_variant_score
进化致病性预测(新增)0-1
myvariant_query
整合预测结果SIFT、PolyPhen
Ensembl_get_variant_info
VEP预测结果SIFT、PolyPhen

3.1 CADD Deleteriousness Scoring (NEW)

3.1 CADD有害性评分(新增)

python
def get_cadd_score(tu, chrom, pos, ref, alt):
    """Get CADD deleteriousness score for a variant."""
    
    result = tu.tools.CADD_get_variant_score(
        chrom=str(chrom),
        pos=pos,
        ref=ref,
        alt=alt,
        version="GRCh38-v1.7"
    )
    
    if result.get('status') == 'success':
        phred = result['data'].get('phred_score')
        return {
            'score': phred,
            'interpretation': result['data'].get('interpretation'),
            'acmg_support': 'PP3' if phred >= 20 else ('BP4' if phred < 15 else 'neutral')
        }
    return None
python
def get_cadd_score(tu, chrom, pos, ref, alt):
    """Get CADD deleteriousness score for a variant."""
    
    result = tu.tools.CADD_get_variant_score(
        chrom=str(chrom),
        pos=pos,
        ref=ref,
        alt=alt,
        version="GRCh38-v1.7"
    )
    
    if result.get('status') == 'success':
        phred = result['data'].get('phred_score')
        return {
            'score': phred,
            'interpretation': result['data'].get('interpretation'),
            'acmg_support': 'PP3' if phred >= 20 else ('BP4' if phred < 15 else 'neutral')
        }
    return None

3.2 AlphaMissense Pathogenicity (NEW)

3.2 AlphaMissense致病性预测(新增)

DeepMind's AlphaMissense provides state-of-the-art missense pathogenicity prediction:
python
def get_alphamissense_score(tu, uniprot_id, variant):
    """
    Get AlphaMissense pathogenicity score.
    variant format: 'R123H' or 'p.R123H'
    
    Thresholds:
    - Pathogenic: score > 0.564
    - Ambiguous: 0.34-0.564
    - Benign: score < 0.34
    """
    
    result = tu.tools.AlphaMissense_get_variant_score(
        uniprot_id=uniprot_id,
        variant=variant
    )
    
    if result.get('status') == 'success' and result.get('data'):
        score = result['data'].get('pathogenicity_score')
        classification = result['data'].get('classification')
        
        # Map to ACMG
        if classification == 'pathogenic':
            acmg = 'PP3 (strong)'  # AlphaMissense has high accuracy
        elif classification == 'benign':
            acmg = 'BP4 (strong)'
        else:
            acmg = 'neutral'
        
        return {
            'score': score,
            'classification': classification,
            'acmg_support': acmg
        }
    return None
DeepMind的AlphaMissense提供最先进的错义变异致病性预测:
python
def get_alphamissense_score(tu, uniprot_id, variant):
    """
    Get AlphaMissense pathogenicity score.
    variant format: 'R123H' or 'p.R123H'
    
    Thresholds:
    - Pathogenic: score > 0.564
    - Ambiguous: 0.34-0.564
    - Benign: score < 0.34
    """
    
    result = tu.tools.AlphaMissense_get_variant_score(
        uniprot_id=uniprot_id,
        variant=variant
    )
    
    if result.get('status') == 'success' and result.get('data'):
        score = result['data'].get('pathogenicity_score')
        classification = result['data'].get('classification')
        
        # Map to ACMG
        if classification == 'pathogenic':
            acmg = 'PP3 (strong)'  # AlphaMissense has high accuracy
        elif classification == 'benign':
            acmg = 'BP4 (strong)'
        else:
            acmg = 'neutral'
        
        return {
            'score': score,
            'classification': classification,
            'acmg_support': acmg
        }
    return None

3.3 EVE Evolutionary Prediction (NEW)

3.3 EVE进化预测(新增)

EVE uses unsupervised learning on evolutionary data:
python
def get_eve_score(tu, chrom, pos, ref, alt):
    """
    Get EVE evolutionary pathogenicity score.
    
    Threshold: >0.5 indicates likely pathogenic
    """
    
    result = tu.tools.EVE_get_variant_score(
        chrom=str(chrom),
        pos=pos,
        ref=ref,
        alt=alt
    )
    
    if result.get('status') == 'success':
        eve_scores = result['data'].get('eve_scores', [])
        if eve_scores:
            best_score = eve_scores[0]
            return {
                'score': best_score.get('eve_score'),
                'classification': best_score.get('classification'),
                'gene': best_score.get('gene_symbol'),
                'acmg_support': 'PP3' if best_score.get('eve_score', 0) > 0.5 else 'BP4'
            }
    return None
EVE利用无监督学习分析进化数据:
python
def get_eve_score(tu, chrom, pos, ref, alt):
    """
    Get EVE evolutionary pathogenicity score.
    
    Threshold: >0.5 indicates likely pathogenic
    """
    
    result = tu.tools.EVE_get_variant_score(
        chrom=str(chrom),
        pos=pos,
        ref=ref,
        alt=alt
    )
    
    if result.get('status') == 'success':
        eve_scores = result['data'].get('eve_scores', [])
        if eve_scores:
            best_score = eve_scores[0]
            return {
                'score': best_score.get('eve_score'),
                'classification': best_score.get('classification'),
                'gene': best_score.get('gene_symbol'),
                'acmg_support': 'PP3' if best_score.get('eve_score', 0) > 0.5 else 'BP4'
            }
    return None

3.4 Integrated Prediction Strategy

3.4 整合预测策略

For VUS (Variants of Uncertain Significance), combine multiple predictors:
python
def comprehensive_pathogenicity_assessment(tu, variant_info):
    """
    Combine all prediction tools for robust classification.
    """
    chrom = variant_info['chrom']
    pos = variant_info['pos']
    ref = variant_info['ref']
    alt = variant_info['alt']
    uniprot_id = variant_info.get('uniprot_id')
    aa_change = variant_info.get('aa_change')  # e.g., 'R123H'
    
    predictions = {}
    
    # 1. CADD (works for all variant types)
    cadd = get_cadd_score(tu, chrom, pos, ref, alt)
    if cadd:
        predictions['cadd'] = cadd
    
    # 2. AlphaMissense (missense only, requires UniProt ID)
    if uniprot_id and aa_change:
        am = get_alphamissense_score(tu, uniprot_id, aa_change)
        if am:
            predictions['alphamissense'] = am
    
    # 3. EVE (missense only)
    eve = get_eve_score(tu, chrom, pos, ref, alt)
    if eve:
        predictions['eve'] = eve
    
    # Consensus assessment
    damaging_count = sum(1 for p in predictions.values() 
                         if 'PP3' in p.get('acmg_support', ''))
    benign_count = sum(1 for p in predictions.values() 
                       if 'BP4' in p.get('acmg_support', ''))
    
    if damaging_count >= 2 and benign_count == 0:
        consensus = 'likely_damaging'
        acmg = 'PP3 (multiple predictors concordant)'
    elif benign_count >= 2 and damaging_count == 0:
        consensus = 'likely_benign'
        acmg = 'BP4 (multiple predictors concordant)'
    else:
        consensus = 'uncertain'
        acmg = 'neutral (discordant predictions)'
    
    return {
        'predictions': predictions,
        'consensus': consensus,
        'acmg_recommendation': acmg
    }
Prediction Interpretation (Updated):
PredictorDamagingBenign
AlphaMissense>0.564<0.34
CADD PHRED≥20 (top 1%)<15
EVE>0.5≤0.5
SIFT<0.05≥0.05
PolyPhen2>0.85 (probably)<0.15 (benign)
ACMG Application (Enhanced):
  • PP3: Multiple concordant damaging predictions (AlphaMissense + CADD + EVE agreement = strong PP3)
  • BP4: Multiple concordant benign predictions
  • Note: AlphaMissense alone achieves ~90% accuracy on ClinVar pathogenic variants
针对VUS(意义未明变异),结合多个预测工具:
python
def comprehensive_pathogenicity_assessment(tu, variant_info):
    """
    Combine all prediction tools for robust classification.
    """
    chrom = variant_info['chrom']
    pos = variant_info['pos']
    ref = variant_info['ref']
    alt = variant_info['alt']
    uniprot_id = variant_info.get('uniprot_id')
    aa_change = variant_info.get('aa_change')  # e.g., 'R123H'
    
    predictions = {}
    
    # 1. CADD (works for all variant types)
    cadd = get_cadd_score(tu, chrom, pos, ref, alt)
    if cadd:
        predictions['cadd'] = cadd
    
    # 2. AlphaMissense (missense only, requires UniProt ID)
    if uniprot_id and aa_change:
        am = get_alphamissense_score(tu, uniprot_id, aa_change)
        if am:
            predictions['alphamissense'] = am
    
    # 3. EVE (missense only)
    eve = get_eve_score(tu, chrom, pos, ref, alt)
    if eve:
        predictions['eve'] = eve
    
    # Consensus assessment
    damaging_count = sum(1 for p in predictions.values() 
                         if 'PP3' in p.get('acmg_support', ''))
    benign_count = sum(1 for p in predictions.values() 
                       if 'BP4' in p.get('acmg_support', ''))
    
    if damaging_count >= 2 and benign_count == 0:
        consensus = 'likely_damaging'
        acmg = 'PP3 (multiple predictors concordant)'
    elif benign_count >= 2 and damaging_count == 0:
        consensus = 'likely_benign'
        acmg = 'BP4 (multiple predictors concordant)'
    else:
        consensus = 'uncertain'
        acmg = 'neutral (discordant predictions)'
    
    return {
        'predictions': predictions,
        'consensus': consensus,
        'acmg_recommendation': acmg
    }
预测解读(更新):
预测工具致病性良性
AlphaMissense>0.564<0.34
CADD PHRED≥20(前1%)<15
EVE>0.5≤0.5
SIFT<0.05≥0.05
PolyPhen2>0.85(可能致病)<0.15(良性)
ACMG应用(增强版):
  • PP3: 多个一致的致病性预测结果(AlphaMissense + CADD + EVE一致 = 强PP3)
  • BP4: 多个一致的良性预测结果
  • 注意: AlphaMissense单独使用对ClinVar致病性变异的准确率约为90%

Phase 4: Structural Analysis

阶段4:结构分析

Goal: Assess protein structural impact (especially for VUS)
Tools:
ToolPurpose
PDB_search_by_uniprot
Find experimental structures
NvidiaNIM_alphafold2
Predict structure if no PDB
alphafold_get_prediction
Get AlphaFold DB structure
InterPro_get_protein_domains
Domain annotations
UniProt_get_protein_function
Functional sites
Structural Impact Categories:
Impact LevelDescriptionACMG Support
CriticalActive site, catalytic residuePM1 (strong)
HighBuried residue, disulfide, structural corePM1 (moderate)
ModerateDomain interface, binding sitePM1 (supporting)
LowSurface, flexible regionNo support
Using AlphaFold2 for VUS:
1. Get wildtype structure (PDB or AlphaFold)
2. Identify residue location:
   - pLDDT at position (confidence)
   - Solvent accessibility
   - Secondary structure
3. Assess structural context:
   - Distance to functional sites
   - Interaction partners
   - Conservation in structure
4. Predict impact:
   - Side chain burial
   - Hydrogen bond disruption
   - Charge changes in buried positions
目标:评估蛋白质结构影响(尤其针对VUS)
工具:
工具用途
PDB_search_by_uniprot
查找实验结构
NvidiaNIM_alphafold2
无PDB结构时预测结构
alphafold_get_prediction
获取AlphaFold DB结构
InterPro_get_protein_domains
结构域注释
UniProt_get_protein_function
功能位点
结构影响分类:
影响等级描述ACMG支持
Critical活性位点、催化残基PM1(强)
High埋藏残基、二硫键、结构核心PM1(中)
Moderate结构域界面、结合位点PM1(支持性)
Low表面、柔性区域无支持
AlphaFold2在VUS分析中的应用:
1. 获取野生型结构(PDB或AlphaFold)
2. 识别残基位置:
   - 该位置的pLDDT(置信度)
   - 溶剂可及性
   - 二级结构
3. 评估结构背景:
   - 与功能位点的距离
   - 相互作用伙伴
   - 结构中的保守性
4. 预测影响:
   - 侧链埋藏情况
   - 氢键破坏
   - 埋藏位置的电荷变化

Phase 4.5: Expression Context (NEW)

阶段4.5:表达背景(新增)

Goal: Validate gene expression in disease-relevant tissues/cells
Tools:
ToolPurposeKey Data
CELLxGENE_get_expression_data
Cell-type specific expressionTPM per cell type
CELLxGENE_get_cell_metadata
Cell type annotationsTissue, disease state
GTEx_get_median_gene_expression
Tissue expressionTPM per tissue
Expression Validation:
python
def validate_expression_context(tu, gene_symbol, phenotype_tissues):
    """Validate gene is expressed in phenotype-relevant tissues."""
    
    # Single-cell expression
    sc_expression = tu.tools.CELLxGENE_get_expression_data(
        gene=gene_symbol,
        tissue=phenotype_tissues[0] if phenotype_tissues else "all"
    )
    
    # Bulk tissue expression (GTEx)
    gtex = tu.tools.GTEx_get_median_gene_expression(
        gene=gene_symbol
    )
    
    # Check expression in relevant tissues
    relevant_expression = {
        tissue: gtex.get(tissue, 0)
        for tissue in phenotype_tissues
    }
    
    return {
        'single_cell': sc_expression,
        'gtex': relevant_expression,
        'expressed_in_phenotype_tissue': any(v > 1 for v in relevant_expression.values())
    }
Why it matters:
  • Confirms gene is expressed where disease manifests
  • Supports PP4 (phenotype-specific) if highly restricted expression
  • Can challenge classification if not expressed in affected tissue
Output for Report:
markdown
undefined
目标:验证基因在疾病相关组织/细胞中的表达情况
工具:
工具用途关键数据
CELLxGENE_get_expression_data
细胞类型特异性表达各细胞类型的TPM值
CELLxGENE_get_cell_metadata
细胞类型注释组织、疾病状态
GTEx_get_median_gene_expression
组织表达各组织的TPM值
表达验证:
python
def validate_expression_context(tu, gene_symbol, phenotype_tissues):
    """Validate gene is expressed in phenotype-relevant tissues."""
    
    # Single-cell expression
    sc_expression = tu.tools.CELLxGENE_get_expression_data(
        gene=gene_symbol,
        tissue=phenotype_tissues[0] if phenotype_tissues else "all"
    )
    
    # Bulk tissue expression (GTEx)
    gtex = tu.tools.GTEx_get_median_gene_expression(
        gene=gene_symbol
    )
    
    # Check expression in relevant tissues
    relevant_expression = {
        tissue: gtex.get(tissue, 0)
        for tissue in phenotype_tissues
    }
    
    return {
        'single_cell': sc_expression,
        'gtex': relevant_expression,
        'expressed_in_phenotype_tissue': any(v > 1 for v in relevant_expression.values())
    }
重要性:
  • 确认基因在疾病发生组织中表达
  • 若表达高度受限,支持PP4(表型特异性)
  • 若在受影响组织中不表达,可能质疑分类结果
报告输出示例:
markdown
undefined

4.5 Expression Context

4.5 表达背景

TissueExpression (TPM)Relevance
Heart45.2✓ Primary disease tissue
Skeletal muscle38.7✓ Secondary involvement
Liver2.1Low expression
Brain0.5Not expressed
Single-Cell Analysis (CELLxGENE):
  • Cardiomyocytes: High expression (TPM=85)
  • Cardiac fibroblasts: Low expression (TPM=5)
Interpretation: Gene highly expressed in cardiomyocytes, supporting cardiac phenotype association.
Source: GTEx, CELLxGENE Census
undefined
组织表达量(TPM)相关性
心脏45.2✓ 主要疾病组织
骨骼肌38.7✓ 次要受累组织
肝脏2.1低表达
0.5不表达
单细胞分析(CELLxGENE):
  • 心肌细胞: 高表达(TPM=85)
  • 心脏成纤维细胞: 低表达(TPM=5)
解读: 基因在心肌细胞中高表达,支持与心脏表型的关联。
Source: GTEx, CELLxGENE Census
undefined

Phase 5: Literature Evidence (ENHANCED)

阶段5:文献证据(增强版)

Goal: Find functional studies, case reports, and cutting-edge preprints
Tools:
ToolPurposeCoverage
PubMed_search
Peer-reviewed studiesComprehensive
EuropePMC_search
Additional literatureEurope PMC
BioRxiv_search_preprints
Biology preprintsRecent findings
MedRxiv_search_preprints
Clinical preprintsClinical studies
openalex_search_works
Citation analysisImpact metrics
SemanticScholar_search_papers
AI-ranked searchRelevance
Search Strategies:
python
def comprehensive_literature_search(tu, gene, variant, phenotype):
    """Search across all literature sources."""
    
    # 1. PubMed: Peer-reviewed
    pubmed = tu.tools.PubMed_search(
        query=f'"{gene}" AND ("{variant}" OR functional)',
        max_results=30
    )
    
    # 2. BioRxiv: Recent preprints
    biorxiv = tu.tools.BioRxiv_search_preprints(
        query=f"{gene} {phenotype}",
        limit=10
    )
    
    # 3. MedRxiv: Clinical preprints
    medrxiv = tu.tools.MedRxiv_search_preprints(
        query=f"{gene} variant {phenotype}",
        limit=10
    )
    
    # 4. Citation analysis
    key_papers = pubmed[:5]  # Top papers
    for paper in key_papers:
        citations = tu.tools.openalex_search_works(
            query=paper['title'],
            limit=1
        )
        paper['citation_count'] = citations[0].get('cited_by_count', 0) if citations else 0
    
    return {
        'pubmed': pubmed,
        'preprints': biorxiv + medrxiv,
        'key_papers_with_citations': key_papers
    }
Search Queries:
undefined
目标:查找功能研究、病例报告和前沿预印本
工具:
工具用途覆盖范围
PubMed_search
同行评审研究全面覆盖
EuropePMC_search
补充文献Europe PMC
BioRxiv_search_preprints
生物学预印本最新发现
MedRxiv_search_preprints
临床预印本临床研究
openalex_search_works
引用分析影响指标
SemanticScholar_search_papers
AI排序搜索相关性
搜索策略:
python
def comprehensive_literature_search(tu, gene, variant, phenotype):
    """Search across all literature sources."""
    
    # 1. PubMed: Peer-reviewed
    pubmed = tu.tools.PubMed_search(
        query=f'"{gene}" AND ("{variant}" OR functional)',
        max_results=30
    )
    
    # 2. BioRxiv: Recent preprints
    biorxiv = tu.tools.BioRxiv_search_preprints(
        query=f"{gene} {phenotype}",
        limit=10
    )
    
    # 3. MedRxiv: Clinical preprints
    medrxiv = tu.tools.MedRxiv_search_preprints(
        query=f"{gene} variant {phenotype}",
        limit=10
    )
    
    # 4. Citation analysis
    key_papers = pubmed[:5]  # Top papers
    for paper in key_papers:
        citations = tu.tools.openalex_search_works(
            query=paper['title'],
            limit=1
        )
        paper['citation_count'] = citations[0].get('cited_by_count', 0) if citations else 0
    
    return {
        'pubmed': pubmed,
        'preprints': biorxiv + medrxiv,
        'key_papers_with_citations': key_papers
    }
搜索查询示例:
undefined

Gene + variant specific

基因+变异特异性查询

"{GENE} AND ({HGVS_p} OR {AA_change})"
"{GENE} AND ({HGVS_p} OR {AA_change})"

Functional studies

功能研究查询

"{GENE} AND (functional OR functional study OR mutagenesis)"
"{GENE} AND (functional OR functional study OR mutagenesis)"

Clinical reports

临床报告查询

"{GENE} AND (case report OR patient) AND {phenotype}"
"{GENE} AND (case report OR patient) AND {phenotype}"

Preprint-specific

预印本特定查询

"{GENE} genetics 2024" (for recent preprints)

**⚠️ Preprint Warning**: Always flag preprints as NOT peer-reviewed in reports.

**Evidence Types**:
| Evidence | ACMG Code | Weight |
|----------|-----------|--------|
| Functional study (null) | PS3 | Strong |
| Functional study (reduced) | PS3_Moderate | Moderate |
| Case reports with segregation | PP1 | Supporting to Moderate |
| Co-occurrence with pathogenic | BP2 | Supporting against |
"{GENE} genetics 2024" (用于最新预印本)

**⚠️ 预印本提示**: 报告中需始终标注预印本为**未经过同行评审**。

**证据类型**:
| 证据 | ACMG代码 | 权重 |
|----------|-----------|--------|
| 功能研究(无效) | PS3 | 强 |
| 功能研究(功能降低) | PS3_Moderate | 中 |
| 带有分离数据的病例报告 | PP1 | 支持性到中等 |
| 与致病性变异共现 | BP2 | 反向支持 |

Phase 6: ACMG Classification

阶段6:ACMG分类

Goal: Systematic classification with explicit evidence
ACMG Evidence Codes:
Pathogenic:
CodeStrengthDescription
PVS1Very StrongNull variant in gene where LOF is mechanism
PS1StrongSame amino acid change as known pathogenic
PS3StrongWell-established functional studies
PM1ModerateMutational hot spot / functional domain
PM2ModerateAbsent from controls
PM5ModerateDifferent missense at same residue as pathogenic
PP3SupportingMultiple computational predictions
PP5SupportingReputable source reports pathogenic
Benign:
CodeStrengthDescription
BA1Stand-aloneMAF >5%
BS1StrongMAF greater than expected
BS3StrongFunctional studies show no effect
BP4SupportingMultiple computational predictions benign
BP7SupportingSynonymous with no splice impact
Classification Algorithm:
ClassificationEvidence Required
Pathogenic1 Very Strong + 1 Strong; OR 2 Strong; OR 1 Strong + 3 Moderate
Likely Pathogenic1 Very Strong + 1 Moderate; OR 1 Strong + 2 Moderate; OR 1 Strong + 2 Supporting
Likely Benign1 Strong + 1 Supporting; OR 2 Supporting
Benign1 Stand-alone; OR 2 Strong
VUSCriteria not met

目标:基于明确证据进行系统性分类
ACMG证据代码:
致病性:
代码强度描述
PVS1极强基因功能缺失为致病机制时的无义变异
PS1与已知致病性变异的氨基酸改变相同
PS3成熟的功能研究支持
PM1突变热点 / 功能结构域
PM2对照人群中未检出
PM5同一残基的不同错义变异为致病性
PP3支持性多个计算预测结果一致
PP5支持性权威来源报告为致病性
良性:
代码强度描述
BA1独立次要等位基因频率>5%
BS1频率高于预期
BS3功能研究显示无影响
BP4支持性多个计算预测结果为良性
BP7支持性同义变异且无剪接影响
分类算法:
分类所需证据
致病性1个极强 +1个强;或2个强;或1个强+3个中
可能致病性1个极强+1个中;或1个强+2个中;或1个强+2个支持性
可能良性1个强+1个支持性;或2个支持性
良性1个独立;或2个强
VUS未满足上述标准

Output Structure

输出结构

Report Sections

报告章节

markdown
undefined
markdown
undefined

Variant Interpretation Report: {GENE} {VARIANT}

变异解读报告: {GENE} {VARIANT}

Executive Summary

执行摘要

  • Variant: {HGVS notation}
  • Gene: {gene symbol}
  • Classification: {Pathogenic/Likely Pathogenic/VUS/Likely Benign/Benign}
  • Evidence Strength: {strong/moderate/limited}
  • Key Finding: {one-sentence summary}
  • 变异: {HGVS命名}
  • 基因: {基因符号}
  • 分类: {致病性/可能致病性/VUS/可能良性/良性}
  • 证据强度: {强/中/有限}
  • 关键发现: {一句话总结}

1. Variant Identity

1. 变异识别

{gene, transcript, protein change, consequence}
{基因、转录本、蛋白质变化、变异类型}

2. Population Data

2. 人群数据

{gnomAD frequencies, ancestry breakdown}
{gnomAD频率、祖先细分数据}

3. Clinical Database Evidence

3. 临床数据库证据

{ClinVar, ClinGen, OMIM}
{ClinVar、ClinGen、OMIM结果}

4. Computational Predictions

4. 计算预测结果

{SIFT, PolyPhen, CADD scores}
{SIFT、PolyPhen、CADD评分}

5. Structural Analysis

5. 结构分析

{Domain location, functional site proximity, AlphaFold confidence}
{结构域位置、功能位点距离、AlphaFold置信度}

6. Literature Evidence

6. 文献证据

{Functional studies, case reports}
{功能研究、病例报告}

7. ACMG Classification

7. ACMG分类

{Evidence codes applied, classification rationale}
{应用的证据代码、分类依据}

8. Clinical Recommendations

8. 临床建议

{Testing, management, family screening}
{检测、管理、家族筛查}

9. Limitations & Uncertainties

9. 局限性与不确定性

{Missing data, conflicting evidence}
{缺失数据、冲突证据}

Data Sources

数据来源

{All tools and databases queried}

---
{所有查询的工具和数据库}

---

Evidence Grading

证据分级

Classification Confidence

分类置信度

SymbolClassificationEvidence Level
★★★High confidenceMultiple independent lines
★★☆Moderate confidenceSome supporting evidence
★☆☆Limited confidenceMinimal evidence
VUSUncertainInsufficient data
符号分类证据等级
★★★高置信度多个独立证据链
★★☆中置信度部分支持性证据
★☆☆有限置信度少量证据
VUS意义未明数据不足

Structural Impact Confidence

结构影响置信度

pLDDT RangeInterpretation
>90Very high confidence in position
70-90High confidence
50-70Moderate (often loops)
<50Low confidence (disorder)

pLDDT范围解读
>90位置置信度极高
70-90高置信度
50-70中置信度(常为环区)
<50低置信度(无序区)

Special Scenarios

特殊场景

Scenario 1: Novel Missense VUS

场景1:新型错义VUS

Additional workflow:
  1. Check if other pathogenic variants at same residue
  2. Get AlphaFold2 structure
  3. Analyze:
    • Is residue buried or surface?
    • What secondary structure?
    • Proximity to active/binding sites?
    • Conservation across species?
  4. Apply PM1 if in functional domain
  5. Apply PP3 if predictions concordant
额外工作流:
  1. 检查同一残基是否存在其他致病性变异
  2. 获取AlphaFold2结构
  3. 分析:
    • 残基是埋藏还是表面?
    • 二级结构类型?
    • 与活性/结合位点的距离?
    • 跨物种保守性?
  4. 若位于功能结构域,应用PM1
  5. 若预测结果一致,应用PP3

Scenario 2: Truncating Variant

场景2:截短变异

Additional workflow:
  1. Check if LOF is mechanism for gene
  2. Determine if escapes NMD (last exon)
  3. Check for alternative isoforms
  4. Review ClinGen LOF curation
PVS1 Application:
ScenarioPVS1 Strength
Canonical LOF gene, NMD predictedVery Strong
LOF gene, last exonModerate
Non-LOF geneNot applicable
额外工作流:
  1. 检查基因的致病机制是否为功能缺失
  2. 判断是否逃逸无义介导的降解(最后一个外显子)
  3. 检查是否存在可变剪接体
  4. 回顾ClinGen的功能缺失分类
PVS1应用规则:
场景PVS1强度
标准功能缺失基因,预测会发生NMD极强
功能缺失基因,位于最后一个外显子
非功能缺失基因不适用

Scenario 3: Splice Variant

场景3:剪接变异

Additional workflow:
  1. Check SpliceAI scores (if available)
  2. Determine canonical splice site distance
  3. Review for in-frame skipping potential
  4. Check for cryptic splice activation

额外工作流:
  1. 检查SpliceAI评分(若可用)
  2. 确定与经典剪接位点的距离
  3. 评估框内跳跃的可能性
  4. 检查是否激活隐蔽剪接位点

Quantified Minimums

量化最低要求

SectionRequirement
Population frequencygnomAD overall + ≥3 ancestry groups
Predictions≥3 computational predictors
Literature search≥2 search strategies
ACMG codesAll applicable codes listed

章节要求
人群频率gnomAD总频率 + ≥3个祖先群体数据
预测结果≥3个计算预测工具
文献搜索≥2种搜索策略
ACMG代码列出所有适用代码

NVIDIA NIM Integration

NVIDIA NIM集成

When to Use AlphaFold2 for Variants

AlphaFold2在变异分析中的适用场景

Use Case: VUS missense variants where structural context aids interpretation
Workflow:
python
undefined
适用场景:结构背景有助于解读的VUS错义变异
工作流:
python
undefined

1. Get protein sequence

1. 获取蛋白质序列

protein_seq = tu.tools.UniProt_get_protein_sequence(accession=uniprot_id)
protein_seq = tu.tools.UniProt_get_protein_sequence(accession=uniprot_id)

2. Get/predict structure

2. 获取/预测结构

try: pdb_hits = tu.tools.PDB_search_by_uniprot(uniprot_id=uniprot_id) structure = tu.tools.PDB_get_structure(pdb_id=pdb_hits[0]['pdb_id']) except: # Predict with AlphaFold2 structure = tu.tools.NvidiaNIM_alphafold2( sequence=protein_seq['sequence'], algorithm="mmseqs2" )
try: pdb_hits = tu.tools.PDB_search_by_uniprot(uniprot_id=uniprot_id) structure = tu.tools.PDB_get_structure(pdb_id=pdb_hits[0]['pdb_id']) except: # Predict with AlphaFold2 structure = tu.tools.NvidiaNIM_alphafold2( sequence=protein_seq['sequence'], algorithm="mmseqs2" )

3. Analyze variant position

3. 分析变异位置

- Extract pLDDT at residue position

- 提取残基位置的pLDDT

- Calculate solvent accessibility

- 计算溶剂可及性

- Check for nearby functional sites

- 检查附近的功能位点


**Structural Features to Report**:
- pLDDT at variant position
- Secondary structure (helix/sheet/coil)
- Solvent accessibility (buried/exposed)
- Distance to active site (if applicable)
- Interactions disrupted (H-bonds, salt bridges)

---

**需报告的结构特征**:
- 变异位置的pLDDT
- 二级结构(螺旋/片层/卷曲)
- 溶剂可及性(埋藏/暴露)
- 与活性位点的距离(若适用)
- 被破坏的相互作用(氢键、盐桥)

---

Report File Naming

报告文件命名规则

{GENE}_{VARIANT}_interpretation_report.md

Examples:
BRCA1_c.5266dupC_interpretation_report.md
TP53_p.R273H_interpretation_report.md

{GENE}_{VARIANT}_interpretation_report.md

示例:
BRCA1_c.5266dupC_interpretation_report.md
TP53_p.R273H_interpretation_report.md

Clinical Recommendations Framework

临床建议框架

For Pathogenic/Likely Pathogenic

致病性/可能致病性变异

Disease ContextRecommendations
Cancer predispositionEnhanced screening, risk-reducing options
PharmacogenomicsDrug dosing adjustment
Carrier statusReproductive counseling
Predictive testingFamily cascade screening
疾病背景建议
癌症易感加强筛查、风险降低方案
药物基因组学调整药物剂量
携带者状态生殖咨询
预测性检测家族级联筛查

For VUS

VUS

ActionDetails
Clinical managementDo not use for medical decisions
Follow-upReinterpret in 1-2 years
ResearchFunctional studies if available
FamilySegregation data valuable
行动详情
临床管理不用于医疗决策
随访1-2年后重新解读
研究若有条件进行功能研究
家族分离数据具有价值

For Benign/Likely Benign

良性/可能良性变异

ActionDetails
ClinicalNot expected to cause disease
FamilyNo cascade testing needed
DocumentationInclude in report for completeness

行动详情
临床预期不会致病
家族无需级联检测
文档纳入报告以保证完整性

See Also

参考文档

  • CHECKLIST.md
    - Pre-delivery verification
  • EXAMPLES.md
    - Sample interpretations
  • TOOLS_REFERENCE.md
    - Tool parameters and fallbacks
  • CHECKLIST.md
    - 交付前验证清单
  • EXAMPLES.md
    - 解读示例
  • TOOLS_REFERENCE.md
    - 工具参数与备选方案