tooluniverse-variant-interpretation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

name: tooluniverse-variant-interpretation description: Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.

name: tooluniverse-variant-interpretation description: 从原始变异检测结果到符合ACMG分类建议的系统性临床变异解读，包含结构影响分析。整合ClinVar、gnomAD、CIViC、UniProt和PDB等数据库的证据，匹配ACMG标准。生成致病性评分（0-100）、临床建议和治疗指导。适用于基因变异解读、意义未明变异（VUS）分类、ACMG变异分类，以及将变异检测结果转化为临床可执行方案的场景。

Clinical Variant Interpreter

临床变异解读工具

Systematic variant interpretation skill using ToolUniverse - from raw variant calls to ACMG-classified clinical recommendations with structural impact analysis.

基于ToolUniverse的系统性变异解读工具——从原始变异检测结果到符合ACMG分类的临床建议，包含结构影响分析。

Problem This Skill Solves

本工具解决的问题

Clinical labs and researchers face critical challenges in variant interpretation:

Variant classification uncertainty - VUS (Variants of Uncertain Significance) comprise 40-60% of clinical variants
Evidence aggregation burden - Must integrate data from 10+ databases per variant
Structural context missing - Traditional annotation ignores 3D protein impact
Clinical actionability unclear - How does classification translate to patient care?

This skill provides: A systematic workflow that combines population databases, functional predictions, structural analysis (via AlphaFold2), and literature evidence into ACMG-compliant interpretations with clear clinical recommendations.

临床实验室和研究人员在变异解读中面临以下关键挑战：

变异分类不确定性 - 意义未明变异（VUS）占临床变异的40-60%
证据整合负担 - 每个变异需整合10余个数据库的数据
结构信息缺失 - 传统注释忽略蛋白质3D结构影响
临床可执行性不明确 - 分类结果如何转化为患者护理方案？

本工具提供：一套系统性工作流，整合人群数据库、功能预测、结构分析（基于AlphaFold2）和文献证据，生成符合ACMG标准的解读结果及明确的临床建议。

Key Principles

核心原则

ACMG-Guided Classification - Follow ACMG/AMP 2015 guidelines with explicit evidence codes
Structural Evidence Integration - Use AlphaFold2 for novel structural impact analysis
Population Context - gnomAD frequencies with ancestry-specific data
Gene-Disease Validity - ClinGen curation status for clinical relevance
Actionable Output - Clear recommendations, not just classifications
English-first queries - Always use English terms in tool calls (gene names, variant descriptions, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language

ACMG指导分类 - 遵循ACMG/AMP 2015指南，使用明确的证据代码
结构证据整合 - 采用AlphaFold2进行新型结构影响分析
人群背景分析 - 结合gnomAD频率及祖先特异性数据
基因-疾病有效性 - 基于ClinGen的临床相关性分类状态
可执行输出 - 提供明确的建议，而非仅分类结果
优先英文查询 - 工具调用中始终使用英文术语（基因名、变异描述、疾病名），即使用户使用其他语言提问。仅在必要时尝试原语言术语作为备选。以用户使用的语言回复

Triggers

触发场景

Use this skill when users:

Ask about variant interpretation or classification
Have VCF data needing clinical annotation
Ask "what does this variant mean clinically?"
Need ACMG classification for variants
Want structural impact analysis for missense variants
Ask about pathogenicity of specific variants

当用户有以下需求时使用本工具：

询问变异解读或分类相关问题
有VCF数据需要临床注释
询问“该变异在临床上有何意义？”
需要对变异进行ACMG分类
希望对意义未明错义变异进行结构影响分析
询问特定变异的致病性

Workflow Overview

工作流概述

┌─────────────────────────────────────────────────────────────────┐
│                    VARIANT INTERPRETATION                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 1: VARIANT IDENTITY                                       │
│  ├── Normalize variant notation (HGVS)                          │
│  ├── Map to gene, transcript, protein                           │
│  └── Get consequence type (missense, nonsense, etc.)            │
│                                                                  │
│  Phase 2: CLINICAL DATABASES                                     │
│  ├── ClinVar: Existing classifications                          │
│  ├── gnomAD: Population frequencies (all + ancestry)            │
│  ├── OMIM: Gene-disease associations                            │
│  ├── ClinGen: Gene validity + dosage sensitivity (ENHANCED)     │
│  │   └─ ClinGen_search_gene_validity, ClinGen_search_dosage     │
│  └── SpliceAI: Splice variant prediction (NEW)                  │
│                                                                  │
│  Phase 2.5: REGULATORY CONTEXT (NEW - for non-coding variants)  │
│  ├── ChIPAtlas: TF binding at position                          │
│  ├── ENCODE: Regulatory elements (enhancers, promoters)         │
│  ├── Conservation in regulatory regions                         │
│  └── Functional annotation of regulatory impact                 │
│                                                                  │
│  Phase 3: COMPUTATIONAL PREDICTIONS                              │
│  ├── SIFT/PolyPhen: Damaging predictions                        │
│  ├── CADD: Deleteriousness score                                │
│  ├── SpliceAI: Splice impact (if applicable)                    │
│  └── Conservation: Cross-species alignment                      │
│                                                                  │
│  Phase 4: STRUCTURAL ANALYSIS (for VUS/novel missense)          │
│  ├── Get protein structure (PDB or AlphaFold2)                  │
│  ├── Map variant to structure                                   │
│  ├── Assess domain/functional site impact                       │
│  └── Predict structural destabilization                         │
│                                                                  │
│  Phase 4.5: EXPRESSION CONTEXT (NEW)                            │
│  ├── CELLxGENE: Cell-type specific expression                   │
│  ├── Tissue relevance to phenotype                              │
│  └── Expression validation                                       │
│                                                                  │
│  Phase 5: LITERATURE EVIDENCE                                    │
│  ├── PubMed: Functional studies                                 │
│  ├── BioRxiv/MedRxiv: Recent preprints (NEW)                   │
│  ├── Case reports: Phenotype correlations                       │
│  └── Segregation data (if in literature)                        │
│                                                                  │
│  Phase 6: ACMG CLASSIFICATION                                    │
│  ├── Apply evidence codes (PVS1, PM2, PP3, etc.)               │
│  ├── Calculate classification                                   │
│  ├── Identify limiting factors                                  │
│  └── Generate clinical recommendations                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                    VARIANT INTERPRETATION                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 1: VARIANT IDENTITY                                       │
│  ├── Normalize variant notation (HGVS)                          │
│  ├── Map to gene, transcript, protein                           │
│  └── Get consequence type (missense, nonsense, etc.)            │
│                                                                  │
│  Phase 2: CLINICAL DATABASES                                     │
│  ├── ClinVar: Existing classifications                          │
│  ├── gnomAD: Population frequencies (all + ancestry)            │
│  ├── OMIM: Gene-disease associations                            │
│  ├── ClinGen: Gene validity + dosage sensitivity (ENHANCED)     │
│  │   └─ ClinGen_search_gene_validity, ClinGen_search_dosage     │
│  └── SpliceAI: Splice variant prediction (NEW)                  │
│                                                                  │
│  Phase 2.5: REGULATORY CONTEXT (NEW - for non-coding variants)  │
│  ├── ChIPAtlas: TF binding at position                          │
│  ├── ENCODE: Regulatory elements (enhancers, promoters)         │
│  ├── Conservation in regulatory regions                         │
│  └── Functional annotation of regulatory impact                 │
│                                                                  │
│  Phase 3: COMPUTATIONAL PREDICTIONS                              │
│  ├── SIFT/PolyPhen: Damaging predictions                        │
│  ├── CADD: Deleteriousness score                                │
│  ├── SpliceAI: Splice impact (if applicable)                    │
│  └── Conservation: Cross-species alignment                      │
│                                                                  │
│  Phase 4: STRUCTURAL ANALYSIS (for VUS/novel missense)          │
│  ├── Get protein structure (PDB or AlphaFold2)                  │
│  ├── Map variant to structure                                   │
│  ├── Assess domain/functional site impact                       │
│  └── Predict structural destabilization                         │
│                                                                  │
│  Phase 4.5: EXPRESSION CONTEXT (NEW)                            │
│  ├── CELLxGENE: Cell-type specific expression                   │
│  ├── Tissue relevance to phenotype                              │
│  └── Expression validation                                       │
│                                                                  │
│  Phase 5: LITERATURE EVIDENCE                                    │
│  ├── PubMed: Functional studies                                 │
│  ├── BioRxiv/MedRxiv: Recent preprints (NEW)                   │
│  ├── Case reports: Phenotype correlations                       │
│  └── Segregation data (if in literature)                        │
│                                                                  │
│  Phase 6: ACMG CLASSIFICATION                                    │
│  ├── Apply evidence codes (PVS1, PM2, PP3, etc.)               │
│  ├── Calculate classification                                   │
│  ├── Identify limiting factors                                  │
│  └── Generate clinical recommendations                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Phase Details

阶段详情

Phase 1: Variant Identity & Normalization

阶段1：变异识别与标准化

Goal: Standardize variant notation and determine molecular consequence

Tools:

Tool	Purpose
`myvariant_query`	Get variant annotations from MyVariant.info
`Ensembl_get_variant_info`	Variant effect predictor data
`NCBI_gene_search`	Gene information

Key Information to Capture:

HGVS notation (c. and p.)
Gene symbol and Ensembl ID
Transcript (canonical/MANE Select)
Consequence type
Amino acid change (for missense)
Exon/intron location

目标：标准化变异命名并确定分子影响

工具:

工具	用途
`myvariant_query`	从MyVariant.info获取变异注释
`Ensembl_get_variant_info`	变异效应预测器数据
`NCBI_gene_search`	基因信息

需捕获的关键信息:

HGVS命名（c. 和 p. 格式）
基因符号和Ensembl ID
转录本（标准/MANE Select）
变异类型
氨基酸变化（针对错义变异）
外显子/内含子位置

Phase 2: Clinical Database Queries

阶段2：临床数据库查询

Goal: Aggregate existing clinical knowledge

Tools:

Tool	Purpose	Key Data
`clinvar_search`	Existing classifications	Classification, review status, submissions
`gnomad_search`	Population frequency	AF, ancestry-specific AFs, homozygotes
`OMIM_search` , `OMIM_get_entry`	Gene-disease	Inheritance, phenotypes
`ClinGen_gene_validity`	Curation status	Gene-disease validity level
`COSMIC_search_mutations`	Somatic mutations (NEW)	Cancer frequency, histology
`DisGeNET_search_gene`	Gene-disease associations (NEW)	Evidence scores, sources

目标：整合现有临床知识

工具:

工具	用途	关键数据
`clinvar_search`	已有分类结果	分类结果、评审状态、提交记录
`gnomad_search`	人群频率	等位基因频率、祖先特异性频率、纯合子数量
`OMIM_search` , `OMIM_get_entry`	基因-疾病关联	遗传方式、表型
`ClinGen_gene_validity`	分类状态	基因-疾病有效性等级
`COSMIC_search_mutations`	体细胞变异（新增）	癌症频率、组织学类型
`DisGeNET_search_gene`	基因-疾病关联（新增）	证据评分、来源

2.1 COSMIC for Somatic Context (NEW)

2.1 体细胞变异的COSMIC背景（新增）

For cancer variants, check COSMIC for somatic mutation frequency:

python

def get_somatic_context(tu, gene_symbol, variant_aa):
    """Get somatic mutation context from COSMIC."""
    
    # Search for specific mutation
    cosmic = tu.tools.COSMIC_search_mutations(
        operation="search",
        terms=f"{gene_symbol} {variant_aa}",
        max_results=20,
        genome_build=38
    )
    
    # Get all gene mutations for context
    gene_mutations = tu.tools.COSMIC_get_mutations_by_gene(
        operation="get_by_gene",
        gene=gene_symbol,
        max_results=100
    )
    
    # Determine if it's a hotspot
    mutation_counts = Counter(m['MutationAA'] for m in gene_mutations.get('results', []))
    is_hotspot = variant_aa in [m[0] for m in mutation_counts.most_common(10)]
    
    return {
        'cosmic_hits': cosmic.get('results', []),
        'is_somatic_hotspot': is_hotspot,
        'cancer_types': [m['PrimarySite'] for m in cosmic.get('results', [])],
        'total_cosmic_count': cosmic.get('total_count', 0)
    }

针对癌症变异，通过COSMIC查询体细胞变异频率：

python

def get_somatic_context(tu, gene_symbol, variant_aa):
    """Get somatic mutation context from COSMIC."""
    
    # Search for specific mutation
    cosmic = tu.tools.COSMIC_search_mutations(
        operation="search",
        terms=f"{gene_symbol} {variant_aa}",
        max_results=20,
        genome_build=38
    )
    
    # Get all gene mutations for context
    gene_mutations = tu.tools.COSMIC_get_mutations_by_gene(
        operation="get_by_gene",
        gene=gene_symbol,
        max_results=100
    )
    
    # Determine if it's a hotspot
    mutation_counts = Counter(m['MutationAA'] for m in gene_mutations.get('results', []))
    is_hotspot = variant_aa in [m[0] for m in mutation_counts.most_common(10)]
    
    return {
        'cosmic_hits': cosmic.get('results', []),
        'is_somatic_hotspot': is_hotspot,
        'cancer_types': [m['PrimarySite'] for m in cosmic.get('results', [])],
        'total_cosmic_count': cosmic.get('total_count', 0)
    }

2.2 OMIM Gene-Disease Context (NEW)

2.2 OMIM基因-疾病背景（新增）

python

def get_omim_context(tu, gene_symbol):
    """Get OMIM gene-disease associations."""
    
    # Search OMIM for gene
    search = tu.tools.OMIM_search(
        operation="search",
        query=gene_symbol,
        limit=5
    )
    
    omim_data = []
    for entry in search.get('data', {}).get('entries', []):
        mim = entry.get('mimNumber')
        
        # Get detailed entry
        details = tu.tools.OMIM_get_entry(
            operation="get_entry",
            mim_number=str(mim)
        )
        
        # Get clinical synopsis
        synopsis = tu.tools.OMIM_get_clinical_synopsis(
            operation="get_clinical_synopsis",
            mim_number=str(mim)
        )
        
        omim_data.append({
            'mim_number': mim,
            'title': details.get('data', {}).get('titles', {}),
            'inheritance': synopsis.get('data', {}).get('inheritance'),
            'clinical_features': synopsis.get('data', {})
        })
    
    return omim_data

python

def get_omim_context(tu, gene_symbol):
    """Get OMIM gene-disease associations."""
    
    # Search OMIM for gene
    search = tu.tools.OMIM_search(
        operation="search",
        query=gene_symbol,
        limit=5
    )
    
    omim_data = []
    for entry in search.get('data', {}).get('entries', []):
        mim = entry.get('mimNumber')
        
        # Get detailed entry
        details = tu.tools.OMIM_get_entry(
            operation="get_entry",
            mim_number=str(mim)
        )
        
        # Get clinical synopsis
        synopsis = tu.tools.OMIM_get_clinical_synopsis(
            operation="get_clinical_synopsis",
            mim_number=str(mim)
        )
        
        omim_data.append({
            'mim_number': mim,
            'title': details.get('data', {}).get('titles', {}),
            'inheritance': synopsis.get('data', {}).get('inheritance'),
            'clinical_features': synopsis.get('data', {})
        })
    
    return omim_data

2.3 DisGeNET Gene-Disease Evidence (NEW)

2.3 DisGeNET基因-疾病证据（新增）

python

def get_disgenet_context(tu, gene_symbol, variant_rsid=None):
    """Get gene-disease associations from DisGeNET."""
    
    # Gene-disease associations
    gda = tu.tools.DisGeNET_search_gene(
        operation="search_gene",
        gene=gene_symbol,
        limit=20
    )
    
    # Variant-disease associations (if rsID available)
    vda = None
    if variant_rsid:
        vda = tu.tools.DisGeNET_get_vda(
            operation="get_vda",
            variant=variant_rsid,
            limit=20
        )
    
    return {
        'gene_associations': gda.get('data', {}).get('associations', []),
        'variant_associations': vda.get('data', {}).get('associations', []) if vda else []
    }

python

def get_disgenet_context(tu, gene_symbol, variant_rsid=None):
    """Get gene-disease associations from DisGeNET."""
    
    # Gene-disease associations
    gda = tu.tools.DisGeNET_search_gene(
        operation="search_gene",
        gene=gene_symbol,
        limit=20
    )
    
    # Variant-disease associations (if rsID available)
    vda = None
    if variant_rsid:
        vda = tu.tools.DisGeNET_get_vda(
            operation="get_vda",
            variant=variant_rsid,
            limit=20
        )
    
    return {
        'gene_associations': gda.get('data', {}).get('associations', []),
        'variant_associations': vda.get('data', {}).get('associations', []) if vda else []
    }

2.4 ClinGen Gene Validity & Dosage Sensitivity (NEW)

2.4 ClinGen基因有效性与剂量敏感性（新增）

ClinGen provides authoritative curation of gene-disease relationships:

python

def get_clingen_evidence(tu, gene_symbol):
    """
    Get ClinGen gene validity and dosage sensitivity data.
    CRITICAL for ACMG classification - establishes gene-disease validity.
    """
    
    # 1. Gene-disease validity (Definitive/Strong/Moderate/Limited)
    validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
    
    validity_data = []
    if validity.get('data'):
        for entry in validity.get('data', []):
            validity_data.append({
                'disease': entry.get('Disease Label'),
                'classification': entry.get('Classification'),  # Definitive, Strong, etc.
                'inheritance': entry.get('Inheritance'),
                'mondo_id': entry.get('Disease ID (MONDO)')
            })
    
    # 2. Dosage sensitivity (haploinsufficiency, triplosensitivity)
    dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
    
    dosage_data = {}
    if dosage.get('data'):
        for entry in dosage.get('data', []):
            dosage_data = {
                'haploinsufficiency_score': entry.get('Haploinsufficiency Score'),
                'triplosensitivity_score': entry.get('Triplosensitivity Score'),
                'disease': entry.get('Disease')
            }
            break  # Usually one entry per gene
    
    # 3. Clinical actionability (for incidental findings context)
    actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
    
    return {
        'gene_validity': validity_data,
        'dosage_sensitivity': dosage_data,
        'actionability': actionability.get('data', {}),
        'has_definitive_validity': any(v['classification'] == 'Definitive' for v in validity_data),
        'is_haploinsufficient': dosage_data.get('haploinsufficiency_score') == '3'
    }

ClinGen Validity Levels (for ACMG PM1/PP4):

Classification	Meaning	ACMG Impact
Definitive	Multiple concordant studies	Strong gene-disease support
Strong	Extensive evidence	Moderate-strong support
Moderate	Some evidence	Moderate support
Limited	Minimal evidence	Weak support, use caution
Disputed	Conflicting evidence	Do not use for classification
Refuted	Evidence against	Gene NOT associated

Dosage Sensitivity Scores (for CNV interpretation):

Score	Meaning	Interpretation
3	Sufficient evidence	Haploinsufficiency/triplosensitivity established
2	Emerging evidence	Some support, not definitive
1	Little evidence	Minimal support
0	No evidence	Unknown

ClinGen提供权威的基因-疾病关系分类：

python

def get_clingen_evidence(tu, gene_symbol):
    """
    Get ClinGen gene validity and dosage sensitivity data.
    CRITICAL for ACMG classification - establishes gene-disease validity.
    """
    
    # 1. Gene-disease validity (Definitive/Strong/Moderate/Limited)
    validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
    
    validity_data = []
    if validity.get('data'):
        for entry in validity.get('data', []):
            validity_data.append({
                'disease': entry.get('Disease Label'),
                'classification': entry.get('Classification'),  # Definitive, Strong, etc.
                'inheritance': entry.get('Inheritance'),
                'mondo_id': entry.get('Disease ID (MONDO)')
            })
    
    # 2. Dosage sensitivity (haploinsufficiency, triplosensitivity)
    dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
    
    dosage_data = {}
    if dosage.get('data'):
        for entry in dosage.get('data', []):
            dosage_data = {
                'haploinsufficiency_score': entry.get('Haploinsufficiency Score'),
                'triplosensitivity_score': entry.get('Triplosensitivity Score'),
                'disease': entry.get('Disease')
            }
            break  # Usually one entry per gene
    
    # 3. Clinical actionability (for incidental findings context)
    actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
    
    return {
        'gene_validity': validity_data,
        'dosage_sensitivity': dosage_data,
        'actionability': actionability.get('data', {}),
        'has_definitive_validity': any(v['classification'] == 'Definitive' for v in validity_data),
        'is_haploinsufficient': dosage_data.get('haploinsufficiency_score') == '3'
    }

ClinGen有效性等级（用于ACMG PM1/PP4）:

分类	含义	ACMG影响
Definitive	多项一致研究支持	强基因-疾病关联
Strong	大量证据支持	中-强关联
Moderate	部分证据支持	中等关联
Limited	少量证据支持	弱关联，谨慎使用
Disputed	证据冲突	不用于分类
Refuted	反向证据	基因与疾病无关

剂量敏感性评分（用于CNV解读）:

评分	含义	解读
3	充分证据	单倍剂量不足/三倍体敏感性已确立
2	新兴证据	部分支持，非确定性
1	少量证据	minimal支持
0	无证据	未知

2.5 SpliceAI Splice Variant Prediction (NEW)

2.5 SpliceAI剪接变异预测（新增）

~15% of pathogenic variants affect splicing. SpliceAI is the gold standard for splice prediction:

python

def get_spliceai_prediction(tu, chrom, pos, ref, alt, genome="38"):
    """
    Get SpliceAI splice effect predictions.
    
    Delta scores:
    - DS_AG: Acceptor gain
    - DS_AL: Acceptor loss  
    - DS_DG: Donor gain
    - DS_DL: Donor loss
    
    Thresholds:
    - ≥0.8: High pathogenicity (strong PP3)
    - 0.5-0.8: Moderate (supporting PP3)
    - 0.2-0.5: Low (weak evidence)
    - <0.2: Likely benign
    """
    
    # Format variant for SpliceAI
    variant = f"chr{chrom}-{pos}-{ref}-{alt}"
    
    # Get full splice predictions
    result = tu.tools.SpliceAI_predict_splice(
        variant=variant,
        genome=genome
    )
    
    if result.get('data'):
        max_score = result['data'].get('max_delta_score', 0)
        interpretation = result['data'].get('interpretation', '')
        
        # Determine ACMG support
        if max_score >= 0.8:
            acmg = 'PP3 (strong) - high splice impact'
        elif max_score >= 0.5:
            acmg = 'PP3 (supporting) - moderate splice impact'
        elif max_score >= 0.2:
            acmg = 'PP3 (weak) - possible splice impact'
        else:
            acmg = 'BP7 (if synonymous) - splice benign'
        
        return {
            'max_delta_score': max_score,
            'interpretation': interpretation,
            'acmg_support': acmg,
            'scores': result['data'].get('scores', [])
        }
    return None

def quick_splice_check(tu, variant, genome="38"):
    """Quick triage using max delta score only."""
    
    result = tu.tools.SpliceAI_get_max_delta(
        variant=variant,
        genome=genome
    )
    
    return result.get('data', {})

When to Use SpliceAI:

Intronic variants near splice sites (±50bp)
Synonymous variants (may still affect splicing)
Exonic variants near splice junctions
Variants creating cryptic splice sites

Report Section for Splice Variants:

markdown

undefined

约15%的致病性变异影响剪接。SpliceAI是剪接预测的金标准：

python

def get_spliceai_prediction(tu, chrom, pos, ref, alt, genome="38"):
    """
    Get SpliceAI splice effect predictions.
    
    Delta scores:
    - DS_AG: Acceptor gain
    - DS_AL: Acceptor loss  
    - DS_DG: Donor gain
    - DS_DL: Donor loss
    
    Thresholds:
    - ≥0.8: High pathogenicity (strong PP3)
    - 0.5-0.8: Moderate (supporting PP3)
    - 0.2-0.5: Low (weak evidence)
    - <0.2: Likely benign
    """
    
    # Format variant for SpliceAI
    variant = f"chr{chrom}-{pos}-{ref}-{alt}"
    
    # Get full splice predictions
    result = tu.tools.SpliceAI_predict_splice(
        variant=variant,
        genome=genome
    )
    
    if result.get('data'):
        max_score = result['data'].get('max_delta_score', 0)
        interpretation = result['data'].get('interpretation', '')
        
        # Determine ACMG support
        if max_score >= 0.8:
            acmg = 'PP3 (strong) - high splice impact'
        elif max_score >= 0.5:
            acmg = 'PP3 (supporting) - moderate splice impact'
        elif max_score >= 0.2:
            acmg = 'PP3 (weak) - possible splice impact'
        else:
            acmg = 'BP7 (if synonymous) - splice benign'
        
        return {
            'max_delta_score': max_score,
            'interpretation': interpretation,
            'acmg_support': acmg,
            'scores': result['data'].get('scores', [])
        }
    return None

def quick_splice_check(tu, variant, genome="38"):
    """Quick triage using max delta score only."""
    
    result = tu.tools.SpliceAI_get_max_delta(
        variant=variant,
        genome=genome
    )
    
    return result.get('data', {})

SpliceAI适用场景:

剪接位点附近的内含子变异（±50bp）
同义变异（仍可能影响剪接）
剪接 junction附近的外显子变异
产生隐蔽剪接位点的变异

剪接变异报告示例:

markdown

undefined

Splice Impact Analysis (SpliceAI)

剪接影响分析（SpliceAI）

Score Type	Value	Position	Interpretation
DS_AG	0.02	+15	Acceptor gain unlikely
DS_AL	0.85	-2	High acceptor loss
DS_DG	0.01	+8	Donor gain unlikely
DS_DL	0.03	+1	Donor loss unlikely

Max Delta Score: 0.85 (DS_AL) Interpretation: High impact - likely disrupts acceptor site ACMG Support: PP3 (strong) for splice-altering effect

Source: SpliceAI via
SpliceAI_predict_splice


**ClinVar Classification Map**:
| ClinVar | Interpretation |
|---------|----------------|
| Pathogenic | Disease-causing |
| Likely pathogenic | 90%+ confidence pathogenic |
| VUS | Uncertain significance |
| Likely benign | 90%+ confidence benign |
| Benign | Not disease-causing |
| Conflicting | Multiple interpretations |

**gnomAD Thresholds (for rare disease)**:
| Frequency | ACMG Code | Interpretation |
|-----------|-----------|----------------|
| Absent | PM2_Supporting | Absent from controls |
| <0.00001 | PM2_Supporting | Extremely rare |
| <0.0001 | - | Rare (use with caution) |
| >0.01 | BS1/BA1 | Too common for rare disease |

**COSMIC Somatic Evidence (NEW)**:
| COSMIC Finding | Interpretation | ACMG Support |
|----------------|----------------|--------------|
| Recurrent hotspot (>100 samples) | Known oncogenic driver | PS3 (functional) |
| Moderate frequency (10-100) | Likely oncogenic | PM1 (hotspot) |
| Rare somatic (<10) | Unknown significance | No support |

**DisGeNET Score Interpretation (NEW)**:
| GDA Score | Evidence Level | ACMG Support |
|-----------|----------------|--------------|
| >0.7 | Strong | PP4 (phenotype) |
| 0.4-0.7 | Moderate | Supporting |
| <0.4 | Weak | Insufficient |

评分类型	数值	位置	解读
DS_AG	0.02	+15	不太可能产生新的受体位点
DS_AL	0.85	-2	高受体位点丢失风险
DS_DG	0.01	+8	不太可能产生新的供体位点
DS_DL	0.03	+1	不太可能丢失供体位点

最大Delta评分: 0.85 (DS_AL) 解读: 高影响 - 可能破坏受体位点 ACMG支持: PP3（强）剪接改变效应

来源: SpliceAI via
SpliceAI_predict_splice


**ClinVar分类映射**:
| ClinVar分类 | 解读 |
|---------|----------------|
| Pathogenic | 致病 |
| Likely pathogenic | 90%+置信度致病 |
| VUS | 意义未明 |
| Likely benign | 90%+置信度良性 |
| Benign | 非致病 |
| Conflicting | 多种解读结果 |

**gnomAD阈值（罕见病）**:
| 频率 | ACMG代码 | 解读 |
|-----------|-----------|----------------|
| 未检出 | PM2_Supporting | 对照人群中未发现 |
| <0.00001 | PM2_Supporting | 极罕见 |
| <0.0001 | - | 罕见（谨慎使用） |
| >0.01 | BS1/BA1 | 过于常见，不符合罕见病特征 |

**COSMIC体细胞证据（新增）**:
| COSMIC发现 | 解读 | ACMG支持 |
|----------------|----------------|--------------|
| 反复出现的热点变异（>100样本） | 已知致癌驱动因子 | PS3（功能证据） |
| 中等频率（10-100） | 可能致癌 | PM1（热点变异） |
| 罕见体细胞变异（<10） | 意义未明 | 无支持 |

**DisGeNET评分解读（新增）**:
| GDA评分 | 证据等级 | ACMG支持 |
|-----------|----------------|--------------|
| >0.7 | 强 | PP4（表型关联） |
| 0.4-0.7 | 中等 | 支持性证据 |
| <0.4 | 弱 | 证据不足 |

Phase 2.5: Regulatory Context (NEW - for Non-Coding Variants)

阶段2.5：调控背景（新增 - 针对非编码变异）

Goal: Assess regulatory impact for non-coding, intronic, and promoter variants

When to Apply:

Intronic variants (not splice site)
Promoter variants
5'UTR / 3'UTR variants
Intergenic variants near disease genes

Tools:

Tool	Purpose	Key Data
`ChIPAtlas_enrichment_analysis`	TF binding at position	Bound TFs, cell types
`ChIPAtlas_get_peak_data`	ChIP-seq peaks	Peak coordinates, scores
`ENCODE_search_experiments`	Regulatory elements	Enhancers, promoters, DHS
`ENCODE_get_experiment`	Experiment details	Assay type, targets

Regulatory Impact Assessment:

python

def assess_regulatory_impact(tu, variant_position, gene_symbol):
    """Assess regulatory impact of non-coding variant."""
    
    # Check TF binding at position
    tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
        gene=gene_symbol,
        cell_type="all"
    )
    
    # Get ChIP-seq peaks overlapping variant
    peaks = tu.tools.ChIPAtlas_get_peak_data(
        gene=gene_symbol,
        experiment_type="TF"
    )
    
    # Search ENCODE for regulatory annotations
    encode_data = tu.tools.ENCODE_search_experiments(
        assay_title="ATAC-seq",
        biosample="all"
    )
    
    # Assess if variant disrupts TF binding
    binding_disrupted = check_motif_disruption(variant_position, peaks)
    
    return {
        'tf_binding': tf_binding,
        'regulatory_peaks': peaks,
        'encode_annotations': encode_data,
        'likely_regulatory': binding_disrupted
    }

Regulatory Impact Categories:

Category	Criteria	ACMG Support
High impact	Disrupts known TF binding motif	PP3 (supporting)
Moderate impact	In active regulatory region	Consider context
Low impact	No regulatory annotation	No support

Output for Report:

markdown

undefined

目标：评估非编码、内含子和启动子变异的调控影响

适用场景:

内含子变异（非剪接位点）
启动子变异
5'UTR / 3'UTR变异
疾病基因附近的基因间变异

工具:

工具	用途	关键数据
`ChIPAtlas_enrichment_analysis`	位点的转录因子结合情况	结合的转录因子、细胞类型
`ChIPAtlas_get_peak_data`	ChIP-seq峰	峰坐标、评分
`ENCODE_search_experiments`	调控元件	增强子、启动子、DNase I超敏位点
`ENCODE_get_experiment`	实验详情	检测类型、靶点

调控影响评估:

python

def assess_regulatory_impact(tu, variant_position, gene_symbol):
    """Assess regulatory impact of non-coding variant."""
    
    # Check TF binding at position
    tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
        gene=gene_symbol,
        cell_type="all"
    )
    
    # Get ChIP-seq peaks overlapping variant
    peaks = tu.tools.ChIPAtlas_get_peak_data(
        gene=gene_symbol,
        experiment_type="TF"
    )
    
    # Search ENCODE for regulatory annotations
    encode_data = tu.tools.ENCODE_search_experiments(
        assay_title="ATAC-seq",
        biosample="all"
    )
    
    # Assess if variant disrupts TF binding
    binding_disrupted = check_motif_disruption(variant_position, peaks)
    
    return {
        'tf_binding': tf_binding,
        'regulatory_peaks': peaks,
        'encode_annotations': encode_data,
        'likely_regulatory': binding_disrupted
    }

调控影响分类:

分类	标准	ACMG支持
高影响	破坏已知转录因子结合基序	PP3（支持性）
中影响	位于活性调控区域	需结合上下文判断
低影响	无调控注释	无支持

报告输出示例:

markdown

undefined

2.5 Regulatory Context (for Non-Coding Variants)

2.5 调控背景（针对非编码变异）

Feature	Finding	Significance
Variant location	Intron 5, 120bp from exon 6	Not canonical splice
TF binding site	CTCF binding peak (ChIPAtlas)	May affect insulation
ENCODE annotation	Active enhancer (H3K27ac)	Regulatory function
Conservation	PhyloP = 2.8	Moderate conservation

Regulatory Interpretation: Variant overlaps CTCF binding site in active enhancer region. Potential impact on gene regulation.

Source: ChIPAtlas, ENCODE

undefined

特征	发现	显著性
变异位置	第5内含子，距第6外显子120bp	非经典剪接位点
转录因子结合位点	CTCF结合峰（ChIPAtlas）	可能影响绝缘作用
ENCODE注释	活性增强子（H3K27ac）	具有调控功能
保守性	PhyloP = 2.8	中等保守

调控解读: 变异位于CTCF结合位点及活性增强子区域，可能影响基因调控。

来源: ChIPAtlas, ENCODE

undefined

Phase 3: Computational Predictions (ENHANCED)

阶段3：计算预测（增强版）

Goal: Assess in silico pathogenicity predictions using state-of-the-art models

Tools:

Tool	Purpose	Score Range
`CADD_get_variant_score`	Deleteriousness score (NEW API)	PHRED 0-99
`AlphaMissense_get_variant_score`	DeepMind pathogenicity (NEW)	0-1
`EVE_get_variant_score`	Evolutionary pathogenicity (NEW)	0-1
`myvariant_query`	Aggregated predictions	SIFT, PolyPhen
`Ensembl_get_variant_info`	VEP predictions	SIFT, PolyPhen

目标：利用最先进的模型评估致病性预测结果

工具:

工具	用途	评分范围
`CADD_get_variant_score`	有害性评分（新增API）	PHRED 0-99
`AlphaMissense_get_variant_score`	DeepMind致病性预测（新增）	0-1
`EVE_get_variant_score`	进化致病性预测（新增）	0-1
`myvariant_query`	整合预测结果	SIFT、PolyPhen
`Ensembl_get_variant_info`	VEP预测结果	SIFT、PolyPhen

3.1 CADD Deleteriousness Scoring (NEW)

3.1 CADD有害性评分（新增）

python

def get_cadd_score(tu, chrom, pos, ref, alt):
    """Get CADD deleteriousness score for a variant."""
    
    result = tu.tools.CADD_get_variant_score(
        chrom=str(chrom),
        pos=pos,
        ref=ref,
        alt=alt,
        version="GRCh38-v1.7"
    )
    
    if result.get('status') == 'success':
        phred = result['data'].get('phred_score')
        return {
            'score': phred,
            'interpretation': result['data'].get('interpretation'),
            'acmg_support': 'PP3' if phred >= 20 else ('BP4' if phred < 15 else 'neutral')
        }
    return None

python

def get_cadd_score(tu, chrom, pos, ref, alt):
    """Get CADD deleteriousness score for a variant."""
    
    result = tu.tools.CADD_get_variant_score(
        chrom=str(chrom),
        pos=pos,
        ref=ref,
        alt=alt,
        version="GRCh38-v1.7"
    )
    
    if result.get('status') == 'success':
        phred = result['data'].get('phred_score')
        return {
            'score': phred,
            'interpretation': result['data'].get('interpretation'),
            'acmg_support': 'PP3' if phred >= 20 else ('BP4' if phred < 15 else 'neutral')
        }
    return None

3.2 AlphaMissense Pathogenicity (NEW)

3.2 AlphaMissense致病性预测（新增）

DeepMind's AlphaMissense provides state-of-the-art missense pathogenicity prediction:

python

def get_alphamissense_score(tu, uniprot_id, variant):
    """
    Get AlphaMissense pathogenicity score.
    variant format: 'R123H' or 'p.R123H'
    
    Thresholds:
    - Pathogenic: score > 0.564
    - Ambiguous: 0.34-0.564
    - Benign: score < 0.34
    """
    
    result = tu.tools.AlphaMissense_get_variant_score(
        uniprot_id=uniprot_id,
        variant=variant
    )
    
    if result.get('status') == 'success' and result.get('data'):
        score = result['data'].get('pathogenicity_score')
        classification = result['data'].get('classification')
        
        # Map to ACMG
        if classification == 'pathogenic':
            acmg = 'PP3 (strong)'  # AlphaMissense has high accuracy
        elif classification == 'benign':
            acmg = 'BP4 (strong)'
        else:
            acmg = 'neutral'
        
        return {
            'score': score,
            'classification': classification,
            'acmg_support': acmg
        }
    return None

DeepMind的AlphaMissense提供最先进的错义变异致病性预测：

python

def get_alphamissense_score(tu, uniprot_id, variant):
    """
    Get AlphaMissense pathogenicity score.
    variant format: 'R123H' or 'p.R123H'
    
    Thresholds:
    - Pathogenic: score > 0.564
    - Ambiguous: 0.34-0.564
    - Benign: score < 0.34
    """
    
    result = tu.tools.AlphaMissense_get_variant_score(
        uniprot_id=uniprot_id,
        variant=variant
    )
    
    if result.get('status') == 'success' and result.get('data'):
        score = result['data'].get('pathogenicity_score')
        classification = result['data'].get('classification')
        
        # Map to ACMG
        if classification == 'pathogenic':
            acmg = 'PP3 (strong)'  # AlphaMissense has high accuracy
        elif classification == 'benign':
            acmg = 'BP4 (strong)'
        else:
            acmg = 'neutral'
        
        return {
            'score': score,
            'classification': classification,
            'acmg_support': acmg
        }
    return None

3.3 EVE Evolutionary Prediction (NEW)

3.3 EVE进化预测（新增）

EVE uses unsupervised learning on evolutionary data:

python

def get_eve_score(tu, chrom, pos, ref, alt):
    """
    Get EVE evolutionary pathogenicity score.
    
    Threshold: >0.5 indicates likely pathogenic
    """
    
    result = tu.tools.EVE_get_variant_score(
        chrom=str(chrom),
        pos=pos,
        ref=ref,
        alt=alt
    )
    
    if result.get('status') == 'success':
        eve_scores = result['data'].get('eve_scores', [])
        if eve_scores:
            best_score = eve_scores[0]
            return {
                'score': best_score.get('eve_score'),
                'classification': best_score.get('classification'),
                'gene': best_score.get('gene_symbol'),
                'acmg_support': 'PP3' if best_score.get('eve_score', 0) > 0.5 else 'BP4'
            }
    return None

EVE利用无监督学习分析进化数据：

python

def get_eve_score(tu, chrom, pos, ref, alt):
    """
    Get EVE evolutionary pathogenicity score.
    
    Threshold: >0.5 indicates likely pathogenic
    """
    
    result = tu.tools.EVE_get_variant_score(
        chrom=str(chrom),
        pos=pos,
        ref=ref,
        alt=alt
    )
    
    if result.get('status') == 'success':
        eve_scores = result['data'].get('eve_scores', [])
        if eve_scores:
            best_score = eve_scores[0]
            return {
                'score': best_score.get('eve_score'),
                'classification': best_score.get('classification'),
                'gene': best_score.get('gene_symbol'),
                'acmg_support': 'PP3' if best_score.get('eve_score', 0) > 0.5 else 'BP4'
            }
    return None

3.4 Integrated Prediction Strategy

3.4 整合预测策略

For VUS (Variants of Uncertain Significance), combine multiple predictors:

python

def comprehensive_pathogenicity_assessment(tu, variant_info):
    """
    Combine all prediction tools for robust classification.
    """
    chrom = variant_info['chrom']
    pos = variant_info['pos']
    ref = variant_info['ref']
    alt = variant_info['alt']
    uniprot_id = variant_info.get('uniprot_id')
    aa_change = variant_info.get('aa_change')  # e.g., 'R123H'
    
    predictions = {}
    
    # 1. CADD (works for all variant types)
    cadd = get_cadd_score(tu, chrom, pos, ref, alt)
    if cadd:
        predictions['cadd'] = cadd
    
    # 2. AlphaMissense (missense only, requires UniProt ID)
    if uniprot_id and aa_change:
        am = get_alphamissense_score(tu, uniprot_id, aa_change)
        if am:
            predictions['alphamissense'] = am
    
    # 3. EVE (missense only)
    eve = get_eve_score(tu, chrom, pos, ref, alt)
    if eve:
        predictions['eve'] = eve
    
    # Consensus assessment
    damaging_count = sum(1 for p in predictions.values() 
                         if 'PP3' in p.get('acmg_support', ''))
    benign_count = sum(1 for p in predictions.values() 
                       if 'BP4' in p.get('acmg_support', ''))
    
    if damaging_count >= 2 and benign_count == 0:
        consensus = 'likely_damaging'
        acmg = 'PP3 (multiple predictors concordant)'
    elif benign_count >= 2 and damaging_count == 0:
        consensus = 'likely_benign'
        acmg = 'BP4 (multiple predictors concordant)'
    else:
        consensus = 'uncertain'
        acmg = 'neutral (discordant predictions)'
    
    return {
        'predictions': predictions,
        'consensus': consensus,
        'acmg_recommendation': acmg
    }

Prediction Interpretation (Updated):

Predictor	Damaging	Benign
AlphaMissense	>0.564	<0.34
CADD PHRED	≥20 (top 1%)	<15
EVE	>0.5	≤0.5
SIFT	<0.05	≥0.05
PolyPhen2	>0.85 (probably)	<0.15 (benign)

ACMG Application (Enhanced):

PP3: Multiple concordant damaging predictions (AlphaMissense + CADD + EVE agreement = strong PP3)
BP4: Multiple concordant benign predictions
Note: AlphaMissense alone achieves ~90% accuracy on ClinVar pathogenic variants

针对VUS（意义未明变异），结合多个预测工具：

python

def comprehensive_pathogenicity_assessment(tu, variant_info):
    """
    Combine all prediction tools for robust classification.
    """
    chrom = variant_info['chrom']
    pos = variant_info['pos']
    ref = variant_info['ref']
    alt = variant_info['alt']
    uniprot_id = variant_info.get('uniprot_id')
    aa_change = variant_info.get('aa_change')  # e.g., 'R123H'
    
    predictions = {}
    
    # 1. CADD (works for all variant types)
    cadd = get_cadd_score(tu, chrom, pos, ref, alt)
    if cadd:
        predictions['cadd'] = cadd
    
    # 2. AlphaMissense (missense only, requires UniProt ID)
    if uniprot_id and aa_change:
        am = get_alphamissense_score(tu, uniprot_id, aa_change)
        if am:
            predictions['alphamissense'] = am
    
    # 3. EVE (missense only)
    eve = get_eve_score(tu, chrom, pos, ref, alt)
    if eve:
        predictions['eve'] = eve
    
    # Consensus assessment
    damaging_count = sum(1 for p in predictions.values() 
                         if 'PP3' in p.get('acmg_support', ''))
    benign_count = sum(1 for p in predictions.values() 
                       if 'BP4' in p.get('acmg_support', ''))
    
    if damaging_count >= 2 and benign_count == 0:
        consensus = 'likely_damaging'
        acmg = 'PP3 (multiple predictors concordant)'
    elif benign_count >= 2 and damaging_count == 0:
        consensus = 'likely_benign'
        acmg = 'BP4 (multiple predictors concordant)'
    else:
        consensus = 'uncertain'
        acmg = 'neutral (discordant predictions)'
    
    return {
        'predictions': predictions,
        'consensus': consensus,
        'acmg_recommendation': acmg
    }

预测解读（更新）:

预测工具	致病性	良性
AlphaMissense	>0.564	<0.34
CADD PHRED	≥20（前1%）	<15
EVE	>0.5	≤0.5
SIFT	<0.05	≥0.05
PolyPhen2	>0.85（可能致病）	<0.15（良性）

ACMG应用（增强版）:

PP3: 多个一致的致病性预测结果（AlphaMissense + CADD + EVE一致 = 强PP3）
BP4: 多个一致的良性预测结果
注意: AlphaMissense单独使用对ClinVar致病性变异的准确率约为90%

Phase 4: Structural Analysis

阶段4：结构分析

Goal: Assess protein structural impact (especially for VUS)

Tools:

Tool	Purpose
`PDB_search_by_uniprot`	Find experimental structures
`NvidiaNIM_alphafold2`	Predict structure if no PDB
`alphafold_get_prediction`	Get AlphaFold DB structure
`InterPro_get_protein_domains`	Domain annotations
`UniProt_get_protein_function`	Functional sites

Structural Impact Categories:

Impact Level	Description	ACMG Support
Critical	Active site, catalytic residue	PM1 (strong)
High	Buried residue, disulfide, structural core	PM1 (moderate)
Moderate	Domain interface, binding site	PM1 (supporting)
Low	Surface, flexible region	No support

Using AlphaFold2 for VUS:

1. Get wildtype structure (PDB or AlphaFold)
2. Identify residue location:
   - pLDDT at position (confidence)
   - Solvent accessibility
   - Secondary structure
3. Assess structural context:
   - Distance to functional sites
   - Interaction partners
   - Conservation in structure
4. Predict impact:
   - Side chain burial
   - Hydrogen bond disruption
   - Charge changes in buried positions

目标：评估蛋白质结构影响（尤其针对VUS）

工具:

工具	用途
`PDB_search_by_uniprot`	查找实验结构
`NvidiaNIM_alphafold2`	无PDB结构时预测结构
`alphafold_get_prediction`	获取AlphaFold DB结构
`InterPro_get_protein_domains`	结构域注释
`UniProt_get_protein_function`	功能位点

结构影响分类:

影响等级	描述	ACMG支持
Critical	活性位点、催化残基	PM1（强）
High	埋藏残基、二硫键、结构核心	PM1（中）
Moderate	结构域界面、结合位点	PM1（支持性）
Low	表面、柔性区域	无支持

AlphaFold2在VUS分析中的应用:

1. 获取野生型结构（PDB或AlphaFold）
2. 识别残基位置:
   - 该位置的pLDDT（置信度）
   - 溶剂可及性
   - 二级结构
3. 评估结构背景:
   - 与功能位点的距离
   - 相互作用伙伴
   - 结构中的保守性
4. 预测影响:
   - 侧链埋藏情况
   - 氢键破坏
   - 埋藏位置的电荷变化

Phase 4.5: Expression Context (NEW)

阶段4.5：表达背景（新增）

Goal: Validate gene expression in disease-relevant tissues/cells

Tools:

Tool	Purpose	Key Data
`CELLxGENE_get_expression_data`	Cell-type specific expression	TPM per cell type
`CELLxGENE_get_cell_metadata`	Cell type annotations	Tissue, disease state
`GTEx_get_median_gene_expression`	Tissue expression	TPM per tissue

Expression Validation:

python

def validate_expression_context(tu, gene_symbol, phenotype_tissues):
    """Validate gene is expressed in phenotype-relevant tissues."""
    
    # Single-cell expression
    sc_expression = tu.tools.CELLxGENE_get_expression_data(
        gene=gene_symbol,
        tissue=phenotype_tissues[0] if phenotype_tissues else "all"
    )
    
    # Bulk tissue expression (GTEx)
    gtex = tu.tools.GTEx_get_median_gene_expression(
        gene=gene_symbol
    )
    
    # Check expression in relevant tissues
    relevant_expression = {
        tissue: gtex.get(tissue, 0)
        for tissue in phenotype_tissues
    }
    
    return {
        'single_cell': sc_expression,
        'gtex': relevant_expression,
        'expressed_in_phenotype_tissue': any(v > 1 for v in relevant_expression.values())
    }

Why it matters:

Confirms gene is expressed where disease manifests
Supports PP4 (phenotype-specific) if highly restricted expression
Can challenge classification if not expressed in affected tissue

Output for Report:

markdown

undefined

目标：验证基因在疾病相关组织/细胞中的表达情况

工具:

工具	用途	关键数据
`CELLxGENE_get_expression_data`	细胞类型特异性表达	各细胞类型的TPM值
`CELLxGENE_get_cell_metadata`	细胞类型注释	组织、疾病状态
`GTEx_get_median_gene_expression`	组织表达	各组织的TPM值

表达验证:

python

def validate_expression_context(tu, gene_symbol, phenotype_tissues):
    """Validate gene is expressed in phenotype-relevant tissues."""
    
    # Single-cell expression
    sc_expression = tu.tools.CELLxGENE_get_expression_data(
        gene=gene_symbol,
        tissue=phenotype_tissues[0] if phenotype_tissues else "all"
    )
    
    # Bulk tissue expression (GTEx)
    gtex = tu.tools.GTEx_get_median_gene_expression(
        gene=gene_symbol
    )
    
    # Check expression in relevant tissues
    relevant_expression = {
        tissue: gtex.get(tissue, 0)
        for tissue in phenotype_tissues
    }
    
    return {
        'single_cell': sc_expression,
        'gtex': relevant_expression,
        'expressed_in_phenotype_tissue': any(v > 1 for v in relevant_expression.values())
    }

重要性:

确认基因在疾病发生组织中表达
若表达高度受限，支持PP4（表型特异性）
若在受影响组织中不表达，可能质疑分类结果

报告输出示例:

markdown

undefined

4.5 Expression Context

4.5 表达背景

Tissue	Expression (TPM)	Relevance
Heart	45.2	✓ Primary disease tissue
Skeletal muscle	38.7	✓ Secondary involvement
Liver	2.1	Low expression
Brain	0.5	Not expressed

Single-Cell Analysis (CELLxGENE):

Cardiomyocytes: High expression (TPM=85)
Cardiac fibroblasts: Low expression (TPM=5)

Interpretation: Gene highly expressed in cardiomyocytes, supporting cardiac phenotype association.

Source: GTEx, CELLxGENE Census

undefined

组织	表达量（TPM）	相关性
心脏	45.2	✓ 主要疾病组织
骨骼肌	38.7	✓ 次要受累组织
肝脏	2.1	低表达
脑	0.5	不表达

单细胞分析（CELLxGENE）:

心肌细胞: 高表达（TPM=85）
心脏成纤维细胞: 低表达（TPM=5）

解读: 基因在心肌细胞中高表达，支持与心脏表型的关联。

Source: GTEx, CELLxGENE Census

undefined

Phase 5: Literature Evidence (ENHANCED)

阶段5：文献证据（增强版）

Goal: Find functional studies, case reports, and cutting-edge preprints

Tools:

Tool	Purpose	Coverage
`PubMed_search`	Peer-reviewed studies	Comprehensive
`EuropePMC_search`	Additional literature	Europe PMC
`BioRxiv_search_preprints`	Biology preprints	Recent findings
`MedRxiv_search_preprints`	Clinical preprints	Clinical studies
`openalex_search_works`	Citation analysis	Impact metrics
`SemanticScholar_search_papers`	AI-ranked search	Relevance

Search Strategies:

python

def comprehensive_literature_search(tu, gene, variant, phenotype):
    """Search across all literature sources."""
    
    # 1. PubMed: Peer-reviewed
    pubmed = tu.tools.PubMed_search(
        query=f'"{gene}" AND ("{variant}" OR functional)',
        max_results=30
    )
    
    # 2. BioRxiv: Recent preprints
    biorxiv = tu.tools.BioRxiv_search_preprints(
        query=f"{gene} {phenotype}",
        limit=10
    )
    
    # 3. MedRxiv: Clinical preprints
    medrxiv = tu.tools.MedRxiv_search_preprints(
        query=f"{gene} variant {phenotype}",
        limit=10
    )
    
    # 4. Citation analysis
    key_papers = pubmed[:5]  # Top papers
    for paper in key_papers:
        citations = tu.tools.openalex_search_works(
            query=paper['title'],
            limit=1
        )
        paper['citation_count'] = citations[0].get('cited_by_count', 0) if citations else 0
    
    return {
        'pubmed': pubmed,
        'preprints': biorxiv + medrxiv,
        'key_papers_with_citations': key_papers
    }

Search Queries:

undefined

目标：查找功能研究、病例报告和前沿预印本

工具:

工具	用途	覆盖范围
`PubMed_search`	同行评审研究	全面覆盖
`EuropePMC_search`	补充文献	Europe PMC
`BioRxiv_search_preprints`	生物学预印本	最新发现
`MedRxiv_search_preprints`	临床预印本	临床研究
`openalex_search_works`	引用分析	影响指标
`SemanticScholar_search_papers`	AI排序搜索	相关性

搜索策略:

python

def comprehensive_literature_search(tu, gene, variant, phenotype):
    """Search across all literature sources."""
    
    # 1. PubMed: Peer-reviewed
    pubmed = tu.tools.PubMed_search(
        query=f'"{gene}" AND ("{variant}" OR functional)',
        max_results=30
    )
    
    # 2. BioRxiv: Recent preprints
    biorxiv = tu.tools.BioRxiv_search_preprints(
        query=f"{gene} {phenotype}",
        limit=10
    )
    
    # 3. MedRxiv: Clinical preprints
    medrxiv = tu.tools.MedRxiv_search_preprints(
        query=f"{gene} variant {phenotype}",
        limit=10
    )
    
    # 4. Citation analysis
    key_papers = pubmed[:5]  # Top papers
    for paper in key_papers:
        citations = tu.tools.openalex_search_works(
            query=paper['title'],
            limit=1
        )
        paper['citation_count'] = citations[0].get('cited_by_count', 0) if citations else 0
    
    return {
        'pubmed': pubmed,
        'preprints': biorxiv + medrxiv,
        'key_papers_with_citations': key_papers
    }

搜索查询示例:

undefined

Gene + variant specific

基因+变异特异性查询

"{GENE} AND ({HGVS_p} OR {AA_change})"

Functional studies

功能研究查询

"{GENE} AND (functional OR functional study OR mutagenesis)"

Clinical reports

临床报告查询

"{GENE} AND (case report OR patient) AND {phenotype}"

Preprint-specific

预印本特定查询

"{GENE} genetics 2024" (for recent preprints)


**⚠️ Preprint Warning**: Always flag preprints as NOT peer-reviewed in reports.

**Evidence Types**:
| Evidence | ACMG Code | Weight |
|----------|-----------|--------|
| Functional study (null) | PS3 | Strong |
| Functional study (reduced) | PS3_Moderate | Moderate |
| Case reports with segregation | PP1 | Supporting to Moderate |
| Co-occurrence with pathogenic | BP2 | Supporting against |

"{GENE} genetics 2024" (用于最新预印本)


**⚠️ 预印本提示**: 报告中需始终标注预印本为**未经过同行评审**。

**证据类型**:
| 证据 | ACMG代码 | 权重 |
|----------|-----------|--------|
| 功能研究（无效） | PS3 | 强 |
| 功能研究（功能降低） | PS3_Moderate | 中 |
| 带有分离数据的病例报告 | PP1 | 支持性到中等 |
| 与致病性变异共现 | BP2 | 反向支持 |

Phase 6: ACMG Classification

阶段6：ACMG分类

Goal: Systematic classification with explicit evidence

ACMG Evidence Codes:

Pathogenic:

Code	Strength	Description
PVS1	Very Strong	Null variant in gene where LOF is mechanism
PS1	Strong	Same amino acid change as known pathogenic
PS3	Strong	Well-established functional studies
PM1	Moderate	Mutational hot spot / functional domain
PM2	Moderate	Absent from controls
PM5	Moderate	Different missense at same residue as pathogenic
PP3	Supporting	Multiple computational predictions
PP5	Supporting	Reputable source reports pathogenic

Benign:

Code	Strength	Description
BA1	Stand-alone	MAF >5%
BS1	Strong	MAF greater than expected
BS3	Strong	Functional studies show no effect
BP4	Supporting	Multiple computational predictions benign
BP7	Supporting	Synonymous with no splice impact

Classification Algorithm:

Classification	Evidence Required
Pathogenic	1 Very Strong + 1 Strong; OR 2 Strong; OR 1 Strong + 3 Moderate
Likely Pathogenic	1 Very Strong + 1 Moderate; OR 1 Strong + 2 Moderate; OR 1 Strong + 2 Supporting
Likely Benign	1 Strong + 1 Supporting; OR 2 Supporting
Benign	1 Stand-alone; OR 2 Strong
VUS	Criteria not met

目标：基于明确证据进行系统性分类

ACMG证据代码:

致病性:

代码	强度	描述
PVS1	极强	基因功能缺失为致病机制时的无义变异
PS1	强	与已知致病性变异的氨基酸改变相同
PS3	强	成熟的功能研究支持
PM1	中	突变热点 / 功能结构域
PM2	中	对照人群中未检出
PM5	中	同一残基的不同错义变异为致病性
PP3	支持性	多个计算预测结果一致
PP5	支持性	权威来源报告为致病性

良性:

代码	强度	描述
BA1	独立	次要等位基因频率>5%
BS1	强	频率高于预期
BS3	强	功能研究显示无影响
BP4	支持性	多个计算预测结果为良性
BP7	支持性	同义变异且无剪接影响

分类算法:

分类	所需证据
致病性	1个极强 +1个强；或2个强；或1个强+3个中
可能致病性	1个极强+1个中；或1个强+2个中；或1个强+2个支持性
可能良性	1个强+1个支持性；或2个支持性
良性	1个独立；或2个强
VUS	未满足上述标准

Output Structure

输出结构

Report Sections

报告章节

markdown

undefined

markdown

undefined

Variant Interpretation Report: {GENE} {VARIANT}

变异解读报告: {GENE} {VARIANT}

Executive Summary

执行摘要

Variant: {HGVS notation}
Gene: {gene symbol}
Classification: {Pathogenic/Likely Pathogenic/VUS/Likely Benign/Benign}
Evidence Strength: {strong/moderate/limited}
Key Finding: {one-sentence summary}

变异: {HGVS命名}
基因: {基因符号}
分类: {致病性/可能致病性/VUS/可能良性/良性}
证据强度: {强/中/有限}
关键发现: {一句话总结}

1. Variant Identity

1. 变异识别

{gene, transcript, protein change, consequence}

{基因、转录本、蛋白质变化、变异类型}

2. Population Data

2. 人群数据

{gnomAD frequencies, ancestry breakdown}

{gnomAD频率、祖先细分数据}

3. Clinical Database Evidence

3. 临床数据库证据

{ClinVar, ClinGen, OMIM}

{ClinVar、ClinGen、OMIM结果}

4. Computational Predictions

4. 计算预测结果

{SIFT, PolyPhen, CADD scores}

{SIFT、PolyPhen、CADD评分}

5. Structural Analysis

5. 结构分析

{Domain location, functional site proximity, AlphaFold confidence}

{结构域位置、功能位点距离、AlphaFold置信度}

6. Literature Evidence

6. 文献证据

{Functional studies, case reports}

{功能研究、病例报告}

7. ACMG Classification

7. ACMG分类

{Evidence codes applied, classification rationale}

{应用的证据代码、分类依据}

8. Clinical Recommendations

8. 临床建议

{Testing, management, family screening}

{检测、管理、家族筛查}

9. Limitations & Uncertainties

9. 局限性与不确定性

{Missing data, conflicting evidence}

{缺失数据、冲突证据}

Data Sources

数据来源

{All tools and databases queried}

---

{所有查询的工具和数据库}

---

Evidence Grading

证据分级

Classification Confidence

分类置信度

Symbol	Classification	Evidence Level
★★★	High confidence	Multiple independent lines
★★☆	Moderate confidence	Some supporting evidence
★☆☆	Limited confidence	Minimal evidence
VUS	Uncertain	Insufficient data

符号	分类	证据等级
★★★	高置信度	多个独立证据链
★★☆	中置信度	部分支持性证据
★☆☆	有限置信度	少量证据
VUS	意义未明	数据不足

Structural Impact Confidence

结构影响置信度

pLDDT Range	Interpretation
>90	Very high confidence in position
70-90	High confidence
50-70	Moderate (often loops)
<50	Low confidence (disorder)

pLDDT范围	解读
>90	位置置信度极高
70-90	高置信度
50-70	中置信度（常为环区）
<50	低置信度（无序区）

Special Scenarios

特殊场景

Scenario 1: Novel Missense VUS

场景1：新型错义VUS

Additional workflow:

Check if other pathogenic variants at same residue
Get AlphaFold2 structure
Analyze:
- Is residue buried or surface?
- What secondary structure?
- Proximity to active/binding sites?
- Conservation across species?
Apply PM1 if in functional domain
Apply PP3 if predictions concordant

额外工作流:

检查同一残基是否存在其他致病性变异
获取AlphaFold2结构
分析:
- 残基是埋藏还是表面？
- 二级结构类型？
- 与活性/结合位点的距离？
- 跨物种保守性？
若位于功能结构域，应用PM1
若预测结果一致，应用PP3

Scenario 2: Truncating Variant

场景2：截短变异

Additional workflow:

Check if LOF is mechanism for gene
Determine if escapes NMD (last exon)
Check for alternative isoforms
Review ClinGen LOF curation

PVS1 Application:

Scenario	PVS1 Strength
Canonical LOF gene, NMD predicted	Very Strong
LOF gene, last exon	Moderate
Non-LOF gene	Not applicable

额外工作流:

检查基因的致病机制是否为功能缺失
判断是否逃逸无义介导的降解（最后一个外显子）
检查是否存在可变剪接体
回顾ClinGen的功能缺失分类

PVS1应用规则:

场景	PVS1强度
标准功能缺失基因，预测会发生NMD	极强
功能缺失基因，位于最后一个外显子	中
非功能缺失基因	不适用

Scenario 3: Splice Variant

场景3：剪接变异

Additional workflow:

Check SpliceAI scores (if available)
Determine canonical splice site distance
Review for in-frame skipping potential
Check for cryptic splice activation

额外工作流:

检查SpliceAI评分（若可用）
确定与经典剪接位点的距离
评估框内跳跃的可能性
检查是否激活隐蔽剪接位点

Quantified Minimums

量化最低要求

Section	Requirement
Population frequency	gnomAD overall + ≥3 ancestry groups
Predictions	≥3 computational predictors
Literature search	≥2 search strategies
ACMG codes	All applicable codes listed

章节	要求
人群频率	gnomAD总频率 + ≥3个祖先群体数据
预测结果	≥3个计算预测工具
文献搜索	≥2种搜索策略
ACMG代码	列出所有适用代码

NVIDIA NIM Integration

NVIDIA NIM集成

When to Use AlphaFold2 for Variants

AlphaFold2在变异分析中的适用场景

Use Case: VUS missense variants where structural context aids interpretation

Workflow:

python

undefined

适用场景：结构背景有助于解读的VUS错义变异

工作流:

python

undefined

1. Get protein sequence

1. 获取蛋白质序列

protein_seq = tu.tools.UniProt_get_protein_sequence(accession=uniprot_id)

2. Get/predict structure

2. 获取/预测结构

try: pdb_hits = tu.tools.PDB_search_by_uniprot(uniprot_id=uniprot_id) structure = tu.tools.PDB_get_structure(pdb_id=pdb_hits[0]['pdb_id']) except: # Predict with AlphaFold2 structure = tu.tools.NvidiaNIM_alphafold2( sequence=protein_seq['sequence'], algorithm="mmseqs2" )

3. Analyze variant position

3. 分析变异位置

- Extract pLDDT at residue position

- 提取残基位置的pLDDT

- Calculate solvent accessibility

- 计算溶剂可及性

- Check for nearby functional sites

- 检查附近的功能位点


**Structural Features to Report**:
- pLDDT at variant position
- Secondary structure (helix/sheet/coil)
- Solvent accessibility (buried/exposed)
- Distance to active site (if applicable)
- Interactions disrupted (H-bonds, salt bridges)

---


**需报告的结构特征**:
- 变异位置的pLDDT
- 二级结构（螺旋/片层/卷曲）
- 溶剂可及性（埋藏/暴露）
- 与活性位点的距离（若适用）
- 被破坏的相互作用（氢键、盐桥）

---

Report File Naming

报告文件命名规则

{GENE}_{VARIANT}_interpretation_report.md

Examples:
BRCA1_c.5266dupC_interpretation_report.md
TP53_p.R273H_interpretation_report.md

{GENE}_{VARIANT}_interpretation_report.md

示例:
BRCA1_c.5266dupC_interpretation_report.md
TP53_p.R273H_interpretation_report.md

Clinical Recommendations Framework

临床建议框架

For Pathogenic/Likely Pathogenic

致病性/可能致病性变异

Disease Context	Recommendations
Cancer predisposition	Enhanced screening, risk-reducing options
Pharmacogenomics	Drug dosing adjustment
Carrier status	Reproductive counseling
Predictive testing	Family cascade screening

疾病背景	建议
癌症易感	加强筛查、风险降低方案
药物基因组学	调整药物剂量
携带者状态	生殖咨询
预测性检测	家族级联筛查

For VUS

VUS

Action	Details
Clinical management	Do not use for medical decisions
Follow-up	Reinterpret in 1-2 years
Research	Functional studies if available
Family	Segregation data valuable

行动	详情
临床管理	不用于医疗决策
随访	1-2年后重新解读
研究	若有条件进行功能研究
家族	分离数据具有价值

For Benign/Likely Benign

良性/可能良性变异

Action	Details
Clinical	Not expected to cause disease
Family	No cascade testing needed
Documentation	Include in report for completeness

行动	详情
临床	预期不会致病
家族	无需级联检测
文档	纳入报告以保证完整性

参考文档

```
CHECKLIST.md
```
- Pre-delivery verification
```
EXAMPLES.md
```
- Sample interpretations
```
TOOLS_REFERENCE.md
```
- Tool parameters and fallbacks

```
CHECKLIST.md
```
- 交付前验证清单
```
EXAMPLES.md
```
- 解读示例
```
TOOLS_REFERENCE.md
```
- 工具参数与备选方案