tooluniverse-gwas-finemapping

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GWAS Fine-Mapping & Causal Variant Prioritization

GWAS精细定位与因果变异优先排序

Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions.
利用统计精细定位和位点-基因预测来识别并优先排序GWAS位点中的因果变异。

Overview

概述

Genome-wide association studies (GWAS) identify genomic regions associated with traits, but linkage disequilibrium (LD) makes it difficult to pinpoint the causal variant. Fine-mapping uses Bayesian statistical methods to compute the posterior probability that each variant is causal, given the GWAS summary statistics.
This skill provides tools to:
  • Prioritize causal variants using fine-mapping posterior probabilities
  • Link variants to genes using locus-to-gene (L2G) predictions
  • Annotate variants with functional consequences
  • Suggest validation strategies based on fine-mapping results
全基因组关联研究(GWAS)识别与性状相关的基因组区域,但连锁不平衡(LD)使得准确定位因果变异变得困难。精细定位利用贝叶斯统计方法,结合GWAS汇总统计数据计算每个变异为因果变异的后验概率。
本技能提供以下工具:
  • 优先排序因果变异:利用精细定位后验概率
  • 关联变异与基因:利用位点-基因(L2G)预测
  • 注释变异:标注功能影响
  • 提出验证策略:基于精细定位结果

Key Concepts

核心概念

Credible Sets

可信集

A credible set is a minimal set of variants that contains the causal variant with high confidence (typically 95% or 99%). Each variant in the set has a posterior probability of being causal, computed using methods like:
  • SuSiE (Sum of Single Effects)
  • FINEMAP (Bayesian fine-mapping)
  • PAINTOR (Probabilistic Annotation INtegraTOR)
可信集是指以高置信度(通常为95%或99%)包含因果变异的最小变异集合。集合中的每个变异都有一个后验概率,表示其为因果变异的可能性,计算方法包括:
  • SuSiE(Sum of Single Effects)
  • FINEMAP(贝叶斯精细定位)
  • PAINTOR(Probabilistic Annotation INtegraTOR)

Posterior Probability

后验概率

The probability that a specific variant is causal, given the GWAS data and LD structure. Higher posterior probability = more likely to be causal.
在给定GWAS数据和LD结构的情况下,某个特定变异为因果变异的概率。后验概率越高,该变异为因果变异的可能性越大。

Locus-to-Gene (L2G) Predictions

位点-基因(L2G)预测

L2G scores integrate multiple data types to predict which gene is affected by a variant:
  • Distance to gene (closer = higher score)
  • eQTL evidence (expression changes)
  • Chromatin interactions (Hi-C, promoter capture)
  • Functional annotations (coding variants, regulatory regions)
L2G scores range from 0 to 1, with higher scores indicating stronger gene-variant links.
L2G评分整合多种数据类型,预测变异影响的基因:
  • 与基因的距离(越近评分越高)
  • eQTL证据(表达变化)
  • 染色质相互作用(Hi-C、启动子捕获)
  • 功能注释(编码变异、调控区域)
L2G评分范围为0到1,评分越高表示基因与变异的关联越强。

Use Cases

应用场景

1. Prioritize Variants at a Known Locus

1. 优先排序已知位点的变异

Question: "Which variant at the TCF7L2 locus is likely causal for type 2 diabetes?"
python
from python_implementation import prioritize_causal_variants
问题:"TCF7L2位点中哪个变异可能是2型糖尿病的因果变异?"
python
from python_implementation import prioritize_causal_variants

Prioritize variants in TCF7L2 for diabetes

Prioritize variants in TCF7L2 for diabetes

result = prioritize_causal_variants("TCF7L2", "type 2 diabetes") print(result.get_summary())
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes") print(result.get_summary())

Output shows:

Output shows:

- Credible sets containing TCF7L2 variants

- Credible sets containing TCF7L2 variants

- Posterior probabilities (via fine-mapping methods)

- Posterior probabilities (via fine-mapping methods)

- Top L2G genes (which genes are likely affected)

- Top L2G genes (which genes are likely affected)

- Associated traits

- Associated traits

undefined
undefined

2. Fine-Map a Specific Variant

2. 精细定位特定变异

Question: "What do we know about rs429358 (APOE4) from fine-mapping?"
python
undefined
问题:"从精细定位结果来看,rs429358(APOE4)有哪些信息?"
python
undefined

Fine-map a specific variant

Fine-map a specific variant

result = prioritize_causal_variants("rs429358")
result = prioritize_causal_variants("rs429358")

Check which credible sets contain this variant

Check which credible sets contain this variant

for cs in result.credible_sets: print(f"Trait: {cs.trait}") print(f"Fine-mapping method: {cs.finemapping_method}") print(f"Top gene: {cs.l2g_genes[0] if cs.l2g_genes else 'N/A'}") print(f"Confidence: {cs.confidence}")
undefined
for cs in result.credible_sets: print(f"Trait: {cs.trait}") print(f"Fine-mapping method: {cs.finemapping_method}") print(f"Top gene: {cs.l2g_genes[0] if cs.l2g_genes else 'N/A'}") print(f"Confidence: {cs.confidence}")
undefined

3. Explore All Loci from a GWAS Study

3. 探索GWAS研究的所有位点

Question: "What are all the causal loci from the recent T2D meta-analysis?"
python
from python_implementation import get_credible_sets_for_study
问题:"最新2型糖尿病荟萃分析中的所有因果位点有哪些?"
python
from python_implementation import get_credible_sets_for_study

Get all fine-mapped loci from a study

Get all fine-mapped loci from a study

credible_sets = get_credible_sets_for_study("GCST90029024") # T2D GWAS
print(f"Found {len(credible_sets)} independent loci")
credible_sets = get_credible_sets_for_study("GCST90029024") # T2D GWAS
print(f"Found {len(credible_sets)} independent loci")

Examine each locus

Examine each locus

for cs in credible_sets: print(f"\nRegion: {cs.region}") print(f"Lead variant: {cs.lead_variant.rs_ids[0] if cs.lead_variant else 'N/A'}")
if cs.l2g_genes:
    top_gene = cs.l2g_genes[0]
    print(f"Most likely causal gene: {top_gene.gene_symbol} (L2G: {top_gene.l2g_score:.3f})")
undefined
for cs in credible_sets: print(f"\nRegion: {cs.region}") print(f"Lead variant: {cs.lead_variant.rs_ids[0] if cs.lead_variant else 'N/A'}")
if cs.l2g_genes:
    top_gene = cs.l2g_genes[0]
    print(f"Most likely causal gene: {top_gene.gene_symbol} (L2G: {top_gene.l2g_score:.3f})")
undefined

4. Find GWAS Studies for a Disease

4. 查找疾病相关的GWAS研究

Question: "What GWAS studies exist for Alzheimer's disease?"
python
from python_implementation import search_gwas_studies_for_disease
问题:"有哪些针对阿尔茨海默病的GWAS研究?"
python
from python_implementation import search_gwas_studies_for_disease

Search by disease name

Search by disease name

studies = search_gwas_studies_for_disease("Alzheimer's disease")
for study in studies[:5]: print(f"{study['id']}: {study.get('nSamples', 'N/A')} samples") print(f" Author: {study.get('publicationFirstAuthor', 'N/A')}") print(f" Has summary stats: {study.get('hasSumstats', False)}")
studies = search_gwas_studies_for_disease("Alzheimer's disease")
for study in studies[:5]: print(f"{study['id']}: {study.get('nSamples', 'N/A')} samples") print(f" Author: {study.get('publicationFirstAuthor', 'N/A')}") print(f" Has summary stats: {study.get('hasSumstats', False)}")

Or use precise disease ontology IDs

Or use precise disease ontology IDs

studies = search_gwas_studies_for_disease( "Alzheimer's disease", disease_id="EFO_0000249" # EFO ID for Alzheimer's )
undefined
studies = search_gwas_studies_for_disease( "Alzheimer's disease", disease_id="EFO_0000249" # EFO ID for Alzheimer's )
undefined

5. Get Validation Suggestions

5. 获取验证建议

Question: "How should we validate the top causal variant?"
python
result = prioritize_causal_variants("APOE", "alzheimer")
问题:"我们应该如何验证排名靠前的因果变异?"
python
result = prioritize_causal_variants("APOE", "alzheimer")

Get experimental validation suggestions

Get experimental validation suggestions

suggestions = result.get_validation_suggestions() for suggestion in suggestions: print(suggestion)
suggestions = result.get_validation_suggestions() for suggestion in suggestions: print(suggestion)

Output includes:

Output includes:

- CRISPR knock-in experiments

- CRISPR knock-in experiments

- Reporter assays

- Reporter assays

- eQTL analysis

- eQTL analysis

- Colocalization studies

- Colocalization studies

undefined
undefined

Workflow Example: Complete Fine-Mapping Analysis

工作流示例:完整精细定位分析

python
from python_implementation import (
    prioritize_causal_variants,
    search_gwas_studies_for_disease,
    get_credible_sets_for_study
)
python
from python_implementation import (
    prioritize_causal_variants,
    search_gwas_studies_for_disease,
    get_credible_sets_for_study
)

Step 1: Find relevant GWAS studies

Step 1: Find relevant GWAS studies

print("Step 1: Finding T2D GWAS studies...") studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148") largest_study = max(studies, key=lambda s: s.get('nSamples', 0) or 0) print(f"Largest study: {largest_study['id']} ({largest_study.get('nSamples', 'N/A')} samples)")
print("Step 1: Finding T2D GWAS studies...") studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148") largest_study = max(studies, key=lambda s: s.get('nSamples', 0) or 0) print(f"Largest study: {largest_study['id']} ({largest_study.get('nSamples', 'N/A')} samples)")

Step 2: Get all fine-mapped loci from the study

Step 2: Get all fine-mapped loci from the study

print("\nStep 2: Getting fine-mapped loci...") credible_sets = get_credible_sets_for_study(largest_study['id'], max_sets=100) print(f"Found {len(credible_sets)} credible sets")
print("\nStep 2: Getting fine-mapped loci...") credible_sets = get_credible_sets_for_study(largest_study['id'], max_sets=100) print(f"Found {len(credible_sets)} credible sets")

Step 3: Find loci near genes of interest

Step 3: Find loci near genes of interest

print("\nStep 3: Finding TCF7L2 loci...") tcf7l2_loci = [ cs for cs in credible_sets if any(gene.gene_symbol == "TCF7L2" for gene in cs.l2g_genes) ]
print(f"TCF7L2 appears in {len(tcf7l2_loci)} loci")
print("\nStep 3: Finding TCF7L2 loci...") tcf7l2_loci = [ cs for cs in credible_sets if any(gene.gene_symbol == "TCF7L2" for gene in cs.l2g_genes) ]
print(f"TCF7L2 appears in {len(tcf7l2_loci)} loci")

Step 4: Prioritize variants at TCF7L2

Step 4: Prioritize variants at TCF7L2

print("\nStep 4: Prioritizing TCF7L2 variants...") result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
print("\nStep 4: Prioritizing TCF7L2 variants...") result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")

Step 5: Print summary and validation plan

Step 5: Print summary and validation plan

print("\n" + "="*60) print("FINE-MAPPING SUMMARY") print("="*60) print(result.get_summary())
print("\n" + "="*60) print("VALIDATION STRATEGY") print("="*60) suggestions = result.get_validation_suggestions() for suggestion in suggestions: print(suggestion)
undefined
print("\n" + "="*60) print("FINE-MAPPING SUMMARY") print("="*60) print(result.get_summary())
print("\n" + "="*60) print("VALIDATION STRATEGY") print("="*60) suggestions = result.get_validation_suggestions() for suggestion in suggestions: print(suggestion)
undefined

Data Classes

数据类

FineMappingResult

FineMappingResult

Main result object containing:
  • query_variant
    : Variant annotation
  • query_gene
    : Gene symbol (if queried by gene)
  • credible_sets
    : List of fine-mapped loci
  • associated_traits
    : All associated traits
  • top_causal_genes
    : L2G genes ranked by score
Methods:
  • get_summary()
    : Human-readable summary
  • get_validation_suggestions()
    : Experimental validation strategies
主要结果对象,包含:
  • query_variant
    :变异注释
  • query_gene
    :基因符号(如果按基因查询)
  • credible_sets
    :精细定位位点列表
  • associated_traits
    :所有相关性状
  • top_causal_genes
    :按评分排序的L2G基因
方法:
  • get_summary()
    :人类可读的摘要
  • get_validation_suggestions()
    :实验验证策略

CredibleSet

CredibleSet

Represents a fine-mapped locus:
  • study_locus_id
    : Unique identifier
  • region
    : Genomic region (e.g., "10:112861809-113404438")
  • lead_variant
    : Top variant by posterior probability
  • finemapping_method
    : Statistical method used (SuSiE, FINEMAP, etc.)
  • l2g_genes
    : Locus-to-gene predictions
  • confidence
    : Credible set confidence (95%, 99%)
表示精细定位的位点:
  • study_locus_id
    :唯一标识符
  • region
    :基因组区域(例如:"10:112861809-113404438")
  • lead_variant
    :后验概率最高的变异
  • finemapping_method
    :使用的统计方法(SuSiE、FINEMAP等)
  • l2g_genes
    :位点-基因预测结果
  • confidence
    :可信集置信度(95%、99%)

L2GGene

L2GGene

Locus-to-gene prediction:
  • gene_symbol
    : Gene name (e.g., "TCF7L2")
  • gene_id
    : Ensembl gene ID
  • l2g_score
    : Probability score (0-1)
位点-基因预测结果:
  • gene_symbol
    :基因名称(例如:"TCF7L2")
  • gene_id
    :Ensembl基因ID
  • l2g_score
    :概率评分(0-1)

VariantAnnotation

VariantAnnotation

Functional annotation for a variant:
  • variant_id
    : Open Targets format (chr_pos_ref_alt)
  • rs_ids
    : dbSNP identifiers
  • chromosome
    ,
    position
    : Genomic coordinates
  • most_severe_consequence
    : Functional impact
  • allele_frequencies
    : Population-specific MAFs
变异的功能注释:
  • variant_id
    :Open Targets格式(chr_pos_ref_alt)
  • rs_ids
    :dbSNP标识符
  • chromosome
    ,
    position
    :基因组坐标
  • most_severe_consequence
    :功能影响程度
  • allele_frequencies
    :人群特异性次要等位基因频率

Tools Used

使用的工具

Open Targets Genetics (GraphQL)

Open Targets Genetics (GraphQL)

  • OpenTargets_get_variant_info
    : Variant details and allele frequencies
  • OpenTargets_get_variant_credible_sets
    : Credible sets containing a variant
  • OpenTargets_get_credible_set_detail
    : Detailed credible set information
  • OpenTargets_get_study_credible_sets
    : All loci from a GWAS study
  • OpenTargets_search_gwas_studies_by_disease
    : Find studies by disease
  • OpenTargets_get_variant_info
    :变异详情和等位基因频率
  • OpenTargets_get_variant_credible_sets
    :包含该变异的可信集
  • OpenTargets_get_credible_set_detail
    :可信集详细信息
  • OpenTargets_get_study_credible_sets
    :GWAS研究中的所有位点
  • OpenTargets_search_gwas_studies_by_disease
    :按疾病查找研究

GWAS Catalog (REST API)

GWAS Catalog (REST API)

  • gwas_search_snps
    : Find SNPs by gene or rsID
  • gwas_get_snp_by_id
    : Detailed SNP information
  • gwas_get_associations_for_snp
    : All trait associations for a variant
  • gwas_search_studies
    : Find studies by disease/trait
  • gwas_search_snps
    :按基因或rsID查找SNP
  • gwas_get_snp_by_id
    :SNP详细信息
  • gwas_get_associations_for_snp
    :变异的所有性状关联
  • gwas_search_studies
    :按疾病/性状查找研究

Understanding Fine-Mapping Output

解读精细定位结果

Interpreting Posterior Probabilities

解读后验概率

  • > 0.5: Very likely causal (strong candidate)
  • 0.1 - 0.5: Plausible causal variant
  • 0.01 - 0.1: Possible but uncertain
  • < 0.01: Unlikely to be causal
  • > 0.5:极有可能为因果变异(强候选)
  • 0.1 - 0.5:可能的因果变异
  • 0.01 - 0.1:有可能但不确定
  • < 0.01:不太可能为因果变异

Interpreting L2G Scores

解读L2G评分

  • > 0.7: High confidence gene-variant link
  • 0.5 - 0.7: Moderate confidence
  • 0.3 - 0.5: Weak but possible link
  • < 0.3: Low confidence
  • > 0.7:基因-变异关联置信度高
  • 0.5 - 0.7:基因-变异关联置信度中等
  • 0.3 - 0.5:基因-变异关联较弱但可能存在
  • < 0.3:基因-变异关联置信度低

Fine-Mapping Methods Compared

精细定位方法对比

MethodApproachStrengthsUse Case
SuSiESum of Single EffectsHandles multiple causal variantsMulti-signal loci
FINEMAPBayesian shotgun stochastic searchFast, scalableLarge studies
PAINTORFunctional annotationsIntegrates epigenomicsRegulatory variants
CAVIARColocalizationFinds shared causal variantseQTL overlap
方法方法思路优势适用场景
SuSiESum of Single Effects处理多个因果变异多信号位点
FINEMAP贝叶斯随机搜索快速、可扩展大型研究
PAINTOR功能注释整合整合表观基因组学数据调控变异
CAVIAR共定位分析寻找共享因果变异eQTL重叠区域

Common Questions

常见问题

Q: Why don't all variants have credible sets? A: Fine-mapping requires:
  1. GWAS summary statistics (not just top hits)
  2. LD reference panel
  3. Sufficient signal strength (p < 5e-8)
  4. Computational resources
Q: Can a variant be in multiple credible sets? A: Yes! A variant can be causal for multiple traits (pleiotropy) or appear in different studies for the same trait.
Q: What if the top L2G gene is far from the variant? A: This suggests regulatory effects (enhancers, promoters). Check:
  • eQTL evidence in relevant tissues
  • Chromatin interaction data (Hi-C)
  • Regulatory element annotations (Roadmap, ENCODE)
Q: How do I choose between variants in a credible set? A: Prioritize by:
  1. Posterior probability (higher = better)
  2. Functional consequence (coding > regulatory > intergenic)
  3. eQTL evidence
  4. Evolutionary conservation
  5. Experimental feasibility
问:为什么不是所有变异都有可信集? 答:精细定位需要满足以下条件:
  1. GWAS汇总统计数据(不仅仅是显著位点)
  2. LD参考面板
  3. 足够的信号强度(p < 5e-8)
  4. 计算资源
问:一个变异可以属于多个可信集吗? 答:可以!一个变异可能是多个性状的因果变异(多效性),或者在同一性状的不同研究中出现。
问:如果排名靠前的L2G基因与变异距离较远怎么办? 答:这表明可能存在调控效应(增强子、启动子)。可以检查:
  • 相关组织中的eQTL证据
  • 染色质相互作用数据(Hi-C)
  • 调控元件注释(Roadmap、ENCODE)
问:如何在可信集中的变异中进行选择? 答:按以下优先级排序:
  1. 后验概率(越高越好)
  2. 功能影响(编码变异 > 调控变异 > 基因间变异)
  3. eQTL证据
  4. 进化保守性
  5. 实验可行性

Limitations

局限性

  1. LD-dependent: Fine-mapping accuracy depends on LD structure matching the study population
  2. Requires summary stats: Not all studies provide full summary statistics
  3. Computational intensive: Fine-mapping large studies takes significant resources
  4. Prior assumptions: Bayesian methods depend on priors (number of causal variants, effect sizes)
  5. Missing data: Not all GWAS loci have been fine-mapped in Open Targets
  1. 依赖LD:精细定位的准确性取决于LD结构与研究人群的匹配度
  2. 需要汇总统计数据:并非所有研究都提供完整的汇总统计数据
  3. 计算密集:大型研究的精细定位需要大量计算资源
  4. 先验假设:贝叶斯方法依赖先验假设(因果变异数量、效应大小)
  5. 数据缺失:并非所有GWAS位点都已在Open Targets中完成精细定位

Best Practices

最佳实践

  1. Start with study-level queries when exploring a new disease
  2. Check multiple studies for replication of signals
  3. Combine with functional data (eQTLs, chromatin, CRISPR screens)
  4. Consider ancestry - LD differs across populations
  5. Validate experimentally - fine-mapping provides candidates, not proof
  1. 探索新疾病时从研究层面查询开始
  2. 检查多项研究以验证信号的重复性
  3. 结合功能数据(eQTL、染色质、CRISPR筛选)
  4. 考虑祖先群体 - 不同人群的LD存在差异
  5. 进行实验验证 - 精细定位仅提供候选,而非结论

References

参考文献

  1. Wang et al. (2020) "A simple new approach to variable selection in regression, with application to genetic fine mapping." JRSS-B (SuSiE)
  2. Benner et al. (2016) "FINEMAP: efficient variable selection using summary data from genome-wide association studies." Bioinformatics
  3. Ghoussaini et al. (2021) "Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics." NAR
  4. Mountjoy et al. (2021) "An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci." Nat Genet
  1. Wang et al. (2020) "A simple new approach to variable selection in regression, with application to genetic fine mapping." JRSS-B (SuSiE)
  2. Benner et al. (2016) "FINEMAP: efficient variable selection using summary data from genome-wide association studies." Bioinformatics
  3. Ghoussaini et al. (2021) "Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics." NAR
  4. Mountjoy et al. (2021) "An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci." Nat Genet

Related Skills

相关技能

  • tooluniverse-gwas-explorer: Broader GWAS analysis
  • tooluniverse-eqtl-colocalization: Link variants to gene expression
  • tooluniverse-gene-prioritization: Systematic gene ranking
  • tooluniverse-gwas-explorer:更全面的GWAS分析
  • tooluniverse-eqtl-colocalization:关联变异与基因表达
  • tooluniverse-gene-prioritization:系统的基因排序