tooluniverse-gwas-finemapping

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

GWAS Fine-Mapping & Causal Variant Prioritization

GWAS精细定位与因果变异优先排序

Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions.

利用统计精细定位和位点-基因预测来识别并优先排序GWAS位点中的因果变异。

Overview

概述

Genome-wide association studies (GWAS) identify genomic regions associated with traits, but linkage disequilibrium (LD) makes it difficult to pinpoint the causal variant. Fine-mapping uses Bayesian statistical methods to compute the posterior probability that each variant is causal, given the GWAS summary statistics.

This skill provides tools to:

Prioritize causal variants using fine-mapping posterior probabilities
Link variants to genes using locus-to-gene (L2G) predictions
Annotate variants with functional consequences
Suggest validation strategies based on fine-mapping results

全基因组关联研究（GWAS）识别与性状相关的基因组区域，但连锁不平衡（LD）使得准确定位因果变异变得困难。精细定位利用贝叶斯统计方法，结合GWAS汇总统计数据计算每个变异为因果变异的后验概率。

本技能提供以下工具：

优先排序因果变异：利用精细定位后验概率
关联变异与基因：利用位点-基因（L2G）预测
注释变异：标注功能影响
提出验证策略：基于精细定位结果

Key Concepts

核心概念

Credible Sets

可信集

A credible set is a minimal set of variants that contains the causal variant with high confidence (typically 95% or 99%). Each variant in the set has a posterior probability of being causal, computed using methods like:

SuSiE (Sum of Single Effects)
FINEMAP (Bayesian fine-mapping)
PAINTOR (Probabilistic Annotation INtegraTOR)

可信集是指以高置信度（通常为95%或99%）包含因果变异的最小变异集合。集合中的每个变异都有一个后验概率，表示其为因果变异的可能性，计算方法包括：

SuSiE（Sum of Single Effects）
FINEMAP（贝叶斯精细定位）
PAINTOR（Probabilistic Annotation INtegraTOR）

Posterior Probability

后验概率

The probability that a specific variant is causal, given the GWAS data and LD structure. Higher posterior probability = more likely to be causal.

在给定GWAS数据和LD结构的情况下，某个特定变异为因果变异的概率。后验概率越高，该变异为因果变异的可能性越大。

Locus-to-Gene (L2G) Predictions

位点-基因（L2G）预测

L2G scores integrate multiple data types to predict which gene is affected by a variant:

Distance to gene (closer = higher score)
eQTL evidence (expression changes)
Chromatin interactions (Hi-C, promoter capture)
Functional annotations (coding variants, regulatory regions)

L2G scores range from 0 to 1, with higher scores indicating stronger gene-variant links.

L2G评分整合多种数据类型，预测变异影响的基因：

与基因的距离（越近评分越高）
eQTL证据（表达变化）
染色质相互作用（Hi-C、启动子捕获）
功能注释（编码变异、调控区域）

L2G评分范围为0到1，评分越高表示基因与变异的关联越强。

Use Cases

应用场景

1. Prioritize Variants at a Known Locus

1. 优先排序已知位点的变异

Question: "Which variant at the TCF7L2 locus is likely causal for type 2 diabetes?"

python

from python_implementation import prioritize_causal_variants

问题："TCF7L2位点中哪个变异可能是2型糖尿病的因果变异？"

python

from python_implementation import prioritize_causal_variants

Prioritize variants in TCF7L2 for diabetes

result = prioritize_causal_variants("TCF7L2", "type 2 diabetes") print(result.get_summary())

Output shows:

- Credible sets containing TCF7L2 variants

- Posterior probabilities (via fine-mapping methods)

- Top L2G genes (which genes are likely affected)

- Associated traits

undefined

undefined

2. Fine-Map a Specific Variant

2. 精细定位特定变异

Question: "What do we know about rs429358 (APOE4) from fine-mapping?"

python

undefined

问题："从精细定位结果来看，rs429358（APOE4）有哪些信息？"

python

undefined

Fine-map a specific variant

result = prioritize_causal_variants("rs429358")

Check which credible sets contain this variant

for cs in result.credible_sets: print(f"Trait: {cs.trait}") print(f"Fine-mapping method: {cs.finemapping_method}") print(f"Top gene: {cs.l2g_genes[0] if cs.l2g_genes else 'N/A'}") print(f"Confidence: {cs.confidence}")

undefined

undefined

3. Explore All Loci from a GWAS Study

3. 探索GWAS研究的所有位点

Question: "What are all the causal loci from the recent T2D meta-analysis?"

python

from python_implementation import get_credible_sets_for_study

问题："最新2型糖尿病荟萃分析中的所有因果位点有哪些？"

python

from python_implementation import get_credible_sets_for_study

Get all fine-mapped loci from a study

credible_sets = get_credible_sets_for_study("GCST90029024") # T2D GWAS

print(f"Found {len(credible_sets)} independent loci")

credible_sets = get_credible_sets_for_study("GCST90029024") # T2D GWAS

print(f"Found {len(credible_sets)} independent loci")

Examine each locus

for cs in credible_sets: print(f"\nRegion: {cs.region}") print(f"Lead variant: {cs.lead_variant.rs_ids[0] if cs.lead_variant else 'N/A'}")

if cs.l2g_genes:
    top_gene = cs.l2g_genes[0]
    print(f"Most likely causal gene: {top_gene.gene_symbol} (L2G: {top_gene.l2g_score:.3f})")

undefined

for cs in credible_sets: print(f"\nRegion: {cs.region}") print(f"Lead variant: {cs.lead_variant.rs_ids[0] if cs.lead_variant else 'N/A'}")

if cs.l2g_genes:
    top_gene = cs.l2g_genes[0]
    print(f"Most likely causal gene: {top_gene.gene_symbol} (L2G: {top_gene.l2g_score:.3f})")

undefined

4. Find GWAS Studies for a Disease

4. 查找疾病相关的GWAS研究

Question: "What GWAS studies exist for Alzheimer's disease?"

python

from python_implementation import search_gwas_studies_for_disease

问题："有哪些针对阿尔茨海默病的GWAS研究？"

python

from python_implementation import search_gwas_studies_for_disease

Search by disease name

studies = search_gwas_studies_for_disease("Alzheimer's disease")

for study in studies[:5]: print(f"{study['id']}: {study.get('nSamples', 'N/A')} samples") print(f" Author: {study.get('publicationFirstAuthor', 'N/A')}") print(f" Has summary stats: {study.get('hasSumstats', False)}")

studies = search_gwas_studies_for_disease("Alzheimer's disease")

Or use precise disease ontology IDs

studies = search_gwas_studies_for_disease( "Alzheimer's disease", disease_id="EFO_0000249" # EFO ID for Alzheimer's )

undefined

studies = search_gwas_studies_for_disease( "Alzheimer's disease", disease_id="EFO_0000249" # EFO ID for Alzheimer's )

undefined

5. Get Validation Suggestions

5. 获取验证建议

Question: "How should we validate the top causal variant?"

python

result = prioritize_causal_variants("APOE", "alzheimer")

问题："我们应该如何验证排名靠前的因果变异？"

python

result = prioritize_causal_variants("APOE", "alzheimer")

Get experimental validation suggestions

suggestions = result.get_validation_suggestions() for suggestion in suggestions: print(suggestion)

Output includes:

- CRISPR knock-in experiments

- Reporter assays

- eQTL analysis

- Colocalization studies

undefined

undefined

Workflow Example: Complete Fine-Mapping Analysis

工作流示例：完整精细定位分析

python

from python_implementation import (
    prioritize_causal_variants,
    search_gwas_studies_for_disease,
    get_credible_sets_for_study
)

python

from python_implementation import (
    prioritize_causal_variants,
    search_gwas_studies_for_disease,
    get_credible_sets_for_study
)

Step 1: Find relevant GWAS studies

print("Step 1: Finding T2D GWAS studies...") studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148") largest_study = max(studies, key=lambda s: s.get('nSamples', 0) or 0) print(f"Largest study: {largest_study['id']} ({largest_study.get('nSamples', 'N/A')} samples)")

Step 2: Get all fine-mapped loci from the study

print("\nStep 2: Getting fine-mapped loci...") credible_sets = get_credible_sets_for_study(largest_study['id'], max_sets=100) print(f"Found {len(credible_sets)} credible sets")

Step 3: Find loci near genes of interest

print("\nStep 3: Finding TCF7L2 loci...") tcf7l2_loci = [ cs for cs in credible_sets if any(gene.gene_symbol == "TCF7L2" for gene in cs.l2g_genes) ]

print(f"TCF7L2 appears in {len(tcf7l2_loci)} loci")

print("\nStep 3: Finding TCF7L2 loci...") tcf7l2_loci = [ cs for cs in credible_sets if any(gene.gene_symbol == "TCF7L2" for gene in cs.l2g_genes) ]

print(f"TCF7L2 appears in {len(tcf7l2_loci)} loci")

Step 4: Prioritize variants at TCF7L2

print("\nStep 4: Prioritizing TCF7L2 variants...") result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")

Step 5: Print summary and validation plan

print("\n" + "="*60) print("FINE-MAPPING SUMMARY") print("="*60) print(result.get_summary())

print("\n" + "="*60) print("VALIDATION STRATEGY") print("="*60) suggestions = result.get_validation_suggestions() for suggestion in suggestions: print(suggestion)

undefined

print("\n" + "="*60) print("FINE-MAPPING SUMMARY") print("="*60) print(result.get_summary())

print("\n" + "="*60) print("VALIDATION STRATEGY") print("="*60) suggestions = result.get_validation_suggestions() for suggestion in suggestions: print(suggestion)

undefined

Data Classes

数据类

FineMappingResult

FineMappingResult

Main result object containing:

```
query_variant
```
: Variant annotation
```
query_gene
```
: Gene symbol (if queried by gene)
```
credible_sets
```
: List of fine-mapped loci
```
associated_traits
```
: All associated traits
```
top_causal_genes
```
: L2G genes ranked by score

Methods:

```
get_summary()
```
: Human-readable summary
```
get_validation_suggestions()
```
: Experimental validation strategies

主要结果对象，包含：

```
query_variant
```
：变异注释
```
query_gene
```
：基因符号（如果按基因查询）
```
credible_sets
```
：精细定位位点列表
```
associated_traits
```
：所有相关性状
```
top_causal_genes
```
：按评分排序的L2G基因

方法：

```
get_summary()
```
：人类可读的摘要
```
get_validation_suggestions()
```
：实验验证策略

CredibleSet

CredibleSet

Represents a fine-mapped locus:

```
study_locus_id
```
: Unique identifier
```
region
```
: Genomic region (e.g., "10:112861809-113404438")
```
lead_variant
```
: Top variant by posterior probability
```
finemapping_method
```
: Statistical method used (SuSiE, FINEMAP, etc.)
```
l2g_genes
```
: Locus-to-gene predictions
```
confidence
```
: Credible set confidence (95%, 99%)

表示精细定位的位点：

```
study_locus_id
```
：唯一标识符
```
region
```
：基因组区域（例如："10:112861809-113404438"）
```
lead_variant
```
：后验概率最高的变异
```
finemapping_method
```
：使用的统计方法（SuSiE、FINEMAP等）
```
l2g_genes
```
：位点-基因预测结果
```
confidence
```
：可信集置信度（95%、99%）

L2GGene

L2GGene

Locus-to-gene prediction:

```
gene_symbol
```
: Gene name (e.g., "TCF7L2")
```
gene_id
```
: Ensembl gene ID
```
l2g_score
```
: Probability score (0-1)

位点-基因预测结果：

```
gene_symbol
```
：基因名称（例如："TCF7L2"）
```
gene_id
```
：Ensembl基因ID
```
l2g_score
```
：概率评分（0-1）

VariantAnnotation

VariantAnnotation

Functional annotation for a variant:

```
variant_id
```
: Open Targets format (chr_pos_ref_alt)
```
rs_ids
```
: dbSNP identifiers
```
chromosome
```
,
```
position
```
: Genomic coordinates
```
most_severe_consequence
```
: Functional impact
```
allele_frequencies
```
: Population-specific MAFs

变异的功能注释：

```
variant_id
```
：Open Targets格式（chr_pos_ref_alt）
```
rs_ids
```
：dbSNP标识符
```
chromosome
```
,
```
position
```
：基因组坐标
```
most_severe_consequence
```
：功能影响程度
```
allele_frequencies
```
：人群特异性次要等位基因频率

Tools Used

使用的工具

Open Targets Genetics (GraphQL)

```
OpenTargets_get_variant_info
```
: Variant details and allele frequencies
```
OpenTargets_get_variant_credible_sets
```
: Credible sets containing a variant
```
OpenTargets_get_credible_set_detail
```
: Detailed credible set information
```
OpenTargets_get_study_credible_sets
```
: All loci from a GWAS study

OpenTargets_search_gwas_studies_by_disease

: Find studies by disease

```
OpenTargets_get_variant_info
```
：变异详情和等位基因频率
```
OpenTargets_get_variant_credible_sets
```
：包含该变异的可信集
```
OpenTargets_get_credible_set_detail
```
：可信集详细信息
```
OpenTargets_get_study_credible_sets
```
：GWAS研究中的所有位点

OpenTargets_search_gwas_studies_by_disease

：按疾病查找研究

GWAS Catalog (REST API)

```
gwas_search_snps
```
: Find SNPs by gene or rsID
```
gwas_get_snp_by_id
```
: Detailed SNP information
```
gwas_get_associations_for_snp
```
: All trait associations for a variant
```
gwas_search_studies
```
: Find studies by disease/trait

```
gwas_search_snps
```
：按基因或rsID查找SNP
```
gwas_get_snp_by_id
```
：SNP详细信息
```
gwas_get_associations_for_snp
```
：变异的所有性状关联
```
gwas_search_studies
```
：按疾病/性状查找研究

Understanding Fine-Mapping Output

解读精细定位结果

Interpreting Posterior Probabilities

解读后验概率

> 0.5: Very likely causal (strong candidate)
0.1 - 0.5: Plausible causal variant
0.01 - 0.1: Possible but uncertain
< 0.01: Unlikely to be causal

> 0.5：极有可能为因果变异（强候选）
0.1 - 0.5：可能的因果变异
0.01 - 0.1：有可能但不确定
< 0.01：不太可能为因果变异

Interpreting L2G Scores

解读L2G评分

> 0.7: High confidence gene-variant link
0.5 - 0.7: Moderate confidence
0.3 - 0.5: Weak but possible link
< 0.3: Low confidence

> 0.7：基因-变异关联置信度高
0.5 - 0.7：基因-变异关联置信度中等
0.3 - 0.5：基因-变异关联较弱但可能存在
< 0.3：基因-变异关联置信度低

Fine-Mapping Methods Compared

精细定位方法对比

Method	Approach	Strengths	Use Case
SuSiE	Sum of Single Effects	Handles multiple causal variants	Multi-signal loci
FINEMAP	Bayesian shotgun stochastic search	Fast, scalable	Large studies
PAINTOR	Functional annotations	Integrates epigenomics	Regulatory variants
CAVIAR	Colocalization	Finds shared causal variants	eQTL overlap

方法	方法思路	优势	适用场景
SuSiE	Sum of Single Effects	处理多个因果变异	多信号位点
FINEMAP	贝叶斯随机搜索	快速、可扩展	大型研究
PAINTOR	功能注释整合	整合表观基因组学数据	调控变异
CAVIAR	共定位分析	寻找共享因果变异	eQTL重叠区域

Common Questions

常见问题

Q: Why don't all variants have credible sets? A: Fine-mapping requires:

GWAS summary statistics (not just top hits)
LD reference panel
Sufficient signal strength (p < 5e-8)
Computational resources

Q: Can a variant be in multiple credible sets? A: Yes! A variant can be causal for multiple traits (pleiotropy) or appear in different studies for the same trait.

Q: What if the top L2G gene is far from the variant? A: This suggests regulatory effects (enhancers, promoters). Check:

eQTL evidence in relevant tissues
Chromatin interaction data (Hi-C)
Regulatory element annotations (Roadmap, ENCODE)

Q: How do I choose between variants in a credible set? A: Prioritize by:

Posterior probability (higher = better)
Functional consequence (coding > regulatory > intergenic)
eQTL evidence
Evolutionary conservation
Experimental feasibility

问：为什么不是所有变异都有可信集？ 答：精细定位需要满足以下条件：

GWAS汇总统计数据（不仅仅是显著位点）
LD参考面板
足够的信号强度（p < 5e-8）
计算资源

问：一个变异可以属于多个可信集吗？ 答：可以！一个变异可能是多个性状的因果变异（多效性），或者在同一性状的不同研究中出现。

问：如果排名靠前的L2G基因与变异距离较远怎么办？ 答：这表明可能存在调控效应（增强子、启动子）。可以检查：

相关组织中的eQTL证据
染色质相互作用数据（Hi-C）
调控元件注释（Roadmap、ENCODE）

问：如何在可信集中的变异中进行选择？ 答：按以下优先级排序：

后验概率（越高越好）
功能影响（编码变异 > 调控变异 > 基因间变异）
eQTL证据
进化保守性
实验可行性

Limitations

局限性

LD-dependent: Fine-mapping accuracy depends on LD structure matching the study population
Requires summary stats: Not all studies provide full summary statistics
Computational intensive: Fine-mapping large studies takes significant resources
Prior assumptions: Bayesian methods depend on priors (number of causal variants, effect sizes)
Missing data: Not all GWAS loci have been fine-mapped in Open Targets

依赖LD：精细定位的准确性取决于LD结构与研究人群的匹配度
需要汇总统计数据：并非所有研究都提供完整的汇总统计数据
计算密集：大型研究的精细定位需要大量计算资源
先验假设：贝叶斯方法依赖先验假设（因果变异数量、效应大小）
数据缺失：并非所有GWAS位点都已在Open Targets中完成精细定位

Best Practices

最佳实践

Start with study-level queries when exploring a new disease
Check multiple studies for replication of signals
Combine with functional data (eQTLs, chromatin, CRISPR screens)
Consider ancestry - LD differs across populations
Validate experimentally - fine-mapping provides candidates, not proof

探索新疾病时从研究层面查询开始
检查多项研究以验证信号的重复性
结合功能数据（eQTL、染色质、CRISPR筛选）
考虑祖先群体 - 不同人群的LD存在差异
进行实验验证 - 精细定位仅提供候选，而非结论

References

参考文献

Wang et al. (2020) "A simple new approach to variable selection in regression, with application to genetic fine mapping." JRSS-B (SuSiE)
Benner et al. (2016) "FINEMAP: efficient variable selection using summary data from genome-wide association studies." Bioinformatics
Ghoussaini et al. (2021) "Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics." NAR
Mountjoy et al. (2021) "An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci." Nat Genet

Wang et al. (2020) "A simple new approach to variable selection in regression, with application to genetic fine mapping." JRSS-B (SuSiE)
Benner et al. (2016) "FINEMAP: efficient variable selection using summary data from genome-wide association studies." Bioinformatics
Ghoussaini et al. (2021) "Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics." NAR
Mountjoy et al. (2021) "An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci." Nat Genet