tooluniverse-gwas-finemapping
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGWAS Fine-Mapping & Causal Variant Prioritization
GWAS精细定位与因果变异优先排序
Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions.
利用统计精细定位和位点-基因预测来识别并优先排序GWAS位点中的因果变异。
Overview
概述
Genome-wide association studies (GWAS) identify genomic regions associated with traits, but linkage disequilibrium (LD) makes it difficult to pinpoint the causal variant. Fine-mapping uses Bayesian statistical methods to compute the posterior probability that each variant is causal, given the GWAS summary statistics.
This skill provides tools to:
- Prioritize causal variants using fine-mapping posterior probabilities
- Link variants to genes using locus-to-gene (L2G) predictions
- Annotate variants with functional consequences
- Suggest validation strategies based on fine-mapping results
全基因组关联研究(GWAS)识别与性状相关的基因组区域,但连锁不平衡(LD)使得准确定位因果变异变得困难。精细定位利用贝叶斯统计方法,结合GWAS汇总统计数据计算每个变异为因果变异的后验概率。
本技能提供以下工具:
- 优先排序因果变异:利用精细定位后验概率
- 关联变异与基因:利用位点-基因(L2G)预测
- 注释变异:标注功能影响
- 提出验证策略:基于精细定位结果
Key Concepts
核心概念
Credible Sets
可信集
A credible set is a minimal set of variants that contains the causal variant with high confidence (typically 95% or 99%). Each variant in the set has a posterior probability of being causal, computed using methods like:
- SuSiE (Sum of Single Effects)
- FINEMAP (Bayesian fine-mapping)
- PAINTOR (Probabilistic Annotation INtegraTOR)
可信集是指以高置信度(通常为95%或99%)包含因果变异的最小变异集合。集合中的每个变异都有一个后验概率,表示其为因果变异的可能性,计算方法包括:
- SuSiE(Sum of Single Effects)
- FINEMAP(贝叶斯精细定位)
- PAINTOR(Probabilistic Annotation INtegraTOR)
Posterior Probability
后验概率
The probability that a specific variant is causal, given the GWAS data and LD structure. Higher posterior probability = more likely to be causal.
在给定GWAS数据和LD结构的情况下,某个特定变异为因果变异的概率。后验概率越高,该变异为因果变异的可能性越大。
Locus-to-Gene (L2G) Predictions
位点-基因(L2G)预测
L2G scores integrate multiple data types to predict which gene is affected by a variant:
- Distance to gene (closer = higher score)
- eQTL evidence (expression changes)
- Chromatin interactions (Hi-C, promoter capture)
- Functional annotations (coding variants, regulatory regions)
L2G scores range from 0 to 1, with higher scores indicating stronger gene-variant links.
L2G评分整合多种数据类型,预测变异影响的基因:
- 与基因的距离(越近评分越高)
- eQTL证据(表达变化)
- 染色质相互作用(Hi-C、启动子捕获)
- 功能注释(编码变异、调控区域)
L2G评分范围为0到1,评分越高表示基因与变异的关联越强。
Use Cases
应用场景
1. Prioritize Variants at a Known Locus
1. 优先排序已知位点的变异
Question: "Which variant at the TCF7L2 locus is likely causal for type 2 diabetes?"
python
from python_implementation import prioritize_causal_variants问题:"TCF7L2位点中哪个变异可能是2型糖尿病的因果变异?"
python
from python_implementation import prioritize_causal_variantsPrioritize variants in TCF7L2 for diabetes
Prioritize variants in TCF7L2 for diabetes
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
print(result.get_summary())
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
print(result.get_summary())
Output shows:
Output shows:
- Credible sets containing TCF7L2 variants
- Credible sets containing TCF7L2 variants
- Posterior probabilities (via fine-mapping methods)
- Posterior probabilities (via fine-mapping methods)
- Top L2G genes (which genes are likely affected)
- Top L2G genes (which genes are likely affected)
- Associated traits
- Associated traits
undefinedundefined2. Fine-Map a Specific Variant
2. 精细定位特定变异
Question: "What do we know about rs429358 (APOE4) from fine-mapping?"
python
undefined问题:"从精细定位结果来看,rs429358(APOE4)有哪些信息?"
python
undefinedFine-map a specific variant
Fine-map a specific variant
result = prioritize_causal_variants("rs429358")
result = prioritize_causal_variants("rs429358")
Check which credible sets contain this variant
Check which credible sets contain this variant
for cs in result.credible_sets:
print(f"Trait: {cs.trait}")
print(f"Fine-mapping method: {cs.finemapping_method}")
print(f"Top gene: {cs.l2g_genes[0] if cs.l2g_genes else 'N/A'}")
print(f"Confidence: {cs.confidence}")
undefinedfor cs in result.credible_sets:
print(f"Trait: {cs.trait}")
print(f"Fine-mapping method: {cs.finemapping_method}")
print(f"Top gene: {cs.l2g_genes[0] if cs.l2g_genes else 'N/A'}")
print(f"Confidence: {cs.confidence}")
undefined3. Explore All Loci from a GWAS Study
3. 探索GWAS研究的所有位点
Question: "What are all the causal loci from the recent T2D meta-analysis?"
python
from python_implementation import get_credible_sets_for_study问题:"最新2型糖尿病荟萃分析中的所有因果位点有哪些?"
python
from python_implementation import get_credible_sets_for_studyGet all fine-mapped loci from a study
Get all fine-mapped loci from a study
credible_sets = get_credible_sets_for_study("GCST90029024") # T2D GWAS
print(f"Found {len(credible_sets)} independent loci")
credible_sets = get_credible_sets_for_study("GCST90029024") # T2D GWAS
print(f"Found {len(credible_sets)} independent loci")
Examine each locus
Examine each locus
for cs in credible_sets:
print(f"\nRegion: {cs.region}")
print(f"Lead variant: {cs.lead_variant.rs_ids[0] if cs.lead_variant else 'N/A'}")
if cs.l2g_genes:
top_gene = cs.l2g_genes[0]
print(f"Most likely causal gene: {top_gene.gene_symbol} (L2G: {top_gene.l2g_score:.3f})")undefinedfor cs in credible_sets:
print(f"\nRegion: {cs.region}")
print(f"Lead variant: {cs.lead_variant.rs_ids[0] if cs.lead_variant else 'N/A'}")
if cs.l2g_genes:
top_gene = cs.l2g_genes[0]
print(f"Most likely causal gene: {top_gene.gene_symbol} (L2G: {top_gene.l2g_score:.3f})")undefined4. Find GWAS Studies for a Disease
4. 查找疾病相关的GWAS研究
Question: "What GWAS studies exist for Alzheimer's disease?"
python
from python_implementation import search_gwas_studies_for_disease问题:"有哪些针对阿尔茨海默病的GWAS研究?"
python
from python_implementation import search_gwas_studies_for_diseaseSearch by disease name
Search by disease name
studies = search_gwas_studies_for_disease("Alzheimer's disease")
for study in studies[:5]:
print(f"{study['id']}: {study.get('nSamples', 'N/A')} samples")
print(f" Author: {study.get('publicationFirstAuthor', 'N/A')}")
print(f" Has summary stats: {study.get('hasSumstats', False)}")
studies = search_gwas_studies_for_disease("Alzheimer's disease")
for study in studies[:5]:
print(f"{study['id']}: {study.get('nSamples', 'N/A')} samples")
print(f" Author: {study.get('publicationFirstAuthor', 'N/A')}")
print(f" Has summary stats: {study.get('hasSumstats', False)}")
Or use precise disease ontology IDs
Or use precise disease ontology IDs
studies = search_gwas_studies_for_disease(
"Alzheimer's disease",
disease_id="EFO_0000249" # EFO ID for Alzheimer's
)
undefinedstudies = search_gwas_studies_for_disease(
"Alzheimer's disease",
disease_id="EFO_0000249" # EFO ID for Alzheimer's
)
undefined5. Get Validation Suggestions
5. 获取验证建议
Question: "How should we validate the top causal variant?"
python
result = prioritize_causal_variants("APOE", "alzheimer")问题:"我们应该如何验证排名靠前的因果变异?"
python
result = prioritize_causal_variants("APOE", "alzheimer")Get experimental validation suggestions
Get experimental validation suggestions
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
print(suggestion)
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
print(suggestion)
Output includes:
Output includes:
- CRISPR knock-in experiments
- CRISPR knock-in experiments
- Reporter assays
- Reporter assays
- eQTL analysis
- eQTL analysis
- Colocalization studies
- Colocalization studies
undefinedundefinedWorkflow Example: Complete Fine-Mapping Analysis
工作流示例:完整精细定位分析
python
from python_implementation import (
prioritize_causal_variants,
search_gwas_studies_for_disease,
get_credible_sets_for_study
)python
from python_implementation import (
prioritize_causal_variants,
search_gwas_studies_for_disease,
get_credible_sets_for_study
)Step 1: Find relevant GWAS studies
Step 1: Find relevant GWAS studies
print("Step 1: Finding T2D GWAS studies...")
studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148")
largest_study = max(studies, key=lambda s: s.get('nSamples', 0) or 0)
print(f"Largest study: {largest_study['id']} ({largest_study.get('nSamples', 'N/A')} samples)")
print("Step 1: Finding T2D GWAS studies...")
studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148")
largest_study = max(studies, key=lambda s: s.get('nSamples', 0) or 0)
print(f"Largest study: {largest_study['id']} ({largest_study.get('nSamples', 'N/A')} samples)")
Step 2: Get all fine-mapped loci from the study
Step 2: Get all fine-mapped loci from the study
print("\nStep 2: Getting fine-mapped loci...")
credible_sets = get_credible_sets_for_study(largest_study['id'], max_sets=100)
print(f"Found {len(credible_sets)} credible sets")
print("\nStep 2: Getting fine-mapped loci...")
credible_sets = get_credible_sets_for_study(largest_study['id'], max_sets=100)
print(f"Found {len(credible_sets)} credible sets")
Step 3: Find loci near genes of interest
Step 3: Find loci near genes of interest
print("\nStep 3: Finding TCF7L2 loci...")
tcf7l2_loci = [
cs for cs in credible_sets
if any(gene.gene_symbol == "TCF7L2" for gene in cs.l2g_genes)
]
print(f"TCF7L2 appears in {len(tcf7l2_loci)} loci")
print("\nStep 3: Finding TCF7L2 loci...")
tcf7l2_loci = [
cs for cs in credible_sets
if any(gene.gene_symbol == "TCF7L2" for gene in cs.l2g_genes)
]
print(f"TCF7L2 appears in {len(tcf7l2_loci)} loci")
Step 4: Prioritize variants at TCF7L2
Step 4: Prioritize variants at TCF7L2
print("\nStep 4: Prioritizing TCF7L2 variants...")
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
print("\nStep 4: Prioritizing TCF7L2 variants...")
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
Step 5: Print summary and validation plan
Step 5: Print summary and validation plan
print("\n" + "="*60)
print("FINE-MAPPING SUMMARY")
print("="*60)
print(result.get_summary())
print("\n" + "="*60)
print("VALIDATION STRATEGY")
print("="*60)
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
print(suggestion)
undefinedprint("\n" + "="*60)
print("FINE-MAPPING SUMMARY")
print("="*60)
print(result.get_summary())
print("\n" + "="*60)
print("VALIDATION STRATEGY")
print("="*60)
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
print(suggestion)
undefinedData Classes
数据类
FineMappingResult
FineMappingResultFineMappingResult
FineMappingResultMain result object containing:
- : Variant annotation
query_variant - : Gene symbol (if queried by gene)
query_gene - : List of fine-mapped loci
credible_sets - : All associated traits
associated_traits - : L2G genes ranked by score
top_causal_genes
Methods:
- : Human-readable summary
get_summary() - : Experimental validation strategies
get_validation_suggestions()
主要结果对象,包含:
- :变异注释
query_variant - :基因符号(如果按基因查询)
query_gene - :精细定位位点列表
credible_sets - :所有相关性状
associated_traits - :按评分排序的L2G基因
top_causal_genes
方法:
- :人类可读的摘要
get_summary() - :实验验证策略
get_validation_suggestions()
CredibleSet
CredibleSetCredibleSet
CredibleSetRepresents a fine-mapped locus:
- : Unique identifier
study_locus_id - : Genomic region (e.g., "10:112861809-113404438")
region - : Top variant by posterior probability
lead_variant - : Statistical method used (SuSiE, FINEMAP, etc.)
finemapping_method - : Locus-to-gene predictions
l2g_genes - : Credible set confidence (95%, 99%)
confidence
表示精细定位的位点:
- :唯一标识符
study_locus_id - :基因组区域(例如:"10:112861809-113404438")
region - :后验概率最高的变异
lead_variant - :使用的统计方法(SuSiE、FINEMAP等)
finemapping_method - :位点-基因预测结果
l2g_genes - :可信集置信度(95%、99%)
confidence
L2GGene
L2GGeneL2GGene
L2GGeneLocus-to-gene prediction:
- : Gene name (e.g., "TCF7L2")
gene_symbol - : Ensembl gene ID
gene_id - : Probability score (0-1)
l2g_score
位点-基因预测结果:
- :基因名称(例如:"TCF7L2")
gene_symbol - :Ensembl基因ID
gene_id - :概率评分(0-1)
l2g_score
VariantAnnotation
VariantAnnotationVariantAnnotation
VariantAnnotationFunctional annotation for a variant:
- : Open Targets format (chr_pos_ref_alt)
variant_id - : dbSNP identifiers
rs_ids - ,
chromosome: Genomic coordinatesposition - : Functional impact
most_severe_consequence - : Population-specific MAFs
allele_frequencies
变异的功能注释:
- :Open Targets格式(chr_pos_ref_alt)
variant_id - :dbSNP标识符
rs_ids - ,
chromosome:基因组坐标position - :功能影响程度
most_severe_consequence - :人群特异性次要等位基因频率
allele_frequencies
Tools Used
使用的工具
Open Targets Genetics (GraphQL)
Open Targets Genetics (GraphQL)
- : Variant details and allele frequencies
OpenTargets_get_variant_info - : Credible sets containing a variant
OpenTargets_get_variant_credible_sets - : Detailed credible set information
OpenTargets_get_credible_set_detail - : All loci from a GWAS study
OpenTargets_get_study_credible_sets - : Find studies by disease
OpenTargets_search_gwas_studies_by_disease
- :变异详情和等位基因频率
OpenTargets_get_variant_info - :包含该变异的可信集
OpenTargets_get_variant_credible_sets - :可信集详细信息
OpenTargets_get_credible_set_detail - :GWAS研究中的所有位点
OpenTargets_get_study_credible_sets - :按疾病查找研究
OpenTargets_search_gwas_studies_by_disease
GWAS Catalog (REST API)
GWAS Catalog (REST API)
- : Find SNPs by gene or rsID
gwas_search_snps - : Detailed SNP information
gwas_get_snp_by_id - : All trait associations for a variant
gwas_get_associations_for_snp - : Find studies by disease/trait
gwas_search_studies
- :按基因或rsID查找SNP
gwas_search_snps - :SNP详细信息
gwas_get_snp_by_id - :变异的所有性状关联
gwas_get_associations_for_snp - :按疾病/性状查找研究
gwas_search_studies
Understanding Fine-Mapping Output
解读精细定位结果
Interpreting Posterior Probabilities
解读后验概率
- > 0.5: Very likely causal (strong candidate)
- 0.1 - 0.5: Plausible causal variant
- 0.01 - 0.1: Possible but uncertain
- < 0.01: Unlikely to be causal
- > 0.5:极有可能为因果变异(强候选)
- 0.1 - 0.5:可能的因果变异
- 0.01 - 0.1:有可能但不确定
- < 0.01:不太可能为因果变异
Interpreting L2G Scores
解读L2G评分
- > 0.7: High confidence gene-variant link
- 0.5 - 0.7: Moderate confidence
- 0.3 - 0.5: Weak but possible link
- < 0.3: Low confidence
- > 0.7:基因-变异关联置信度高
- 0.5 - 0.7:基因-变异关联置信度中等
- 0.3 - 0.5:基因-变异关联较弱但可能存在
- < 0.3:基因-变异关联置信度低
Fine-Mapping Methods Compared
精细定位方法对比
| Method | Approach | Strengths | Use Case |
|---|---|---|---|
| SuSiE | Sum of Single Effects | Handles multiple causal variants | Multi-signal loci |
| FINEMAP | Bayesian shotgun stochastic search | Fast, scalable | Large studies |
| PAINTOR | Functional annotations | Integrates epigenomics | Regulatory variants |
| CAVIAR | Colocalization | Finds shared causal variants | eQTL overlap |
| 方法 | 方法思路 | 优势 | 适用场景 |
|---|---|---|---|
| SuSiE | Sum of Single Effects | 处理多个因果变异 | 多信号位点 |
| FINEMAP | 贝叶斯随机搜索 | 快速、可扩展 | 大型研究 |
| PAINTOR | 功能注释整合 | 整合表观基因组学数据 | 调控变异 |
| CAVIAR | 共定位分析 | 寻找共享因果变异 | eQTL重叠区域 |
Common Questions
常见问题
Q: Why don't all variants have credible sets?
A: Fine-mapping requires:
- GWAS summary statistics (not just top hits)
- LD reference panel
- Sufficient signal strength (p < 5e-8)
- Computational resources
Q: Can a variant be in multiple credible sets?
A: Yes! A variant can be causal for multiple traits (pleiotropy) or appear in different studies for the same trait.
Q: What if the top L2G gene is far from the variant?
A: This suggests regulatory effects (enhancers, promoters). Check:
- eQTL evidence in relevant tissues
- Chromatin interaction data (Hi-C)
- Regulatory element annotations (Roadmap, ENCODE)
Q: How do I choose between variants in a credible set?
A: Prioritize by:
- Posterior probability (higher = better)
- Functional consequence (coding > regulatory > intergenic)
- eQTL evidence
- Evolutionary conservation
- Experimental feasibility
问:为什么不是所有变异都有可信集?
答:精细定位需要满足以下条件:
- GWAS汇总统计数据(不仅仅是显著位点)
- LD参考面板
- 足够的信号强度(p < 5e-8)
- 计算资源
问:一个变异可以属于多个可信集吗?
答:可以!一个变异可能是多个性状的因果变异(多效性),或者在同一性状的不同研究中出现。
问:如果排名靠前的L2G基因与变异距离较远怎么办?
答:这表明可能存在调控效应(增强子、启动子)。可以检查:
- 相关组织中的eQTL证据
- 染色质相互作用数据(Hi-C)
- 调控元件注释(Roadmap、ENCODE)
问:如何在可信集中的变异中进行选择?
答:按以下优先级排序:
- 后验概率(越高越好)
- 功能影响(编码变异 > 调控变异 > 基因间变异)
- eQTL证据
- 进化保守性
- 实验可行性
Limitations
局限性
- LD-dependent: Fine-mapping accuracy depends on LD structure matching the study population
- Requires summary stats: Not all studies provide full summary statistics
- Computational intensive: Fine-mapping large studies takes significant resources
- Prior assumptions: Bayesian methods depend on priors (number of causal variants, effect sizes)
- Missing data: Not all GWAS loci have been fine-mapped in Open Targets
- 依赖LD:精细定位的准确性取决于LD结构与研究人群的匹配度
- 需要汇总统计数据:并非所有研究都提供完整的汇总统计数据
- 计算密集:大型研究的精细定位需要大量计算资源
- 先验假设:贝叶斯方法依赖先验假设(因果变异数量、效应大小)
- 数据缺失:并非所有GWAS位点都已在Open Targets中完成精细定位
Best Practices
最佳实践
- Start with study-level queries when exploring a new disease
- Check multiple studies for replication of signals
- Combine with functional data (eQTLs, chromatin, CRISPR screens)
- Consider ancestry - LD differs across populations
- Validate experimentally - fine-mapping provides candidates, not proof
- 探索新疾病时从研究层面查询开始
- 检查多项研究以验证信号的重复性
- 结合功能数据(eQTL、染色质、CRISPR筛选)
- 考虑祖先群体 - 不同人群的LD存在差异
- 进行实验验证 - 精细定位仅提供候选,而非结论
References
参考文献
- Wang et al. (2020) "A simple new approach to variable selection in regression, with application to genetic fine mapping." JRSS-B (SuSiE)
- Benner et al. (2016) "FINEMAP: efficient variable selection using summary data from genome-wide association studies." Bioinformatics
- Ghoussaini et al. (2021) "Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics." NAR
- Mountjoy et al. (2021) "An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci." Nat Genet
- Wang et al. (2020) "A simple new approach to variable selection in regression, with application to genetic fine mapping." JRSS-B (SuSiE)
- Benner et al. (2016) "FINEMAP: efficient variable selection using summary data from genome-wide association studies." Bioinformatics
- Ghoussaini et al. (2021) "Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics." NAR
- Mountjoy et al. (2021) "An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci." Nat Genet
Related Skills
相关技能
- tooluniverse-gwas-explorer: Broader GWAS analysis
- tooluniverse-eqtl-colocalization: Link variants to gene expression
- tooluniverse-gene-prioritization: Systematic gene ranking
- tooluniverse-gwas-explorer:更全面的GWAS分析
- tooluniverse-eqtl-colocalization:关联变异与基因表达
- tooluniverse-gene-prioritization:系统的基因排序