tooluniverse-gwas-study-explorer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGWAS Study Deep Dive & Meta-Analysis
GWAS研究深度解析与元分析
Compare GWAS studies, perform meta-analyses, and assess replication across cohorts
对比GWAS研究、开展元分析并评估跨队列的重复性
Overview
概述
The GWAS Study Deep Dive & Meta-Analysis skill enables comprehensive comparison of genome-wide association studies (GWAS) for the same trait, meta-analysis of genetic loci across studies, and systematic assessment of replication and study quality. It integrates data from the NHGRI-EBI GWAS Catalog and Open Targets Genetics to provide a complete picture of the genetic architecture of complex traits.
GWAS研究深度解析与元分析Skill可对同一性状的全基因组关联研究(GWAS)进行全面对比,对跨研究的基因位点开展元分析,并系统性评估研究的重复性与质量。它整合了NHGRI-EBI GWAS Catalog和Open Targets Genetics的数据,为复杂性状的遗传结构提供完整视角。
Key Capabilities
核心功能
- Study Comparison: Compare all GWAS studies for a trait, assessing sample sizes, ancestries, and platforms
- Meta-Analysis: Aggregate effect sizes across studies and calculate heterogeneity statistics
- Replication Assessment: Identify replicated vs novel findings across discovery and replication cohorts
- Quality Evaluation: Assess statistical power, ancestry diversity, and data availability
- 研究对比:对比某一性状的所有GWAS研究,评估样本量、祖先群体及研究平台
- 元分析:整合跨研究的效应量并计算异质性统计量
- 重复性评估:识别发现队列与重复队列中的已验证结果和新发现
- 质量评估:评估统计效力、祖先多样性及数据可用性
Use Cases
应用场景
1. Comprehensive Trait Analysis
1. 全面性状分析
Scenario: "I want to understand all available GWAS data for type 2 diabetes"
Workflow:
- Search for all T2D studies in GWAS Catalog
- Filter by sample size and ancestry
- Extract top associations from each study
- Identify consistently replicated loci
- Assess ancestry-specific effects
Outcome: Complete landscape of T2D genetics with replicated findings and population-specific signals
场景:"我想了解2型糖尿病的所有可用GWAS数据"
流程:
- 在GWAS Catalog中搜索所有2型糖尿病(T2D)研究
- 按样本量和祖先群体筛选
- 提取每项研究的顶级关联结果
- 识别持续重复的基因位点
- 评估祖先特异性效应
结果:获得2型糖尿病遗传学的完整图景,包含已验证结果和群体特异性信号
2. Locus-Specific Meta-Analysis
2. 位点特异性元分析
Scenario: "Is the TCF7L2 association with T2D consistent across all studies?"
Workflow:
- Retrieve all TCF7L2 (rs7903146) associations for T2D
- Calculate combined effect size and p-value
- Assess heterogeneity (I² statistic)
- Generate forest plot data
- Interpret heterogeneity level
Outcome: Quantitative assessment of effect size consistency with heterogeneity interpretation
场景:"TCF7L2与2型糖尿病的关联在所有研究中是否一致?"
流程:
- 获取所有2型糖尿病研究中TCF7L2(rs7903146)的关联结果
- 计算合并效应量与p值
- 评估异质性(I²统计量)
- 生成森林图数据
- 解读异质性水平
结果:对效应量的一致性进行量化评估并解读异质性
3. Replication Analysis
3. 重复性分析
Scenario: "Which findings from the discovery cohort replicated in the independent sample?"
Workflow:
- Get top hits from discovery study
- Check for presence and significance in replication study
- Assess direction consistency
- Calculate replication rate
- Identify novel vs failed replication
Outcome: Systematic replication report with success rates and failed findings
场景:"发现队列中的哪些结果在独立样本中得到了重复验证?"
流程:
- 获取发现队列的顶级关联结果
- 检查这些结果在重复队列中的存在性与显著性
- 评估效应方向的一致性
- 计算重复率
- 识别新发现与未重复的结果
结果:系统性的重复验证报告,包含成功率与未重复的结果
4. Multi-Ancestry Comparison
4. 跨祖先群体对比
Scenario: "Are T2D loci consistent across European and East Asian populations?"
Workflow:
- Filter studies by ancestry
- Compare top associations between populations
- Identify shared vs population-specific loci
- Assess allele frequency differences
- Evaluate transferability of genetic risk scores
Outcome: Ancestry-specific genetic architecture with transferability assessment
场景:"2型糖尿病的基因位点在欧洲人群与东亚人群中是否一致?"
流程:
- 按祖先群体筛选研究
- 对比不同群体的顶级关联结果
- 识别共享位点与群体特异性位点
- 评估等位基因频率差异
- 评估遗传风险评分的可转移性
结果:具有祖先特异性的遗传结构及可转移性评估
Statistical Methods
统计方法
Meta-Analysis Approach
元分析方法
This skill implements standard GWAS meta-analysis methods:
Fixed-Effects Model:
- Used when heterogeneity is low (I² < 25%)
- Weights studies by inverse variance
- Assumes true effect size is the same across studies
Random-Effects Model (recommended when I² > 50%):
- Accounts for between-study variation
- More conservative than fixed-effects
- Better for diverse ancestries or methodologies
Heterogeneity Assessment:
The I² statistic measures the percentage of variance due to between-study heterogeneity:
I² = [(Q - df) / Q] × 100%
where Q = Cochran's Q statistic
df = degrees of freedom (n_studies - 1)Interpretation Guidelines:
- I² < 25%: Low heterogeneity → fixed-effects appropriate
- I² = 25-50%: Moderate heterogeneity → investigate sources
- I² = 50-75%: Substantial heterogeneity → random-effects preferred
- I² > 75%: Considerable heterogeneity → meta-analysis may not be appropriate
本Skill采用标准的GWAS元分析方法:
固定效应模型:
- 适用于异质性较低的情况(I² < 25%)
- 按逆方差对研究进行加权
- 假设各研究的真实效应量相同
随机效应模型(当I² > 50%时推荐使用):
- 考虑研究间的变异
- 比固定效应模型更保守
- 更适用于祖先群体多样或研究方法不同的情况
异质性评估:
I²统计量用于衡量由研究间差异导致的变异百分比:
I² = [(Q - df) / Q] × 100%
其中 Q = Cochran's Q统计量
df = 自由度(研究数量 - 1)解读指南:
- I² < 25%:低异质性 → 适合使用固定效应模型
- I² = 25-50%:中等异质性 → 需调查异质性来源
- I² = 50-75%:显著异质性 → 推荐使用随机效应模型
- I² > 75%:高度异质性 → 元分析可能不适用
Sources of Heterogeneity
异质性来源
Common reasons for high I²:
- Ancestry differences: Different allele frequencies and LD structure
- Phenotype heterogeneity: Trait definition varies across studies
- Platform differences: Imputation quality and coverage
- Winner's curse: Discovery studies overestimate effect sizes
- Cohort characteristics: Age, sex, environmental factors
Recommendations:
- Perform subgroup analysis by ancestry
- Use meta-regression to investigate sources
- Consider excluding outlier studies
- Apply genomic control correction
导致I²值较高的常见原因:
- 祖先群体差异:等位基因频率和连锁不平衡(LD)结构不同
- 表型异质性:不同研究的性状定义存在差异
- 平台差异:基因型填充质量与覆盖度不同
- 胜者诅咒:发现队列高估了效应量
- 队列特征:年龄、性别、环境因素不同
建议:
- 按祖先群体进行亚组分析
- 使用元回归调查异质性来源
- 考虑排除异常值研究
- 应用基因组控制校正
Study Quality Assessment
研究质量评估
Quality Metrics
质量指标
The skill evaluates studies based on:
1. Sample Size:
- Power to detect associations (80% power requires n > 10,000 for OR=1.2)
- Precision of effect size estimates
- Ability to detect modest effects
2. Ancestry Diversity:
- Single-ancestry vs multi-ancestry
- Population stratification control
- Transferability of findings
3. Data Availability:
- Summary statistics available for meta-analysis
- Individual-level data vs summary-level
- Imputation quality scores
4. Genotyping Quality:
- Platform density and coverage
- Imputation reference panel
- Quality control measures
5. Statistical Rigor:
- Genome-wide significance threshold (p < 5×10⁻⁸)
- Multiple testing correction
- Replication in independent cohort
本Skill从以下维度评估研究质量:
1. 样本量:
- 检测关联的效力(对于OR=1.2的变异,需样本量n > 10000才能达到80%的统计效力)
- 效应量估计的精度
- 检测中等效应变异的能力
2. 祖先多样性:
- 单祖先群体 vs 多祖先群体
- 群体分层控制
- 研究结果的可转移性
3. 数据可用性:
- 可用于元分析的汇总统计数据
- 个体水平数据 vs 汇总水平数据
- 基因型填充质量评分
4. 基因分型质量:
- 研究平台的密度与覆盖度
- 基因型填充参考面板
- 质量控制措施
5. 统计严谨性:
- 全基因组显著性阈值(p < 5×10⁻⁸)
- 多重检验校正
- 在独立队列中的重复验证
Quality Tiers
质量等级
Tier 1 (High Quality):
- n ≥ 50,000
- Summary statistics available
- Multi-ancestry or large single-ancestry
- Imputed to high-quality reference
- Independent replication
Tier 2 (Moderate Quality):
- n ≥ 10,000
- Standard GWAS platform
- Adequate power for common variants
- Some data availability
Tier 3 (Limited):
- n < 10,000
- Limited power
- May miss modest effects
- Use with caution
1级(高质量):
- n ≥ 50000
- 可获取汇总统计数据
- 多祖先群体或大样本单祖先群体
- 基于高质量参考面板进行基因型填充
- 经过独立队列重复验证
2级(中等质量):
- n ≥ 10000
- 使用标准GWAS平台
- 对常见变异具有足够统计效力
- 具备一定的数据可用性
3级(有限质量):
- n < 10000
- 统计效力有限
- 可能遗漏中等效应的变异
- 需谨慎使用
Best Practices
最佳实践
Before Meta-Analysis
元分析前的准备
- Check phenotype consistency: Ensure studies measure the same trait
- Verify ancestry overlap: High heterogeneity expected if ancestries differ
- Harmonize alleles: Align effect alleles across studies
- Quality control: Exclude low-quality studies or associations
- 检查表型一致性:确保各研究测量的是同一性状
- 验证祖先群体重叠情况:若祖先群体不同,预计会出现较高异质性
- 等位基因 harmonization:统一各研究的效应等位基因
- 质量控制:排除低质量研究或关联结果
Interpreting Results
结果解读
- Genome-wide significance: p < 5×10⁻⁸ (Bonferroni for ~1M independent tests)
- Replication threshold: p < 0.05 in independent cohort
- Direction consistency: Effect should be same direction across studies
- Heterogeneity: I² > 50% suggests caution in interpretation
- 全基因组显著性:p < 5×10⁻⁸(针对约100万次独立检验的Bonferroni校正阈值)
- 重复验证阈值:在独立队列中p < 0.05
- 效应方向一致性:各研究的效应方向应一致
- 异质性:I² > 50%时,解读结果需谨慎
Common Pitfalls
常见误区
❌ Don't:
- Meta-analyze without checking heterogeneity
- Ignore ancestry differences
- Over-interpret nominal p-values
- Assume replication failure means false positive
✅ Do:
- Always report I² statistic
- Perform sensitivity analyses
- Consider ancestry-stratified analysis
- Account for winner's curse in discovery studies
❌ 请勿:
- 未检查异质性就开展元分析
- 忽略祖先群体差异
- 过度解读名义p值
- 假设重复验证失败就意味着假阳性
✅ 建议:
- 始终报告I²统计量
- 进行敏感性分析
- 考虑按祖先群体分层分析
- 考虑发现队列中的胜者诅咒效应
Limitations & Caveats
局限性与注意事项
Data Limitations
数据局限性
- Incomplete Overlap: Studies may analyze different SNPs
- Cohort Overlap: Some cohorts participate in multiple studies (inflates significance)
- Publication Bias: Significant findings more likely to be published
- Winner's Curse: Discovery studies overestimate effect sizes
- Imputation Quality: Varies across studies and populations
- 重叠不完全:各研究可能分析不同的SNP
- 队列重叠:部分队列参与了多项研究(会夸大显著性)
- 发表偏倚:显著结果更易被发表
- 胜者诅咒:发现队列高估了效应量
- 基因型填充质量:不同研究和群体间存在差异
Statistical Limitations
统计局限性
- Heterogeneity: High I² may preclude meaningful meta-analysis
- Sample Size Differences: Large studies dominate fixed-effects models
- Allele Frequency Differences: Same variant has different effects across ancestries
- Linkage Disequilibrium: Fine-mapping needed to identify causal variants
- Gene-Environment Interactions: Not captured in standard meta-analysis
- 异质性:高I²值可能导致元分析结果无意义
- 样本量差异:大样本研究会主导固定效应模型的结果
- 等位基因频率差异:同一变异在不同祖先群体中的效应不同
- 连锁不平衡:需通过精细定位识别因果变异
- 基因-环境交互作用:标准元分析无法捕捉此类交互作用
Interpretation Guidelines
解读指南
When I² > 75%:
- Meta-analysis results should be interpreted with extreme caution
- Investigate sources of heterogeneity systematically
- Consider ancestry-specific or subgroup analyses
- Descriptive comparison may be more appropriate than meta-analysis
When Studies Conflict:
- Check for methodological differences
- Verify phenotype definitions match
- Investigate population stratification
- Consider conditional analysis
当I² > 75%时:
- 元分析结果需极其谨慎地解读
- 系统性调查异质性来源
- 考虑按祖先群体或亚组进行分析
- 描述性对比可能比元分析更合适
当研究结果存在冲突时:
- 检查研究方法的差异
- 验证表型定义是否一致
- 调查群体分层情况
- 考虑条件分析
Scientific References
科学参考文献
Key Publications
关键出版物
-
GWAS Best Practices:
- Visscher et al. (2017). "10 Years of GWAS Discovery" American Journal of Human Genetics 101(1): 5-22
- PMID: 28686856
- DOI: 10.1016/j.ajhg.2017.06.005
-
Meta-Analysis Methods:
- Evangelou & Ioannidis (2013). "Meta-analysis methods for genome-wide association studies and beyond" Nature Reviews Genetics 14: 379-389
- PMID: 23657481
-
Heterogeneity Interpretation:
- Higgins et al. (2003). "Measuring inconsistency in meta-analyses" BMJ 327: 557-560
- PMID: 12958120
-
Multi-Ancestry GWAS:
- Peterson et al. (2019). "Genome-wide Association Studies in Ancestrally Diverse Populations" Nature Reviews Genetics 20: 409-422
- PMID: 30926972
-
Replication Standards:
- Chanock et al. (2007). "Replicating genotype-phenotype associations" Nature 447: 655-660
- PMID: 17554299
-
GWAS最佳实践:
- Visscher等人(2017)。"10 Years of GWAS Discovery" American Journal of Human Genetics 101(1): 5-22
- PMID: 28686856
-
元分析方法:
- Evangelou & Ioannidis(2013)。"Meta-analysis methods for genome-wide association studies and beyond" Nature Reviews Genetics 14: 379-389
- PMID: 23657481
-
异质性解读:
- Higgins等人(2003)。"Measuring inconsistency in meta-analyses" BMJ 327: 557-560
- PMID: 12958120
-
多祖先群体GWAS:
- Peterson等人(2019)。"Genome-wide Association Studies in Ancestrally Diverse Populations" Nature Reviews Genetics 20: 409-422
- PMID: 30926972
-
重复验证标准:
- Chanock等人(2007)。"Replicating genotype-phenotype associations" Nature 447: 655-660
- PMID: 17554299
Tools Used
使用工具
GWAS Catalog API
GWAS Catalog API
- : Find studies by trait
gwas_search_studies - : Get detailed study metadata
gwas_get_study_by_id - : Retrieve study associations
gwas_get_associations_for_study - : Get SNP associations across studies
gwas_get_associations_for_snp - : Search associations by trait
gwas_search_associations
- : 按性状查找研究
gwas_search_studies - : 获取详细研究元数据
gwas_get_study_by_id - : 提取研究的关联结果
gwas_get_associations_for_study - : 获取跨研究的SNP关联结果
gwas_get_associations_for_snp - : 按性状搜索关联结果
gwas_search_associations
Open Targets Genetics GraphQL API
Open Targets Genetics GraphQL API
- : Disease-based study search
OpenTargets_search_gwas_studies_by_disease - : Detailed study information with LD populations
OpenTargets_get_gwas_study - : Fine-mapped loci for variant
OpenTargets_get_variant_credible_sets - : All credible sets for study
OpenTargets_get_study_credible_sets - : Variant annotation and allele frequencies
OpenTargets_get_variant_info
- : 基于疾病的研究搜索
OpenTargets_search_gwas_studies_by_disease - : 包含LD群体的详细研究信息
OpenTargets_get_gwas_study - : 变异的精细定位位点
OpenTargets_get_variant_credible_sets - : 研究的所有精细定位位点
OpenTargets_get_study_credible_sets - : 变异注释与等位基因频率
OpenTargets_get_variant_info
Glossary
术语表
Association: Statistical relationship between a genetic variant and a trait
Credible Set: Set of variants likely to contain the causal variant (from fine-mapping)
Effect Size: Magnitude of genetic association (beta coefficient or odds ratio)
Fine-Mapping: Statistical method to identify causal variants within a locus
Genome-Wide Significance: p < 5×10⁻⁸, accounting for ~1M independent tests
Heterogeneity (I²): Percentage of variance due to between-study differences
L2G (Locus-to-Gene): Score predicting which gene is affected by a GWAS locus
LD (Linkage Disequilibrium): Non-random association of alleles at different loci
Meta-Analysis: Statistical combination of results from multiple studies
Replication: Independent confirmation of an association in a new cohort
Summary Statistics: Per-SNP statistics (p-value, beta, SE) from GWAS
Winner's Curse: Overestimation of effect size in discovery studies
Association(关联): 遗传变异与性状之间的统计关系
Credible Set(可信集): 可能包含因果变异的一组变异(来自精细定位)
Effect Size(效应量): 遗传关联的强度(β系数或比值比)
Fine-Mapping(精细定位): 识别某一位点内因果变异的统计方法
Genome-Wide Significance(全基因组显著性): p < 5×10⁻⁸,针对约100万次独立检验的校正阈值
Heterogeneity (I²)(异质性): 由研究间差异导致的变异百分比
L2G (Locus-to-Gene)(位点-基因评分): 预测GWAS位点影响的基因的评分
LD (Linkage Disequilibrium)(连锁不平衡): 不同位点等位基因的非随机关联
Meta-Analysis(元分析): 对多项研究结果进行统计合并的方法
Replication(重复验证): 在新队列中独立验证某一关联结果
Summary Statistics(汇总统计数据): GWAS中每个SNP的统计数据(p值、β系数、标准误)
Winner's Curse(胜者诅咒): 发现队列中对效应量的高估
Next Steps
后续步骤
After running this skill, consider:
- Fine-Mapping: Use credible sets from Open Targets to identify causal variants
- Functional Follow-Up: Investigate biological mechanisms of replicated loci
- Genetic Risk Scores: Calculate polygenic risk scores using validated loci
- Drug Target Identification: Use L2G scores to prioritize therapeutic targets
- Cross-Trait Analysis: Look for pleiotropy with related traits
运行本Skill后,可考虑:
- 精细定位: 使用Open Targets的可信集识别因果变异
- 功能验证: 探索已验证位点的生物学机制
- 遗传风险评分: 使用经过验证的位点计算多基因风险评分
- 药物靶点识别: 利用L2G评分优先选择治疗靶点
- 跨性状分析: 寻找与相关性状的多效性关联
Version History
版本历史
- v1.0 (2026-02-13): Initial release with study comparison, meta-analysis, and replication assessment
Created by: ToolUniverse GWAS Analysis Team
Last Updated: 2026-02-13
License: Open source (MIT)
- v1.0 (2026-02-13): 初始版本,包含研究对比、元分析及重复性评估功能
创建者: ToolUniverse GWAS分析团队
最后更新: 2026-02-13
许可证: 开源(MIT)