tooluniverse-gwas-trait-to-gene

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GWAS Trait-to-Gene Discovery

基于GWAS的性状关联基因发现

Discover genes associated with diseases and traits using genome-wide association studies (GWAS)
利用全基因组关联研究(GWAS)发现与疾病和性状相关的基因

Overview

概述

This skill enables systematic discovery of genes linked to diseases/traits by analyzing GWAS data from two major resources:
  • GWAS Catalog (EBI/NHGRI): Curated catalog of published GWAS with >500,000 associations
  • Open Targets Genetics: Fine-mapped GWAS signals with locus-to-gene (L2G) predictions
本技能通过分析两大核心资源的GWAS数据,可系统性发现与疾病/性状相关的基因:
  • GWAS Catalog(EBI/NHGRI):精心整理的已发表GWAS目录,包含超过50万条关联数据
  • Open Targets Genetics:经过精细定位的GWAS信号,提供基因座-基因(L2G)预测

Use Cases

适用场景

Clinical Research
  • "What genes are associated with type 2 diabetes?"
  • "Find genetic risk factors for coronary artery disease"
  • "Which genes contribute to Alzheimer's disease susceptibility?"
Drug Target Discovery
  • Identify genes with strong genetic evidence for disease causation
  • Prioritize targets based on L2G scores and replication across studies
  • Find genes with genome-wide significant associations (p < 5e-8)
Functional Genomics
  • Map disease-associated variants to candidate genes
  • Analyze genetic architecture of complex traits
  • Understand polygenic disease mechanisms
临床研究
  • "哪些基因与2型糖尿病相关?"
  • "寻找冠心病的遗传风险因素"
  • "哪些基因会增加阿尔茨海默病的易感性?"
药物靶点发现
  • 识别具有强遗传致病证据的基因
  • 根据L2G评分和跨研究重复验证结果优先选择靶点
  • 寻找达到全基因组显著性关联的基因(p < 5e-8)
功能基因组学
  • 将疾病相关变异映射到候选基因
  • 分析复杂性状的遗传结构
  • 理解多基因疾病的发病机制

Workflow

工作流程

1. Trait Search → Search GWAS Catalog by disease/trait name
2. SNP Aggregation → Collect genome-wide significant SNPs (p < 5e-8)
3. Gene Mapping → Extract mapped genes from associations
4. Evidence Ranking → Score by p-value, replication, fine-mapping
5. Annotation (Optional) → Add L2G predictions from Open Targets
1. 性状搜索 → 根据疾病/性状名称搜索GWAS Catalog
2. SNP聚合 → 收集达到全基因组显著性的SNP(p < 5e-8)
3. 基因映射 → 从关联数据中提取映射的基因
4. 证据排序 → 按p值、重复验证结果、精细定位结果评分
5. 注释(可选) → 加入来自Open Targets的L2G预测

Key Concepts

核心概念

Genome-wide Significance
  • Standard threshold: p < 5×10⁻⁸
  • Accounts for multiple testing burden across ~1M common variants
  • Higher confidence: p < 5×10⁻¹⁰ or replicated across studies
Gene Mapping Methods
  • Positional: Nearest gene to lead SNP
  • Fine-mapping: Statistical refinement to credible variants
  • Locus-to-Gene (L2G): Integrative score combining multiple evidence types
Evidence Confidence Levels
  • High: L2G score > 0.5 OR multiple studies with p < 5e-10
  • Medium: 2+ studies with p < 5e-8
  • Low: Single study or marginal significance
全基因组显著性
  • 标准阈值:p < 5×10⁻⁸
  • 用于校正约100万个常见变异的多重检验偏差
  • 更高置信度:p < 5×10⁻¹⁰ 或在多项研究中重复验证
基因映射方法
  • 位置映射:将SNP分配给距离最近的基因
  • 精细定位:通过统计方法缩小到可信变异位点
  • 基因座-基因(L2G):整合多种证据类型的综合评分
证据置信水平
  • :L2G评分>0.5 或多项研究中p < 5e-10
  • :2项及以上研究中p < 5e-8
  • :仅单一研究或显著性边缘

Required ToolUniverse Tools

所需ToolUniverse工具

GWAS Catalog (11 tools)

GWAS Catalog(11款工具)

  • gwas_get_associations_for_trait
    - Get all associations for a trait (sorted by p-value)
  • gwas_search_snps
    - Search SNPs by gene mapping
  • gwas_get_snp_by_id
    - Get SNP details (MAF, consequence, location)
  • gwas_get_study_by_id
    - Get study metadata
  • gwas_search_associations
    - Search associations with filters
  • gwas_search_studies
    - Search studies by trait/cohort
  • gwas_get_associations_for_snp
    - Get all associations for a SNP
  • gwas_get_variants_for_trait
    - Get variants for a trait
  • gwas_get_studies_for_trait
    - Get studies for a trait
  • gwas_get_snps_for_gene
    - Get SNPs mapped to a gene
  • gwas_get_associations_for_study
    - Get associations from a study
  • gwas_get_associations_for_trait
    - 获取某一性状的所有关联数据(按p值排序)
  • gwas_search_snps
    - 根据基因映射结果搜索SNP
  • gwas_get_snp_by_id
    - 获取SNP详细信息(MAF、变异后果、位置)
  • gwas_get_study_by_id
    - 获取研究元数据
  • gwas_search_associations
    - 带筛选条件的关联数据搜索
  • gwas_search_studies
    - 按性状/队列搜索研究
  • gwas_get_associations_for_snp
    - 获取某一SNP的所有关联数据
  • gwas_get_variants_for_trait
    - 获取某一性状的变异位点
  • gwas_get_studies_for_trait
    - 获取某一性状的相关研究
  • gwas_get_snps_for_gene
    - 获取与某一基因映射的SNP
  • gwas_get_associations_for_study
    - 获取某一研究的关联数据

Open Targets Genetics (6 tools)

Open Targets Genetics(6款工具)

  • OpenTargets_search_gwas_studies_by_disease
    - Search studies by disease ontology
  • OpenTargets_get_study_credible_sets
    - Get fine-mapped loci for a study
  • OpenTargets_get_variant_credible_sets
    - Get credible sets for a variant
  • OpenTargets_get_variant_info
    - Get variant annotation (frequencies, consequences)
  • OpenTargets_get_gwas_study
    - Get study metadata
  • OpenTargets_get_credible_set_detail
    - Get detailed credible set information
  • OpenTargets_search_gwas_studies_by_disease
    - 按疾病本体搜索研究
  • OpenTargets_get_study_credible_sets
    - 获取某一研究的精细定位位点
  • OpenTargets_get_variant_credible_sets
    - 获取某一变异的可信集合
  • OpenTargets_get_variant_info
    - 获取变异注释信息(频率、后果)
  • OpenTargets_get_gwas_study
    - 获取研究元数据
  • OpenTargets_get_credible_set_detail
    - 获取可信集合的详细信息

Parameters

参数说明

Required
  • trait
    - Disease/trait name (e.g., "type 2 diabetes", "coronary artery disease")
Optional
  • p_value_threshold
    - Significance threshold (default: 5e-8)
  • min_evidence_count
    - Minimum number of studies (default: 1)
  • max_results
    - Maximum genes to return (default: 100)
  • use_fine_mapping
    - Include L2G predictions (default: true)
  • disease_ontology_id
    - Disease ontology ID for Open Targets (e.g., "MONDO_0005148")
必填参数
  • trait
    - 疾病/性状名称(例如:"2型糖尿病"、"冠心病")
可选参数
  • p_value_threshold
    - 显著性阈值(默认:5e-8)
  • min_evidence_count
    - 最小研究数量(默认:1)
  • max_results
    - 返回的最大基因数量(默认:100)
  • use_fine_mapping
    - 是否包含L2G预测(默认:true)
  • disease_ontology_id
    - Open Targets的疾病本体ID(例如:"MONDO_0005148")

Output Schema

输出格式

python
{
  "genes": [
    {
      "symbol": str,              # Gene symbol (e.g., "TCF7L2")
      "min_p_value": float,       # Most significant p-value
      "evidence_count": int,      # Number of independent studies
      "snps": [str],              # Associated SNP rs IDs
      "studies": [str],           # GWAS study accessions
      "l2g_score": float | null,  # Locus-to-gene score (0-1)
      "credible_sets": int,       # Number of credible sets
      "confidence_level": str     # "High", "Medium", or "Low"
    }
  ],
  "summary": {
    "trait": str,
    "total_associations": int,
    "significant_genes": int,
    "data_sources": ["GWAS Catalog", "Open Targets"]
  }
}
python
{
  "genes": [
    {
      "symbol": str,              # 基因符号(例如:"TCF7L2")
      "min_p_value": float,       # 最显著的p值
      "evidence_count": int,      # 独立研究数量
      "snps": [str],              # 关联的SNP rs编号
      "studies": [str],           # GWAS研究编号
      "l2g_score": float | null,  # 基因座-基因(L2G)评分(0-1)
      "credible_sets": int,       # 可信集合数量
      "confidence_level": str     # "High"、"Medium"或"Low"
    }
  ],
  "summary": {
    "trait": str,
    "total_associations": int,
    "significant_genes": int,
    "data_sources": ["GWAS Catalog", "Open Targets"]
  }
}

Example Results

结果示例

Type 2 Diabetes
TCF7L2:  p=1.2e-98, 15 studies, L2G=0.82 → High confidence
KCNJ11:  p=3.4e-67, 12 studies, L2G=0.76 → High confidence
PPARG:   p=2.1e-45, 8 studies,  L2G=0.71 → High confidence
FTO:     p=5.6e-42, 10 studies, L2G=0.68 → High confidence
IRS1:    p=8.9e-38, 6 studies,  L2G=0.54 → High confidence
Alzheimer's Disease
APOE:    p=1.0e-450, 25 studies, L2G=0.95 → High confidence
BIN1:    p=2.3e-89,  18 studies, L2G=0.88 → High confidence
CLU:     p=4.5e-67,  16 studies, L2G=0.82 → High confidence
ABCA7:   p=6.7e-54,  14 studies, L2G=0.79 → High confidence
CR1:     p=8.9e-52,  13 studies, L2G=0.75 → High confidence
2型糖尿病
TCF7L2:  p=1.2e-98, 15项研究, L2G=0.82 → 高置信度
KCNJ11:  p=3.4e-67, 12项研究, L2G=0.76 → 高置信度
PPARG:   p=2.1e-45, 8项研究,  L2G=0.71 → 高置信度
FTO:     p=5.6e-42, 10项研究, L2G=0.68 → 高置信度
IRS1:    p=8.9e-38, 6项研究,  L2G=0.54 → 高置信度
阿尔茨海默病
APOE:    p=1.0e-450, 25项研究, L2G=0.95 → 高置信度
BIN1:    p=2.3e-89,  18项研究, L2G=0.88 → 高置信度
CLU:     p=4.5e-67,  16项研究, L2G=0.82 → 高置信度
ABCA7:   p=6.7e-54,  14项研究, L2G=0.79 → 高置信度
CR1:     p=8.9e-52,  13项研究, L2G=0.75 → 高置信度

Best Practices

最佳实践

1. Use Disease Ontology IDs for Precision
undefined
1. 使用疾病本体ID提高精准度
undefined

Instead of:

避免:

discover_gwas_genes("diabetes") # Ambiguous
discover_gwas_genes("diabetes") # 含义模糊

Use:

推荐:

discover_gwas_genes( "type 2 diabetes", disease_ontology_id="MONDO_0005148" # Specific )

**2. Filter by Evidence Strength**
discover_gwas_genes( "type 2 diabetes", disease_ontology_id="MONDO_0005148" # 精准定位 )

**2. 按证据强度筛选**

For drug targets, require strong evidence:

针对药物靶点,要求强证据:

discover_gwas_genes( "coronary artery disease", p_value_threshold=5e-10, # Stricter than GWAS threshold min_evidence_count=3, # Multiple independent studies use_fine_mapping=True # Include L2G predictions )

**3. Interpret Results Carefully**
- **Association ≠ Causation**: GWAS identifies correlated variants, not necessarily causal genes
- **Linkage Disequilibrium**: Lead SNP may tag the true causal variant in a nearby gene
- **Fine-mapping**: L2G scores provide better causal gene evidence than positional mapping
- **Functional Evidence**: Validate with orthogonal data (eQTLs, knockout models, etc.)
discover_gwas_genes( "coronary artery disease", p_value_threshold=5e-10, # 比GWAS标准阈值更严格 min_evidence_count=3, # 需多项独立研究验证 use_fine_mapping=True # 包含L2G预测 )

**3. 谨慎解读结果**
- **关联≠因果**:GWAS识别的是相关变异,不一定是致病基因
- **连锁不平衡**:主导SNP可能标记的是附近基因中的真正致病变异
- **精细定位**:L2G评分比位置映射能提供更可靠的致病基因证据
- **功能验证**:需结合正交数据(如eQTL、基因敲除模型等)进行验证

Limitations

局限性

  1. Gene Mapping Uncertainty
    • Positional mapping assigns SNPs to nearest gene (may be incorrect)
    • Fine-mapping available for only a subset of studies
    • Intergenic variants difficult to map
  2. Population Bias
    • Most GWAS in European populations
    • Effect sizes may differ across ancestries
    • Rare variants often under-represented
  3. Sample Size Dependence
    • Larger studies detect more associations
    • Older small studies may have false negatives
    • p-values alone don't indicate effect size
  4. Validation Bug
    • Some ToolUniverse tools have oneOf validation issues
    • Use
      validate=False
      parameter if needed
    • This is automatically handled in the Python implementation
  1. 基因映射不确定性
    • 位置映射将SNP分配给最近的基因(可能不准确)
    • 仅部分研究支持精细定位
    • 基因间变异难以映射
  2. 人群偏差
    • 大多数GWAS研究基于欧洲人群
    • 效应量可能因种族而异
    • 罕见变异通常代表性不足
  3. 样本量依赖性
    • 样本量更大的研究能检测到更多关联
    • 早期小样本研究可能存在假阴性
    • p值本身无法反映效应量大小
  4. 验证缺陷
    • 部分ToolUniverse工具存在oneOf验证问题
    • 必要时使用
      validate=False
      参数
    • Python实现中已自动处理此问题

Related Skills

相关技能

  • Variant-to-Disease Association: Look up specific SNPs (e.g., rs7903146 → T2D)
  • Gene-to-Disease Links: Find diseases associated with known genes
  • Drug Target Prioritization: Rank targets by genetic evidence
  • Population Genetics Analysis: Compare allele frequencies across populations
  • 变异-疾病关联分析:查询特定SNP(例如:rs7903146 → 2型糖尿病)
  • 基因-疾病关联查询:查找与已知基因相关的疾病
  • 药物靶点优先排序:根据遗传证据对靶点进行排名
  • 群体遗传学分析:比较不同人群的等位基因频率

Data Sources

数据来源

GWAS Catalog
  • Curator: EBI and NHGRI
  • URL: https://www.ebi.ac.uk/gwas/
  • Coverage: 100,000+ publications, 500,000+ associations
  • Update Frequency: Weekly
Open Targets Genetics
  • Curator: Open Targets consortium
  • URL: https://genetics.opentargets.org/
  • Coverage: Fine-mapped GWAS, L2G predictions, QTL colocalization
  • Update Frequency: Quarterly
GWAS Catalog
  • 整理方:EBI和NHGRI
  • 网址:https://www.ebi.ac.uk/gwas/
  • 覆盖范围:10万+篇出版物,50万+条关联数据
  • 更新频率:每周
Open Targets Genetics
  • 整理方:Open Targets联盟
  • 网址:https://genetics.opentargets.org/
  • 覆盖范围:精细定位的GWAS信号、L2G预测、QTL共定位
  • 更新频率:每季度

Citation

引用说明

If you use this skill in research, please cite:
Buniello A, et al. (2019) The NHGRI-EBI GWAS Catalog of published genome-wide
association studies. Nucleic Acids Research, 47(D1):D1005-D1012.

Mountjoy E, et al. (2021) An open approach to systematically prioritize causal
variants and genes at all published human GWAS trait-associated loci.
Nature Genetics, 53:1527-1533.
若在研究中使用本技能,请引用:
Buniello A, et al. (2019) 已发表全基因组关联研究的NHGRI-EBI GWAS目录. Nucleic Acids Research, 47(D1):D1005-D1012.

Mountjoy E, et al. (2021) 系统性优先定位所有已发表人类GWAS性状关联位点中致病变异和基因的开放方法. Nature Genetics, 53:1527-1533.

Support

支持与反馈

For issues with:
  • Skill functionality: Open issue at tooluniverse/skills
  • GWAS data: Contact GWAS Catalog or Open Targets support
  • Tool errors: Check ToolUniverse tool status
若遇到以下问题:
  • 技能功能问题:在tooluniverse/skills提交issue
  • GWAS数据问题:联系GWAS Catalog或Open Targets支持团队
  • 工具错误:查看ToolUniverse工具状态