tooluniverse-gwas-trait-to-gene
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGWAS Trait-to-Gene Discovery
基于GWAS的性状关联基因发现
Discover genes associated with diseases and traits using genome-wide association studies (GWAS)
利用全基因组关联研究(GWAS)发现与疾病和性状相关的基因
Overview
概述
This skill enables systematic discovery of genes linked to diseases/traits by analyzing GWAS data from two major resources:
- GWAS Catalog (EBI/NHGRI): Curated catalog of published GWAS with >500,000 associations
- Open Targets Genetics: Fine-mapped GWAS signals with locus-to-gene (L2G) predictions
本技能通过分析两大核心资源的GWAS数据,可系统性发现与疾病/性状相关的基因:
- GWAS Catalog(EBI/NHGRI):精心整理的已发表GWAS目录,包含超过50万条关联数据
- Open Targets Genetics:经过精细定位的GWAS信号,提供基因座-基因(L2G)预测
Use Cases
适用场景
Clinical Research
- "What genes are associated with type 2 diabetes?"
- "Find genetic risk factors for coronary artery disease"
- "Which genes contribute to Alzheimer's disease susceptibility?"
Drug Target Discovery
- Identify genes with strong genetic evidence for disease causation
- Prioritize targets based on L2G scores and replication across studies
- Find genes with genome-wide significant associations (p < 5e-8)
Functional Genomics
- Map disease-associated variants to candidate genes
- Analyze genetic architecture of complex traits
- Understand polygenic disease mechanisms
临床研究
- "哪些基因与2型糖尿病相关?"
- "寻找冠心病的遗传风险因素"
- "哪些基因会增加阿尔茨海默病的易感性?"
药物靶点发现
- 识别具有强遗传致病证据的基因
- 根据L2G评分和跨研究重复验证结果优先选择靶点
- 寻找达到全基因组显著性关联的基因(p < 5e-8)
功能基因组学
- 将疾病相关变异映射到候选基因
- 分析复杂性状的遗传结构
- 理解多基因疾病的发病机制
Workflow
工作流程
1. Trait Search → Search GWAS Catalog by disease/trait name
↓
2. SNP Aggregation → Collect genome-wide significant SNPs (p < 5e-8)
↓
3. Gene Mapping → Extract mapped genes from associations
↓
4. Evidence Ranking → Score by p-value, replication, fine-mapping
↓
5. Annotation (Optional) → Add L2G predictions from Open Targets1. 性状搜索 → 根据疾病/性状名称搜索GWAS Catalog
↓
2. SNP聚合 → 收集达到全基因组显著性的SNP(p < 5e-8)
↓
3. 基因映射 → 从关联数据中提取映射的基因
↓
4. 证据排序 → 按p值、重复验证结果、精细定位结果评分
↓
5. 注释(可选) → 加入来自Open Targets的L2G预测Key Concepts
核心概念
Genome-wide Significance
- Standard threshold: p < 5×10⁻⁸
- Accounts for multiple testing burden across ~1M common variants
- Higher confidence: p < 5×10⁻¹⁰ or replicated across studies
Gene Mapping Methods
- Positional: Nearest gene to lead SNP
- Fine-mapping: Statistical refinement to credible variants
- Locus-to-Gene (L2G): Integrative score combining multiple evidence types
Evidence Confidence Levels
- High: L2G score > 0.5 OR multiple studies with p < 5e-10
- Medium: 2+ studies with p < 5e-8
- Low: Single study or marginal significance
全基因组显著性
- 标准阈值:p < 5×10⁻⁸
- 用于校正约100万个常见变异的多重检验偏差
- 更高置信度:p < 5×10⁻¹⁰ 或在多项研究中重复验证
基因映射方法
- 位置映射:将SNP分配给距离最近的基因
- 精细定位:通过统计方法缩小到可信变异位点
- 基因座-基因(L2G):整合多种证据类型的综合评分
证据置信水平
- 高:L2G评分>0.5 或多项研究中p < 5e-10
- 中:2项及以上研究中p < 5e-8
- 低:仅单一研究或显著性边缘
Required ToolUniverse Tools
所需ToolUniverse工具
GWAS Catalog (11 tools)
GWAS Catalog(11款工具)
- - Get all associations for a trait (sorted by p-value)
gwas_get_associations_for_trait - - Search SNPs by gene mapping
gwas_search_snps - - Get SNP details (MAF, consequence, location)
gwas_get_snp_by_id - - Get study metadata
gwas_get_study_by_id - - Search associations with filters
gwas_search_associations - - Search studies by trait/cohort
gwas_search_studies - - Get all associations for a SNP
gwas_get_associations_for_snp - - Get variants for a trait
gwas_get_variants_for_trait - - Get studies for a trait
gwas_get_studies_for_trait - - Get SNPs mapped to a gene
gwas_get_snps_for_gene - - Get associations from a study
gwas_get_associations_for_study
- - 获取某一性状的所有关联数据(按p值排序)
gwas_get_associations_for_trait - - 根据基因映射结果搜索SNP
gwas_search_snps - - 获取SNP详细信息(MAF、变异后果、位置)
gwas_get_snp_by_id - - 获取研究元数据
gwas_get_study_by_id - - 带筛选条件的关联数据搜索
gwas_search_associations - - 按性状/队列搜索研究
gwas_search_studies - - 获取某一SNP的所有关联数据
gwas_get_associations_for_snp - - 获取某一性状的变异位点
gwas_get_variants_for_trait - - 获取某一性状的相关研究
gwas_get_studies_for_trait - - 获取与某一基因映射的SNP
gwas_get_snps_for_gene - - 获取某一研究的关联数据
gwas_get_associations_for_study
Open Targets Genetics (6 tools)
Open Targets Genetics(6款工具)
- - Search studies by disease ontology
OpenTargets_search_gwas_studies_by_disease - - Get fine-mapped loci for a study
OpenTargets_get_study_credible_sets - - Get credible sets for a variant
OpenTargets_get_variant_credible_sets - - Get variant annotation (frequencies, consequences)
OpenTargets_get_variant_info - - Get study metadata
OpenTargets_get_gwas_study - - Get detailed credible set information
OpenTargets_get_credible_set_detail
- - 按疾病本体搜索研究
OpenTargets_search_gwas_studies_by_disease - - 获取某一研究的精细定位位点
OpenTargets_get_study_credible_sets - - 获取某一变异的可信集合
OpenTargets_get_variant_credible_sets - - 获取变异注释信息(频率、后果)
OpenTargets_get_variant_info - - 获取研究元数据
OpenTargets_get_gwas_study - - 获取可信集合的详细信息
OpenTargets_get_credible_set_detail
Parameters
参数说明
Required
- - Disease/trait name (e.g., "type 2 diabetes", "coronary artery disease")
trait
Optional
- - Significance threshold (default: 5e-8)
p_value_threshold - - Minimum number of studies (default: 1)
min_evidence_count - - Maximum genes to return (default: 100)
max_results - - Include L2G predictions (default: true)
use_fine_mapping - - Disease ontology ID for Open Targets (e.g., "MONDO_0005148")
disease_ontology_id
必填参数
- - 疾病/性状名称(例如:"2型糖尿病"、"冠心病")
trait
可选参数
- - 显著性阈值(默认:5e-8)
p_value_threshold - - 最小研究数量(默认:1)
min_evidence_count - - 返回的最大基因数量(默认:100)
max_results - - 是否包含L2G预测(默认:true)
use_fine_mapping - - Open Targets的疾病本体ID(例如:"MONDO_0005148")
disease_ontology_id
Output Schema
输出格式
python
{
"genes": [
{
"symbol": str, # Gene symbol (e.g., "TCF7L2")
"min_p_value": float, # Most significant p-value
"evidence_count": int, # Number of independent studies
"snps": [str], # Associated SNP rs IDs
"studies": [str], # GWAS study accessions
"l2g_score": float | null, # Locus-to-gene score (0-1)
"credible_sets": int, # Number of credible sets
"confidence_level": str # "High", "Medium", or "Low"
}
],
"summary": {
"trait": str,
"total_associations": int,
"significant_genes": int,
"data_sources": ["GWAS Catalog", "Open Targets"]
}
}python
{
"genes": [
{
"symbol": str, # 基因符号(例如:"TCF7L2")
"min_p_value": float, # 最显著的p值
"evidence_count": int, # 独立研究数量
"snps": [str], # 关联的SNP rs编号
"studies": [str], # GWAS研究编号
"l2g_score": float | null, # 基因座-基因(L2G)评分(0-1)
"credible_sets": int, # 可信集合数量
"confidence_level": str # "High"、"Medium"或"Low"
}
],
"summary": {
"trait": str,
"total_associations": int,
"significant_genes": int,
"data_sources": ["GWAS Catalog", "Open Targets"]
}
}Example Results
结果示例
Type 2 Diabetes
TCF7L2: p=1.2e-98, 15 studies, L2G=0.82 → High confidence
KCNJ11: p=3.4e-67, 12 studies, L2G=0.76 → High confidence
PPARG: p=2.1e-45, 8 studies, L2G=0.71 → High confidence
FTO: p=5.6e-42, 10 studies, L2G=0.68 → High confidence
IRS1: p=8.9e-38, 6 studies, L2G=0.54 → High confidenceAlzheimer's Disease
APOE: p=1.0e-450, 25 studies, L2G=0.95 → High confidence
BIN1: p=2.3e-89, 18 studies, L2G=0.88 → High confidence
CLU: p=4.5e-67, 16 studies, L2G=0.82 → High confidence
ABCA7: p=6.7e-54, 14 studies, L2G=0.79 → High confidence
CR1: p=8.9e-52, 13 studies, L2G=0.75 → High confidence2型糖尿病
TCF7L2: p=1.2e-98, 15项研究, L2G=0.82 → 高置信度
KCNJ11: p=3.4e-67, 12项研究, L2G=0.76 → 高置信度
PPARG: p=2.1e-45, 8项研究, L2G=0.71 → 高置信度
FTO: p=5.6e-42, 10项研究, L2G=0.68 → 高置信度
IRS1: p=8.9e-38, 6项研究, L2G=0.54 → 高置信度阿尔茨海默病
APOE: p=1.0e-450, 25项研究, L2G=0.95 → 高置信度
BIN1: p=2.3e-89, 18项研究, L2G=0.88 → 高置信度
CLU: p=4.5e-67, 16项研究, L2G=0.82 → 高置信度
ABCA7: p=6.7e-54, 14项研究, L2G=0.79 → 高置信度
CR1: p=8.9e-52, 13项研究, L2G=0.75 → 高置信度Best Practices
最佳实践
1. Use Disease Ontology IDs for Precision
undefined1. 使用疾病本体ID提高精准度
undefinedInstead of:
避免:
discover_gwas_genes("diabetes") # Ambiguous
discover_gwas_genes("diabetes") # 含义模糊
Use:
推荐:
discover_gwas_genes(
"type 2 diabetes",
disease_ontology_id="MONDO_0005148" # Specific
)
**2. Filter by Evidence Strength**discover_gwas_genes(
"type 2 diabetes",
disease_ontology_id="MONDO_0005148" # 精准定位
)
**2. 按证据强度筛选**For drug targets, require strong evidence:
针对药物靶点,要求强证据:
discover_gwas_genes(
"coronary artery disease",
p_value_threshold=5e-10, # Stricter than GWAS threshold
min_evidence_count=3, # Multiple independent studies
use_fine_mapping=True # Include L2G predictions
)
**3. Interpret Results Carefully**
- **Association ≠ Causation**: GWAS identifies correlated variants, not necessarily causal genes
- **Linkage Disequilibrium**: Lead SNP may tag the true causal variant in a nearby gene
- **Fine-mapping**: L2G scores provide better causal gene evidence than positional mapping
- **Functional Evidence**: Validate with orthogonal data (eQTLs, knockout models, etc.)discover_gwas_genes(
"coronary artery disease",
p_value_threshold=5e-10, # 比GWAS标准阈值更严格
min_evidence_count=3, # 需多项独立研究验证
use_fine_mapping=True # 包含L2G预测
)
**3. 谨慎解读结果**
- **关联≠因果**:GWAS识别的是相关变异,不一定是致病基因
- **连锁不平衡**:主导SNP可能标记的是附近基因中的真正致病变异
- **精细定位**:L2G评分比位置映射能提供更可靠的致病基因证据
- **功能验证**:需结合正交数据(如eQTL、基因敲除模型等)进行验证Limitations
局限性
-
Gene Mapping Uncertainty
- Positional mapping assigns SNPs to nearest gene (may be incorrect)
- Fine-mapping available for only a subset of studies
- Intergenic variants difficult to map
-
Population Bias
- Most GWAS in European populations
- Effect sizes may differ across ancestries
- Rare variants often under-represented
-
Sample Size Dependence
- Larger studies detect more associations
- Older small studies may have false negatives
- p-values alone don't indicate effect size
-
Validation Bug
- Some ToolUniverse tools have oneOf validation issues
- Use parameter if needed
validate=False - This is automatically handled in the Python implementation
-
基因映射不确定性
- 位置映射将SNP分配给最近的基因(可能不准确)
- 仅部分研究支持精细定位
- 基因间变异难以映射
-
人群偏差
- 大多数GWAS研究基于欧洲人群
- 效应量可能因种族而异
- 罕见变异通常代表性不足
-
样本量依赖性
- 样本量更大的研究能检测到更多关联
- 早期小样本研究可能存在假阴性
- p值本身无法反映效应量大小
-
验证缺陷
- 部分ToolUniverse工具存在oneOf验证问题
- 必要时使用参数
validate=False - Python实现中已自动处理此问题
Related Skills
相关技能
- Variant-to-Disease Association: Look up specific SNPs (e.g., rs7903146 → T2D)
- Gene-to-Disease Links: Find diseases associated with known genes
- Drug Target Prioritization: Rank targets by genetic evidence
- Population Genetics Analysis: Compare allele frequencies across populations
- 变异-疾病关联分析:查询特定SNP(例如:rs7903146 → 2型糖尿病)
- 基因-疾病关联查询:查找与已知基因相关的疾病
- 药物靶点优先排序:根据遗传证据对靶点进行排名
- 群体遗传学分析:比较不同人群的等位基因频率
Data Sources
数据来源
GWAS Catalog
- Curator: EBI and NHGRI
- URL: https://www.ebi.ac.uk/gwas/
- Coverage: 100,000+ publications, 500,000+ associations
- Update Frequency: Weekly
Open Targets Genetics
- Curator: Open Targets consortium
- URL: https://genetics.opentargets.org/
- Coverage: Fine-mapped GWAS, L2G predictions, QTL colocalization
- Update Frequency: Quarterly
GWAS Catalog
- 整理方:EBI和NHGRI
- 网址:https://www.ebi.ac.uk/gwas/
- 覆盖范围:10万+篇出版物,50万+条关联数据
- 更新频率:每周
Open Targets Genetics
- 整理方:Open Targets联盟
- 网址:https://genetics.opentargets.org/
- 覆盖范围:精细定位的GWAS信号、L2G预测、QTL共定位
- 更新频率:每季度
Citation
引用说明
If you use this skill in research, please cite:
Buniello A, et al. (2019) The NHGRI-EBI GWAS Catalog of published genome-wide
association studies. Nucleic Acids Research, 47(D1):D1005-D1012.
Mountjoy E, et al. (2021) An open approach to systematically prioritize causal
variants and genes at all published human GWAS trait-associated loci.
Nature Genetics, 53:1527-1533.若在研究中使用本技能,请引用:
Buniello A, et al. (2019) 已发表全基因组关联研究的NHGRI-EBI GWAS目录. Nucleic Acids Research, 47(D1):D1005-D1012.
Mountjoy E, et al. (2021) 系统性优先定位所有已发表人类GWAS性状关联位点中致病变异和基因的开放方法. Nature Genetics, 53:1527-1533.Support
支持与反馈
For issues with:
- Skill functionality: Open issue at tooluniverse/skills
- GWAS data: Contact GWAS Catalog or Open Targets support
- Tool errors: Check ToolUniverse tool status
若遇到以下问题:
- 技能功能问题:在tooluniverse/skills提交issue
- GWAS数据问题:联系GWAS Catalog或Open Targets支持团队
- 工具错误:查看ToolUniverse工具状态