gwas-database

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GWAS Catalog Database

GWAS Catalog数据库

Overview

概述

The GWAS Catalog is a comprehensive repository of published genome-wide association studies maintained by the National Human Genome Research Institute (NHGRI) and the European Bioinformatics Institute (EBI). The catalog contains curated SNP-trait associations from thousands of GWAS publications, including genetic variants, associated traits and diseases, p-values, effect sizes, and full summary statistics for many studies.
GWAS Catalog是由美国国家人类基因组研究所(NHGRI)和欧洲生物信息研究所(EBI)维护的已发表全基因组关联研究(GWAS)的综合知识库。该数据库包含来自数千篇GWAS论文的经过整理的SNP-性状关联信息,包括遗传变异体、相关性状和疾病、p值、效应量,以及许多研究的完整汇总统计数据。

When to Use This Skill

何时使用该技能

This skill should be used when queries involve:
  • Genetic variant associations: Finding SNPs associated with diseases or traits
  • SNP lookups: Retrieving information about specific genetic variants (rs IDs)
  • Trait/disease searches: Discovering genetic associations for phenotypes
  • Gene associations: Finding variants in or near specific genes
  • GWAS summary statistics: Accessing complete genome-wide association data
  • Study metadata: Retrieving publication and cohort information
  • Population genetics: Exploring ancestry-specific associations
  • Polygenic risk scores: Identifying variants for risk prediction models
  • Functional genomics: Understanding variant effects and genomic context
  • Systematic reviews: Comprehensive literature synthesis of genetic associations
当查询涉及以下场景时,应使用该技能:
  • 遗传变异体关联:查找与疾病或性状相关的SNP
  • SNP查询:检索特定遗传变异体(rs ID)的信息
  • 性状/疾病搜索:发现与表型相关的遗传关联
  • 基因关联:查找特定基因内部或附近的变异体
  • GWAS汇总统计数据:获取完整的全基因组关联数据
  • 研究元数据:检索论文和队列信息
  • 群体遗传学:探索特定祖先群体的关联
  • 多基因风险评分:识别用于风险预测模型的变异体
  • 功能基因组学:理解变异体效应和基因组背景
  • 系统综述:遗传关联研究的综合文献整合

Core Capabilities

核心功能

1. Understanding GWAS Catalog Data Structure

1. 理解GWAS Catalog数据结构

The GWAS Catalog is organized around four core entities:
  • Studies: GWAS publications with metadata (PMID, author, cohort details)
  • Associations: SNP-trait associations with statistical evidence (p ≤ 5×10⁻⁸)
  • Variants: Genetic markers (SNPs) with genomic coordinates and alleles
  • Traits: Phenotypes and diseases (mapped to EFO ontology terms)
Key Identifiers:
  • Study accessions:
    GCST
    IDs (e.g., GCST001234)
  • Variant IDs:
    rs
    numbers (e.g., rs7903146) or
    variant_id
    format
  • Trait IDs: EFO terms (e.g., EFO_0001360 for type 2 diabetes)
  • Gene symbols: HGNC approved names (e.g., TCF7L2)
GWAS Catalog围绕四个核心实体组织:
  • 研究(Studies):包含元数据的GWAS论文(PMID、作者、队列详情)
  • 关联(Associations):带有统计证据的SNP-性状关联(p ≤ 5×10⁻⁸)
  • 变异体(Variants):带有基因组坐标和等位基因的遗传标记(SNP)
  • 性状(Traits):表型和疾病(映射到EFO本体术语)
关键标识符:
  • 研究编号:
    GCST
    ID(例如:GCST001234)
  • 变异体ID:
    rs
    编号(例如:rs7903146)或
    variant_id
    格式
  • 性状ID:EFO术语(例如:2型糖尿病对应的EFO_0001360)
  • 基因符号:HGNC批准的名称(例如:TCF7L2)

2. Web Interface Searches

2. 网页界面搜索

The web interface at https://www.ebi.ac.uk/gwas/ supports multiple search modes:
By Variant (rs ID):
rs7903146
Returns all trait associations for this SNP.
By Disease/Trait:
type 2 diabetes
Parkinson disease
body mass index
Returns all associated genetic variants.
By Gene:
APOE
TCF7L2
Returns variants in or near the gene region.
By Chromosomal Region:
10:114000000-115000000
Returns variants in the specified genomic interval.
By Publication:
PMID:20581827
Author: McCarthy MI
GCST001234
Returns study details and all reported associations.
按变异体(rs ID)搜索:
rs7903146
返回该SNP的所有性状关联信息。
按疾病/性状搜索:
type 2 diabetes
Parkinson disease
body mass index
返回所有相关的遗传变异体。
按基因搜索:
APOE
TCF7L2
返回该基因内部或附近的变异体。
按染色体区域搜索:
10:114000000-115000000
返回指定基因组区间内的变异体。
按论文搜索:
PMID:20581827
Author: McCarthy MI
GCST001234
返回研究详情及所有报告的关联信息。

3. REST API Access

3. REST API访问

The GWAS Catalog provides two REST APIs for programmatic access:
Base URLs:
  • GWAS Catalog API:
    https://www.ebi.ac.uk/gwas/rest/api
  • Summary Statistics API:
    https://www.ebi.ac.uk/gwas/summary-statistics/api
API Documentation:
Core Endpoints:
  1. Studies endpoint -
    /studies/{accessionID}
    python
    import requests
    
    # Get a specific study
    url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795"
    response = requests.get(url, headers={"Content-Type": "application/json"})
    study = response.json()
  2. Associations endpoint -
    /associations
    python
    # Find associations for a variant
    variant = "rs7903146"
    url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{variant}/associations"
    params = {"projection": "associationBySnp"}
    response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
    associations = response.json()
  3. Variants endpoint -
    /singleNucleotidePolymorphisms/{rsID}
    python
    # Get variant details
    url = "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs7903146"
    response = requests.get(url, headers={"Content-Type": "application/json"})
    variant_info = response.json()
  4. Traits endpoint -
    /efoTraits/{efoID}
    python
    # Get trait information
    url = "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360"
    response = requests.get(url, headers={"Content-Type": "application/json"})
    trait_info = response.json()
GWAS Catalog提供两个REST API用于程序化访问:
基础URL:
  • GWAS Catalog API:
    https://www.ebi.ac.uk/gwas/rest/api
  • 汇总统计数据API:
    https://www.ebi.ac.uk/gwas/summary-statistics/api
API文档:
核心端点:
  1. 研究端点 -
    /studies/{accessionID}
    python
    import requests
    
    # 获取特定研究信息
    url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795"
    response = requests.get(url, headers={"Content-Type": "application/json"})
    study = response.json()
  2. 关联端点 -
    /associations
    python
    # 查找变异体的关联信息
    variant = "rs7903146"
    url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{variant}/associations"
    params = {"projection": "associationBySnp"}
    response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
    associations = response.json()
  3. 变异体端点 -
    /singleNucleotidePolymorphisms/{rsID}
    python
    # 获取变异体详情
    url = "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs7903146"
    response = requests.get(url, headers={"Content-Type": "application/json"})
    variant_info = response.json()
  4. 性状端点 -
    /efoTraits/{efoID}
    python
    # 获取性状信息
    url = "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360"
    response = requests.get(url, headers={"Content-Type": "application/json"})
    trait_info = response.json()

4. Query Examples and Patterns

4. 查询示例与模式

Example 1: Find all associations for a disease
python
import requests

trait = "EFO_0001360"  # Type 2 diabetes
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
示例1:查找某疾病的所有关联信息
python
import requests

trait = "EFO_0001360"  # 2型糖尿病
base_url = "https://www.ebi.ac.uk/gwas/rest/api"

Query associations for this trait

查询该性状的关联信息

url = f"{base_url}/efoTraits/{trait}/associations" response = requests.get(url, headers={"Content-Type": "application/json"}) associations = response.json()
url = f"{base_url}/efoTraits/{trait}/associations" response = requests.get(url, headers={"Content-Type": "application/json"}) associations = response.json()

Process results

处理结果

for assoc in associations.get('_embedded', {}).get('associations', []): variant = assoc.get('rsId') pvalue = assoc.get('pvalue') risk_allele = assoc.get('strongestAllele') print(f"{variant}: p={pvalue}, risk allele={risk_allele}")

**Example 2: Get variant information and all trait associations**
```python
import requests

variant = "rs7903146"
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
for assoc in associations.get('_embedded', {}).get('associations', []): variant = assoc.get('rsId') pvalue = assoc.get('pvalue') risk_allele = assoc.get('strongestAllele') print(f"{variant}: p={pvalue}, 风险等位基因={risk_allele}")

**示例2:获取变异体信息及所有性状关联**
```python
import requests

variant = "rs7903146"
base_url = "https://www.ebi.ac.uk/gwas/rest/api"

Get variant details

获取变异体详情

url = f"{base_url}/singleNucleotidePolymorphisms/{variant}" response = requests.get(url, headers={"Content-Type": "application/json"}) variant_data = response.json()
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}" response = requests.get(url, headers={"Content-Type": "application/json"}) variant_data = response.json()

Get all associations for this variant

获取该变异体的所有关联信息

url = f"{base_url}/singleNucleotidePolymorphisms/{variant}/associations" params = {"projection": "associationBySnp"} response = requests.get(url, params=params, headers={"Content-Type": "application/json"}) associations = response.json()
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}/associations" params = {"projection": "associationBySnp"} response = requests.get(url, params=params, headers={"Content-Type": "application/json"}) associations = response.json()

Extract trait names and p-values

提取性状名称和p值

for assoc in associations.get('_embedded', {}).get('associations', []): trait = assoc.get('efoTrait') pvalue = assoc.get('pvalue') print(f"Trait: {trait}, p-value: {pvalue}")

**Example 3: Access summary statistics**
```python
import requests
for assoc in associations.get('_embedded', {}).get('associations', []): trait = assoc.get('efoTrait') pvalue = assoc.get('pvalue') print(f"性状: {trait}, p值: {pvalue}")

**示例3:访问汇总统计数据**
```python
import requests

Query summary statistics API

查询汇总统计数据API

Find associations by trait with p-value threshold

按性状查询关联信息并设置p值阈值

trait = "EFO_0001360" # Type 2 diabetes p_upper = "0.000000001" # p < 1e-9 url = f"{base_url}/traits/{trait}/associations" params = { "p_upper": p_upper, "size": 100 # Number of results } response = requests.get(url, params=params) results = response.json()
trait = "EFO_0001360" # 2型糖尿病 p_upper = "0.000000001" # p < 1e-9 url = f"{base_url}/traits/{trait}/associations" params = { "p_upper": p_upper, "size": 100 # 结果数量 } response = requests.get(url, params=params) results = response.json()

Process genome-wide significant hits

处理全基因组显著关联结果

for hit in results.get('_embedded', {}).get('associations', []): variant_id = hit.get('variant_id') chromosome = hit.get('chromosome') position = hit.get('base_pair_location') pvalue = hit.get('p_value') print(f"{chromosome}:{position} ({variant_id}): p={pvalue}")

**Example 4: Query by chromosomal region**
```python
import requests
for hit in results.get('_embedded', {}).get('associations', []): variant_id = hit.get('variant_id') chromosome = hit.get('chromosome') position = hit.get('base_pair_location') pvalue = hit.get('p_value') print(f"{chromosome}:{position} ({variant_id}): p={pvalue}")

**示例4:按染色体区域查询**
```python
import requests

Find variants in a specific genomic region

查找特定基因组区域内的变异体

chromosome = "10" start_pos = 114000000 end_pos = 115000000
base_url = "https://www.ebi.ac.uk/gwas/rest/api" url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange" params = { "chrom": chromosome, "bpStart": start_pos, "bpEnd": end_pos } response = requests.get(url, params=params, headers={"Content-Type": "application/json"}) variants_in_region = response.json()
undefined
chromosome = "10" start_pos = 114000000 end_pos = 115000000
base_url = "https://www.ebi.ac.uk/gwas/rest/api" url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange" params = { "chrom": chromosome, "bpStart": start_pos, "bpEnd": end_pos } response = requests.get(url, params=params, headers={"Content-Type": "application/json"}) variants_in_region = response.json()
undefined

5. Working with Summary Statistics

5. 汇总统计数据的使用

The GWAS Catalog hosts full summary statistics for many studies, providing access to all tested variants (not just genome-wide significant hits).
Access Methods:
  1. FTP download: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/
  2. REST API: Query-based access to summary statistics
  3. Web interface: Browse and download via the website
Summary Statistics API Features:
  • Filter by chromosome, position, p-value
  • Query specific variants across studies
  • Retrieve effect sizes and allele frequencies
  • Access harmonized and standardized data
Example: Download summary statistics for a study
python
import requests
import gzip
GWAS Catalog为许多研究提供完整的汇总统计数据,可访问所有测试过的变异体(不仅是全基因组显著的变异体)。
访问方式:
  1. FTP下载http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/
  2. REST API:基于查询的汇总统计数据访问
  3. 网页界面:通过网站浏览和下载
汇总统计数据API功能:
  • 按染色体、位置、p值过滤
  • 查询跨研究的特定变异体
  • 检索效应量和等位基因频率
  • 获取经过标准化和统一的数据
示例:下载某研究的汇总统计数据
python
import requests
import gzip

Get available summary statistics

获取可用的汇总统计数据

base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api" url = f"{base_url}/studies/GCST001234" response = requests.get(url) study_info = response.json()
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api" url = f"{base_url}/studies/GCST001234" response = requests.get(url) study_info = response.json()

Download link is provided in the response

响应中提供了下载链接

Alternatively, use FTP:

也可以使用FTP:

ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/

ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/

undefined
undefined

6. Data Integration and Cross-referencing

6. 数据集成与交叉引用

The GWAS Catalog provides links to external resources:
Genomic Databases:
  • Ensembl: Gene annotations and variant consequences
  • dbSNP: Variant identifiers and population frequencies
  • gnomAD: Population allele frequencies
Functional Resources:
  • Open Targets: Target-disease associations
  • PGS Catalog: Polygenic risk scores
  • UCSC Genome Browser: Genomic context
Phenotype Resources:
  • EFO (Experimental Factor Ontology): Standardized trait terms
  • OMIM: Disease gene relationships
  • Disease Ontology: Disease hierarchies
Following Links in API Responses:
python
import requests
GWAS Catalog提供与外部资源的链接:
基因组数据库:
  • Ensembl:基因注释和变异体影响
  • dbSNP:变异体标识符和群体频率
  • gnomAD:群体等位基因频率
功能资源:
  • Open Targets:靶点-疾病关联
  • PGS Catalog:多基因风险评分
  • UCSC Genome Browser:基因组背景
表型资源:
  • EFO(实验因子本体):标准化性状术语
  • OMIM:疾病-基因关系
  • Disease Ontology:疾病层级
在API响应中跟随链接:
python
import requests

API responses include _links for related resources

API响应包含指向相关资源的_links字段

response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001234") study = response.json()
response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001234") study = response.json()

Follow link to associations

跟随链接获取关联信息

associations_url = study['_links']['associations']['href'] associations_response = requests.get(associations_url)
undefined
associations_url = study['_links']['associations']['href'] associations_response = requests.get(associations_url)
undefined

Query Workflows

查询工作流

Workflow 1: Exploring Genetic Associations for a Disease

工作流1:探索某疾病的遗传关联

  1. Identify the trait using EFO terms or free text:
    • Search web interface for disease name
    • Note the EFO ID (e.g., EFO_0001360 for type 2 diabetes)
  2. Query associations via API:
    python
    url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{efo_id}/associations"
  3. Filter by significance and population:
    • Check p-values (genome-wide significant: p ≤ 5×10⁻⁸)
    • Review ancestry information in study metadata
    • Filter by sample size or discovery/replication status
  4. Extract variant details:
    • rs IDs for each association
    • Effect alleles and directions
    • Effect sizes (odds ratios, beta coefficients)
    • Population allele frequencies
  5. Cross-reference with other databases:
    • Look up variant consequences in Ensembl
    • Check population frequencies in gnomAD
    • Explore gene function and pathways
  1. 使用EFO术语或自由文本确定性状
    • 在网页界面搜索疾病名称
    • 记录EFO ID(例如:2型糖尿病对应的EFO_0001360)
  2. 通过API查询关联信息:
    python
    url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{efo_id}/associations"
  3. 按显著性和群体过滤:
    • 检查p值(全基因组显著:p ≤ 5×10⁻⁸)
    • 查看研究元数据中的祖先信息
    • 按样本量或发现/重复研究状态过滤
  4. 提取变异体详情:
    • 每个关联的rs ID
    • 效应等位基因和方向
    • 效应量(比值比、β系数)
    • 群体等位基因频率
  5. 与其他数据库交叉引用:
    • 在Ensembl中查询变异体影响
    • 在gnomAD中检查群体频率
    • 探索基因功能和通路

Workflow 2: Investigating a Specific Genetic Variant

工作流2:研究特定遗传变异体

  1. Query the variant:
    python
    url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"
  2. Retrieve all trait associations:
    python
    url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}/associations"
  3. Analyze pleiotropy:
    • Identify all traits associated with this variant
    • Review effect directions across traits
    • Look for shared biological pathways
  4. Check genomic context:
    • Determine nearby genes
    • Identify if variant is in coding/regulatory regions
    • Review linkage disequilibrium with other variants
  1. 查询变异体:
    python
    url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"
  2. 检索所有性状关联:
    python
    url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}/associations"
  3. 分析多效性:
    • 识别与该变异体相关的所有性状
    • 查看跨性状的效应方向
    • 寻找共享的生物学通路
  4. 检查基因组背景:
    • 确定附近的基因
    • 识别变异体是否位于编码区/调控区
    • 查看与其他变异体的连锁不平衡

Workflow 3: Gene-Centric Association Analysis

工作流3:基于基因的关联分析

  1. Search by gene symbol in web interface or:
    python
    url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene"
    params = {"geneName": gene_symbol}
  2. Retrieve variants in gene region:
    • Get chromosomal coordinates for gene
    • Query variants in region
    • Include promoter and regulatory regions (extend boundaries)
  3. Analyze association patterns:
    • Identify traits associated with variants in this gene
    • Look for consistent associations across studies
    • Review effect sizes and directions
  4. Functional interpretation:
    • Determine variant consequences (missense, regulatory, etc.)
    • Check expression QTL (eQTL) data
    • Review pathway and network context
  1. 在网页界面按基因符号搜索,或使用:
    python
    url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene"
    params = {"geneName": gene_symbol}
  2. 检索基因区域内的变异体:
    • 获取基因的染色体坐标
    • 查询该区域内的变异体
    • 包含启动子和调控区域(扩展边界)
  3. 分析关联模式:
    • 识别与该基因内变异体相关的性状
    • 寻找跨研究的一致关联
    • 查看效应量和方向
  4. 功能解读:
    • 确定变异体的影响(错义、调控等)
    • 检查表达数量性状位点(eQTL)数据
    • 查看通路和网络背景

Workflow 4: Systematic Review of Genetic Evidence

工作流4:遗传证据的系统综述

  1. Define research question:
    • Specific trait or disease of interest
    • Population considerations
    • Study design requirements
  2. Comprehensive variant extraction:
    • Query all associations for trait
    • Set significance threshold
    • Note discovery and replication studies
  3. Quality assessment:
    • Review study sample sizes
    • Check for population diversity
    • Assess heterogeneity across studies
    • Identify potential biases
  4. Data synthesis:
    • Aggregate associations across studies
    • Perform meta-analysis if applicable
    • Create summary tables
    • Generate Manhattan or forest plots
  5. Export and documentation:
    • Download full association data
    • Export summary statistics if needed
    • Document search strategy and date
    • Create reproducible analysis scripts
  1. 定义研究问题:
    • 感兴趣的特定性状或疾病
    • 群体考量
    • 研究设计要求
  2. 全面提取变异体:
    • 查询该性状的所有关联信息
    • 设置显著性阈值
    • 记录发现和重复研究
  3. 质量评估:
    • 查看研究样本量
    • 检查群体多样性
    • 评估跨研究的异质性
    • 识别潜在偏倚
  4. 数据整合:
    • 跨研究汇总关联信息
    • 如适用,进行元分析
    • 创建汇总表格
    • 生成曼哈顿图或森林图
  5. 导出与文档记录:
    • 下载完整的关联数据
    • 如有需要,导出汇总统计数据
    • 记录搜索策略和日期
    • 创建可复现的分析脚本

Workflow 5: Accessing and Analyzing Summary Statistics

工作流5:访问与分析汇总统计数据

  1. Identify studies with summary statistics:
    • Browse summary statistics portal
    • Check FTP directory listings
    • Query API for available studies
  2. Download summary statistics:
    bash
    # Via FTP
    wget ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/harmonised/GCSTXXXXXX-harmonised.tsv.gz
  3. Query via API for specific variants:
    python
    url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chrom}/associations"
    params = {"start": start_pos, "end": end_pos}
  4. Process and analyze:
    • Filter by p-value thresholds
    • Extract effect sizes and confidence intervals
    • Perform downstream analyses (fine-mapping, colocalization, etc.)
  1. 识别提供汇总统计数据的研究:
    • 浏览汇总统计数据门户
    • 查看FTP目录列表
    • 通过API查询可用研究
  2. 下载汇总统计数据:
    bash
    # 通过FTP
    wget ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/harmonised/GCSTXXXXXX-harmonised.tsv.gz
  3. 通过API查询特定变异体:
    python
    url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chrom}/associations"
    params = {"start": start_pos, "end": end_pos}
  4. 处理与分析:
    • 按p值阈值过滤
    • 提取效应量和置信区间
    • 进行下游分析(精细定位、共定位等)

Response Formats and Data Fields

响应格式与数据字段

Key Fields in Association Records:
  • rsId
    : Variant identifier (rs number)
  • strongestAllele
    : Risk allele for the association
  • pvalue
    : Association p-value
  • pvalueText
    : P-value as text (may include inequality)
  • orPerCopyNum
    : Odds ratio or beta coefficient
  • betaNum
    : Effect size (for quantitative traits)
  • betaUnit
    : Unit of measurement for beta
  • range
    : Confidence interval
  • efoTrait
    : Associated trait name
  • mappedLabel
    : EFO-mapped trait term
Study Metadata Fields:
  • accessionId
    : GCST study identifier
  • pubmedId
    : PubMed ID
  • author
    : First author
  • publicationDate
    : Publication date
  • ancestryInitial
    : Discovery population ancestry
  • ancestryReplication
    : Replication population ancestry
  • sampleSize
    : Total sample size
Pagination: Results are paginated (default 20 items per page). Navigate using:
  • size
    parameter: Number of results per page
  • page
    parameter: Page number (0-indexed)
  • _links
    in response: URLs for next/previous pages
关联记录中的关键字段:
  • rsId
    :变异体标识符(rs编号)
  • strongestAllele
    :关联的风险等位基因
  • pvalue
    :关联p值
  • pvalueText
    :文本格式的p值(可能包含不等号)
  • orPerCopyNum
    :比值比或β系数
  • betaNum
    :效应量(用于数量性状)
  • betaUnit
    :β值的测量单位
  • range
    :置信区间
  • efoTrait
    :相关性状名称
  • mappedLabel
    :映射到EFO的性状术语
研究元数据字段:
  • accessionId
    :GCST研究标识符
  • pubmedId
    :PubMed ID
  • author
    :第一作者
  • publicationDate
    :发表日期
  • ancestryInitial
    :发现队列的祖先信息
  • ancestryReplication
    :重复队列的祖先信息
  • sampleSize
    :总样本量
分页: 结果采用分页展示(默认每页20条)。可通过以下方式导航:
  • size
    参数:每页结果数量
  • page
    参数:页码(从0开始)
  • 响应中的
    _links
    :下一页/上一页的URL

Best Practices

最佳实践

Query Strategy

查询策略

  • Start with web interface to identify relevant EFO terms and study accessions
  • Use API for bulk data extraction and automated analyses
  • Implement pagination handling for large result sets
  • Cache API responses to minimize redundant requests
  • 先通过网页界面确定相关的EFO术语和研究编号
  • 使用API进行批量数据提取和自动化分析
  • 对大型结果集实现分页处理
  • 缓存API响应以减少重复请求

Data Interpretation

数据解读

  • Always check p-value thresholds (genome-wide: 5×10⁻⁸)
  • Review ancestry information for population applicability
  • Consider sample size when assessing evidence strength
  • Check for replication across independent studies
  • Be aware of winner's curse in effect size estimates
  • 始终检查p值阈值(全基因组显著:5×10⁻⁸)
  • 查看祖先信息以确定群体适用性
  • 评估证据强度时考虑样本量
  • 检查独立研究中的重复验证情况
  • 注意效应量估计中的“胜者诅咒”偏差

Rate Limiting and Ethics

速率限制与伦理

  • Respect API usage guidelines (no excessive requests)
  • Use summary statistics downloads for genome-wide analyses
  • Implement appropriate delays between API calls
  • Cache results locally when performing iterative analyses
  • Cite the GWAS Catalog in publications
  • 遵守API使用指南(避免过度请求)
  • 全基因组分析使用汇总统计数据下载
  • 在API调用之间设置适当的延迟
  • 迭代分析时在本地缓存结果
  • 在论文中引用GWAS Catalog

Data Quality Considerations

数据质量考量

  • GWAS Catalog curates published associations (may contain inconsistencies)
  • Effect sizes reported as published (may need harmonization)
  • Some studies report conditional or joint associations
  • Check for study overlap when combining results
  • Be aware of ascertainment and selection biases
  • GWAS Catalog整理已发表的关联信息(可能存在不一致)
  • 效应量按原文报告(可能需要统一处理)
  • 部分研究报告条件性或联合关联
  • 合并结果时检查研究重叠
  • 注意确定偏倚和选择偏倚

Python Integration Example

Python集成示例

Complete workflow for querying and analyzing GWAS data:
python
import requests
import pandas as pd
from time import sleep

def query_gwas_catalog(trait_id, p_threshold=5e-8):
    """
    Query GWAS Catalog for trait associations

    Args:
        trait_id: EFO trait identifier (e.g., 'EFO_0001360')
        p_threshold: P-value threshold for filtering

    Returns:
        pandas DataFrame with association results
    """
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/efoTraits/{trait_id}/associations"

    headers = {"Content-Type": "application/json"}
    results = []
    page = 0

    while True:
        params = {"page": page, "size": 100}
        response = requests.get(url, params=params, headers=headers)

        if response.status_code != 200:
            break

        data = response.json()
        associations = data.get('_embedded', {}).get('associations', [])

        if not associations:
            break

        for assoc in associations:
            pvalue = assoc.get('pvalue')
            if pvalue and float(pvalue) <= p_threshold:
                results.append({
                    'variant': assoc.get('rsId'),
                    'pvalue': pvalue,
                    'risk_allele': assoc.get('strongestAllele'),
                    'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
                    'trait': assoc.get('efoTrait'),
                    'pubmed_id': assoc.get('pubmedId')
                })

        page += 1
        sleep(0.1)  # Rate limiting

    return pd.DataFrame(results)
查询和分析GWAS数据的完整工作流:
python
import requests
import pandas as pd
from time import sleep

def query_gwas_catalog(trait_id, p_threshold=5e-8):
    """
    查询GWAS Catalog获取性状关联信息

    参数:
        trait_id: EFO性状标识符(例如:'EFO_0001360')
        p_threshold: 过滤用的p值阈值

    返回:
        包含关联结果的pandas DataFrame
    """
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/efoTraits/{trait_id}/associations"

    headers = {"Content-Type": "application/json"}
    results = []
    page = 0

    while True:
        params = {"page": page, "size": 100}
        response = requests.get(url, params=params, headers=headers)

        if response.status_code != 200:
            break

        data = response.json()
        associations = data.get('_embedded', {}).get('associations', [])

        if not associations:
            break

        for assoc in associations:
            pvalue = assoc.get('pvalue')
            if pvalue and float(pvalue) <= p_threshold:
                results.append({
                    'variant': assoc.get('rsId'),
                    'pvalue': pvalue,
                    'risk_allele': assoc.get('strongestAllele'),
                    'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
                    'trait': assoc.get('efoTrait'),
                    'pubmed_id': assoc.get('pubmedId')
                })

        page += 1
        sleep(0.1)  # 速率限制

    return pd.DataFrame(results)

Example usage

示例用法

df = query_gwas_catalog('EFO_0001360') # Type 2 diabetes print(df.head()) print(f"\nTotal associations: {len(df)}") print(f"Unique variants: {df['variant'].nunique()}")
undefined
df = query_gwas_catalog('EFO_0001360') # 2型糖尿病 print(df.head()) print(f"\n总关联数: {len(df)}") print(f"唯一变异体数: {df['variant'].nunique()}")
undefined

Resources

资源

references/api_reference.md

references/api_reference.md

Comprehensive API documentation including:
  • Detailed endpoint specifications for both APIs
  • Complete list of query parameters and filters
  • Response format specifications and field descriptions
  • Advanced query examples and patterns
  • Error handling and troubleshooting
  • Integration with external databases
Consult this reference when:
  • Constructing complex API queries
  • Understanding response structures
  • Implementing pagination or batch operations
  • Troubleshooting API errors
  • Exploring advanced filtering options
包含以下内容的综合API文档:
  • 两个API的详细端点规范
  • 查询参数和过滤器的完整列表
  • 响应格式规范和字段描述
  • 高级查询示例与模式
  • 错误处理与故障排除
  • 与外部数据库的集成
在以下场景时参考该文档:
  • 构建复杂API查询
  • 理解响应结构
  • 实现分页或批量操作
  • 排查API错误
  • 探索高级过滤选项

Training Materials

培训材料

The GWAS Catalog team provides workshop materials:
GWAS Catalog团队提供研讨会材料:

Important Notes

重要说明

Data Updates

数据更新

  • The GWAS Catalog is updated regularly with new publications
  • Re-run queries periodically for comprehensive coverage
  • Summary statistics are added as studies release data
  • EFO mappings may be updated over time
  • GWAS Catalog会定期更新新增论文
  • 为了全面覆盖,定期重新运行查询
  • 汇总统计数据会随着研究发布数据而添加
  • EFO映射可能会随时间更新

Citation Requirements

引用要求

When using GWAS Catalog data, cite:
  • Sollis E, et al. (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research. PMID: 37953337
  • Include access date and version when available
  • Cite original studies when discussing specific findings
使用GWAS Catalog数据时,请引用:
  • Sollis E, et al. (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research. PMID: 37953337
  • 如有可用,包含访问日期和版本
  • 讨论特定发现时引用原始研究

Limitations

局限性

  • Not all GWAS publications are included (curation criteria apply)
  • Full summary statistics available for subset of studies
  • Effect sizes may require harmonization across studies
  • Population diversity is growing but historically limited
  • Some associations represent conditional or joint effects
  • 并非所有GWAS论文都被收录(需符合整理标准)
  • 仅部分研究提供完整汇总统计数据
  • 跨研究的效应量可能需要统一处理
  • 群体多样性正在提升,但历史数据有限
  • 部分关联代表条件性或联合效应

Data Access

数据访问

  • Web interface: Free, no registration required
  • REST APIs: Free, no API key needed
  • FTP downloads: Open access
  • Rate limiting applies to API (be respectful)
  • 网页界面:免费,无需注册
  • REST API:免费,无需API密钥
  • FTP下载:开放访问
  • API有速率限制(请合理使用)

Additional Resources

其他资源