gwas-database

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

GWAS Catalog Database

GWAS Catalog数据库

Overview

概述

The GWAS Catalog is a comprehensive repository of published genome-wide association studies maintained by the National Human Genome Research Institute (NHGRI) and the European Bioinformatics Institute (EBI). The catalog contains curated SNP-trait associations from thousands of GWAS publications, including genetic variants, associated traits and diseases, p-values, effect sizes, and full summary statistics for many studies.

GWAS Catalog是由美国国家人类基因组研究所（NHGRI）和欧洲生物信息研究所（EBI）维护的已发表全基因组关联研究（GWAS）的综合知识库。该数据库包含来自数千篇GWAS论文的经过整理的SNP-性状关联信息，包括遗传变异体、相关性状和疾病、p值、效应量，以及许多研究的完整汇总统计数据。

When to Use This Skill

何时使用该技能

This skill should be used when queries involve:

Genetic variant associations: Finding SNPs associated with diseases or traits
SNP lookups: Retrieving information about specific genetic variants (rs IDs)
Trait/disease searches: Discovering genetic associations for phenotypes
Gene associations: Finding variants in or near specific genes
GWAS summary statistics: Accessing complete genome-wide association data
Study metadata: Retrieving publication and cohort information
Population genetics: Exploring ancestry-specific associations
Polygenic risk scores: Identifying variants for risk prediction models
Functional genomics: Understanding variant effects and genomic context
Systematic reviews: Comprehensive literature synthesis of genetic associations

当查询涉及以下场景时，应使用该技能：

遗传变异体关联：查找与疾病或性状相关的SNP
SNP查询：检索特定遗传变异体（rs ID）的信息
性状/疾病搜索：发现与表型相关的遗传关联
基因关联：查找特定基因内部或附近的变异体
GWAS汇总统计数据：获取完整的全基因组关联数据
研究元数据：检索论文和队列信息
群体遗传学：探索特定祖先群体的关联
多基因风险评分：识别用于风险预测模型的变异体
功能基因组学：理解变异体效应和基因组背景
系统综述：遗传关联研究的综合文献整合

Core Capabilities

核心功能

1. Understanding GWAS Catalog Data Structure

1. 理解GWAS Catalog数据结构

The GWAS Catalog is organized around four core entities:

Studies: GWAS publications with metadata (PMID, author, cohort details)
Associations: SNP-trait associations with statistical evidence (p ≤ 5×10⁻⁸)
Variants: Genetic markers (SNPs) with genomic coordinates and alleles
Traits: Phenotypes and diseases (mapped to EFO ontology terms)

Key Identifiers:

Study accessions:
```
GCST
```
IDs (e.g., GCST001234)
Variant IDs:
```
rs
```
numbers (e.g., rs7903146) or
```
variant_id
```
format
Trait IDs: EFO terms (e.g., EFO_0001360 for type 2 diabetes)
Gene symbols: HGNC approved names (e.g., TCF7L2)

GWAS Catalog围绕四个核心实体组织：

研究（Studies）：包含元数据的GWAS论文（PMID、作者、队列详情）
关联（Associations）：带有统计证据的SNP-性状关联（p ≤ 5×10⁻⁸）
变异体（Variants）：带有基因组坐标和等位基因的遗传标记（SNP）
性状（Traits）：表型和疾病（映射到EFO本体术语）

关键标识符：

研究编号：
```
GCST
```
ID（例如：GCST001234）
变异体ID：
```
rs
```
编号（例如：rs7903146）或
```
variant_id
```
格式
性状ID：EFO术语（例如：2型糖尿病对应的EFO_0001360）
基因符号：HGNC批准的名称（例如：TCF7L2）

2. Web Interface Searches

2. 网页界面搜索

The web interface at https://www.ebi.ac.uk/gwas/ supports multiple search modes:

By Variant (rs ID):

rs7903146

Returns all trait associations for this SNP.

By Disease/Trait:

type 2 diabetes
Parkinson disease
body mass index

Returns all associated genetic variants.

By Gene:

APOE
TCF7L2

Returns variants in or near the gene region.

By Chromosomal Region:

10:114000000-115000000

Returns variants in the specified genomic interval.

By Publication:

PMID:20581827
Author: McCarthy MI
GCST001234

Returns study details and all reported associations.

网页界面（https://www.ebi.ac.uk/gwas/）支持多种搜索模式：

按变异体（rs ID）搜索：

rs7903146

返回该SNP的所有性状关联信息。

按疾病/性状搜索：

type 2 diabetes
Parkinson disease
body mass index

返回所有相关的遗传变异体。

按基因搜索：

APOE
TCF7L2

返回该基因内部或附近的变异体。

按染色体区域搜索：

10:114000000-115000000

返回指定基因组区间内的变异体。

按论文搜索：

PMID:20581827
Author: McCarthy MI
GCST001234

返回研究详情及所有报告的关联信息。

3. REST API Access

3. REST API访问

The GWAS Catalog provides two REST APIs for programmatic access:

Base URLs:

GWAS Catalog API:
```
https://www.ebi.ac.uk/gwas/rest/api
```

Summary Statistics API:

https://www.ebi.ac.uk/gwas/summary-statistics/api

API Documentation:

Main API docs: https://www.ebi.ac.uk/gwas/rest/docs/api
Summary stats docs: https://www.ebi.ac.uk/gwas/summary-statistics/docs/

Core Endpoints:

Studies endpoint -

/studies/{accessionID}

python

import requests

# Get a specific study
url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795"
response = requests.get(url, headers={"Content-Type": "application/json"})
study = response.json()

Associations endpoint -

/associations

python

# Find associations for a variant
variant = "rs7903146"
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{variant}/associations"
params = {"projection": "associationBySnp"}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
associations = response.json()

Variants endpoint -

/singleNucleotidePolymorphisms/{rsID}

python

# Get variant details
url = "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs7903146"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant_info = response.json()

Traits endpoint -

/efoTraits/{efoID}

python

# Get trait information
url = "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360"
response = requests.get(url, headers={"Content-Type": "application/json"})
trait_info = response.json()

GWAS Catalog提供两个REST API用于程序化访问：

基础URL：

GWAS Catalog API：
```
https://www.ebi.ac.uk/gwas/rest/api
```

汇总统计数据API：

https://www.ebi.ac.uk/gwas/summary-statistics/api

API文档：

主API文档：https://www.ebi.ac.uk/gwas/rest/docs/api
汇总统计数据文档：https://www.ebi.ac.uk/gwas/summary-statistics/docs/

核心端点：

研究端点 -

/studies/{accessionID}

python

import requests

# 获取特定研究信息
url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795"
response = requests.get(url, headers={"Content-Type": "application/json"})
study = response.json()

关联端点 -

/associations

python

# 查找变异体的关联信息
variant = "rs7903146"
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{variant}/associations"
params = {"projection": "associationBySnp"}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
associations = response.json()

变异体端点 -

/singleNucleotidePolymorphisms/{rsID}

python

# 获取变异体详情
url = "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs7903146"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant_info = response.json()

性状端点 -

/efoTraits/{efoID}

python

# 获取性状信息
url = "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360"
response = requests.get(url, headers={"Content-Type": "application/json"})
trait_info = response.json()

4. Query Examples and Patterns

4. 查询示例与模式

Example 1: Find all associations for a disease

python

import requests

trait = "EFO_0001360"  # Type 2 diabetes
base_url = "https://www.ebi.ac.uk/gwas/rest/api"

示例1：查找某疾病的所有关联信息

python

import requests

trait = "EFO_0001360"  # 2型糖尿病
base_url = "https://www.ebi.ac.uk/gwas/rest/api"

Query associations for this trait

查询该性状的关联信息

url = f"{base_url}/efoTraits/{trait}/associations" response = requests.get(url, headers={"Content-Type": "application/json"}) associations = response.json()

Process results

处理结果

for assoc in associations.get('_embedded', {}).get('associations', []): variant = assoc.get('rsId') pvalue = assoc.get('pvalue') risk_allele = assoc.get('strongestAllele') print(f"{variant}: p={pvalue}, risk allele={risk_allele}")


**Example 2: Get variant information and all trait associations**
```python
import requests

variant = "rs7903146"
base_url = "https://www.ebi.ac.uk/gwas/rest/api"


**示例2：获取变异体信息及所有性状关联**
```python
import requests

variant = "rs7903146"
base_url = "https://www.ebi.ac.uk/gwas/rest/api"

Get variant details

获取变异体详情

url = f"{base_url}/singleNucleotidePolymorphisms/{variant}" response = requests.get(url, headers={"Content-Type": "application/json"}) variant_data = response.json()

Get all associations for this variant

获取该变异体的所有关联信息

url = f"{base_url}/singleNucleotidePolymorphisms/{variant}/associations" params = {"projection": "associationBySnp"} response = requests.get(url, params=params, headers={"Content-Type": "application/json"}) associations = response.json()

Extract trait names and p-values

提取性状名称和p值

for assoc in associations.get('_embedded', {}).get('associations', []): trait = assoc.get('efoTrait') pvalue = assoc.get('pvalue') print(f"Trait: {trait}, p-value: {pvalue}")


**Example 3: Access summary statistics**
```python
import requests

for assoc in associations.get('_embedded', {}).get('associations', []): trait = assoc.get('efoTrait') pvalue = assoc.get('pvalue') print(f"性状: {trait}, p值: {pvalue}")


**示例3：访问汇总统计数据**
```python
import requests

Query summary statistics API

查询汇总统计数据API

base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"

Find associations by trait with p-value threshold

按性状查询关联信息并设置p值阈值

trait = "EFO_0001360" # Type 2 diabetes p_upper = "0.000000001" # p < 1e-9 url = f"{base_url}/traits/{trait}/associations" params = { "p_upper": p_upper, "size": 100 # Number of results } response = requests.get(url, params=params) results = response.json()

trait = "EFO_0001360" # 2型糖尿病 p_upper = "0.000000001" # p < 1e-9 url = f"{base_url}/traits/{trait}/associations" params = { "p_upper": p_upper, "size": 100 # 结果数量 } response = requests.get(url, params=params) results = response.json()

Process genome-wide significant hits

处理全基因组显著关联结果

for hit in results.get('_embedded', {}).get('associations', []): variant_id = hit.get('variant_id') chromosome = hit.get('chromosome') position = hit.get('base_pair_location') pvalue = hit.get('p_value') print(f"{chromosome}:{position} ({variant_id}): p={pvalue}")


**Example 4: Query by chromosomal region**
```python
import requests


**示例4：按染色体区域查询**
```python
import requests

Find variants in a specific genomic region

查找特定基因组区域内的变异体

chromosome = "10" start_pos = 114000000 end_pos = 115000000

base_url = "https://www.ebi.ac.uk/gwas/rest/api" url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange" params = { "chrom": chromosome, "bpStart": start_pos, "bpEnd": end_pos } response = requests.get(url, params=params, headers={"Content-Type": "application/json"}) variants_in_region = response.json()

undefined

chromosome = "10" start_pos = 114000000 end_pos = 115000000

undefined

5. Working with Summary Statistics

5. 汇总统计数据的使用

The GWAS Catalog hosts full summary statistics for many studies, providing access to all tested variants (not just genome-wide significant hits).

Access Methods:

FTP download: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/
REST API: Query-based access to summary statistics
Web interface: Browse and download via the website

Summary Statistics API Features:

Filter by chromosome, position, p-value
Query specific variants across studies
Retrieve effect sizes and allele frequencies
Access harmonized and standardized data

Example: Download summary statistics for a study

python

import requests
import gzip

GWAS Catalog为许多研究提供完整的汇总统计数据，可访问所有测试过的变异体（不仅是全基因组显著的变异体）。

访问方式：

FTP下载：http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/
REST API：基于查询的汇总统计数据访问
网页界面：通过网站浏览和下载

汇总统计数据API功能：

按染色体、位置、p值过滤
查询跨研究的特定变异体
检索效应量和等位基因频率
获取经过标准化和统一的数据

示例：下载某研究的汇总统计数据

python

import requests
import gzip

Get available summary statistics

获取可用的汇总统计数据

base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api" url = f"{base_url}/studies/GCST001234" response = requests.get(url) study_info = response.json()

Download link is provided in the response

响应中提供了下载链接

Alternatively, use FTP:

也可以使用FTP：

ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/

undefined

undefined

6. Data Integration and Cross-referencing

6. 数据集成与交叉引用

The GWAS Catalog provides links to external resources:

Genomic Databases:

Ensembl: Gene annotations and variant consequences
dbSNP: Variant identifiers and population frequencies
gnomAD: Population allele frequencies

Functional Resources:

Open Targets: Target-disease associations
PGS Catalog: Polygenic risk scores
UCSC Genome Browser: Genomic context

Phenotype Resources:

EFO (Experimental Factor Ontology): Standardized trait terms
OMIM: Disease gene relationships
Disease Ontology: Disease hierarchies

Following Links in API Responses:

python

import requests

GWAS Catalog提供与外部资源的链接：

基因组数据库：

Ensembl：基因注释和变异体影响
dbSNP：变异体标识符和群体频率
gnomAD：群体等位基因频率

功能资源：

Open Targets：靶点-疾病关联
PGS Catalog：多基因风险评分
UCSC Genome Browser：基因组背景

表型资源：

EFO（实验因子本体）：标准化性状术语
OMIM：疾病-基因关系
Disease Ontology：疾病层级

在API响应中跟随链接：

python

import requests

API responses include _links for related resources

API响应包含指向相关资源的_links字段

response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001234") study = response.json()

Follow link to associations

跟随链接获取关联信息

associations_url = study['_links']['associations']['href'] associations_response = requests.get(associations_url)

undefined

associations_url = study['_links']['associations']['href'] associations_response = requests.get(associations_url)

undefined

Query Workflows

查询工作流

Workflow 1: Exploring Genetic Associations for a Disease

工作流1：探索某疾病的遗传关联

Identify the trait using EFO terms or free text:
- Search web interface for disease name
- Note the EFO ID (e.g., EFO_0001360 for type 2 diabetes)

Query associations via API:

python

url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{efo_id}/associations"

Filter by significance and population:
- Check p-values (genome-wide significant: p ≤ 5×10⁻⁸)
- Review ancestry information in study metadata
- Filter by sample size or discovery/replication status
Extract variant details:
- rs IDs for each association
- Effect alleles and directions
- Effect sizes (odds ratios, beta coefficients)
- Population allele frequencies
Cross-reference with other databases:
- Look up variant consequences in Ensembl
- Check population frequencies in gnomAD
- Explore gene function and pathways

使用EFO术语或自由文本确定性状：
- 在网页界面搜索疾病名称
- 记录EFO ID（例如：2型糖尿病对应的EFO_0001360）

通过API查询关联信息：

python

url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{efo_id}/associations"

按显著性和群体过滤：
- 检查p值（全基因组显著：p ≤ 5×10⁻⁸）
- 查看研究元数据中的祖先信息
- 按样本量或发现/重复研究状态过滤
提取变异体详情：
- 每个关联的rs ID
- 效应等位基因和方向
- 效应量（比值比、β系数）
- 群体等位基因频率
与其他数据库交叉引用：
- 在Ensembl中查询变异体影响
- 在gnomAD中检查群体频率
- 探索基因功能和通路

Workflow 2: Investigating a Specific Genetic Variant

工作流2：研究特定遗传变异体

Query the variant:

python

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"

Retrieve all trait associations:

python

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}/associations"

Analyze pleiotropy:
- Identify all traits associated with this variant
- Review effect directions across traits
- Look for shared biological pathways
Check genomic context:
- Determine nearby genes
- Identify if variant is in coding/regulatory regions
- Review linkage disequilibrium with other variants

查询变异体：

python

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"

检索所有性状关联：

python

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}/associations"

分析多效性：
- 识别与该变异体相关的所有性状
- 查看跨性状的效应方向
- 寻找共享的生物学通路
检查基因组背景：
- 确定附近的基因
- 识别变异体是否位于编码区/调控区
- 查看与其他变异体的连锁不平衡

Workflow 3: Gene-Centric Association Analysis

工作流3：基于基因的关联分析

Search by gene symbol in web interface or:

python

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene"
params = {"geneName": gene_symbol}

Retrieve variants in gene region:
- Get chromosomal coordinates for gene
- Query variants in region
- Include promoter and regulatory regions (extend boundaries)
Analyze association patterns:
- Identify traits associated with variants in this gene
- Look for consistent associations across studies
- Review effect sizes and directions
Functional interpretation:
- Determine variant consequences (missense, regulatory, etc.)
- Check expression QTL (eQTL) data
- Review pathway and network context

在网页界面按基因符号搜索，或使用：

python

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene"
params = {"geneName": gene_symbol}

检索基因区域内的变异体：
- 获取基因的染色体坐标
- 查询该区域内的变异体
- 包含启动子和调控区域（扩展边界）
分析关联模式：
- 识别与该基因内变异体相关的性状
- 寻找跨研究的一致关联
- 查看效应量和方向
功能解读：
- 确定变异体的影响（错义、调控等）
- 检查表达数量性状位点（eQTL）数据
- 查看通路和网络背景

Workflow 4: Systematic Review of Genetic Evidence

工作流4：遗传证据的系统综述

Define research question:
- Specific trait or disease of interest
- Population considerations
- Study design requirements
Comprehensive variant extraction:
- Query all associations for trait
- Set significance threshold
- Note discovery and replication studies
Quality assessment:
- Review study sample sizes
- Check for population diversity
- Assess heterogeneity across studies
- Identify potential biases
Data synthesis:
- Aggregate associations across studies
- Perform meta-analysis if applicable
- Create summary tables
- Generate Manhattan or forest plots
Export and documentation:
- Download full association data
- Export summary statistics if needed
- Document search strategy and date
- Create reproducible analysis scripts

定义研究问题：
- 感兴趣的特定性状或疾病
- 群体考量
- 研究设计要求
全面提取变异体：
- 查询该性状的所有关联信息
- 设置显著性阈值
- 记录发现和重复研究
质量评估：
- 查看研究样本量
- 检查群体多样性
- 评估跨研究的异质性
- 识别潜在偏倚
数据整合：
- 跨研究汇总关联信息
- 如适用，进行元分析
- 创建汇总表格
- 生成曼哈顿图或森林图
导出与文档记录：
- 下载完整的关联数据
- 如有需要，导出汇总统计数据
- 记录搜索策略和日期
- 创建可复现的分析脚本

Workflow 5: Accessing and Analyzing Summary Statistics

工作流5：访问与分析汇总统计数据

Identify studies with summary statistics:
- Browse summary statistics portal
- Check FTP directory listings
- Query API for available studies

Download summary statistics:

bash

# Via FTP
wget ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/harmonised/GCSTXXXXXX-harmonised.tsv.gz

Query via API for specific variants:

python

url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chrom}/associations"
params = {"start": start_pos, "end": end_pos}

Process and analyze:
- Filter by p-value thresholds
- Extract effect sizes and confidence intervals
- Perform downstream analyses (fine-mapping, colocalization, etc.)

识别提供汇总统计数据的研究：
- 浏览汇总统计数据门户
- 查看FTP目录列表
- 通过API查询可用研究

下载汇总统计数据：

bash

# 通过FTP
wget ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/harmonised/GCSTXXXXXX-harmonised.tsv.gz

通过API查询特定变异体：

python

url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chrom}/associations"
params = {"start": start_pos, "end": end_pos}

处理与分析：
- 按p值阈值过滤
- 提取效应量和置信区间
- 进行下游分析（精细定位、共定位等）

Response Formats and Data Fields

响应格式与数据字段

Key Fields in Association Records:

```
rsId
```
: Variant identifier (rs number)
```
strongestAllele
```
: Risk allele for the association
```
pvalue
```
: Association p-value
```
pvalueText
```
: P-value as text (may include inequality)
```
orPerCopyNum
```
: Odds ratio or beta coefficient
```
betaNum
```
: Effect size (for quantitative traits)
```
betaUnit
```
: Unit of measurement for beta
```
range
```
: Confidence interval
```
efoTrait
```
: Associated trait name
```
mappedLabel
```
: EFO-mapped trait term

Study Metadata Fields:

```
accessionId
```
: GCST study identifier
```
pubmedId
```
: PubMed ID
```
author
```
: First author
```
publicationDate
```
: Publication date
```
ancestryInitial
```
: Discovery population ancestry
```
ancestryReplication
```
: Replication population ancestry
```
sampleSize
```
: Total sample size

Pagination: Results are paginated (default 20 items per page). Navigate using:

```
size
```
parameter: Number of results per page
```
page
```
parameter: Page number (0-indexed)
```
_links
```
in response: URLs for next/previous pages

关联记录中的关键字段：

```
rsId
```
：变异体标识符（rs编号）
```
strongestAllele
```
：关联的风险等位基因
```
pvalue
```
：关联p值
```
pvalueText
```
：文本格式的p值（可能包含不等号）
```
orPerCopyNum
```
：比值比或β系数
```
betaNum
```
：效应量（用于数量性状）
```
betaUnit
```
：β值的测量单位
```
range
```
：置信区间
```
efoTrait
```
：相关性状名称
```
mappedLabel
```
：映射到EFO的性状术语

研究元数据字段：

```
accessionId
```
：GCST研究标识符
```
pubmedId
```
：PubMed ID
```
author
```
：第一作者
```
publicationDate
```
：发表日期
```
ancestryInitial
```
：发现队列的祖先信息
```
ancestryReplication
```
：重复队列的祖先信息
```
sampleSize
```
：总样本量

分页： 结果采用分页展示（默认每页20条）。可通过以下方式导航：

```
size
```
参数：每页结果数量
```
page
```
参数：页码（从0开始）
响应中的
```
_links
```
：下一页/上一页的URL

Best Practices

最佳实践

Query Strategy

查询策略

Start with web interface to identify relevant EFO terms and study accessions
Use API for bulk data extraction and automated analyses
Implement pagination handling for large result sets
Cache API responses to minimize redundant requests

先通过网页界面确定相关的EFO术语和研究编号
使用API进行批量数据提取和自动化分析
对大型结果集实现分页处理
缓存API响应以减少重复请求

Data Interpretation

数据解读

Always check p-value thresholds (genome-wide: 5×10⁻⁸)
Review ancestry information for population applicability
Consider sample size when assessing evidence strength
Check for replication across independent studies
Be aware of winner's curse in effect size estimates

始终检查p值阈值（全基因组显著：5×10⁻⁸）
查看祖先信息以确定群体适用性
评估证据强度时考虑样本量
检查独立研究中的重复验证情况
注意效应量估计中的“胜者诅咒”偏差

Rate Limiting and Ethics

速率限制与伦理

Respect API usage guidelines (no excessive requests)
Use summary statistics downloads for genome-wide analyses
Implement appropriate delays between API calls
Cache results locally when performing iterative analyses
Cite the GWAS Catalog in publications

遵守API使用指南（避免过度请求）
全基因组分析使用汇总统计数据下载
在API调用之间设置适当的延迟
迭代分析时在本地缓存结果
在论文中引用GWAS Catalog

Data Quality Considerations

数据质量考量

GWAS Catalog curates published associations (may contain inconsistencies)
Effect sizes reported as published (may need harmonization)
Some studies report conditional or joint associations
Check for study overlap when combining results
Be aware of ascertainment and selection biases

GWAS Catalog整理已发表的关联信息（可能存在不一致）
效应量按原文报告（可能需要统一处理）
部分研究报告条件性或联合关联
合并结果时检查研究重叠
注意确定偏倚和选择偏倚

Python Integration Example

Python集成示例

Complete workflow for querying and analyzing GWAS data:

python

import requests
import pandas as pd
from time import sleep

def query_gwas_catalog(trait_id, p_threshold=5e-8):
    """
    Query GWAS Catalog for trait associations

    Args:
        trait_id: EFO trait identifier (e.g., 'EFO_0001360')
        p_threshold: P-value threshold for filtering

    Returns:
        pandas DataFrame with association results
    """
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/efoTraits/{trait_id}/associations"

    headers = {"Content-Type": "application/json"}
    results = []
    page = 0

    while True:
        params = {"page": page, "size": 100}
        response = requests.get(url, params=params, headers=headers)

        if response.status_code != 200:
            break

        data = response.json()
        associations = data.get('_embedded', {}).get('associations', [])

        if not associations:
            break

        for assoc in associations:
            pvalue = assoc.get('pvalue')
            if pvalue and float(pvalue) <= p_threshold:
                results.append({
                    'variant': assoc.get('rsId'),
                    'pvalue': pvalue,
                    'risk_allele': assoc.get('strongestAllele'),
                    'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
                    'trait': assoc.get('efoTrait'),
                    'pubmed_id': assoc.get('pubmedId')
                })

        page += 1
        sleep(0.1)  # Rate limiting

    return pd.DataFrame(results)

查询和分析GWAS数据的完整工作流：

python

import requests
import pandas as pd
from time import sleep

def query_gwas_catalog(trait_id, p_threshold=5e-8):
    """
    查询GWAS Catalog获取性状关联信息

    参数:
        trait_id: EFO性状标识符（例如：'EFO_0001360'）
        p_threshold: 过滤用的p值阈值

    返回:
        包含关联结果的pandas DataFrame
    """
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/efoTraits/{trait_id}/associations"

    headers = {"Content-Type": "application/json"}
    results = []
    page = 0

    while True:
        params = {"page": page, "size": 100}
        response = requests.get(url, params=params, headers=headers)

        if response.status_code != 200:
            break

        data = response.json()
        associations = data.get('_embedded', {}).get('associations', [])

        if not associations:
            break

        for assoc in associations:
            pvalue = assoc.get('pvalue')
            if pvalue and float(pvalue) <= p_threshold:
                results.append({
                    'variant': assoc.get('rsId'),
                    'pvalue': pvalue,
                    'risk_allele': assoc.get('strongestAllele'),
                    'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
                    'trait': assoc.get('efoTrait'),
                    'pubmed_id': assoc.get('pubmedId')
                })

        page += 1
        sleep(0.1)  # 速率限制

    return pd.DataFrame(results)

Example usage

示例用法

df = query_gwas_catalog('EFO_0001360') # Type 2 diabetes print(df.head()) print(f"\nTotal associations: {len(df)}") print(f"Unique variants: {df['variant'].nunique()}")

undefined

df = query_gwas_catalog('EFO_0001360') # 2型糖尿病 print(df.head()) print(f"\n总关联数: {len(df)}") print(f"唯一变异体数: {df['variant'].nunique()}")

undefined

Resources

资源

references/api_reference.md

Comprehensive API documentation including:

Detailed endpoint specifications for both APIs
Complete list of query parameters and filters
Response format specifications and field descriptions
Advanced query examples and patterns
Error handling and troubleshooting
Integration with external databases

Consult this reference when:

Constructing complex API queries
Understanding response structures
Implementing pagination or batch operations
Troubleshooting API errors
Exploring advanced filtering options

包含以下内容的综合API文档：

两个API的详细端点规范
查询参数和过滤器的完整列表
响应格式规范和字段描述
高级查询示例与模式
错误处理与故障排除
与外部数据库的集成

在以下场景时参考该文档：

构建复杂API查询
理解响应结构
实现分页或批量操作
排查API错误
探索高级过滤选项

Training Materials

培训材料

The GWAS Catalog team provides workshop materials:

GitHub repository: https://github.com/EBISPOT/GWAS_Catalog-workshop
Jupyter notebooks with example queries
Google Colab integration for cloud execution

GWAS Catalog团队提供研讨会材料：

GitHub仓库：https://github.com/EBISPOT/GWAS_Catalog-workshop
包含示例查询的Jupyter笔记本
支持Google Colab云执行

Important Notes

重要说明

Data Updates

数据更新

The GWAS Catalog is updated regularly with new publications
Re-run queries periodically for comprehensive coverage
Summary statistics are added as studies release data
EFO mappings may be updated over time

GWAS Catalog会定期更新新增论文
为了全面覆盖，定期重新运行查询
汇总统计数据会随着研究发布数据而添加
EFO映射可能会随时间更新

Citation Requirements

引用要求

When using GWAS Catalog data, cite:

Sollis E, et al. (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research. PMID: 37953337
Include access date and version when available
Cite original studies when discussing specific findings

使用GWAS Catalog数据时，请引用：

Sollis E, et al. (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research. PMID: 37953337
如有可用，包含访问日期和版本
讨论特定发现时引用原始研究

Limitations

局限性

Not all GWAS publications are included (curation criteria apply)
Full summary statistics available for subset of studies
Effect sizes may require harmonization across studies
Population diversity is growing but historically limited
Some associations represent conditional or joint effects

并非所有GWAS论文都被收录（需符合整理标准）
仅部分研究提供完整汇总统计数据
跨研究的效应量可能需要统一处理
群体多样性正在提升，但历史数据有限
部分关联代表条件性或联合效应

Data Access

数据访问

Web interface: Free, no registration required
REST APIs: Free, no API key needed
FTP downloads: Open access
Rate limiting applies to API (be respectful)

网页界面：免费，无需注册
REST API：免费，无需API密钥
FTP下载：开放访问
API有速率限制（请合理使用）

Additional Resources

其他资源

GWAS Catalog website: https://www.ebi.ac.uk/gwas/
Documentation: https://www.ebi.ac.uk/gwas/docs
API documentation: https://www.ebi.ac.uk/gwas/rest/docs/api
Summary Statistics API: https://www.ebi.ac.uk/gwas/summary-statistics/docs/
FTP site: http://ftp.ebi.ac.uk/pub/databases/gwas/
Training materials: https://github.com/EBISPOT/GWAS_Catalog-workshop
PGS Catalog (polygenic scores): https://www.pgscatalog.org/
Help and support: gwas-info@ebi.ac.uk

GWAS Catalog官网：https://www.ebi.ac.uk/gwas/
文档：https://www.ebi.ac.uk/gwas/docs
API文档：https://www.ebi.ac.uk/gwas/rest/docs/api
汇总统计数据API：https://www.ebi.ac.uk/gwas/summary-statistics/docs/
FTP站点：http://ftp.ebi.ac.uk/pub/databases/gwas/
培训材料：https://github.com/EBISPOT/GWAS_Catalog-workshop
PGS Catalog（多基因评分）：https://www.pgscatalog.org/
帮助与支持：gwas-info@ebi.ac.uk