gwas-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGWAS Catalog Database
GWAS Catalog数据库
Overview
概述
The GWAS Catalog is a comprehensive repository of published genome-wide association studies maintained by the National Human Genome Research Institute (NHGRI) and the European Bioinformatics Institute (EBI). The catalog contains curated SNP-trait associations from thousands of GWAS publications, including genetic variants, associated traits and diseases, p-values, effect sizes, and full summary statistics for many studies.
GWAS Catalog是由美国国家人类基因组研究所(NHGRI)和欧洲生物信息研究所(EBI)维护的已发表全基因组关联研究(GWAS)的综合知识库。该数据库包含来自数千篇GWAS论文的经过整理的SNP-性状关联信息,包括遗传变异体、相关性状和疾病、p值、效应量,以及许多研究的完整汇总统计数据。
When to Use This Skill
何时使用该技能
This skill should be used when queries involve:
- Genetic variant associations: Finding SNPs associated with diseases or traits
- SNP lookups: Retrieving information about specific genetic variants (rs IDs)
- Trait/disease searches: Discovering genetic associations for phenotypes
- Gene associations: Finding variants in or near specific genes
- GWAS summary statistics: Accessing complete genome-wide association data
- Study metadata: Retrieving publication and cohort information
- Population genetics: Exploring ancestry-specific associations
- Polygenic risk scores: Identifying variants for risk prediction models
- Functional genomics: Understanding variant effects and genomic context
- Systematic reviews: Comprehensive literature synthesis of genetic associations
当查询涉及以下场景时,应使用该技能:
- 遗传变异体关联:查找与疾病或性状相关的SNP
- SNP查询:检索特定遗传变异体(rs ID)的信息
- 性状/疾病搜索:发现与表型相关的遗传关联
- 基因关联:查找特定基因内部或附近的变异体
- GWAS汇总统计数据:获取完整的全基因组关联数据
- 研究元数据:检索论文和队列信息
- 群体遗传学:探索特定祖先群体的关联
- 多基因风险评分:识别用于风险预测模型的变异体
- 功能基因组学:理解变异体效应和基因组背景
- 系统综述:遗传关联研究的综合文献整合
Core Capabilities
核心功能
1. Understanding GWAS Catalog Data Structure
1. 理解GWAS Catalog数据结构
The GWAS Catalog is organized around four core entities:
- Studies: GWAS publications with metadata (PMID, author, cohort details)
- Associations: SNP-trait associations with statistical evidence (p ≤ 5×10⁻⁸)
- Variants: Genetic markers (SNPs) with genomic coordinates and alleles
- Traits: Phenotypes and diseases (mapped to EFO ontology terms)
Key Identifiers:
- Study accessions: IDs (e.g., GCST001234)
GCST - Variant IDs: numbers (e.g., rs7903146) or
rsformatvariant_id - Trait IDs: EFO terms (e.g., EFO_0001360 for type 2 diabetes)
- Gene symbols: HGNC approved names (e.g., TCF7L2)
GWAS Catalog围绕四个核心实体组织:
- 研究(Studies):包含元数据的GWAS论文(PMID、作者、队列详情)
- 关联(Associations):带有统计证据的SNP-性状关联(p ≤ 5×10⁻⁸)
- 变异体(Variants):带有基因组坐标和等位基因的遗传标记(SNP)
- 性状(Traits):表型和疾病(映射到EFO本体术语)
关键标识符:
- 研究编号:ID(例如:GCST001234)
GCST - 变异体ID:编号(例如:rs7903146)或
rs格式variant_id - 性状ID:EFO术语(例如:2型糖尿病对应的EFO_0001360)
- 基因符号:HGNC批准的名称(例如:TCF7L2)
2. Web Interface Searches
2. 网页界面搜索
The web interface at https://www.ebi.ac.uk/gwas/ supports multiple search modes:
By Variant (rs ID):
rs7903146Returns all trait associations for this SNP.
By Disease/Trait:
type 2 diabetes
Parkinson disease
body mass indexReturns all associated genetic variants.
By Gene:
APOE
TCF7L2Returns variants in or near the gene region.
By Chromosomal Region:
10:114000000-115000000Returns variants in the specified genomic interval.
By Publication:
PMID:20581827
Author: McCarthy MI
GCST001234Returns study details and all reported associations.
按变异体(rs ID)搜索:
rs7903146返回该SNP的所有性状关联信息。
按疾病/性状搜索:
type 2 diabetes
Parkinson disease
body mass index返回所有相关的遗传变异体。
按基因搜索:
APOE
TCF7L2返回该基因内部或附近的变异体。
按染色体区域搜索:
10:114000000-115000000返回指定基因组区间内的变异体。
按论文搜索:
PMID:20581827
Author: McCarthy MI
GCST001234返回研究详情及所有报告的关联信息。
3. REST API Access
3. REST API访问
The GWAS Catalog provides two REST APIs for programmatic access:
Base URLs:
- GWAS Catalog API:
https://www.ebi.ac.uk/gwas/rest/api - Summary Statistics API:
https://www.ebi.ac.uk/gwas/summary-statistics/api
API Documentation:
- Main API docs: https://www.ebi.ac.uk/gwas/rest/docs/api
- Summary stats docs: https://www.ebi.ac.uk/gwas/summary-statistics/docs/
Core Endpoints:
-
Studies endpoint -
/studies/{accessionID}pythonimport requests # Get a specific study url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795" response = requests.get(url, headers={"Content-Type": "application/json"}) study = response.json() -
Associations endpoint -
/associationspython# Find associations for a variant variant = "rs7903146" url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{variant}/associations" params = {"projection": "associationBySnp"} response = requests.get(url, params=params, headers={"Content-Type": "application/json"}) associations = response.json() -
Variants endpoint -
/singleNucleotidePolymorphisms/{rsID}python# Get variant details url = "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs7903146" response = requests.get(url, headers={"Content-Type": "application/json"}) variant_info = response.json() -
Traits endpoint -
/efoTraits/{efoID}python# Get trait information url = "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360" response = requests.get(url, headers={"Content-Type": "application/json"}) trait_info = response.json()
GWAS Catalog提供两个REST API用于程序化访问:
基础URL:
- GWAS Catalog API:
https://www.ebi.ac.uk/gwas/rest/api - 汇总统计数据API:
https://www.ebi.ac.uk/gwas/summary-statistics/api
API文档:
- 主API文档:https://www.ebi.ac.uk/gwas/rest/docs/api
- 汇总统计数据文档:https://www.ebi.ac.uk/gwas/summary-statistics/docs/
核心端点:
-
研究端点 -
/studies/{accessionID}pythonimport requests # 获取特定研究信息 url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795" response = requests.get(url, headers={"Content-Type": "application/json"}) study = response.json() -
关联端点 -
/associationspython# 查找变异体的关联信息 variant = "rs7903146" url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{variant}/associations" params = {"projection": "associationBySnp"} response = requests.get(url, params=params, headers={"Content-Type": "application/json"}) associations = response.json() -
变异体端点 -
/singleNucleotidePolymorphisms/{rsID}python# 获取变异体详情 url = "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs7903146" response = requests.get(url, headers={"Content-Type": "application/json"}) variant_info = response.json() -
性状端点 -
/efoTraits/{efoID}python# 获取性状信息 url = "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360" response = requests.get(url, headers={"Content-Type": "application/json"}) trait_info = response.json()
4. Query Examples and Patterns
4. 查询示例与模式
Example 1: Find all associations for a disease
python
import requests
trait = "EFO_0001360" # Type 2 diabetes
base_url = "https://www.ebi.ac.uk/gwas/rest/api"示例1:查找某疾病的所有关联信息
python
import requests
trait = "EFO_0001360" # 2型糖尿病
base_url = "https://www.ebi.ac.uk/gwas/rest/api"Query associations for this trait
查询该性状的关联信息
url = f"{base_url}/efoTraits/{trait}/associations"
response = requests.get(url, headers={"Content-Type": "application/json"})
associations = response.json()
url = f"{base_url}/efoTraits/{trait}/associations"
response = requests.get(url, headers={"Content-Type": "application/json"})
associations = response.json()
Process results
处理结果
for assoc in associations.get('_embedded', {}).get('associations', []):
variant = assoc.get('rsId')
pvalue = assoc.get('pvalue')
risk_allele = assoc.get('strongestAllele')
print(f"{variant}: p={pvalue}, risk allele={risk_allele}")
**Example 2: Get variant information and all trait associations**
```python
import requests
variant = "rs7903146"
base_url = "https://www.ebi.ac.uk/gwas/rest/api"for assoc in associations.get('_embedded', {}).get('associations', []):
variant = assoc.get('rsId')
pvalue = assoc.get('pvalue')
risk_allele = assoc.get('strongestAllele')
print(f"{variant}: p={pvalue}, 风险等位基因={risk_allele}")
**示例2:获取变异体信息及所有性状关联**
```python
import requests
variant = "rs7903146"
base_url = "https://www.ebi.ac.uk/gwas/rest/api"Get variant details
获取变异体详情
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant_data = response.json()
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant_data = response.json()
Get all associations for this variant
获取该变异体的所有关联信息
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}/associations"
params = {"projection": "associationBySnp"}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
associations = response.json()
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}/associations"
params = {"projection": "associationBySnp"}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
associations = response.json()
Extract trait names and p-values
提取性状名称和p值
for assoc in associations.get('_embedded', {}).get('associations', []):
trait = assoc.get('efoTrait')
pvalue = assoc.get('pvalue')
print(f"Trait: {trait}, p-value: {pvalue}")
**Example 3: Access summary statistics**
```python
import requestsfor assoc in associations.get('_embedded', {}).get('associations', []):
trait = assoc.get('efoTrait')
pvalue = assoc.get('pvalue')
print(f"性状: {trait}, p值: {pvalue}")
**示例3:访问汇总统计数据**
```python
import requestsQuery summary statistics API
查询汇总统计数据API
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
Find associations by trait with p-value threshold
按性状查询关联信息并设置p值阈值
trait = "EFO_0001360" # Type 2 diabetes
p_upper = "0.000000001" # p < 1e-9
url = f"{base_url}/traits/{trait}/associations"
params = {
"p_upper": p_upper,
"size": 100 # Number of results
}
response = requests.get(url, params=params)
results = response.json()
trait = "EFO_0001360" # 2型糖尿病
p_upper = "0.000000001" # p < 1e-9
url = f"{base_url}/traits/{trait}/associations"
params = {
"p_upper": p_upper,
"size": 100 # 结果数量
}
response = requests.get(url, params=params)
results = response.json()
Process genome-wide significant hits
处理全基因组显著关联结果
for hit in results.get('_embedded', {}).get('associations', []):
variant_id = hit.get('variant_id')
chromosome = hit.get('chromosome')
position = hit.get('base_pair_location')
pvalue = hit.get('p_value')
print(f"{chromosome}:{position} ({variant_id}): p={pvalue}")
**Example 4: Query by chromosomal region**
```python
import requestsfor hit in results.get('_embedded', {}).get('associations', []):
variant_id = hit.get('variant_id')
chromosome = hit.get('chromosome')
position = hit.get('base_pair_location')
pvalue = hit.get('p_value')
print(f"{chromosome}:{position} ({variant_id}): p={pvalue}")
**示例4:按染色体区域查询**
```python
import requestsFind variants in a specific genomic region
查找特定基因组区域内的变异体
chromosome = "10"
start_pos = 114000000
end_pos = 115000000
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange"
params = {
"chrom": chromosome,
"bpStart": start_pos,
"bpEnd": end_pos
}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
variants_in_region = response.json()
undefinedchromosome = "10"
start_pos = 114000000
end_pos = 115000000
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange"
params = {
"chrom": chromosome,
"bpStart": start_pos,
"bpEnd": end_pos
}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
variants_in_region = response.json()
undefined5. Working with Summary Statistics
5. 汇总统计数据的使用
The GWAS Catalog hosts full summary statistics for many studies, providing access to all tested variants (not just genome-wide significant hits).
Access Methods:
- FTP download: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/
- REST API: Query-based access to summary statistics
- Web interface: Browse and download via the website
Summary Statistics API Features:
- Filter by chromosome, position, p-value
- Query specific variants across studies
- Retrieve effect sizes and allele frequencies
- Access harmonized and standardized data
Example: Download summary statistics for a study
python
import requests
import gzipGWAS Catalog为许多研究提供完整的汇总统计数据,可访问所有测试过的变异体(不仅是全基因组显著的变异体)。
访问方式:
- FTP下载:http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/
- REST API:基于查询的汇总统计数据访问
- 网页界面:通过网站浏览和下载
汇总统计数据API功能:
- 按染色体、位置、p值过滤
- 查询跨研究的特定变异体
- 检索效应量和等位基因频率
- 获取经过标准化和统一的数据
示例:下载某研究的汇总统计数据
python
import requests
import gzipGet available summary statistics
获取可用的汇总统计数据
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
url = f"{base_url}/studies/GCST001234"
response = requests.get(url)
study_info = response.json()
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
url = f"{base_url}/studies/GCST001234"
response = requests.get(url)
study_info = response.json()
Download link is provided in the response
响应中提供了下载链接
Alternatively, use FTP:
也可以使用FTP:
ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/
ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/
undefinedundefined6. Data Integration and Cross-referencing
6. 数据集成与交叉引用
The GWAS Catalog provides links to external resources:
Genomic Databases:
- Ensembl: Gene annotations and variant consequences
- dbSNP: Variant identifiers and population frequencies
- gnomAD: Population allele frequencies
Functional Resources:
- Open Targets: Target-disease associations
- PGS Catalog: Polygenic risk scores
- UCSC Genome Browser: Genomic context
Phenotype Resources:
- EFO (Experimental Factor Ontology): Standardized trait terms
- OMIM: Disease gene relationships
- Disease Ontology: Disease hierarchies
Following Links in API Responses:
python
import requestsGWAS Catalog提供与外部资源的链接:
基因组数据库:
- Ensembl:基因注释和变异体影响
- dbSNP:变异体标识符和群体频率
- gnomAD:群体等位基因频率
功能资源:
- Open Targets:靶点-疾病关联
- PGS Catalog:多基因风险评分
- UCSC Genome Browser:基因组背景
表型资源:
- EFO(实验因子本体):标准化性状术语
- OMIM:疾病-基因关系
- Disease Ontology:疾病层级
在API响应中跟随链接:
python
import requestsAPI responses include _links for related resources
API响应包含指向相关资源的_links字段
response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001234")
study = response.json()
response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001234")
study = response.json()
Follow link to associations
跟随链接获取关联信息
associations_url = study['_links']['associations']['href']
associations_response = requests.get(associations_url)
undefinedassociations_url = study['_links']['associations']['href']
associations_response = requests.get(associations_url)
undefinedQuery Workflows
查询工作流
Workflow 1: Exploring Genetic Associations for a Disease
工作流1:探索某疾病的遗传关联
-
Identify the trait using EFO terms or free text:
- Search web interface for disease name
- Note the EFO ID (e.g., EFO_0001360 for type 2 diabetes)
-
Query associations via API:python
url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{efo_id}/associations" -
Filter by significance and population:
- Check p-values (genome-wide significant: p ≤ 5×10⁻⁸)
- Review ancestry information in study metadata
- Filter by sample size or discovery/replication status
-
Extract variant details:
- rs IDs for each association
- Effect alleles and directions
- Effect sizes (odds ratios, beta coefficients)
- Population allele frequencies
-
Cross-reference with other databases:
- Look up variant consequences in Ensembl
- Check population frequencies in gnomAD
- Explore gene function and pathways
-
使用EFO术语或自由文本确定性状:
- 在网页界面搜索疾病名称
- 记录EFO ID(例如:2型糖尿病对应的EFO_0001360)
-
通过API查询关联信息:python
url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{efo_id}/associations" -
按显著性和群体过滤:
- 检查p值(全基因组显著:p ≤ 5×10⁻⁸)
- 查看研究元数据中的祖先信息
- 按样本量或发现/重复研究状态过滤
-
提取变异体详情:
- 每个关联的rs ID
- 效应等位基因和方向
- 效应量(比值比、β系数)
- 群体等位基因频率
-
与其他数据库交叉引用:
- 在Ensembl中查询变异体影响
- 在gnomAD中检查群体频率
- 探索基因功能和通路
Workflow 2: Investigating a Specific Genetic Variant
工作流2:研究特定遗传变异体
-
Query the variant:python
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}" -
Retrieve all trait associations:python
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}/associations" -
Analyze pleiotropy:
- Identify all traits associated with this variant
- Review effect directions across traits
- Look for shared biological pathways
-
Check genomic context:
- Determine nearby genes
- Identify if variant is in coding/regulatory regions
- Review linkage disequilibrium with other variants
-
查询变异体:python
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}" -
检索所有性状关联:python
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}/associations" -
分析多效性:
- 识别与该变异体相关的所有性状
- 查看跨性状的效应方向
- 寻找共享的生物学通路
-
检查基因组背景:
- 确定附近的基因
- 识别变异体是否位于编码区/调控区
- 查看与其他变异体的连锁不平衡
Workflow 3: Gene-Centric Association Analysis
工作流3:基于基因的关联分析
-
Search by gene symbol in web interface or:python
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene" params = {"geneName": gene_symbol} -
Retrieve variants in gene region:
- Get chromosomal coordinates for gene
- Query variants in region
- Include promoter and regulatory regions (extend boundaries)
-
Analyze association patterns:
- Identify traits associated with variants in this gene
- Look for consistent associations across studies
- Review effect sizes and directions
-
Functional interpretation:
- Determine variant consequences (missense, regulatory, etc.)
- Check expression QTL (eQTL) data
- Review pathway and network context
-
在网页界面按基因符号搜索,或使用:python
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene" params = {"geneName": gene_symbol} -
检索基因区域内的变异体:
- 获取基因的染色体坐标
- 查询该区域内的变异体
- 包含启动子和调控区域(扩展边界)
-
分析关联模式:
- 识别与该基因内变异体相关的性状
- 寻找跨研究的一致关联
- 查看效应量和方向
-
功能解读:
- 确定变异体的影响(错义、调控等)
- 检查表达数量性状位点(eQTL)数据
- 查看通路和网络背景
Workflow 4: Systematic Review of Genetic Evidence
工作流4:遗传证据的系统综述
-
Define research question:
- Specific trait or disease of interest
- Population considerations
- Study design requirements
-
Comprehensive variant extraction:
- Query all associations for trait
- Set significance threshold
- Note discovery and replication studies
-
Quality assessment:
- Review study sample sizes
- Check for population diversity
- Assess heterogeneity across studies
- Identify potential biases
-
Data synthesis:
- Aggregate associations across studies
- Perform meta-analysis if applicable
- Create summary tables
- Generate Manhattan or forest plots
-
Export and documentation:
- Download full association data
- Export summary statistics if needed
- Document search strategy and date
- Create reproducible analysis scripts
-
定义研究问题:
- 感兴趣的特定性状或疾病
- 群体考量
- 研究设计要求
-
全面提取变异体:
- 查询该性状的所有关联信息
- 设置显著性阈值
- 记录发现和重复研究
-
质量评估:
- 查看研究样本量
- 检查群体多样性
- 评估跨研究的异质性
- 识别潜在偏倚
-
数据整合:
- 跨研究汇总关联信息
- 如适用,进行元分析
- 创建汇总表格
- 生成曼哈顿图或森林图
-
导出与文档记录:
- 下载完整的关联数据
- 如有需要,导出汇总统计数据
- 记录搜索策略和日期
- 创建可复现的分析脚本
Workflow 5: Accessing and Analyzing Summary Statistics
工作流5:访问与分析汇总统计数据
-
Identify studies with summary statistics:
- Browse summary statistics portal
- Check FTP directory listings
- Query API for available studies
-
Download summary statistics:bash
# Via FTP wget ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/harmonised/GCSTXXXXXX-harmonised.tsv.gz -
Query via API for specific variants:python
url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chrom}/associations" params = {"start": start_pos, "end": end_pos} -
Process and analyze:
- Filter by p-value thresholds
- Extract effect sizes and confidence intervals
- Perform downstream analyses (fine-mapping, colocalization, etc.)
-
识别提供汇总统计数据的研究:
- 浏览汇总统计数据门户
- 查看FTP目录列表
- 通过API查询可用研究
-
下载汇总统计数据:bash
# 通过FTP wget ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/harmonised/GCSTXXXXXX-harmonised.tsv.gz -
通过API查询特定变异体:python
url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chrom}/associations" params = {"start": start_pos, "end": end_pos} -
处理与分析:
- 按p值阈值过滤
- 提取效应量和置信区间
- 进行下游分析(精细定位、共定位等)
Response Formats and Data Fields
响应格式与数据字段
Key Fields in Association Records:
- : Variant identifier (rs number)
rsId - : Risk allele for the association
strongestAllele - : Association p-value
pvalue - : P-value as text (may include inequality)
pvalueText - : Odds ratio or beta coefficient
orPerCopyNum - : Effect size (for quantitative traits)
betaNum - : Unit of measurement for beta
betaUnit - : Confidence interval
range - : Associated trait name
efoTrait - : EFO-mapped trait term
mappedLabel
Study Metadata Fields:
- : GCST study identifier
accessionId - : PubMed ID
pubmedId - : First author
author - : Publication date
publicationDate - : Discovery population ancestry
ancestryInitial - : Replication population ancestry
ancestryReplication - : Total sample size
sampleSize
Pagination:
Results are paginated (default 20 items per page). Navigate using:
- parameter: Number of results per page
size - parameter: Page number (0-indexed)
page - in response: URLs for next/previous pages
_links
关联记录中的关键字段:
- :变异体标识符(rs编号)
rsId - :关联的风险等位基因
strongestAllele - :关联p值
pvalue - :文本格式的p值(可能包含不等号)
pvalueText - :比值比或β系数
orPerCopyNum - :效应量(用于数量性状)
betaNum - :β值的测量单位
betaUnit - :置信区间
range - :相关性状名称
efoTrait - :映射到EFO的性状术语
mappedLabel
研究元数据字段:
- :GCST研究标识符
accessionId - :PubMed ID
pubmedId - :第一作者
author - :发表日期
publicationDate - :发现队列的祖先信息
ancestryInitial - :重复队列的祖先信息
ancestryReplication - :总样本量
sampleSize
分页:
结果采用分页展示(默认每页20条)。可通过以下方式导航:
- 参数:每页结果数量
size - 参数:页码(从0开始)
page - 响应中的:下一页/上一页的URL
_links
Best Practices
最佳实践
Query Strategy
查询策略
- Start with web interface to identify relevant EFO terms and study accessions
- Use API for bulk data extraction and automated analyses
- Implement pagination handling for large result sets
- Cache API responses to minimize redundant requests
- 先通过网页界面确定相关的EFO术语和研究编号
- 使用API进行批量数据提取和自动化分析
- 对大型结果集实现分页处理
- 缓存API响应以减少重复请求
Data Interpretation
数据解读
- Always check p-value thresholds (genome-wide: 5×10⁻⁸)
- Review ancestry information for population applicability
- Consider sample size when assessing evidence strength
- Check for replication across independent studies
- Be aware of winner's curse in effect size estimates
- 始终检查p值阈值(全基因组显著:5×10⁻⁸)
- 查看祖先信息以确定群体适用性
- 评估证据强度时考虑样本量
- 检查独立研究中的重复验证情况
- 注意效应量估计中的“胜者诅咒”偏差
Rate Limiting and Ethics
速率限制与伦理
- Respect API usage guidelines (no excessive requests)
- Use summary statistics downloads for genome-wide analyses
- Implement appropriate delays between API calls
- Cache results locally when performing iterative analyses
- Cite the GWAS Catalog in publications
- 遵守API使用指南(避免过度请求)
- 全基因组分析使用汇总统计数据下载
- 在API调用之间设置适当的延迟
- 迭代分析时在本地缓存结果
- 在论文中引用GWAS Catalog
Data Quality Considerations
数据质量考量
- GWAS Catalog curates published associations (may contain inconsistencies)
- Effect sizes reported as published (may need harmonization)
- Some studies report conditional or joint associations
- Check for study overlap when combining results
- Be aware of ascertainment and selection biases
- GWAS Catalog整理已发表的关联信息(可能存在不一致)
- 效应量按原文报告(可能需要统一处理)
- 部分研究报告条件性或联合关联
- 合并结果时检查研究重叠
- 注意确定偏倚和选择偏倚
Python Integration Example
Python集成示例
Complete workflow for querying and analyzing GWAS data:
python
import requests
import pandas as pd
from time import sleep
def query_gwas_catalog(trait_id, p_threshold=5e-8):
"""
Query GWAS Catalog for trait associations
Args:
trait_id: EFO trait identifier (e.g., 'EFO_0001360')
p_threshold: P-value threshold for filtering
Returns:
pandas DataFrame with association results
"""
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
url = f"{base_url}/efoTraits/{trait_id}/associations"
headers = {"Content-Type": "application/json"}
results = []
page = 0
while True:
params = {"page": page, "size": 100}
response = requests.get(url, params=params, headers=headers)
if response.status_code != 200:
break
data = response.json()
associations = data.get('_embedded', {}).get('associations', [])
if not associations:
break
for assoc in associations:
pvalue = assoc.get('pvalue')
if pvalue and float(pvalue) <= p_threshold:
results.append({
'variant': assoc.get('rsId'),
'pvalue': pvalue,
'risk_allele': assoc.get('strongestAllele'),
'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
'trait': assoc.get('efoTrait'),
'pubmed_id': assoc.get('pubmedId')
})
page += 1
sleep(0.1) # Rate limiting
return pd.DataFrame(results)查询和分析GWAS数据的完整工作流:
python
import requests
import pandas as pd
from time import sleep
def query_gwas_catalog(trait_id, p_threshold=5e-8):
"""
查询GWAS Catalog获取性状关联信息
参数:
trait_id: EFO性状标识符(例如:'EFO_0001360')
p_threshold: 过滤用的p值阈值
返回:
包含关联结果的pandas DataFrame
"""
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
url = f"{base_url}/efoTraits/{trait_id}/associations"
headers = {"Content-Type": "application/json"}
results = []
page = 0
while True:
params = {"page": page, "size": 100}
response = requests.get(url, params=params, headers=headers)
if response.status_code != 200:
break
data = response.json()
associations = data.get('_embedded', {}).get('associations', [])
if not associations:
break
for assoc in associations:
pvalue = assoc.get('pvalue')
if pvalue and float(pvalue) <= p_threshold:
results.append({
'variant': assoc.get('rsId'),
'pvalue': pvalue,
'risk_allele': assoc.get('strongestAllele'),
'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
'trait': assoc.get('efoTrait'),
'pubmed_id': assoc.get('pubmedId')
})
page += 1
sleep(0.1) # 速率限制
return pd.DataFrame(results)Example usage
示例用法
df = query_gwas_catalog('EFO_0001360') # Type 2 diabetes
print(df.head())
print(f"\nTotal associations: {len(df)}")
print(f"Unique variants: {df['variant'].nunique()}")
undefineddf = query_gwas_catalog('EFO_0001360') # 2型糖尿病
print(df.head())
print(f"\n总关联数: {len(df)}")
print(f"唯一变异体数: {df['variant'].nunique()}")
undefinedResources
资源
references/api_reference.md
references/api_reference.md
Comprehensive API documentation including:
- Detailed endpoint specifications for both APIs
- Complete list of query parameters and filters
- Response format specifications and field descriptions
- Advanced query examples and patterns
- Error handling and troubleshooting
- Integration with external databases
Consult this reference when:
- Constructing complex API queries
- Understanding response structures
- Implementing pagination or batch operations
- Troubleshooting API errors
- Exploring advanced filtering options
包含以下内容的综合API文档:
- 两个API的详细端点规范
- 查询参数和过滤器的完整列表
- 响应格式规范和字段描述
- 高级查询示例与模式
- 错误处理与故障排除
- 与外部数据库的集成
在以下场景时参考该文档:
- 构建复杂API查询
- 理解响应结构
- 实现分页或批量操作
- 排查API错误
- 探索高级过滤选项
Training Materials
培训材料
The GWAS Catalog team provides workshop materials:
- GitHub repository: https://github.com/EBISPOT/GWAS_Catalog-workshop
- Jupyter notebooks with example queries
- Google Colab integration for cloud execution
GWAS Catalog团队提供研讨会材料:
- GitHub仓库:https://github.com/EBISPOT/GWAS_Catalog-workshop
- 包含示例查询的Jupyter笔记本
- 支持Google Colab云执行
Important Notes
重要说明
Data Updates
数据更新
- The GWAS Catalog is updated regularly with new publications
- Re-run queries periodically for comprehensive coverage
- Summary statistics are added as studies release data
- EFO mappings may be updated over time
- GWAS Catalog会定期更新新增论文
- 为了全面覆盖,定期重新运行查询
- 汇总统计数据会随着研究发布数据而添加
- EFO映射可能会随时间更新
Citation Requirements
引用要求
When using GWAS Catalog data, cite:
- Sollis E, et al. (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research. PMID: 37953337
- Include access date and version when available
- Cite original studies when discussing specific findings
使用GWAS Catalog数据时,请引用:
- Sollis E, et al. (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research. PMID: 37953337
- 如有可用,包含访问日期和版本
- 讨论特定发现时引用原始研究
Limitations
局限性
- Not all GWAS publications are included (curation criteria apply)
- Full summary statistics available for subset of studies
- Effect sizes may require harmonization across studies
- Population diversity is growing but historically limited
- Some associations represent conditional or joint effects
- 并非所有GWAS论文都被收录(需符合整理标准)
- 仅部分研究提供完整汇总统计数据
- 跨研究的效应量可能需要统一处理
- 群体多样性正在提升,但历史数据有限
- 部分关联代表条件性或联合效应
Data Access
数据访问
- Web interface: Free, no registration required
- REST APIs: Free, no API key needed
- FTP downloads: Open access
- Rate limiting applies to API (be respectful)
- 网页界面:免费,无需注册
- REST API:免费,无需API密钥
- FTP下载:开放访问
- API有速率限制(请合理使用)
Additional Resources
其他资源
- GWAS Catalog website: https://www.ebi.ac.uk/gwas/
- Documentation: https://www.ebi.ac.uk/gwas/docs
- API documentation: https://www.ebi.ac.uk/gwas/rest/docs/api
- Summary Statistics API: https://www.ebi.ac.uk/gwas/summary-statistics/docs/
- FTP site: http://ftp.ebi.ac.uk/pub/databases/gwas/
- Training materials: https://github.com/EBISPOT/GWAS_Catalog-workshop
- PGS Catalog (polygenic scores): https://www.pgscatalog.org/
- Help and support: gwas-info@ebi.ac.uk
- GWAS Catalog官网:https://www.ebi.ac.uk/gwas/
- 文档:https://www.ebi.ac.uk/gwas/docs
- API文档:https://www.ebi.ac.uk/gwas/rest/docs/api
- 汇总统计数据API:https://www.ebi.ac.uk/gwas/summary-statistics/docs/
- FTP站点:http://ftp.ebi.ac.uk/pub/databases/gwas/
- 培训材料:https://github.com/EBISPOT/GWAS_Catalog-workshop
- PGS Catalog(多基因评分):https://www.pgscatalog.org/
- 帮助与支持:gwas-info@ebi.ac.uk