ensembl-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEnsembl Database
Ensembl基因组数据库
Overview
概述
Access and query the Ensembl genome database, a comprehensive resource for vertebrate genomic data maintained by EMBL-EBI. The database provides gene annotations, sequences, variants, regulatory information, and comparative genomics data for over 250 species. Current release is 115 (September 2025).
访问并查询Ensembl基因组数据库,这是由EMBL-EBI维护的脊椎动物基因组数据综合资源。该数据库提供基因注释、序列、变异、调控信息以及超过250个物种的比较基因组学数据。当前版本为115(2025年9月)。
When to Use This Skill
何时使用该技能
This skill should be used when:
- Querying gene information by symbol or Ensembl ID
- Retrieving DNA, transcript, or protein sequences
- Analyzing genetic variants using the Variant Effect Predictor (VEP)
- Finding orthologs and paralogs across species
- Accessing regulatory features and genomic annotations
- Converting coordinates between genome assemblies (e.g., GRCh37 to GRCh38)
- Performing comparative genomics analyses
- Integrating Ensembl data into genomic research pipelines
在以下场景中可使用本技能:
- 通过基因符号或Ensembl ID查询基因信息
- 获取DNA、转录本或蛋白质序列
- 使用Variant Effect Predictor(VEP)分析遗传变异
- 跨物种查找同源基因(orthologs)和旁系同源基因(paralogs)
- 访问调控元件和基因组注释
- 在不同基因组组装版本间转换坐标(例如GRCh37转GRCh38)
- 进行比较基因组学分析
- 将Ensembl数据整合到基因组研究流程中
Core Capabilities
核心功能
1. Gene Information Retrieval
1. 基因信息检索
Query gene data by symbol, Ensembl ID, or external database identifiers.
Common operations:
- Look up gene information by symbol (e.g., "BRCA2", "TP53")
- Retrieve transcript and protein information
- Get gene coordinates and chromosomal locations
- Access cross-references to external databases (UniProt, RefSeq, etc.)
Using the ensembl_rest package:
python
from ensembl_rest import EnsemblClient
client = EnsemblClient()通过基因符号、Ensembl ID或外部数据库标识符查询基因数据。
常见操作:
- 通过基因符号(如"BRCA2"、"TP53")查询基因信息
- 获取转录本和蛋白质信息
- 获取基因坐标和染色体位置
- 访问与外部数据库(UniProt、RefSeq等)的交叉引用
使用ensembl_rest包:
python
from ensembl_rest import EnsemblClient
client = EnsemblClient()Look up gene by symbol
通过基因符号查询基因
gene_data = client.symbol_lookup(
species='human',
symbol='BRCA2'
)
gene_data = client.symbol_lookup(
species='human',
symbol='BRCA2'
)
Get detailed gene information
获取详细基因信息
gene_info = client.lookup_id(
id='ENSG00000139618', # BRCA2 Ensembl ID
expand=True
)
**Direct REST API (no package):**
```python
import requests
server = "https://rest.ensembl.org"gene_info = client.lookup_id(
id='ENSG00000139618', # BRCA2的Ensembl ID
expand=True
)
**直接调用REST API(无需安装包):**
```python
import requests
server = "https://rest.ensembl.org"Symbol lookup
基因符号查询
response = requests.get(
f"{server}/lookup/symbol/homo_sapiens/BRCA2",
headers={"Content-Type": "application/json"}
)
gene_data = response.json()
undefinedresponse = requests.get(
f"{server}/lookup/symbol/homo_sapiens/BRCA2",
headers={"Content-Type": "application/json"}
)
gene_data = response.json()
undefined2. Sequence Retrieval
2. 序列获取
Fetch genomic, transcript, or protein sequences in various formats (JSON, FASTA, plain text).
Operations:
- Get DNA sequences for genes or genomic regions
- Retrieve transcript sequences (cDNA)
- Access protein sequences
- Extract sequences with flanking regions or modifications
Example:
python
undefined以多种格式(JSON、FASTA、纯文本)获取基因组、转录本或蛋白质序列。
操作:
- 获取基因或基因组区域的DNA序列
- 获取转录本序列(cDNA)
- 访问蛋白质序列
- 提取带有侧翼区域或修饰的序列
示例:
python
undefinedUsing ensembl_rest package
使用ensembl_rest包
sequence = client.sequence_id(
id='ENSG00000139618', # Gene ID
content_type='application/json'
)
sequence = client.sequence_id(
id='ENSG00000139618', # 基因ID
content_type='application/json'
)
Get sequence for a genomic region
获取基因组区域的序列
region_seq = client.sequence_region(
species='human',
region='7:140424943-140624564' # chromosome:start-end
)
undefinedregion_seq = client.sequence_region(
species='human',
region='7:140424943-140624564' # 染色体:起始位置-结束位置
)
undefined3. Variant Analysis
3. 变异分析
Query genetic variation data and predict variant consequences using the Variant Effect Predictor (VEP).
Capabilities:
- Look up variants by rsID or genomic coordinates
- Predict functional consequences of variants
- Access population frequency data
- Retrieve phenotype associations
VEP example:
python
undefined查询遗传变异数据,并使用Variant Effect Predictor(VEP)预测变异后果。
功能:
- 通过rsID或基因组坐标查询变异
- 预测变异的功能后果
- 访问群体频率数据
- 获取表型关联信息
VEP示例:
python
undefinedPredict variant consequences
预测变异后果
vep_result = client.vep_hgvs(
species='human',
hgvs_notation='ENST00000380152.7:c.803C>T'
)
vep_result = client.vep_hgvs(
species='human',
hgvs_notation='ENST00000380152.7:c.803C>T'
)
Query variant by rsID
通过rsID查询变异
variant = client.variation_id(
species='human',
id='rs699'
)
undefinedvariant = client.variation_id(
species='human',
id='rs699'
)
undefined4. Comparative Genomics
4. 比较基因组学
Perform cross-species comparisons to identify orthologs, paralogs, and evolutionary relationships.
Operations:
- Find orthologs (same gene in different species)
- Identify paralogs (related genes in same species)
- Access gene trees showing evolutionary relationships
- Retrieve gene family information
Example:
python
undefined进行跨物种比较,识别同源基因、旁系同源基因及进化关系。
操作:
- 查找不同物种中的同源基因
- 识别同一物种中的旁系同源基因
- 查看展示进化关系的基因树
- 获取基因家族信息
示例:
python
undefinedFind orthologs for a human gene
查找人类基因的同源基因
orthologs = client.homology_ensemblgene(
id='ENSG00000139618', # Human BRCA2
target_species='mouse'
)
orthologs = client.homology_ensemblgene(
id='ENSG00000139618', # 人类BRCA2基因
target_species='mouse'
)
Get gene tree
获取基因树
gene_tree = client.genetree_member_symbol(
species='human',
symbol='BRCA2'
)
undefinedgene_tree = client.genetree_member_symbol(
species='human',
symbol='BRCA2'
)
undefined5. Genomic Region Analysis
5. 基因组区域分析
Find all genomic features (genes, transcripts, regulatory elements) in a specific region.
Use cases:
- Identify all genes in a chromosomal region
- Find regulatory features (promoters, enhancers)
- Locate variants within a region
- Retrieve structural features
Example:
python
undefined查找特定区域内的所有基因组特征(基因、转录本、调控元件)。
使用场景:
- 识别染色体区域内的所有基因
- 查找调控元件(启动子、增强子)
- 定位区域内的变异
- 获取结构特征
示例:
python
undefinedFind all features in a region
查找区域内的所有特征
features = client.overlap_region(
species='human',
region='7:140424943-140624564',
feature='gene'
)
undefinedfeatures = client.overlap_region(
species='human',
region='7:140424943-140624564',
feature='gene'
)
undefined6. Assembly Mapping
6. 组装版本映射
Convert coordinates between different genome assemblies (e.g., GRCh37 to GRCh38).
Important: Use for GRCh37/hg19 queries and for current assemblies.
https://grch37.rest.ensembl.orghttps://rest.ensembl.orgExample:
python
from ensembl_rest import AssemblyMapper在不同基因组组装版本间转换坐标(例如GRCh37转GRCh38)。
注意: 查询GRCh37/hg19时使用,查询当前组装版本时使用。
https://grch37.rest.ensembl.orghttps://rest.ensembl.org示例:
python
from ensembl_rest import AssemblyMapperMap coordinates from GRCh37 to GRCh38
将坐标从GRCh37映射到GRCh38
mapper = AssemblyMapper(
species='human',
asm_from='GRCh37',
asm_to='GRCh38'
)
mapped = mapper.map(chrom='7', start=140453136, end=140453136)
undefinedmapper = AssemblyMapper(
species='human',
asm_from='GRCh37',
asm_to='GRCh38'
)
mapped = mapper.map(chrom='7', start=140453136, end=140453136)
undefinedAPI Best Practices
API最佳实践
Rate Limiting
速率限制
The Ensembl REST API has rate limits. Follow these practices:
- Respect rate limits: Maximum 15 requests per second for anonymous users
- Handle 429 responses: When rate-limited, check the header and wait
Retry-After - Use batch endpoints: When querying multiple items, use batch endpoints where available
- Cache results: Store frequently accessed data to reduce API calls
Ensembl REST API设有速率限制,请遵循以下规范:
- 遵守速率限制: 匿名用户最多每秒15次请求
- 处理429响应: 触发速率限制时,检查头并等待相应时间
Retry-After - 使用批量端点: 查询多个条目时,尽可能使用批量端点
- 缓存结果: 存储频繁访问的数据以减少API调用次数
Error Handling
错误处理
Always implement proper error handling:
python
import requests
import time
def query_ensembl(endpoint, params=None, max_retries=3):
server = "https://rest.ensembl.org"
headers = {"Content-Type": "application/json"}
for attempt in range(max_retries):
response = requests.get(
f"{server}{endpoint}",
headers=headers,
params=params
)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
# Rate limited - wait and retry
retry_after = int(response.headers.get('Retry-After', 1))
time.sleep(retry_after)
else:
response.raise_for_status()
raise Exception(f"Failed after {max_retries} attempts")始终实现完善的错误处理:
python
import requests
import time
def query_ensembl(endpoint, params=None, max_retries=3):
server = "https://rest.ensembl.org"
headers = {"Content-Type": "application/json"}
for attempt in range(max_retries):
response = requests.get(
f"{server}{endpoint}",
headers=headers,
params=params
)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
# 触发速率限制 - 等待后重试
retry_after = int(response.headers.get('Retry-After', 1))
time.sleep(retry_after)
else:
response.raise_for_status()
raise Exception(f"经过{max_retries}次尝试后仍失败")Installation
安装
Python Package (Recommended)
Python包(推荐)
bash
uv pip install ensembl_restThe package provides a Pythonic interface to all Ensembl REST API endpoints.
ensembl_restbash
uv pip install ensembl_restensembl_restDirect REST API
直接调用REST API
No installation needed - use standard HTTP libraries like :
requestsbash
uv pip install requests无需安装包 - 使用标准HTTP库如即可:
requestsbash
uv pip install requestsResources
资源
references/
references/
- : Comprehensive documentation of all 17 API endpoint categories with examples and parameters
api_endpoints.md
- : 包含所有17类API端点的综合文档,附示例和参数说明
api_endpoints.md
scripts/
scripts/
- : Reusable Python script for common Ensembl queries with built-in rate limiting and error handling
ensembl_query.py
- : 可复用的Python脚本,用于常见Ensembl查询,内置速率限制和错误处理
ensembl_query.py
Common Workflows
常见工作流
Workflow 1: Gene Annotation Pipeline
工作流1:基因注释流程
- Look up gene by symbol to get Ensembl ID
- Retrieve transcript information
- Get protein sequences for all transcripts
- Find orthologs in other species
- Export results
- 通过基因符号查询获取Ensembl ID
- 获取转录本信息
- 获取所有转录本的蛋白质序列
- 查找其他物种中的同源基因
- 导出结果
Workflow 2: Variant Analysis
工作流2:变异分析
- Query variant by rsID or coordinates
- Use VEP to predict functional consequences
- Check population frequencies
- Retrieve phenotype associations
- Generate report
- 通过rsID或坐标查询变异
- 使用VEP预测功能后果
- 检查群体频率
- 获取表型关联信息
- 生成报告
Workflow 3: Comparative Analysis
工作流3:比较分析
- Start with gene of interest in reference species
- Find orthologs in target species
- Retrieve sequences for all orthologs
- Compare gene structures and features
- Analyze evolutionary conservation
- 从参考物种中选择目标基因
- 查找目标物种中的同源基因
- 获取所有同源基因的序列
- 比较基因结构和特征
- 分析进化保守性
Species and Assembly Information
物种与组装版本信息
To query available species and assemblies:
python
undefined查询可用物种和组装版本:
python
undefinedList all available species
列出所有可用物种
species_list = client.info_species()
species_list = client.info_species()
Get assembly information for a species
获取某物种的组装版本信息
assembly_info = client.info_assembly(species='human')
Common species identifiers:
- Human: `homo_sapiens` or `human`
- Mouse: `mus_musculus` or `mouse`
- Zebrafish: `danio_rerio` or `zebrafish`
- Fruit fly: `drosophila_melanogaster`assembly_info = client.info_assembly(species='human')
常见物种标识符:
- 人类:`homo_sapiens`或`human`
- 小鼠:`mus_musculus`或`mouse`
- 斑马鱼:`danio_rerio`或`zebrafish`
- 果蝇:`drosophila_melanogaster`Additional Resources
额外资源
- Official Documentation: https://rest.ensembl.org/documentation
- Python Package Docs: https://ensemblrest.readthedocs.io
- EBI Training: https://www.ebi.ac.uk/training/online/courses/ensembl-rest-api/
- Ensembl Browser: https://useast.ensembl.org
- GitHub Examples: https://github.com/Ensembl/ensembl-rest/wiki
- 官方文档: https://rest.ensembl.org/documentation
- Python包文档: https://ensemblrest.readthedocs.io
- EBI培训课程: https://www.ebi.ac.uk/training/online/courses/ensembl-rest-api/
- Ensembl浏览器: https://useast.ensembl.org
- GitHub示例: https://github.com/Ensembl/ensembl-rest/wiki