kegg-database

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

KEGG Database

KEGG数据库

Overview

概述

KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive bioinformatics resource for biological pathway analysis and molecular interaction networks.
Important: KEGG API is made available only for academic use by academic users.
KEGG(京都基因与基因组百科全书)是一个用于生物通路分析和分子相互作用网络的综合性生物信息学资源。
重要提示:KEGG API仅对学术用户开放学术用途。

When to Use This Skill

何时使用本工具

This skill should be used when querying pathways, genes, compounds, enzymes, diseases, and drugs across multiple organisms using KEGG's REST API.
当你需要通过KEGG的REST API查询多物种的通路、基因、化合物、酶、疾病和药物相关数据时,可使用本工具。

Quick Start

快速开始

The skill provides:
  1. Python helper functions (
    scripts/kegg_api.py
    ) for all KEGG REST API operations
  2. Comprehensive reference documentation (
    references/kegg_reference.md
    ) with detailed API specifications
When users request KEGG data, determine which operation is needed and use the appropriate function from
scripts/kegg_api.py
.
本工具提供:
  1. 用于所有KEGG REST API操作的Python辅助函数(
    scripts/kegg_api.py
  2. 包含详细API规范的综合性参考文档(
    references/kegg_reference.md
当用户请求KEGG数据时,确定所需操作类型,然后使用
scripts/kegg_api.py
中的对应函数。

Core Operations

核心操作

1. Database Information (
kegg_info
)

1. 数据库信息查询(
kegg_info

Retrieve metadata and statistics about KEGG databases.
When to use: Understanding database structure, checking available data, getting release information.
Usage:
python
from scripts.kegg_api import kegg_info
获取KEGG数据库的元数据和统计信息。
适用场景:了解数据库结构、检查可用数据、获取版本发布信息。
使用示例
python
from scripts.kegg_api import kegg_info

Get pathway database info

获取通路数据库信息

info = kegg_info('pathway')
info = kegg_info('pathway')

Get organism-specific info

获取物种特异性信息

hsa_info = kegg_info('hsa') # Human genome

**Common databases**: `kegg`, `pathway`, `module`, `brite`, `genes`, `genome`, `compound`, `glycan`, `reaction`, `enzyme`, `disease`, `drug`
hsa_info = kegg_info('hsa') # 人类基因组

**常见数据库**:`kegg`, `pathway`, `module`, `brite`, `genes`, `genome`, `compound`, `glycan`, `reaction`, `enzyme`, `disease`, `drug`

2. Listing Entries (
kegg_list
)

2. 条目列表查询(
kegg_list

List entry identifiers and names from KEGG databases.
When to use: Getting all pathways for an organism, listing genes, retrieving compound catalogs.
Usage:
python
from scripts.kegg_api import kegg_list
列出KEGG数据库中的条目标识符和名称。
适用场景:获取某物种的所有通路、列出基因、检索化合物目录。
使用示例
python
from scripts.kegg_api import kegg_list

List all reference pathways

列出所有参考通路

pathways = kegg_list('pathway')
pathways = kegg_list('pathway')

List human-specific pathways

列出人类特异性通路

hsa_pathways = kegg_list('pathway', 'hsa')
hsa_pathways = kegg_list('pathway', 'hsa')

List specific genes (max 10)

列出特定基因(最多10个)

genes = kegg_list('hsa:10458+hsa:10459')

**Common organism codes**: `hsa` (human), `mmu` (mouse), `dme` (fruit fly), `sce` (yeast), `eco` (E. coli)
genes = kegg_list('hsa:10458+hsa:10459')

**常见物种代码**:`hsa`(人类)、`mmu`(小鼠)、`dme`(果蝇)、`sce`(酵母)、`eco`(大肠杆菌)

3. Searching (
kegg_find
)

3. 搜索功能(
kegg_find

Search KEGG databases by keywords or molecular properties.
When to use: Finding genes by name/description, searching compounds by formula or mass, discovering entries by keywords.
Usage:
python
from scripts.kegg_api import kegg_find
通过关键词或分子属性搜索KEGG数据库。
适用场景:按名称/描述查找基因、按分子式或质量搜索化合物、通过关键词发现相关条目。
使用示例
python
from scripts.kegg_api import kegg_find

Keyword search

关键词搜索

results = kegg_find('genes', 'p53') shiga_toxin = kegg_find('genes', 'shiga toxin')
results = kegg_find('genes', 'p53') shiga_toxin = kegg_find('genes', 'shiga toxin')

Chemical formula search (exact match)

分子式搜索(精确匹配)

compounds = kegg_find('compound', 'C7H10N4O2', 'formula')
compounds = kegg_find('compound', 'C7H10N4O2', 'formula')

Molecular weight range search

精确质量范围搜索

drugs = kegg_find('drug', '300-310', 'exact_mass')

**Search options**: `formula` (exact match), `exact_mass` (range), `mol_weight` (range)
drugs = kegg_find('drug', '300-310', 'exact_mass')

**搜索选项**:`formula`(精确匹配)、`exact_mass`(范围)、`mol_weight`(范围)

4. Retrieving Entries (
kegg_get
)

4. 条目详情获取(
kegg_get

Get complete database entries or specific data formats.
When to use: Retrieving pathway details, getting gene/protein sequences, downloading pathway maps, accessing compound structures.
Usage:
python
from scripts.kegg_api import kegg_get
获取完整的数据库条目或特定格式的数据。
适用场景:获取通路详情、获取基因/蛋白质序列、下载通路图谱、访问化合物结构。
使用示例
python
from scripts.kegg_api import kegg_get

Get pathway entry

获取通路条目

pathway = kegg_get('hsa00010') # Glycolysis pathway
pathway = kegg_get('hsa00010') # 糖酵解通路

Get multiple entries (max 10)

获取多个条目(最多10个)

genes = kegg_get(['hsa:10458', 'hsa:10459'])
genes = kegg_get(['hsa:10458', 'hsa:10459'])

Get protein sequence (FASTA)

获取蛋白质序列(FASTA格式)

sequence = kegg_get('hsa:10458', 'aaseq')
sequence = kegg_get('hsa:10458', 'aaseq')

Get nucleotide sequence

获取核苷酸序列

nt_seq = kegg_get('hsa:10458', 'ntseq')
nt_seq = kegg_get('hsa:10458', 'ntseq')

Get compound structure

获取化合物结构(MOL格式)

mol_file = kegg_get('cpd:C00002', 'mol') # ATP in MOL format
mol_file = kegg_get('cpd:C00002', 'mol') # ATP的MOL格式文件

Get pathway as JSON (single entry only)

获取通路的JSON格式数据(仅支持单个条目)

pathway_json = kegg_get('hsa05130', 'json')
pathway_json = kegg_get('hsa05130', 'json')

Get pathway image (single entry only)

获取通路图片(仅支持单个条目)

pathway_img = kegg_get('hsa05130', 'image')

**Output formats**: `aaseq` (protein FASTA), `ntseq` (nucleotide FASTA), `mol` (MOL format), `kcf` (KCF format), `image` (PNG), `kgml` (XML), `json` (pathway JSON)

**Important**: Image, KGML, and JSON formats allow only one entry at a time.
pathway_img = kegg_get('hsa05130', 'image')

**输出格式**:`aaseq`(蛋白质FASTA)、`ntseq`(核苷酸FASTA)、`mol`(MOL格式)、`kcf`(KCF格式)、`image`(PNG图片)、`kgml`(XML格式)、`json`(通路JSON格式)

**重要提示**:图片、KGML和JSON格式仅支持单次查询一个条目。

5. ID Conversion (
kegg_conv
)

5. ID转换(
kegg_conv

Convert identifiers between KEGG and external databases.
When to use: Integrating KEGG data with other databases, mapping gene IDs, converting compound identifiers.
Usage:
python
from scripts.kegg_api import kegg_conv
在KEGG数据库与外部数据库之间转换标识符。
适用场景:整合KEGG数据与其他数据库、映射基因ID、转换化合物标识符。
使用示例
python
from scripts.kegg_api import kegg_conv

Convert all human genes to NCBI Gene IDs

将所有人类基因ID转换为NCBI Gene ID

conversions = kegg_conv('ncbi-geneid', 'hsa')
conversions = kegg_conv('ncbi-geneid', 'hsa')

Convert specific gene

转换特定基因ID

gene_id = kegg_conv('ncbi-geneid', 'hsa:10458')
gene_id = kegg_conv('ncbi-geneid', 'hsa:10458')

Convert to UniProt

转换为UniProt ID

uniprot_id = kegg_conv('uniprot', 'hsa:10458')
uniprot_id = kegg_conv('uniprot', 'hsa:10458')

Convert compounds to PubChem

将化合物ID转换为PubChem ID

pubchem_ids = kegg_conv('pubchem', 'compound')
pubchem_ids = kegg_conv('pubchem', 'compound')

Reverse conversion (NCBI Gene ID to KEGG)

反向转换(NCBI Gene ID转KEGG ID)

kegg_id = kegg_conv('hsa', 'ncbi-geneid')

**Supported conversions**: `ncbi-geneid`, `ncbi-proteinid`, `uniprot`, `pubchem`, `chebi`
kegg_id = kegg_conv('hsa', 'ncbi-geneid')

**支持的转换类型**:`ncbi-geneid`、`ncbi-proteinid`、`uniprot`、`pubchem`、`chebi`

6. Cross-Referencing (
kegg_link
)

6. 交叉引用(
kegg_link

Find related entries within and between KEGG databases.
When to use: Finding pathways containing genes, getting genes in a pathway, mapping genes to KO groups, finding compounds in pathways.
Usage:
python
from scripts.kegg_api import kegg_link
在KEGG数据库内部和数据库之间查找相关条目。
适用场景:查找包含某基因的通路、获取某通路中的基因、将基因映射到KO组、查找某通路中的化合物。
使用示例
python
from scripts.kegg_api import kegg_link

Find pathways linked to human genes

查找与人类基因相关的通路

pathways = kegg_link('pathway', 'hsa')
pathways = kegg_link('pathway', 'hsa')

Get genes in a specific pathway

获取特定通路中的基因

genes = kegg_link('genes', 'hsa00010') # Glycolysis genes
genes = kegg_link('genes', 'hsa00010') # 糖酵解通路中的基因

Find pathways containing a specific gene

查找包含特定基因的通路

gene_pathways = kegg_link('pathway', 'hsa:10458')
gene_pathways = kegg_link('pathway', 'hsa:10458')

Find compounds in a pathway

查找某通路中的化合物

compounds = kegg_link('compound', 'hsa00010')
compounds = kegg_link('compound', 'hsa00010')

Map genes to KO (orthology) groups

将基因映射到KO(直系同源)组

ko_groups = kegg_link('ko', 'hsa:10458')

**Common links**: genes ↔ pathway, pathway ↔ compound, pathway ↔ enzyme, genes ↔ ko (orthology)
ko_groups = kegg_link('ko', 'hsa:10458')

**常见关联类型**:基因↔通路、通路↔化合物、通路↔酶、基因↔ko(直系同源)

7. Drug-Drug Interactions (
kegg_ddi
)

7. 药物-药物相互作用(
kegg_ddi

Check for drug-drug interactions.
When to use: Analyzing drug combinations, checking for contraindications, pharmacological research.
Usage:
python
from scripts.kegg_api import kegg_ddi
检查药物之间的相互作用。
适用场景:分析药物组合、检查禁忌、药理学研究。
使用示例
python
from scripts.kegg_api import kegg_ddi

Check single drug

检查单个药物的相互作用

interactions = kegg_ddi('D00001')
interactions = kegg_ddi('D00001')

Check multiple drugs (max 10)

检查多个药物的相互作用(最多10个)

interactions = kegg_ddi(['D00001', 'D00002', 'D00003'])
undefined
interactions = kegg_ddi(['D00001', 'D00002', 'D00003'])
undefined

Common Analysis Workflows

常见分析工作流

Workflow 1: Gene to Pathway Mapping

工作流1:基因到通路的映射

Use case: Finding pathways associated with genes of interest (e.g., for pathway enrichment analysis).
python
from scripts.kegg_api import kegg_find, kegg_link, kegg_get
适用场景:查找与目标基因相关的通路(例如用于通路富集分析)。
python
from scripts.kegg_api import kegg_find, kegg_link, kegg_get

Step 1: Find gene ID by name

步骤1:通过名称查找基因ID

gene_results = kegg_find('genes', 'p53')
gene_results = kegg_find('genes', 'p53')

Step 2: Link gene to pathways

步骤2:将基因关联到通路

pathways = kegg_link('pathway', 'hsa:7157') # TP53 gene
pathways = kegg_link('pathway', 'hsa:7157') # TP53基因

Step 3: Get detailed pathway information

步骤3:获取详细的通路信息

for pathway_line in pathways.split('\n'): if pathway_line: pathway_id = pathway_line.split('\t')[1].replace('path:', '') pathway_info = kegg_get(pathway_id) # Process pathway information
undefined
for pathway_line in pathways.split('\n'): if pathway_line: pathway_id = pathway_line.split('\t')[1].replace('path:', '') pathway_info = kegg_get(pathway_id) # 处理通路信息
undefined

Workflow 2: Pathway Enrichment Context

工作流2:通路富集分析上下文

Use case: Getting all genes in organism pathways for enrichment analysis.
python
from scripts.kegg_api import kegg_list, kegg_link
适用场景:获取某物种所有通路中的基因,用于富集分析。
python
from scripts.kegg_api import kegg_list, kegg_link

Step 1: List all human pathways

步骤1:列出所有人类通路

pathways = kegg_list('pathway', 'hsa')
pathways = kegg_list('pathway', 'hsa')

Step 2: For each pathway, get associated genes

步骤2:为每个通路获取关联的基因

for pathway_line in pathways.split('\n'): if pathway_line: pathway_id = pathway_line.split('\t')[0] genes = kegg_link('genes', pathway_id) # Process genes for enrichment analysis
undefined
for pathway_line in pathways.split('\n'): if pathway_line: pathway_id = pathway_line.split('\t')[0] genes = kegg_link('genes', pathway_id) # 处理基因数据用于富集分析
undefined

Workflow 3: Compound to Pathway Analysis

工作流3:化合物到通路的分析

Use case: Finding metabolic pathways containing compounds of interest.
python
from scripts.kegg_api import kegg_find, kegg_link, kegg_get
适用场景:查找包含目标化合物的代谢通路。
python
from scripts.kegg_api import kegg_find, kegg_link, kegg_get

Step 1: Search for compound

步骤1:搜索化合物

compound_results = kegg_find('compound', 'glucose')
compound_results = kegg_find('compound', 'glucose')

Step 2: Link compound to reactions

步骤2:将化合物关联到反应

reactions = kegg_link('reaction', 'cpd:C00031') # Glucose
reactions = kegg_link('reaction', 'cpd:C00031') # 葡萄糖

Step 3: Link reactions to pathways

步骤3:将反应关联到通路

pathways = kegg_link('pathway', 'rn:R00299') # Specific reaction
pathways = kegg_link('pathway', 'rn:R00299') # 特定反应

Step 4: Get pathway details

步骤4:获取通路详情

pathway_info = kegg_get('map00010') # Glycolysis
undefined
pathway_info = kegg_get('map00010') # 糖酵解通路
undefined

Workflow 4: Cross-Database Integration

工作流4:跨数据库整合

Use case: Integrating KEGG data with UniProt, NCBI, or PubChem databases.
python
from scripts.kegg_api import kegg_conv, kegg_get
适用场景:将KEGG数据与UniProt、NCBI或PubChem数据库整合。
python
from scripts.kegg_api import kegg_conv, kegg_get

Step 1: Convert KEGG gene IDs to external database IDs

步骤1:将KEGG基因ID转换为外部数据库ID

uniprot_map = kegg_conv('uniprot', 'hsa') ncbi_map = kegg_conv('ncbi-geneid', 'hsa')
uniprot_map = kegg_conv('uniprot', 'hsa') ncbi_map = kegg_conv('ncbi-geneid', 'hsa')

Step 2: Parse conversion results

步骤2:解析转换结果

for line in uniprot_map.split('\n'): if line: kegg_id, uniprot_id = line.split('\t') # Use external IDs for integration
for line in uniprot_map.split('\n'): if line: kegg_id, uniprot_id = line.split('\t') # 使用外部ID进行整合

Step 3: Get sequences using KEGG

步骤3:通过KEGG获取序列

sequence = kegg_get('hsa:10458', 'aaseq')
undefined
sequence = kegg_get('hsa:10458', 'aaseq')
undefined

Workflow 5: Organism-Specific Pathway Analysis

工作流5:物种特异性通路分析

Use case: Comparing pathways across different organisms.
python
from scripts.kegg_api import kegg_list, kegg_get
适用场景:比较不同物种之间的通路。
python
from scripts.kegg_api import kegg_list, kegg_get

Step 1: List pathways for multiple organisms

步骤1:列出多个物种的通路

human_pathways = kegg_list('pathway', 'hsa') mouse_pathways = kegg_list('pathway', 'mmu') yeast_pathways = kegg_list('pathway', 'sce')
human_pathways = kegg_list('pathway', 'hsa') mouse_pathways = kegg_list('pathway', 'mmu') yeast_pathways = kegg_list('pathway', 'sce')

Step 2: Get reference pathway for comparison

步骤2:获取参考通路用于比较

ref_pathway = kegg_get('map00010') # Reference glycolysis
ref_pathway = kegg_get('map00010') # 参考糖酵解通路

Step 3: Get organism-specific versions

步骤3:获取物种特异性版本的通路

hsa_glycolysis = kegg_get('hsa00010') mmu_glycolysis = kegg_get('mmu00010')
undefined
hsa_glycolysis = kegg_get('hsa00010') mmu_glycolysis = kegg_get('mmu00010')
undefined

Pathway Categories

通路分类

KEGG organizes pathways into seven major categories. When interpreting pathway IDs or recommending pathways to users:
  1. Metabolism (e.g.,
    map00010
    - Glycolysis,
    map00190
    - Oxidative phosphorylation)
  2. Genetic Information Processing (e.g.,
    map03010
    - Ribosome,
    map03040
    - Spliceosome)
  3. Environmental Information Processing (e.g.,
    map04010
    - MAPK signaling,
    map02010
    - ABC transporters)
  4. Cellular Processes (e.g.,
    map04140
    - Autophagy,
    map04210
    - Apoptosis)
  5. Organismal Systems (e.g.,
    map04610
    - Complement cascade,
    map04910
    - Insulin signaling)
  6. Human Diseases (e.g.,
    map05200
    - Pathways in cancer,
    map05010
    - Alzheimer disease)
  7. Drug Development (chronological and target-based classifications)
Reference
references/kegg_reference.md
for detailed pathway lists and classifications.
KEGG将通路分为7个主要类别。在解析通路ID或向用户推荐通路时,请参考以下分类:
  1. 代谢(例如
    map00010
    - 糖酵解、
    map00190
    - 氧化磷酸化)
  2. 遗传信息处理(例如
    map03010
    - 核糖体、
    map03040
    - 剪接体)
  3. 环境信息处理(例如
    map04010
    - MAPK信号通路、
    map02010
    - ABC转运蛋白)
  4. 细胞过程(例如
    map04140
    - 自噬、
    map04210
    - 细胞凋亡)
  5. 有机体系统(例如
    map04610
    - 补体激活、
    map04910
    - 胰岛素信号通路)
  6. 人类疾病(例如
    map05200
    - 癌症通路、
    map05010
    - 阿尔茨海默病)
  7. 药物开发(按时间顺序和靶点分类)
如需详细的通路列表和分类,请参考
references/kegg_reference.md

Important Identifiers and Formats

重要标识符和格式

Pathway IDs

通路ID

  • map#####
    - Reference pathway (generic, not organism-specific)
  • hsa#####
    - Human pathway
  • mmu#####
    - Mouse pathway
  • map#####
    - 参考通路(通用型,非物种特异性)
  • hsa#####
    - 人类通路
  • mmu#####
    - 小鼠通路

Gene IDs

基因ID

  • Format:
    organism:gene_number
    (e.g.,
    hsa:10458
    )
  • 格式:
    物种代码:基因编号
    (例如
    hsa:10458

Compound IDs

化合物ID

  • Format:
    cpd:C#####
    (e.g.,
    cpd:C00002
    for ATP)
  • 格式:
    cpd:C#####
    (例如
    cpd:C00002
    代表ATP)

Drug IDs

药物ID

  • Format:
    dr:D#####
    (e.g.,
    dr:D00001
    )
  • 格式:
    dr:D#####
    (例如
    dr:D00001

Enzyme IDs

酶ID

  • Format:
    ec:EC_number
    (e.g.,
    ec:1.1.1.1
    )
  • 格式:
    ec:EC编号
    (例如
    ec:1.1.1.1

KO (KEGG Orthology) IDs

KO(KEGG直系同源组)ID

  • Format:
    ko:K#####
    (e.g.,
    ko:K00001
    )
  • 格式:
    ko:K#####
    (例如
    ko:K00001

API Limitations

API限制

Respect these constraints when using the KEGG API:
  1. Entry limits: Maximum 10 entries per operation (except image/kgml/json: 1 entry only)
  2. Academic use: API is for academic use only; commercial use requires licensing
  3. HTTP status codes: Check for 200 (success), 400 (bad request), 404 (not found)
  4. Rate limiting: No explicit limit, but avoid rapid-fire requests
使用KEGG API时请遵守以下约束:
  1. 条目数量限制:每次操作最多查询10个条目(图片/KGML/JSON格式仅支持1个条目)
  2. 学术用途限制:API仅用于学术用途;商业使用需要授权
  3. HTTP状态码:检查状态码判断结果:200(成功)、400(请求错误)、404(未找到)
  4. 请求频率限制:无明确限制,但请避免连续快速请求

Detailed Reference

详细参考文档

For comprehensive API documentation, database specifications, organism codes, and advanced usage, refer to
references/kegg_reference.md
. This includes:
  • Complete list of KEGG databases
  • Detailed API operation syntax
  • All organism codes
  • HTTP status codes and error handling
  • Integration with Biopython and R/Bioconductor
  • Best practices for API usage
如需完整的API文档、数据库规范、物种代码和高级使用方法,请参考
references/kegg_reference.md
。其中包括:
  • KEGG数据库的完整列表
  • 详细的API操作语法
  • 所有物种代码
  • HTTP状态码和错误处理
  • 与Biopython和R/Bioconductor的集成方法
  • API使用的最佳实践

Troubleshooting

故障排除

404 Not Found: Entry or database doesn't exist; verify IDs and organism codes 400 Bad Request: Syntax error in API call; check parameter formatting Empty results: Search term may not match entries; try broader keywords Image/KGML errors: These formats only work with single entries; remove batch processing
404 Not Found:条目或数据库不存在;请验证ID和物种代码 400 Bad Request:API调用语法错误;请检查参数格式 结果为空:搜索词可能未匹配到条目;尝试更宽泛的关键词 图片/KGML错误:这些格式仅支持单个条目;请取消批量处理

Additional Tools

其他工具

For interactive pathway visualization and annotation:
如需交互式通路可视化和注释工具: