tooluniverse-systems-biology

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Systems Biology & Pathway Analysis

系统生物学与通路分析

Comprehensive pathway and systems biology analysis integrating multiple curated databases to provide multi-dimensional view of biological systems, pathway enrichment, and protein-pathway relationships.
整合多个经过人工筛选的数据库,提供生物系统的多维视角、通路富集分析及蛋白质-通路关系的全面通路与系统生物学分析。

When to Use This Skill

何时使用该技能

Triggers:
  • "Analyze pathways for this gene list"
  • "What pathways is [protein] involved in?"
  • "Find pathways related to [keyword/process]"
  • "Perform pathway enrichment analysis"
  • "Map proteins to biological pathways"
  • "Find computational models for [process]"
  • "Systems biology analysis of [genes/proteins]"
Use Cases:
  1. Gene Set Analysis: Identify enriched pathways from RNA-seq, proteomics, or screen results
  2. Protein Function: Discover pathways and processes a protein participates in
  3. Pathway Discovery: Find pathways related to diseases, processes, or phenotypes
  4. Systems Integration: Connect genes → pathways → processes → diseases
  5. Model Discovery: Find computational systems biology models (SBML)
  6. Cross-Database Validation: Compare pathway annotations across multiple sources
触发场景:
  • "分析该基因列表的相关通路"
  • "[蛋白质]参与哪些通路?"
  • "查找与[关键词/生物学过程]相关的通路"
  • "执行通路富集分析"
  • "将蛋白质映射到生物通路"
  • "查找[生物学过程]的计算模型"
  • "对[基因/蛋白质]进行系统生物学分析"
适用场景:
  1. 基因集分析: 从RNA-seq、蛋白质组学或筛选结果中识别富集通路
  2. 蛋白质功能: 发现蛋白质参与的通路与生物学过程
  3. 通路发现: 查找与疾病、生物学过程或表型相关的通路
  4. 系统整合: 关联基因→通路→生物学过程→疾病
  5. 模型发现: 查找计算系统生物学模型(SBML格式)
  6. 跨数据库验证: 对比多来源的通路注释信息

Core Databases Integrated

整合的核心数据库

DatabaseCoverageStrengths
ReactomeHuman-curated reactions & pathwaysDetailed mechanistic pathways with reactions
KEGGReference pathways across organismsMetabolic maps, disease pathways, drug targets
WikiPathwaysCommunity-curated pathwaysEmerging processes, collaborative updates
Pathway CommonsIntegrated meta-databaseAggregates multiple sources (Reactome, KEGG, etc.)
BioModelsComputational SBML modelsMathematical/dynamic systems biology models
EnrichrStatistical enrichmentPathway over-representation analysis
数据库覆盖范围优势
Reactome人工筛选的人类反应与通路包含反应过程的详细机制通路
KEGG跨物种的参考通路代谢图谱、疾病通路、药物靶点信息
WikiPathways社区协作筛选的通路新兴生物学过程、协作式更新机制
Pathway Commons整合型元数据库聚合多来源数据(Reactome、KEGG等)
BioModels计算型SBML模型数学/动态系统生物学模型
Enrichr统计富集分析通路过度表达分析

Workflow Overview

工作流程概述

Input → Phase 1: Enrichment → Phase 2: Protein Mapping → Phase 3: Keyword Search → Phase 4: Top Pathways → Report

输入 → 阶段1:富集分析 → 阶段2:蛋白质映射 → 阶段3:关键词搜索 → 阶段4:核心通路汇总 → 生成报告

Phase 1: Pathway Enrichment Analysis

阶段1:通路富集分析

When: Gene list provided (from experiments, screens, differentially expressed genes)
Objective: Identify biological pathways statistically over-represented in gene list
触发条件: 提供基因列表(来自实验、筛选或差异表达基因结果)
目标: 识别基因列表中统计学上过度表达的生物通路

Tools Used

使用工具

enrichr_gene_enrichment_analysis:
  • Input:
    • gene_list
      : Array of gene symbols (e.g., ["TP53", "BRCA1", "EGFR"])
    • library
      : Pathway database (e.g., "KEGG_2021_Human", "Reactome_2022")
  • Output: Array of enriched pathways with p-values, adjusted p-values, genes
  • Use: Statistical over-representation analysis
enrichr_gene_enrichment_analysis:
  • 输入:
    • gene_list
      : 基因符号数组(例如:["TP53", "BRCA1", "EGFR"])
    • library
      : 通路数据库(例如:"KEGG_2021_Human", "Reactome_2022")
  • 输出: 包含p值、校正p值及相关基因的富集通路数组
  • 用途: 统计学过度表达分析

Workflow

工作流程

  1. Submit gene list to Enrichr
  2. Query KEGG pathway library for human
  3. Get enriched pathways sorted by significance
  4. Extract:
    • Pathway names and IDs
    • P-values (raw and adjusted)
    • Genes from input list in each pathway
    • Enrichment scores
  1. 将基因列表提交至Enrichr
  2. 查询人类KEGG通路库
  3. 获取按显著性排序的富集通路
  4. 提取以下信息:
    • 通路名称与ID
    • 原始p值与校正p值
    • 每个通路中包含的输入列表基因
    • 富集分数

Decision Logic

决策逻辑

  • Significance threshold: Adjusted p-value < 0.05 (default)
  • Minimum genes: At least 2 genes from input list in pathway
  • Report top pathways: Show 10-20 most significant
  • Empty results: If no enrichment → note "no significant pathways" (don't fail)

  • 显著性阈值: 校正p值 < 0.05(默认值)
  • 最小基因数: 每个通路至少包含2个输入列表中的基因
  • 核心通路展示: 显示10-20个最显著的通路
  • 无结果处理: 若无富集结果,标注“无显著性通路”(不终止流程)

Phase 2: Protein-Pathway Mapping

阶段2:蛋白质-通路映射

When: Protein UniProt ID provided
Objective: Map protein to all known pathways it participates in
触发条件: 提供蛋白质UniProt ID
目标: 将蛋白质映射到所有已知的参与通路

Tools Used

使用工具

Reactome_map_uniprot_to_pathways:
  • Input:
    • id
      : UniProt accession (e.g., "P53350")
  • Output: Array of Reactome pathways containing this protein
  • Note: Parameter is
    id
    (not
    uniprot_id
    )
Reactome_get_pathway_reactions:
  • Input:
    • stId
      : Reactome pathway stable ID (e.g., "R-HSA-73817")
  • Output: Array of reactions and subpathways
  • Use: Get mechanistic details of pathways
Reactome_map_uniprot_to_pathways:
  • 输入:
    • id
      : UniProt登录号(例如:"P53350")
  • 输出: 包含该蛋白质的Reactome通路数组
  • 注意: 参数为
    id
    (而非
    uniprot_id
Reactome_get_pathway_reactions:
  • 输入:
    • stId
      : Reactome通路稳定ID(例如:"R-HSA-73817")
  • 输出: 反应与子通路数组
  • 用途: 获取通路的机制细节

Workflow

工作流程

  1. Map UniProt ID to Reactome pathways
  2. Get all pathways this protein appears in
  3. For top pathway (or user-specified):
    • Retrieve detailed reactions and subpathways
    • Extract event names, types (Reaction vs Pathway)
    • Note disease associations if present
  1. 将UniProt ID映射到Reactome通路
  2. 获取该蛋白质参与的所有通路
  3. 针对核心通路(或用户指定通路):
    • 检索详细的反应与子通路
    • 提取事件名称、类型(反应 vs 通路)
    • 标注相关疾病关联(若存在)

Decision Logic

决策逻辑

  • Multiple pathways: Report all pathways, prioritize by hierarchical level
  • Top pathway details: Get detailed reactions for 1-3 most relevant
  • Versioned IDs: Reactome uses unversioned IDs - strip version if present
  • Empty results: Check if protein ID valid; suggest alternative databases if Reactome empty

  • 多通路处理: 报告所有通路,按层级优先级排序
  • 核心通路细节: 获取1-3个最相关通路的详细反应信息
  • 版本ID处理: Reactome使用无版本ID,若输入含版本号则去除
  • 无结果处理: 检查蛋白质ID有效性;若Reactome无结果,建议尝试其他数据库

Phase 3: Keyword-Based Pathway Search

阶段3:基于关键词的通路搜索

When: User provides keyword or biological process name
Objective: Search multiple pathway databases to find relevant pathways
触发条件: 用户提供关键词或生物过程名称
目标: 搜索多个通路数据库以找到相关通路

Tools Used

使用工具

KEGG Search

KEGG搜索

kegg_search_pathway:
  • Input:
    keyword
    (e.g., "diabetes", "apoptosis")
  • Output: Array of pathway IDs and descriptions
  • Coverage: Reference pathways, metabolism, diseases
kegg_get_pathway_info:
  • Input:
    pathway_id
    (e.g., "hsa04930")
  • Output: Pathway details, genes, compounds
  • Use: Get detailed information for specific pathway
kegg_search_pathway:
  • 输入:
    keyword
    (例如:"diabetes", "apoptosis")
  • 输出: 通路ID与描述数组
  • 覆盖范围: 参考通路、代谢、疾病
kegg_get_pathway_info:
  • 输入:
    pathway_id
    (例如:"hsa04930")
  • 输出: 通路详情、基因、化合物信息
  • 用途: 获取特定通路的详细信息

WikiPathways Search

WikiPathways搜索

WikiPathways_search:
  • Input:
    • query
      : Keyword or gene symbol
    • organism
      : Species filter (e.g., "Homo sapiens")
  • Output: Array of pathway matches with IDs, names, URLs
  • Coverage: Community-curated, includes emerging pathways
WikiPathways_search:
  • 输入:
    • query
      : 关键词或基因符号
    • organism
      : 物种筛选(例如:"Homo sapiens")
  • 输出: 包含ID、名称、URL的通路匹配结果数组
  • 覆盖范围: 社区协作筛选,包含新兴通路

Pathway Commons Search

Pathway Commons搜索

pc_search_pathways:
  • Input:
    • action
      : "search_pathways"
    • keyword
      : Search term
    • datasource
      : Optional filter (e.g., "reactome", "kegg")
    • limit
      : Max results (default: 10)
  • Output: Total hits and array of pathways with source attribution
  • Coverage: Meta-database aggregating multiple sources
pc_search_pathways:
  • 输入:
    • action
      : "search_pathways"
    • keyword
      : 搜索词
    • datasource
      : 可选筛选条件(例如:"reactome", "kegg")
    • limit
      : 最大结果数(默认:10)
  • 输出: 总命中数及带来源标注的通路数组
  • 覆盖范围: 聚合多来源数据的元数据库

BioModels Search

BioModels搜索

biomodels_search:
  • Input:
    • query
      : Keyword for computational models
    • limit
      : Max results
  • Output: Array of SBML models with IDs, names, publications
  • Coverage: Mathematical/computational systems biology models
biomodels_search:
  • 输入:
    • query
      : 计算模型关键词
    • limit
      : 最大结果数
  • 输出: 包含ID、名称、文献的SBML模型数组
  • 覆盖范围: 数学/计算系统生物学模型

Workflow

工作流程

  1. Search KEGG pathways by keyword
  2. Search WikiPathways with organism filter
  3. Search Pathway Commons (aggregates multiple sources)
  4. Search BioModels for computational models
  5. Compile results from all sources
  6. Note overlaps and source-specific pathways
  1. 按关键词搜索KEGG通路
  2. 按物种筛选搜索WikiPathways
  3. 搜索Pathway Commons(聚合多来源数据)
  4. 搜索BioModels获取计算模型
  5. 整合所有来源的结果
  6. 标注结果重叠情况及来源特异性通路

Decision Logic

决策逻辑

  • Parallel queries: Search all databases simultaneously (independent)
  • Empty from one source: Continue with other sources (common for specialized keywords)
  • Result consolidation: Group by pathway concept, note which databases contain each
  • Model availability: BioModels may be empty for many processes - this is normal

  • 并行查询: 同时搜索所有数据库(相互独立)
  • 单来源无结果: 继续处理其他来源结果(专业关键词常见情况)
  • 结果整合: 按通路概念分组,标注各通路所在数据库
  • 模型可用性: 多数生物学过程在BioModels中无结果为正常情况

Phase 4: Top-Level Pathway Catalog

阶段4:核心通路目录

When: Always included to provide context
Objective: Show major biological systems/pathways for organism
触发条件: 始终包含以提供上下文
目标: 展示目标物种的主要生物系统/通路

Tools Used

使用工具

Reactome_list_top_pathways:
  • Input:
    species
    (e.g., "Homo sapiens")
  • Output: Array of top-level pathway categories
  • Use: Provides hierarchical pathway organization
Reactome_list_top_pathways:
  • 输入:
    species
    (例如:"Homo sapiens")
  • 输出: 核心通路分类数组
  • 用途: 提供通路层级组织结构

Workflow

工作流程

  1. Retrieve top-level pathways for specified organism
  2. Display pathway categories (metabolism, signaling, disease, etc.)
  3. Serve as reference for pathway hierarchy
  1. 获取指定物种的核心通路
  2. 展示通路分类(代谢、信号传导、疾病等)
  3. 作为通路层级的参考

Decision Logic

决策逻辑

  • Always show: Provides context even if other phases empty
  • Organism-specific: Filter by species of interest
  • Hierarchical view: These are parent pathways with many subpathways

  • 强制展示: 即使其他阶段无结果,也需提供上下文
  • 物种特异性: 按目标物种筛选
  • 层级视图: 这些是包含多个子通路的父通路

Output Structure

输出结构

Report Format

报告格式

Progressive Markdown Report:
  • Create report file first
  • Add sections progressively
  • Each section self-contained (handles empty gracefully)
Required Sections:
  1. Header: Analysis parameters (genes, protein, keyword, organism)
  2. Phase 1 Results: Pathway enrichment (if gene list)
  3. Phase 2 Results: Protein-pathway mapping (if protein ID)
  4. Phase 3 Results: Keyword search across databases (if keyword)
  5. Phase 4 Results: Top-level pathway catalog (always)
Per-Database Subsections:
  • Database name and result count
  • Table of pathways with key metadata
  • Note if database returns no results
  • Links or IDs for follow-up
渐进式Markdown报告:
  • 先创建报告文件
  • 逐步添加章节
  • 每个章节独立(可优雅处理无结果情况)
必填章节:
  1. 页眉: 分析参数(基因、蛋白质、关键词、物种)
  2. 阶段1结果: 通路富集分析(若提供基因列表)
  3. 阶段2结果: 蛋白质-通路映射(若提供蛋白质ID)
  4. 阶段3结果: 跨数据库关键词搜索(若提供关键词)
  5. 阶段4结果: 核心通路目录(始终包含)
数据库子章节:
  • 数据库名称与结果数量
  • 含关键元数据的通路表格
  • 标注数据库无结果情况
  • 提供后续分析的链接或ID

Data Tables

数据表格

Enrichment Results: | Pathway | P-value | Adjusted P-value | Genes | | ... | ... | ... | ... |
Protein Pathways: | Pathway Name | Pathway ID | Species | | ... | ... | ... |
Keyword Search: | Pathway/Model ID | Name | Source/Database | | ... | ... | ... |

富集分析结果: | 通路 | P值 | 校正P值 | 基因 | | ... | ... | ... | ... |
蛋白质通路: | 通路名称 | 通路ID | 物种 | | ... | ... | ... |
关键词搜索结果: | 通路/模型ID | 名称 | 来源/数据库 | | ... | ... | ... |

Tool Parameter Reference

工具参数参考

Critical Parameter Notes (from testing):
ToolParameterCORRECT NameCommon Mistake
Reactome_map_uniprot_to_pathways
id
id
uniprot_id
kegg_search_pathway
keyword
keyword
-
WikiPathways_search
query
query
-
pc_search_pathways
action
+
keyword
✅ Both required
action
optional
enrichr_gene_enrichment_analysis
gene_list
gene_list
-
Response Format Notes:
  • Reactome: Returns list directly (not wrapped in
    {status, data}
    )
  • Pathway Commons: Returns dict directly with
    total_hits
    and
    pathways
  • Others: Standard
    {status: "success", data: [...]}
    format

关键参数说明(来自测试):
工具参数正确名称常见错误
Reactome_map_uniprot_to_pathways
id
id
uniprot_id
kegg_search_pathway
keyword
keyword
-
WikiPathways_search
query
query
-
pc_search_pathways
action
+
keyword
✅ 两者均必填
action
可选
enrichr_gene_enrichment_analysis
gene_list
gene_list
-
响应格式说明:
  • Reactome: 直接返回列表(未包裹在
    {status, data}
    中)
  • Pathway Commons: 直接返回含
    total_hits
    pathways
    的字典
  • 其他工具: 标准
    {status: "success", data: [...]}
    格式

Fallback Strategies

fallback策略

Enrichment Analysis

富集分析

  • Primary: Enrichr with KEGG library
  • Fallback: Try alternative libraries (Reactome, GO Biological Process)
  • If all fail: Note "enrichment analysis unavailable" and continue
  • 主方案: 使用Enrichr结合KEGG库
  • 备选方案: 尝试其他库(Reactome、GO生物过程)
  • 全部失败: 标注“富集分析不可用”并继续流程

Protein Mapping

蛋白质映射

  • Primary: Reactome protein-pathway mapping
  • Fallback: Use keyword search with protein name
  • If empty: Check if protein ID valid; suggest checking gene symbol
  • 主方案: Reactome蛋白质-通路映射
  • 备选方案: 使用蛋白质名称进行关键词搜索
  • 无结果处理: 检查蛋白质ID有效性;建议尝试基因符号

Keyword Search

关键词搜索

  • Primary: Search all databases (KEGG, WikiPathways, Pathway Commons, BioModels)
  • Fallback: If all empty, broaden keyword (e.g., "diabetes" → "glucose")
  • If still empty: Note "no pathways found for [keyword]"

  • 主方案: 搜索所有数据库(KEGG、WikiPathways、Pathway Commons、BioModels)
  • 备选方案: 若全部无结果,放宽关键词范围(例如:"diabetes" → "glucose")
  • 仍无结果: 标注“未找到与[关键词]相关的通路”

Common Use Patterns

常见使用模式

Pattern 1: Differential Expression Analysis

模式1:差异表达分析

Input: Gene list from RNA-seq (upregulated genes)
Workflow: Phase 1 (Enrichment) → Phase 4 (Context)
Output: Enriched pathways explaining expression changes
输入:RNA-seq得到的上调基因列表
工作流程:阶段1(富集分析)→ 阶段4(上下文)
输出:解释表达变化的富集通路

Pattern 2: Protein Function Investigation

模式2:蛋白质功能研究

Input: UniProt ID of protein of interest
Workflow: Phase 2 (Protein mapping) → Phase 3 (Keyword with protein name)
Output: All pathways involving protein + related pathways
输入:目标蛋白质的UniProt ID
工作流程:阶段2(蛋白质映射)→ 阶段3(蛋白质名称关键词搜索)
输出:蛋白质参与的所有通路 + 相关通路

Pattern 3: Disease Pathway Exploration

模式3:疾病通路探索

Input: Disease name or process keyword
Workflow: Phase 3 (Keyword search) → Phase 4 (Context)
Output: Pathways from multiple databases related to disease
输入:疾病名称或过程关键词
工作流程:阶段3(关键词搜索)→ 阶段4(上下文)
输出:多数据库中与疾病相关的通路

Pattern 4: Comprehensive Multi-Input

模式4:综合多输入分析

Input: Gene list + protein ID + keyword
Workflow: All phases
Output: Complete systems view with enrichment, specific mappings, and context

输入:基因列表 + 蛋白质ID + 关键词
工作流程:所有阶段
输出:包含富集分析、特异性映射与上下文的完整系统视图

Quality Checks

质量检查

Data Completeness

数据完整性

  • At least one analysis phase completed successfully
  • Each database result includes source attribution
  • Empty results explicitly noted (not silently omitted)
  • P-values reported with appropriate precision
  • Pathway IDs provided for follow-up analysis
  • 至少完成一个分析阶段
  • 每个数据库结果均标注来源
  • 无结果情况明确标注(不静默忽略)
  • P值按合适精度报告
  • 提供通路ID以支持后续分析

Biological Validity

生物学有效性

  • Enrichment p-values show significance threshold
  • Protein mappings consistent with known function
  • Keyword results relevant to query
  • Cross-database results show expected overlaps
  • 富集分析P值标注显著性阈值
  • 蛋白质映射与已知功能一致
  • 关键词结果与查询相关
  • 跨数据库结果显示预期重叠

Report Quality

报告质量

  • All sections present even if "no data"
  • Tables formatted consistently
  • Source databases clearly attributed
  • Follow-up recommendations if data sparse

  • 所有章节均存在(即使“无数据”)
  • 表格格式一致
  • 来源数据库标注清晰
  • 数据稀疏时提供后续建议

Limitations & Known Issues

局限性与已知问题

Database-Specific

数据库特异性

  • Reactome: Strong human coverage; limited for non-model organisms
  • KEGG: Requires keyword match; may miss synonyms
  • WikiPathways: Variable curation quality; check pathway version dates
  • Pathway Commons: Aggregation can have duplicates; check source
  • BioModels: Sparse for many processes; often returns no results
  • Enrichr: Requires gene symbols (not IDs); case-sensitive
  • Reactome: 人类覆盖全面;非模式生物覆盖有限
  • KEGG: 需精确关键词匹配;可能遗漏同义词
  • WikiPathways: 筛选质量参差不齐;需检查通路版本日期
  • Pathway Commons: 聚合数据可能存在重复;需核对来源
  • BioModels: 多数过程数据稀疏;常无结果返回
  • Enrichr: 仅支持基因符号(不支持ID);区分大小写

Technical

技术问题

  • Response formats: Different databases use different response structures (handled in implementation)
  • Rate limits: Some databases have rate limits for heavy usage
  • Version differences: Pathway databases updated at different rates
  • 响应格式: 不同数据库使用不同响应结构(已在实现中处理)
  • 速率限制: 部分数据库对高频使用有限制
  • 版本差异: 各通路数据库更新频率不同

Analysis

分析局限性

  • Enrichment bias: Pathway enrichment depends on pathway size and annotation completeness
  • Organism specificity: Not all databases cover all organisms equally
  • Pathway definitions: Same biological process may be modeled differently across databases

  • 富集偏差: 通路富集结果依赖通路大小与注释完整性
  • 物种特异性: 并非所有数据库对所有物种的覆盖程度一致
  • 通路定义: 同一生物过程在不同数据库中的建模方式可能不同

Summary

总结

Systems Biology & Pathway Analysis Skill provides comprehensive pathway analysis by integrating:
  1. ✅ Statistical pathway enrichment (Enrichr)
  2. ✅ Protein-pathway mapping (Reactome)
  3. ✅ Multi-database keyword search (KEGG, WikiPathways, Pathway Commons, BioModels)
  4. ✅ Hierarchical pathway context (Reactome top-level)
Outputs: Markdown report with pathway tables, enrichment statistics, and cross-database comparisons
Best for: Gene set analysis, protein function investigation, pathway discovery, systems-level biology
系统生物学与通路分析技能通过整合以下功能提供全面的通路分析:
  1. ✅ 统计通路富集分析(Enrichr)
  2. ✅ 蛋白质-通路映射(Reactome)
  3. ✅ 多数据库关键词搜索(KEGG、WikiPathways、Pathway Commons、BioModels)
  4. ✅ 层级通路上下文(Reactome核心通路)
输出: 含通路表格、富集统计与跨数据库对比的Markdown报告
最佳适用场景: 基因集分析、蛋白质功能研究、通路发现、系统层面生物学研究