tooluniverse-systems-biology
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSystems Biology & Pathway Analysis
系统生物学与通路分析
Comprehensive pathway and systems biology analysis integrating multiple curated databases to provide multi-dimensional view of biological systems, pathway enrichment, and protein-pathway relationships.
整合多个经过人工筛选的数据库,提供生物系统的多维视角、通路富集分析及蛋白质-通路关系的全面通路与系统生物学分析。
When to Use This Skill
何时使用该技能
Triggers:
- "Analyze pathways for this gene list"
- "What pathways is [protein] involved in?"
- "Find pathways related to [keyword/process]"
- "Perform pathway enrichment analysis"
- "Map proteins to biological pathways"
- "Find computational models for [process]"
- "Systems biology analysis of [genes/proteins]"
Use Cases:
- Gene Set Analysis: Identify enriched pathways from RNA-seq, proteomics, or screen results
- Protein Function: Discover pathways and processes a protein participates in
- Pathway Discovery: Find pathways related to diseases, processes, or phenotypes
- Systems Integration: Connect genes → pathways → processes → diseases
- Model Discovery: Find computational systems biology models (SBML)
- Cross-Database Validation: Compare pathway annotations across multiple sources
触发场景:
- "分析该基因列表的相关通路"
- "[蛋白质]参与哪些通路?"
- "查找与[关键词/生物学过程]相关的通路"
- "执行通路富集分析"
- "将蛋白质映射到生物通路"
- "查找[生物学过程]的计算模型"
- "对[基因/蛋白质]进行系统生物学分析"
适用场景:
- 基因集分析: 从RNA-seq、蛋白质组学或筛选结果中识别富集通路
- 蛋白质功能: 发现蛋白质参与的通路与生物学过程
- 通路发现: 查找与疾病、生物学过程或表型相关的通路
- 系统整合: 关联基因→通路→生物学过程→疾病
- 模型发现: 查找计算系统生物学模型(SBML格式)
- 跨数据库验证: 对比多来源的通路注释信息
Core Databases Integrated
整合的核心数据库
| Database | Coverage | Strengths |
|---|---|---|
| Reactome | Human-curated reactions & pathways | Detailed mechanistic pathways with reactions |
| KEGG | Reference pathways across organisms | Metabolic maps, disease pathways, drug targets |
| WikiPathways | Community-curated pathways | Emerging processes, collaborative updates |
| Pathway Commons | Integrated meta-database | Aggregates multiple sources (Reactome, KEGG, etc.) |
| BioModels | Computational SBML models | Mathematical/dynamic systems biology models |
| Enrichr | Statistical enrichment | Pathway over-representation analysis |
| 数据库 | 覆盖范围 | 优势 |
|---|---|---|
| Reactome | 人工筛选的人类反应与通路 | 包含反应过程的详细机制通路 |
| KEGG | 跨物种的参考通路 | 代谢图谱、疾病通路、药物靶点信息 |
| WikiPathways | 社区协作筛选的通路 | 新兴生物学过程、协作式更新机制 |
| Pathway Commons | 整合型元数据库 | 聚合多来源数据(Reactome、KEGG等) |
| BioModels | 计算型SBML模型 | 数学/动态系统生物学模型 |
| Enrichr | 统计富集分析 | 通路过度表达分析 |
Workflow Overview
工作流程概述
Input → Phase 1: Enrichment → Phase 2: Protein Mapping → Phase 3: Keyword Search → Phase 4: Top Pathways → Report输入 → 阶段1:富集分析 → 阶段2:蛋白质映射 → 阶段3:关键词搜索 → 阶段4:核心通路汇总 → 生成报告Phase 1: Pathway Enrichment Analysis
阶段1:通路富集分析
When: Gene list provided (from experiments, screens, differentially expressed genes)
Objective: Identify biological pathways statistically over-represented in gene list
触发条件: 提供基因列表(来自实验、筛选或差异表达基因结果)
目标: 识别基因列表中统计学上过度表达的生物通路
Tools Used
使用工具
enrichr_gene_enrichment_analysis:
- Input:
- : Array of gene symbols (e.g., ["TP53", "BRCA1", "EGFR"])
gene_list - : Pathway database (e.g., "KEGG_2021_Human", "Reactome_2022")
library
- Output: Array of enriched pathways with p-values, adjusted p-values, genes
- Use: Statistical over-representation analysis
enrichr_gene_enrichment_analysis:
- 输入:
- : 基因符号数组(例如:["TP53", "BRCA1", "EGFR"])
gene_list - : 通路数据库(例如:"KEGG_2021_Human", "Reactome_2022")
library
- 输出: 包含p值、校正p值及相关基因的富集通路数组
- 用途: 统计学过度表达分析
Workflow
工作流程
- Submit gene list to Enrichr
- Query KEGG pathway library for human
- Get enriched pathways sorted by significance
- Extract:
- Pathway names and IDs
- P-values (raw and adjusted)
- Genes from input list in each pathway
- Enrichment scores
- 将基因列表提交至Enrichr
- 查询人类KEGG通路库
- 获取按显著性排序的富集通路
- 提取以下信息:
- 通路名称与ID
- 原始p值与校正p值
- 每个通路中包含的输入列表基因
- 富集分数
Decision Logic
决策逻辑
- Significance threshold: Adjusted p-value < 0.05 (default)
- Minimum genes: At least 2 genes from input list in pathway
- Report top pathways: Show 10-20 most significant
- Empty results: If no enrichment → note "no significant pathways" (don't fail)
- 显著性阈值: 校正p值 < 0.05(默认值)
- 最小基因数: 每个通路至少包含2个输入列表中的基因
- 核心通路展示: 显示10-20个最显著的通路
- 无结果处理: 若无富集结果,标注“无显著性通路”(不终止流程)
Phase 2: Protein-Pathway Mapping
阶段2:蛋白质-通路映射
When: Protein UniProt ID provided
Objective: Map protein to all known pathways it participates in
触发条件: 提供蛋白质UniProt ID
目标: 将蛋白质映射到所有已知的参与通路
Tools Used
使用工具
Reactome_map_uniprot_to_pathways:
- Input:
- : UniProt accession (e.g., "P53350")
id
- Output: Array of Reactome pathways containing this protein
- Note: Parameter is (not
id)uniprot_id
Reactome_get_pathway_reactions:
- Input:
- : Reactome pathway stable ID (e.g., "R-HSA-73817")
stId
- Output: Array of reactions and subpathways
- Use: Get mechanistic details of pathways
Reactome_map_uniprot_to_pathways:
- 输入:
- : UniProt登录号(例如:"P53350")
id
- 输出: 包含该蛋白质的Reactome通路数组
- 注意: 参数为(而非
id)uniprot_id
Reactome_get_pathway_reactions:
- 输入:
- : Reactome通路稳定ID(例如:"R-HSA-73817")
stId
- 输出: 反应与子通路数组
- 用途: 获取通路的机制细节
Workflow
工作流程
- Map UniProt ID to Reactome pathways
- Get all pathways this protein appears in
- For top pathway (or user-specified):
- Retrieve detailed reactions and subpathways
- Extract event names, types (Reaction vs Pathway)
- Note disease associations if present
- 将UniProt ID映射到Reactome通路
- 获取该蛋白质参与的所有通路
- 针对核心通路(或用户指定通路):
- 检索详细的反应与子通路
- 提取事件名称、类型(反应 vs 通路)
- 标注相关疾病关联(若存在)
Decision Logic
决策逻辑
- Multiple pathways: Report all pathways, prioritize by hierarchical level
- Top pathway details: Get detailed reactions for 1-3 most relevant
- Versioned IDs: Reactome uses unversioned IDs - strip version if present
- Empty results: Check if protein ID valid; suggest alternative databases if Reactome empty
- 多通路处理: 报告所有通路,按层级优先级排序
- 核心通路细节: 获取1-3个最相关通路的详细反应信息
- 版本ID处理: Reactome使用无版本ID,若输入含版本号则去除
- 无结果处理: 检查蛋白质ID有效性;若Reactome无结果,建议尝试其他数据库
Phase 3: Keyword-Based Pathway Search
阶段3:基于关键词的通路搜索
When: User provides keyword or biological process name
Objective: Search multiple pathway databases to find relevant pathways
触发条件: 用户提供关键词或生物过程名称
目标: 搜索多个通路数据库以找到相关通路
Tools Used
使用工具
KEGG Search
KEGG搜索
kegg_search_pathway:
- Input: (e.g., "diabetes", "apoptosis")
keyword - Output: Array of pathway IDs and descriptions
- Coverage: Reference pathways, metabolism, diseases
kegg_get_pathway_info:
- Input: (e.g., "hsa04930")
pathway_id - Output: Pathway details, genes, compounds
- Use: Get detailed information for specific pathway
kegg_search_pathway:
- 输入: (例如:"diabetes", "apoptosis")
keyword - 输出: 通路ID与描述数组
- 覆盖范围: 参考通路、代谢、疾病
kegg_get_pathway_info:
- 输入: (例如:"hsa04930")
pathway_id - 输出: 通路详情、基因、化合物信息
- 用途: 获取特定通路的详细信息
WikiPathways Search
WikiPathways搜索
WikiPathways_search:
- Input:
- : Keyword or gene symbol
query - : Species filter (e.g., "Homo sapiens")
organism
- Output: Array of pathway matches with IDs, names, URLs
- Coverage: Community-curated, includes emerging pathways
WikiPathways_search:
- 输入:
- : 关键词或基因符号
query - : 物种筛选(例如:"Homo sapiens")
organism
- 输出: 包含ID、名称、URL的通路匹配结果数组
- 覆盖范围: 社区协作筛选,包含新兴通路
Pathway Commons Search
Pathway Commons搜索
pc_search_pathways:
- Input:
- : "search_pathways"
action - : Search term
keyword - : Optional filter (e.g., "reactome", "kegg")
datasource - : Max results (default: 10)
limit
- Output: Total hits and array of pathways with source attribution
- Coverage: Meta-database aggregating multiple sources
pc_search_pathways:
- 输入:
- : "search_pathways"
action - : 搜索词
keyword - : 可选筛选条件(例如:"reactome", "kegg")
datasource - : 最大结果数(默认:10)
limit
- 输出: 总命中数及带来源标注的通路数组
- 覆盖范围: 聚合多来源数据的元数据库
BioModels Search
BioModels搜索
biomodels_search:
- Input:
- : Keyword for computational models
query - : Max results
limit
- Output: Array of SBML models with IDs, names, publications
- Coverage: Mathematical/computational systems biology models
biomodels_search:
- 输入:
- : 计算模型关键词
query - : 最大结果数
limit
- 输出: 包含ID、名称、文献的SBML模型数组
- 覆盖范围: 数学/计算系统生物学模型
Workflow
工作流程
- Search KEGG pathways by keyword
- Search WikiPathways with organism filter
- Search Pathway Commons (aggregates multiple sources)
- Search BioModels for computational models
- Compile results from all sources
- Note overlaps and source-specific pathways
- 按关键词搜索KEGG通路
- 按物种筛选搜索WikiPathways
- 搜索Pathway Commons(聚合多来源数据)
- 搜索BioModels获取计算模型
- 整合所有来源的结果
- 标注结果重叠情况及来源特异性通路
Decision Logic
决策逻辑
- Parallel queries: Search all databases simultaneously (independent)
- Empty from one source: Continue with other sources (common for specialized keywords)
- Result consolidation: Group by pathway concept, note which databases contain each
- Model availability: BioModels may be empty for many processes - this is normal
- 并行查询: 同时搜索所有数据库(相互独立)
- 单来源无结果: 继续处理其他来源结果(专业关键词常见情况)
- 结果整合: 按通路概念分组,标注各通路所在数据库
- 模型可用性: 多数生物学过程在BioModels中无结果为正常情况
Phase 4: Top-Level Pathway Catalog
阶段4:核心通路目录
When: Always included to provide context
Objective: Show major biological systems/pathways for organism
触发条件: 始终包含以提供上下文
目标: 展示目标物种的主要生物系统/通路
Tools Used
使用工具
Reactome_list_top_pathways:
- Input: (e.g., "Homo sapiens")
species - Output: Array of top-level pathway categories
- Use: Provides hierarchical pathway organization
Reactome_list_top_pathways:
- 输入: (例如:"Homo sapiens")
species - 输出: 核心通路分类数组
- 用途: 提供通路层级组织结构
Workflow
工作流程
- Retrieve top-level pathways for specified organism
- Display pathway categories (metabolism, signaling, disease, etc.)
- Serve as reference for pathway hierarchy
- 获取指定物种的核心通路
- 展示通路分类(代谢、信号传导、疾病等)
- 作为通路层级的参考
Decision Logic
决策逻辑
- Always show: Provides context even if other phases empty
- Organism-specific: Filter by species of interest
- Hierarchical view: These are parent pathways with many subpathways
- 强制展示: 即使其他阶段无结果,也需提供上下文
- 物种特异性: 按目标物种筛选
- 层级视图: 这些是包含多个子通路的父通路
Output Structure
输出结构
Report Format
报告格式
Progressive Markdown Report:
- Create report file first
- Add sections progressively
- Each section self-contained (handles empty gracefully)
Required Sections:
- Header: Analysis parameters (genes, protein, keyword, organism)
- Phase 1 Results: Pathway enrichment (if gene list)
- Phase 2 Results: Protein-pathway mapping (if protein ID)
- Phase 3 Results: Keyword search across databases (if keyword)
- Phase 4 Results: Top-level pathway catalog (always)
Per-Database Subsections:
- Database name and result count
- Table of pathways with key metadata
- Note if database returns no results
- Links or IDs for follow-up
渐进式Markdown报告:
- 先创建报告文件
- 逐步添加章节
- 每个章节独立(可优雅处理无结果情况)
必填章节:
- 页眉: 分析参数(基因、蛋白质、关键词、物种)
- 阶段1结果: 通路富集分析(若提供基因列表)
- 阶段2结果: 蛋白质-通路映射(若提供蛋白质ID)
- 阶段3结果: 跨数据库关键词搜索(若提供关键词)
- 阶段4结果: 核心通路目录(始终包含)
数据库子章节:
- 数据库名称与结果数量
- 含关键元数据的通路表格
- 标注数据库无结果情况
- 提供后续分析的链接或ID
Data Tables
数据表格
Enrichment Results:
| Pathway | P-value | Adjusted P-value | Genes |
| ... | ... | ... | ... |
Protein Pathways:
| Pathway Name | Pathway ID | Species |
| ... | ... | ... |
Keyword Search:
| Pathway/Model ID | Name | Source/Database |
| ... | ... | ... |
富集分析结果:
| 通路 | P值 | 校正P值 | 基因 |
| ... | ... | ... | ... |
蛋白质通路:
| 通路名称 | 通路ID | 物种 |
| ... | ... | ... |
关键词搜索结果:
| 通路/模型ID | 名称 | 来源/数据库 |
| ... | ... | ... |
Tool Parameter Reference
工具参数参考
Critical Parameter Notes (from testing):
| Tool | Parameter | CORRECT Name | Common Mistake |
|---|---|---|---|
| Reactome_map_uniprot_to_pathways | | ✅ | ❌ |
| kegg_search_pathway | | ✅ | - |
| WikiPathways_search | | ✅ | - |
| pc_search_pathways | | ✅ Both required | ❌ |
| enrichr_gene_enrichment_analysis | | ✅ | - |
Response Format Notes:
- Reactome: Returns list directly (not wrapped in )
{status, data} - Pathway Commons: Returns dict directly with and
total_hitspathways - Others: Standard format
{status: "success", data: [...]}
关键参数说明(来自测试):
| 工具 | 参数 | 正确名称 | 常见错误 |
|---|---|---|---|
| Reactome_map_uniprot_to_pathways | | ✅ | ❌ |
| kegg_search_pathway | | ✅ | - |
| WikiPathways_search | | ✅ | - |
| pc_search_pathways | | ✅ 两者均必填 | ❌ |
| enrichr_gene_enrichment_analysis | | ✅ | - |
响应格式说明:
- Reactome: 直接返回列表(未包裹在中)
{status, data} - Pathway Commons: 直接返回含与
total_hits的字典pathways - 其他工具: 标准格式
{status: "success", data: [...]}
Fallback Strategies
fallback策略
Enrichment Analysis
富集分析
- Primary: Enrichr with KEGG library
- Fallback: Try alternative libraries (Reactome, GO Biological Process)
- If all fail: Note "enrichment analysis unavailable" and continue
- 主方案: 使用Enrichr结合KEGG库
- 备选方案: 尝试其他库(Reactome、GO生物过程)
- 全部失败: 标注“富集分析不可用”并继续流程
Protein Mapping
蛋白质映射
- Primary: Reactome protein-pathway mapping
- Fallback: Use keyword search with protein name
- If empty: Check if protein ID valid; suggest checking gene symbol
- 主方案: Reactome蛋白质-通路映射
- 备选方案: 使用蛋白质名称进行关键词搜索
- 无结果处理: 检查蛋白质ID有效性;建议尝试基因符号
Keyword Search
关键词搜索
- Primary: Search all databases (KEGG, WikiPathways, Pathway Commons, BioModels)
- Fallback: If all empty, broaden keyword (e.g., "diabetes" → "glucose")
- If still empty: Note "no pathways found for [keyword]"
- 主方案: 搜索所有数据库(KEGG、WikiPathways、Pathway Commons、BioModels)
- 备选方案: 若全部无结果,放宽关键词范围(例如:"diabetes" → "glucose")
- 仍无结果: 标注“未找到与[关键词]相关的通路”
Common Use Patterns
常见使用模式
Pattern 1: Differential Expression Analysis
模式1:差异表达分析
Input: Gene list from RNA-seq (upregulated genes)
Workflow: Phase 1 (Enrichment) → Phase 4 (Context)
Output: Enriched pathways explaining expression changes输入:RNA-seq得到的上调基因列表
工作流程:阶段1(富集分析)→ 阶段4(上下文)
输出:解释表达变化的富集通路Pattern 2: Protein Function Investigation
模式2:蛋白质功能研究
Input: UniProt ID of protein of interest
Workflow: Phase 2 (Protein mapping) → Phase 3 (Keyword with protein name)
Output: All pathways involving protein + related pathways输入:目标蛋白质的UniProt ID
工作流程:阶段2(蛋白质映射)→ 阶段3(蛋白质名称关键词搜索)
输出:蛋白质参与的所有通路 + 相关通路Pattern 3: Disease Pathway Exploration
模式3:疾病通路探索
Input: Disease name or process keyword
Workflow: Phase 3 (Keyword search) → Phase 4 (Context)
Output: Pathways from multiple databases related to disease输入:疾病名称或过程关键词
工作流程:阶段3(关键词搜索)→ 阶段4(上下文)
输出:多数据库中与疾病相关的通路Pattern 4: Comprehensive Multi-Input
模式4:综合多输入分析
Input: Gene list + protein ID + keyword
Workflow: All phases
Output: Complete systems view with enrichment, specific mappings, and context输入:基因列表 + 蛋白质ID + 关键词
工作流程:所有阶段
输出:包含富集分析、特异性映射与上下文的完整系统视图Quality Checks
质量检查
Data Completeness
数据完整性
- At least one analysis phase completed successfully
- Each database result includes source attribution
- Empty results explicitly noted (not silently omitted)
- P-values reported with appropriate precision
- Pathway IDs provided for follow-up analysis
- 至少完成一个分析阶段
- 每个数据库结果均标注来源
- 无结果情况明确标注(不静默忽略)
- P值按合适精度报告
- 提供通路ID以支持后续分析
Biological Validity
生物学有效性
- Enrichment p-values show significance threshold
- Protein mappings consistent with known function
- Keyword results relevant to query
- Cross-database results show expected overlaps
- 富集分析P值标注显著性阈值
- 蛋白质映射与已知功能一致
- 关键词结果与查询相关
- 跨数据库结果显示预期重叠
Report Quality
报告质量
- All sections present even if "no data"
- Tables formatted consistently
- Source databases clearly attributed
- Follow-up recommendations if data sparse
- 所有章节均存在(即使“无数据”)
- 表格格式一致
- 来源数据库标注清晰
- 数据稀疏时提供后续建议
Limitations & Known Issues
局限性与已知问题
Database-Specific
数据库特异性
- Reactome: Strong human coverage; limited for non-model organisms
- KEGG: Requires keyword match; may miss synonyms
- WikiPathways: Variable curation quality; check pathway version dates
- Pathway Commons: Aggregation can have duplicates; check source
- BioModels: Sparse for many processes; often returns no results
- Enrichr: Requires gene symbols (not IDs); case-sensitive
- Reactome: 人类覆盖全面;非模式生物覆盖有限
- KEGG: 需精确关键词匹配;可能遗漏同义词
- WikiPathways: 筛选质量参差不齐;需检查通路版本日期
- Pathway Commons: 聚合数据可能存在重复;需核对来源
- BioModels: 多数过程数据稀疏;常无结果返回
- Enrichr: 仅支持基因符号(不支持ID);区分大小写
Technical
技术问题
- Response formats: Different databases use different response structures (handled in implementation)
- Rate limits: Some databases have rate limits for heavy usage
- Version differences: Pathway databases updated at different rates
- 响应格式: 不同数据库使用不同响应结构(已在实现中处理)
- 速率限制: 部分数据库对高频使用有限制
- 版本差异: 各通路数据库更新频率不同
Analysis
分析局限性
- Enrichment bias: Pathway enrichment depends on pathway size and annotation completeness
- Organism specificity: Not all databases cover all organisms equally
- Pathway definitions: Same biological process may be modeled differently across databases
- 富集偏差: 通路富集结果依赖通路大小与注释完整性
- 物种特异性: 并非所有数据库对所有物种的覆盖程度一致
- 通路定义: 同一生物过程在不同数据库中的建模方式可能不同
Summary
总结
Systems Biology & Pathway Analysis Skill provides comprehensive pathway analysis by integrating:
- ✅ Statistical pathway enrichment (Enrichr)
- ✅ Protein-pathway mapping (Reactome)
- ✅ Multi-database keyword search (KEGG, WikiPathways, Pathway Commons, BioModels)
- ✅ Hierarchical pathway context (Reactome top-level)
Outputs: Markdown report with pathway tables, enrichment statistics, and cross-database comparisons
Best for: Gene set analysis, protein function investigation, pathway discovery, systems-level biology
系统生物学与通路分析技能通过整合以下功能提供全面的通路分析:
- ✅ 统计通路富集分析(Enrichr)
- ✅ 蛋白质-通路映射(Reactome)
- ✅ 多数据库关键词搜索(KEGG、WikiPathways、Pathway Commons、BioModels)
- ✅ 层级通路上下文(Reactome核心通路)
输出: 含通路表格、富集统计与跨数据库对比的Markdown报告
最佳适用场景: 基因集分析、蛋白质功能研究、通路发现、系统层面生物学研究