tooluniverse-gene-enrichment

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Gene Enrichment and Pathway Analysis

基因富集与通路分析

Perform comprehensive gene enrichment analysis including Gene Ontology (GO), KEGG, Reactome, WikiPathways, and MSigDB enrichment using both Over-Representation Analysis (ORA) and Gene Set Enrichment Analysis (GSEA). Integrates local computation via gseapy with ToolUniverse pathway databases for cross-validated, publication-ready results.

IMPORTANT: Always use English terms in tool calls (gene names, pathway names, organism names), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.

使用过表达分析（ORA）和基因集富集分析（GSEA）两种方法，进行包括基因本体（GO）、KEGG、Reactome、WikiPathways和MSigDB在内的全面基因富集分析。通过gseapy进行本地计算，并结合ToolUniverse通路数据库，可得到经过交叉验证、可用于发表的结果。

重要提示：在调用工具时始终使用英文术语（基因名称、通路名称、生物名称），即使用户使用其他语言提问。仅当英文查询无结果时，才尝试使用原语言术语作为备选。使用用户的语言进行回复。

When to Use This Skill

适用场景

Apply when users:

Ask about gene enrichment analysis (GO, KEGG, Reactome, etc.)
Have a gene list from differential expression, clustering, or any experiment
Want to know which biological processes, molecular functions, or cellular components are enriched
Need KEGG or Reactome pathway enrichment analysis
Ask about GSEA (Gene Set Enrichment Analysis) with ranked gene lists
Want over-representation analysis (ORA) with Fisher's exact test
Need multiple testing correction (Benjamini-Hochberg, Bonferroni)
Ask about enrichGO, gseapy, clusterProfiler-style analyses

NOT for (use other skills instead):

Network pharmacology / drug repurposing → Use
```
tooluniverse-network-pharmacology
```

Disease characterization → Use

tooluniverse-multiomic-disease-characterization

Single gene function lookup → Use
```
tooluniverse-disease-research
```
Spatial omics analysis → Use
```
tooluniverse-spatial-omics-analysis
```
Protein-protein interaction analysis only → Use
```
tooluniverse-protein-interactions
```

当用户有以下需求时使用本工具：

询问基因富集分析（GO、KEGG、Reactome等相关内容）
拥有差异表达、聚类或其他实验得到的基因列表
想了解哪些生物过程、分子功能或细胞组分被富集
需要KEGG或Reactome通路富集分析
询问带排名基因列表的GSEA（基因集富集分析）
需要基于Fisher精确检验的过表达分析（ORA）
需要多重检验校正（Benjamini-Hochberg、Bonferroni）
询问enrichGO、gseapy、clusterProfiler风格的分析

不适用场景（请使用其他工具）：

网络药理学/药物重定位 → 使用
```
tooluniverse-network-pharmacology
```

疾病特征分析 → 使用

tooluniverse-multiomic-disease-characterization

单个基因功能查询 → 使用
```
tooluniverse-disease-research
```
空间组学分析 → 使用
```
tooluniverse-spatial-omics-analysis
```
仅蛋白质相互作用分析 → 使用
```
tooluniverse-protein-interactions
```

Input Parameters

输入参数

Parameter	Required	Description	Example
gene_list	Yes	List of gene symbols, Ensembl IDs, or Entrez IDs	`["TP53", "BRCA1", "EGFR"]`
organism	No	Organism (default: human). Supported: human, mouse, rat, fly, worm, yeast, zebrafish	`human`
analysis_type	No	`ORA` (default) or `GSEA`	`ORA`
enrichment_databases	No	Which databases to query. Default: all applicable	`["GO_BP", "GO_MF", "GO_CC", "KEGG", "Reactome"]`
gene_id_type	No	Input ID type: `symbol` , `ensembl` , `entrez` , `uniprot` (auto-detected if omitted)	`symbol`
p_value_cutoff	No	Significance threshold (default: 0.05)	`0.05`
correction_method	No	Multiple testing: `BH` (Benjamini-Hochberg, default), `bonferroni` , `fdr`	`BH`
background_genes	No	Custom background gene set (default: genome-wide)	`["GENE1", "GENE2", ...]`
ranked_gene_list	No	For GSEA: gene-to-score mapping (e.g., log2FC)	`{"TP53": 2.5, "BRCA1": -1.3, ...}`

参数	是否必填	描述	示例
gene_list	是	基因符号、Ensembl ID或Entrez ID的列表	`["TP53", "BRCA1", "EGFR"]`
organism	否	生物种类（默认：人类）。支持：人类、小鼠、大鼠、果蝇、线虫、酵母、斑马鱼	`human`
analysis_type	否	分析类型： `ORA` （默认）或 `GSEA`	`ORA`
enrichment_databases	否	要查询的数据库。默认：所有适用数据库	`["GO_BP", "GO_MF", "GO_CC", "KEGG", "Reactome"]`
gene_id_type	否	输入ID类型： `symbol` 、 `ensembl` 、 `entrez` 、 `uniprot` （省略时自动检测）	`symbol`
p_value_cutoff	否	显著性阈值（默认：0.05）	`0.05`
correction_method	否	多重检验方法： `BH` （Benjamini-Hochberg，默认）、 `bonferroni` 、 `fdr`	`BH`
background_genes	否	自定义背景基因集（默认：全基因组）	`["GENE1", "GENE2", ...]`
ranked_gene_list	否	用于GSEA：基因与得分的映射（如log2FC）	`{"TP53": 2.5, "BRCA1": -1.3, ...}`

Core Principles

核心原则

Report-first approach - Create report file FIRST, then populate progressively
ID disambiguation FIRST - Detect and convert gene IDs before ANY enrichment
Multi-source validation - Run enrichment on at least 2 independent tools, cross-validate
Exact p-values - Report raw p-values AND adjusted p-values with correction method
Multiple testing correction - ALWAYS apply Benjamini-Hochberg unless user specifies otherwise
Gene set size filtering - Filter by min/max gene set size to avoid trivial/overly broad terms
Evidence grading - Grade enrichment sources T1-T4
Negative results documented - "No significant enrichment" is a valid finding
Source references - Every enrichment result must cite the tool/database/library used
Completeness checklist - Mandatory section at end showing analysis coverage

先报告原则 - 先创建报告文件，再逐步填充内容
先ID消歧 - 在进行任何富集分析前，先检测并转换基因ID
多源验证 - 至少使用2个独立工具进行富集分析，交叉验证结果
精确P值 - 报告原始P值和经过校正方法调整后的P值
多重检验校正 - 除非用户特别指定，否则始终应用Benjamini-Hochberg校正
基因集大小过滤 - 通过最小/最大基因集大小过滤，避免无意义或过于宽泛的术语
证据分级 - 对富集来源进行T1-T4分级
记录阴性结果 - "无显著富集"是有效的结论
来源引用 - 每个富集结果必须注明使用的工具/数据库/库
完整性检查清单 - 在报告末尾添加必填的分析覆盖情况部分

Decision Tree: ORA vs GSEA

决策树：ORA vs GSEA

Q: Do you have a ranked gene list (with scores/fold-changes)?
  YES → Use GSEA (gseapy.prerank)
        - Input: Gene-to-score mapping (e.g., log2FC)
        - Statistics: Running enrichment score, permutation test
        - Cutoff: FDR q-val < 0.25 (standard for GSEA)
        - Output: NES (Normalized Enrichment Score), lead genes
        See: references/gsea_workflow.md

  NO  → Use ORA (gseapy.enrichr)
        - Input: Gene list only
        - Statistics: Fisher's exact test, hypergeometric
        - Cutoff: Adjusted P-value < 0.05 (or user specified)
        - Output: P-value, adjusted P-value, overlap, odds ratio
        See: references/ora_workflow.md

Q: 你是否有带排名的基因列表（包含得分/倍数变化）？
  是 → 使用GSEA（gseapy.prerank）
        - 输入：基因与得分的映射（如log2FC）
        - 统计方法：运行富集得分、置换检验
        - 阈值：FDR q值 < 0.25（GSEA标准阈值）
        - 输出：NES（标准化富集得分）、核心基因
        参考：references/gsea_workflow.md

  否 → 使用ORA（gseapy.enrichr）
        - 输入：仅基因列表
        - 统计方法：Fisher精确检验、超几何检验
        - 阈值：校正后P值 < 0.05（或用户指定值）
        - 输出：P值、校正后P值、重叠基因、优势比
        参考：references/ora_workflow.md

Decision Tree: gseapy vs ToolUniverse Tools

决策树：gseapy vs ToolUniverse工具

Q: Which enrichment method should I use?

Primary Analysis (ALWAYS):
  ├─ gseapy.enrichr (ORA) OR gseapy.prerank (GSEA)
  │  - Most comprehensive (225+ Enrichr libraries)
  │  - GO (BP, MF, CC), KEGG, Reactome, WikiPathways, MSigDB
  │  - All organisms supported
  │  - Returns: P-value, Adjusted P-value, Overlap, Genes
  │  See: references/enrichr_guide.md

Cross-Validation (REQUIRED for publication):
  ├─ PANTHER_enrichment [T1 - curated]
  │  - Curated GO enrichment
  │  - Multiple organisms (taxonomy ID)
  │  - GO BP, MF, CC, PANTHER pathways, Reactome
  │
  ├─ STRING_functional_enrichment [T2 - validated]
  │  - Returns ALL categories in one call
  │  - Filter by category: Process, Function, Component, KEGG, Reactome
  │  - Network-based enrichment
  │
  └─ ReactomeAnalysis_pathway_enrichment [T1 - curated]
     - Reactome curated pathways
     - Cross-species projection
     - Detailed pathway hierarchy

Additional Context (Optional):
  ├─ GO_get_term_by_id, QuickGO_get_term_detail (GO term details)
  ├─ Reactome_get_pathway, Reactome_get_pathway_hierarchy (pathway context)
  ├─ WikiPathways_search, WikiPathways_get_pathway (community pathways)
  └─ STRING_ppi_enrichment (network topology analysis)

Q: 我应该使用哪种富集方法？

主要分析（必须执行）：
  ├─ gseapy.enrichr（ORA）或gseapy.prerank（GSEA）
  │  - 覆盖最全面（225+ Enrichr数据库）
  │  - 支持GO（BP、MF、CC）、KEGG、Reactome、WikiPathways、MSigDB
  │  - 支持所有生物种类
  │  - 返回结果：P值、校正后P值、重叠基因、基因列表
  │  参考：references/enrichr_guide.md

交叉验证（发表级结果必填）：
  ├─ PANTHER_enrichment [T1 - 人工整理]
  │  - 人工整理的GO富集数据库
  │  - 支持多种生物（分类学ID）
  │  - 支持GO BP、MF、CC、PANTHER通路、Reactome
  │
  ├─ STRING_functional_enrichment [T2 - 已验证]
  │  - 一次调用返回所有分类结果
  │  - 可按分类过滤：过程、功能、组分、KEGG、Reactome
  │  - 基于网络的富集分析
  │
  └─ ReactomeAnalysis_pathway_enrichment [T1 - 人工整理]
     - Reactome人工整理的通路
     - 跨物种映射
     - 详细的通路层级

额外补充（可选）：
  ├─ GO_get_term_by_id、QuickGO_get_term_detail（GO术语详情）
  ├─ Reactome_get_pathway、Reactome_get_pathway_hierarchy（通路背景信息）
  ├─ WikiPathways_search、WikiPathways_get_pathway（社区贡献通路）
  └─ STRING_ppi_enrichment（网络拓扑分析）

Quick Start Workflow

快速开始流程

Step 1: Create Report File (IMMEDIATE)

步骤1：立即创建报告文件

python

report_path = f"{analysis_name}_enrichment_report.md"

python

report_path = f"{analysis_name}_enrichment_report.md"

Write header with placeholder sections

写入带占位符章节的标题

Update progressively as analysis proceeds

分析过程中逐步更新内容

undefined

undefined

Step 2: ID Conversion and Validation

步骤2：ID转换与验证

python

from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()

python

from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()

Detect ID type

检测ID类型

gene_list = ["TP53", "BRCA1", "EGFR"]

Auto-detect: ENSG* = Ensembl, numeric = Entrez, pattern = UniProt, else = Symbol

自动检测：ENSG*开头为Ensembl ID，纯数字为Entrez ID，符合特定格式为UniProt ID，其余为基因符号

Convert if needed (Ensembl/Entrez → Symbol)

如有需要进行转换（Ensembl/Entrez → 基因符号）

result = tu.tools.MyGene_batch_query( gene_ids=gene_list, fields="symbol,entrezgene,ensembl.gene" )

Extract symbols from results

从结果中提取基因符号

Validate with STRING

使用STRING进行验证

mapped = tu.tools.STRING_map_identifiers( protein_ids=gene_symbols, species=9606 # human )

mapped = tu.tools.STRING_map_identifiers( protein_ids=gene_symbols, species=9606 # 人类 )

Use preferredName for canonical symbols

使用preferredName作为标准基因符号


**See**: references/id_conversion.md for complete examples


**参考**：references/id_conversion.md获取完整示例

Step 3: Primary Enrichment with gseapy

步骤3：使用gseapy进行主要富集分析

For ORA (gene list only):

python

import gseapy

针对ORA（仅基因列表）：

python

import gseapy

GO Biological Process

GO生物过程

go_bp = gseapy.enrichr( gene_list=gene_symbols, gene_sets='GO_Biological_Process_2021', organism='human', outdir=None, no_plot=True, background=background_genes # None = genome-wide ) go_bp_sig = go_bp.results[go_bp.results['Adjusted P-value'] < 0.05]


**For GSEA (ranked gene list)**:
```python
import pandas as pd

go_bp = gseapy.enrichr( gene_list=gene_symbols, gene_sets='GO_Biological_Process_2021', organism='human', outdir=None, no_plot=True, background=background_genes # None表示使用全基因组背景 ) go_bp_sig = go_bp.results[go_bp.results['Adjusted P-value'] < 0.05]


**针对GSEA（带排名基因列表）**：
```python
import pandas as pd

Ranked by log2FC

按log2FC排序

ranked_series = pd.Series(gene_to_score).sort_values(ascending=False)

gsea_result = gseapy.prerank( rnk=ranked_series, gene_sets='GO_Biological_Process_2021', outdir=None, no_plot=True, seed=42, min_size=5, max_size=500, permutation_num=1000 ) gsea_sig = gsea_result.res2d[gsea_result.res2d['FDR q-val'] < 0.25]


**See**:
- references/ora_workflow.md for complete ORA examples
- references/gsea_workflow.md for complete GSEA examples
- references/enrichr_guide.md for all 225+ libraries

ranked_series = pd.Series(gene_to_score).sort_values(ascending=False)


**参考**：
- references/ora_workflow.md获取完整的ORA示例（含所有数据库）
- references/gsea_workflow.md获取完整的GSEA流程示例
- references/enrichr_guide.md获取225+数据库的详情

Step 4: Cross-Validation with ToolUniverse

步骤4：使用ToolUniverse进行交叉验证

python

undefined

python

undefined

PANTHER [T1 - curated]

PANTHER [T1 - 人工整理]

panther_bp = tu.tools.PANTHER_enrichment( gene_list=','.join(gene_symbols), # comma-separated string organism=9606, annotation_dataset='GO:0008150' # biological_process )

panther_bp = tu.tools.PANTHER_enrichment( gene_list=','.join(gene_symbols), # 逗号分隔的字符串 organism=9606, annotation_dataset='GO:0008150' #生物过程 )

STRING [T2 - validated]

STRING [T2 - 已验证]

string_result = tu.tools.STRING_functional_enrichment( protein_ids=gene_symbols, species=9606 )

Filter by category: Process, Function, Component, KEGG, Reactome

按分类过滤：过程、功能、组分、KEGG、Reactome

Reactome [T1 - curated]

Reactome [T1 - 人工整理]

reactome_result = tu.tools.ReactomeAnalysis_pathway_enrichment( identifiers=' '.join(gene_symbols), # space-separated page_size=50, include_disease=True )


**See**: references/cross_validation.md for comparison strategies

reactome_result = tu.tools.ReactomeAnalysis_pathway_enrichment( identifiers=' '.join(gene_symbols), # 空格分隔 page_size=50, include_disease=True )


**参考**：references/cross_validation.md获取对比策略

Step 5: Report Compilation

步骤5：报告整理

markdown

undefined

markdown

undefined

Results

结果

GO Biological Process (Top 10)

GO生物过程（Top 10）

Term	P-value	Adj. P-value	Overlap	Genes	Evidence
regulation of cell cycle (GO:0051726)	1.2e-08	3.4e-06	12/45	TP53;BRCA1;...	[T2] gseapy

术语	P值	校正后P值	重叠基因数	基因列表	证据等级
细胞周期调控（GO:0051726）	1.2e-08	3.4e-06	12/45	TP53;BRCA1;...	[T2] gseapy

Cross-Validation

交叉验证结果

GO Term	gseapy FDR	PANTHER FDR	STRING FDR	Consensus
GO:0051726	3.4e-06	2.1e-05	1.8e-05	3/3 ✓

GO术语	gseapy FDR	PANTHER FDR	STRING FDR	一致性
GO:0051726	3.4e-06	2.1e-05	1.8e-05	3/3 ✓

Completeness Checklist

完整性检查清单

ID Conversion (MyGene, STRING) - 95% mapped
GO BP (gseapy, PANTHER, STRING) - 24 significant terms
GO MF (gseapy, PANTHER, STRING) - 18 significant terms
GO CC (gseapy, PANTHER, STRING) - 12 significant terms
KEGG (gseapy, STRING) - 8 significant pathways
Reactome (gseapy, ReactomeAPI) - 15 significant pathways
Cross-validation - 12 consensus terms (2+ sources)


**See**: scripts/format_enrichment_output.py for automated formatting

---

ID转换（MyGene、STRING）- 95%映射成功
GO BP（gseapy、PANTHER、STRING）- 24个显著术语
GO MF（gseapy、PANTHER、STRING）- 18个显著术语
GO CC（gseapy、PANTHER、STRING）- 12个显著术语
KEGG（gseapy、STRING）- 8个显著通路
Reactome（gseapy、ReactomeAPI）- 15个显著通路
交叉验证 - 12个一致术语（≥2个来源支持）


**参考**：scripts/format_enrichment_output.py获取自动化格式化脚本

---

Evidence Grading

证据分级

Tier	Symbol	Criteria	Examples
T1	[T1]	Curated/experimental enrichment	PANTHER, Reactome Analysis Service
T2	[T2]	Computational enrichment, well-validated	gseapy ORA/GSEA, STRING functional enrichment
T3	[T3]	Text-mining/predicted enrichment	Enrichr non-curated libraries
T4	[T4]	Single-source annotation	Individual gene GO annotations from QuickGO

等级	标识	标准	示例
T1	[T1]	人工整理/实验验证的富集结果	PANTHER、Reactome分析服务
T2	[T2]	计算富集结果，经过充分验证	gseapy ORA/GSEA、STRING功能富集
T3	[T3]	文本挖掘/预测的富集结果	Enrichr非人工整理数据库
T4	[T4]	单来源注释	QuickGO中的单个基因GO注释

Supported Organisms

支持的生物种类

Organism	Taxonomy ID	gseapy	PANTHER	STRING	Reactome
Human	9606	Yes	Yes	Yes	Yes
Mouse	10090	Yes ( `*_Mouse` )	Yes	Yes	Yes (projection)
Rat	10116	Limited	Yes	Yes	Yes (projection)
Fly	7227	Limited	Yes	Yes	Yes (projection)
Worm	6239	Limited	Yes	Yes	Yes (projection)
Yeast	4932	Limited	Yes	Yes	Yes

See: references/organism_support.md for organism-specific libraries

生物	分类学ID	gseapy	PANTHER	STRING	Reactome
人类	9606	是	是	是	是
小鼠	10090	是（需使用 `*_Mouse` 后缀）	是	是	是（支持映射）
大鼠	10116	有限支持	是	是	是（支持映射）
果蝇	7227	有限支持	是	是	是（支持映射）
线虫	6239	有限支持	是	是	是（支持映射）
酵母	4932	有限支持	是	是	是

参考：references/organism_support.md获取生物种类专属数据库信息

Common Patterns

常见模式

Pattern 1: Standard DEG Enrichment (ORA)

模式1：标准差异表达基因富集（ORA）

Input: List of differentially expressed gene symbols
Flow: ID validation → gseapy ORA (GO + KEGG + Reactome) →
      PANTHER + STRING cross-validation → Report top enriched terms
Use: When you have unranked gene list from DESeq2/edgeR

输入：差异表达基因符号列表
流程：ID验证 → gseapy ORA（GO + KEGG + Reactome）→
      PANTHER + STRING交叉验证 → 报告富集度最高的术语
适用场景：拥有来自DESeq2/edgeR的无排名基因列表时

Pattern 2: Ranked Gene List (GSEA)

模式2：带排名基因列表（GSEA）

Input: Gene-to-log2FC mapping from differential expression
Flow: Convert to ranked Series → gseapy GSEA (GO + KEGG + MSigDB) →
      Filter by FDR < 0.25 → Report NES and lead genes
Use: When you have fold-changes or other ranking metric

输入：差异表达得到的基因与log2FC映射关系
流程：转换为排名序列 → gseapy GSEA（GO + KEGG + MSigDB）→
      按FDR < 0.25过滤 → 报告NES和核心基因
适用场景：拥有倍数变化或其他排名指标时

Pattern 3: BixBench Enrichment Question

模式3：特定富集问题查询

Input: Specific question about enrichment (e.g., "What is the adjusted p-val for neutrophil activation?")
Flow: Parse question for gene list and library → Run gseapy with exact library →
      Find specific term → Report exact p-value and adjusted p-value
Use: When answering targeted questions about specific terms

输入：关于富集的特定问题（如“中性粒细胞激活的校正后P值是多少？”）
流程：解析问题中的基因列表和数据库 → 使用gseapy调用指定数据库 →
      找到特定术语 → 报告精确P值和校正后P值
适用场景：回答关于特定术语的靶向问题时

Pattern 4: Multi-Organism Enrichment

模式4：多生物富集分析

Input: Gene list from mouse experiment
Flow: Use organism='mouse' for gseapy → organism=10090 for PANTHER/STRING →
      projection=True for Reactome human pathway mapping
Use: When working with non-human organisms

See: references/common_patterns.md for more examples

输入：来自小鼠实验的基因列表
流程：gseapy中使用organism='mouse' → PANTHER/STRING中使用organism=10090 →
      Reactome中启用projection=True进行人类通路映射
适用场景：处理非人类生物的基因数据时

参考：references/common_patterns.md获取更多示例

Troubleshooting

故障排除

"No significant enrichment found":

Verify gene symbols are valid (STRING_map_identifiers)
Try different library versions (2021 vs 2023 vs 2025)
Try relaxing significance cutoff or use GSEA instead

"Gene not found" errors:

Check ID type and convert using MyGene_batch_query
Remove version suffixes from Ensembl IDs (ENSG00000141510.16 → ENSG00000141510)

"STRING returns all categories":

This is expected; filter by
```
d['category'] == 'Process'
```
after receiving results

See: references/troubleshooting.md for complete guide

“未发现显著富集”：

验证基因符号是否有效（使用STRING_map_identifiers）
尝试不同版本的数据库（2021 vs 2023 vs 2025）
尝试放宽显著性阈值或改用GSEA方法

“未找到基因”错误：

检查ID类型，使用MyGene_batch_query进行转换
移除Ensembl ID的版本后缀（如ENSG00000141510.16 → ENSG00000141510）

“STRING返回所有分类结果”：

这是正常现象；收到结果后可通过
```
d['category'] == 'Process'
```
进行过滤

参考：references/troubleshooting.md获取完整指南

Tool Reference

工具参考

Primary Enrichment Tools

主要富集工具

Tool	Input	Output	Use For
`gseapy.enrichr()`	gene_list, gene_sets, organism	`.results` DataFrame	ORA with 225+ libraries
`gseapy.prerank()`	rnk (ranked Series), gene_sets	`.res2d` DataFrame	GSEA analysis

工具	输入	输出	用途
`gseapy.enrichr()`	gene_list、gene_sets、organism	`.results` 数据框	使用225+数据库进行ORA分析
`gseapy.prerank()`	rnk（排名序列）、gene_sets	`.res2d` 数据框	GSEA分析

Cross-Validation Tools

交叉验证工具

Tool	Key Parameters	Evidence Grade
`PANTHER_enrichment`	gene_list (comma-sep), organism, annotation_dataset	[T1]
`STRING_functional_enrichment`	protein_ids, species	[T2]
`ReactomeAnalysis_pathway_enrichment`	identifiers (space-sep), page_size	[T1]

工具	关键参数	证据等级
`PANTHER_enrichment`	gene_list（逗号分隔）、organism、annotation_dataset	[T1]
`STRING_functional_enrichment`	protein_ids、species	[T2]
`ReactomeAnalysis_pathway_enrichment`	identifiers（空格分隔）、page_size	[T1]

ID Conversion Tools

ID转换工具

Tool	Input	Output
`MyGene_batch_query`	gene_ids, fields	Symbol, Entrez, Ensembl mappings
`STRING_map_identifiers`	protein_ids, species	Preferred names, STRING IDs

See: references/tool_parameters.md for complete parameter documentation

工具	输入	输出
`MyGene_batch_query`	gene_ids、fields	基因符号、Entrez、Ensembl的映射关系
`STRING_map_identifiers`	protein_ids、species	标准名称、STRING ID

参考：references/tool_parameters.md获取完整参数文档

Detailed Documentation

详细文档

All detailed examples, code blocks, and advanced topics have been moved to

references/

references/ora_workflow.md - Complete ORA examples with all databases
references/gsea_workflow.md - Complete GSEA workflow with ranked lists
references/enrichr_guide.md - All 225+ Enrichr libraries and usage
references/cross_validation.md - Multi-source validation strategies
references/id_conversion.md - Gene ID disambiguation and conversion
references/tool_parameters.md - Complete tool parameter reference
references/organism_support.md - Organism-specific configurations
references/common_patterns.md - Detailed use case examples
references/troubleshooting.md - Complete troubleshooting guide
references/multiple_testing.md - Correction methods (BH, Bonferroni, BY)
references/report_template.md - Standard report format

Helper scripts:

scripts/format_enrichment_output.py - Format results for reports
scripts/compare_enrichment_sources.py - Cross-validation analysis
scripts/filter_by_gene_set_size.py - Filter terms by size

所有详细示例、代码块和进阶内容已移至

references/

references/ora_workflow.md - 全数据库的完整ORA示例
references/gsea_workflow.md - 带排名列表的完整GSEA流程
references/enrichr_guide.md - 225+ Enrichr数据库及使用方法
references/cross_validation.md - 多源验证策略
references/id_conversion.md - 基因ID消歧与转换
references/tool_parameters.md - 完整工具参数参考
references/organism_support.md - 生物种类专属配置
references/common_patterns.md - 详细用例示例
references/troubleshooting.md - 完整故障排除指南
references/multiple_testing.md - 校正方法（BH、Bonferroni、BY）
references/report_template.md - 标准报告格式

辅助脚本：

scripts/format_enrichment_output.py - 结果格式化脚本（用于报告）
scripts/compare_enrichment_sources.py - 交叉验证分析脚本
scripts/filter_by_gene_set_size.py - 按基因集大小过滤术语脚本