tooluniverse-gene-enrichment

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Gene Enrichment and Pathway Analysis

基因富集与通路分析

Perform comprehensive gene enrichment analysis including Gene Ontology (GO), KEGG, Reactome, WikiPathways, and MSigDB enrichment using both Over-Representation Analysis (ORA) and Gene Set Enrichment Analysis (GSEA). Integrates local computation via gseapy with ToolUniverse pathway databases for cross-validated, publication-ready results.
IMPORTANT: Always use English terms in tool calls (gene names, pathway names, organism names), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.

使用过表达分析(ORA)和基因集富集分析(GSEA)两种方法,进行包括基因本体(GO)、KEGG、Reactome、WikiPathways和MSigDB在内的全面基因富集分析。通过gseapy进行本地计算,并结合ToolUniverse通路数据库,可得到经过交叉验证、可用于发表的结果。
重要提示:在调用工具时始终使用英文术语(基因名称、通路名称、生物名称),即使用户使用其他语言提问。仅当英文查询无结果时,才尝试使用原语言术语作为备选。使用用户的语言进行回复。

When to Use This Skill

适用场景

Apply when users:
  • Ask about gene enrichment analysis (GO, KEGG, Reactome, etc.)
  • Have a gene list from differential expression, clustering, or any experiment
  • Want to know which biological processes, molecular functions, or cellular components are enriched
  • Need KEGG or Reactome pathway enrichment analysis
  • Ask about GSEA (Gene Set Enrichment Analysis) with ranked gene lists
  • Want over-representation analysis (ORA) with Fisher's exact test
  • Need multiple testing correction (Benjamini-Hochberg, Bonferroni)
  • Ask about enrichGO, gseapy, clusterProfiler-style analyses
NOT for (use other skills instead):
  • Network pharmacology / drug repurposing → Use
    tooluniverse-network-pharmacology
  • Disease characterization → Use
    tooluniverse-multiomic-disease-characterization
  • Single gene function lookup → Use
    tooluniverse-disease-research
  • Spatial omics analysis → Use
    tooluniverse-spatial-omics-analysis
  • Protein-protein interaction analysis only → Use
    tooluniverse-protein-interactions

当用户有以下需求时使用本工具:
  • 询问基因富集分析(GO、KEGG、Reactome等相关内容)
  • 拥有差异表达、聚类或其他实验得到的基因列表
  • 想了解哪些生物过程、分子功能或细胞组分被富集
  • 需要KEGG或Reactome通路富集分析
  • 询问带排名基因列表的GSEA(基因集富集分析)
  • 需要基于Fisher精确检验的过表达分析(ORA)
  • 需要多重检验校正(Benjamini-Hochberg、Bonferroni)
  • 询问enrichGO、gseapy、clusterProfiler风格的分析
不适用场景(请使用其他工具):
  • 网络药理学/药物重定位 → 使用
    tooluniverse-network-pharmacology
  • 疾病特征分析 → 使用
    tooluniverse-multiomic-disease-characterization
  • 单个基因功能查询 → 使用
    tooluniverse-disease-research
  • 空间组学分析 → 使用
    tooluniverse-spatial-omics-analysis
  • 仅蛋白质相互作用分析 → 使用
    tooluniverse-protein-interactions

Input Parameters

输入参数

ParameterRequiredDescriptionExample
gene_listYesList of gene symbols, Ensembl IDs, or Entrez IDs
["TP53", "BRCA1", "EGFR"]
organismNoOrganism (default: human). Supported: human, mouse, rat, fly, worm, yeast, zebrafish
human
analysis_typeNo
ORA
(default) or
GSEA
ORA
enrichment_databasesNoWhich databases to query. Default: all applicable
["GO_BP", "GO_MF", "GO_CC", "KEGG", "Reactome"]
gene_id_typeNoInput ID type:
symbol
,
ensembl
,
entrez
,
uniprot
(auto-detected if omitted)
symbol
p_value_cutoffNoSignificance threshold (default: 0.05)
0.05
correction_methodNoMultiple testing:
BH
(Benjamini-Hochberg, default),
bonferroni
,
fdr
BH
background_genesNoCustom background gene set (default: genome-wide)
["GENE1", "GENE2", ...]
ranked_gene_listNoFor GSEA: gene-to-score mapping (e.g., log2FC)
{"TP53": 2.5, "BRCA1": -1.3, ...}

参数是否必填描述示例
gene_list基因符号、Ensembl ID或Entrez ID的列表
["TP53", "BRCA1", "EGFR"]
organism生物种类(默认:人类)。支持:人类、小鼠、大鼠、果蝇、线虫、酵母、斑马鱼
human
analysis_type分析类型:
ORA
(默认)或
GSEA
ORA
enrichment_databases要查询的数据库。默认:所有适用数据库
["GO_BP", "GO_MF", "GO_CC", "KEGG", "Reactome"]
gene_id_type输入ID类型:
symbol
ensembl
entrez
uniprot
(省略时自动检测)
symbol
p_value_cutoff显著性阈值(默认:0.05)
0.05
correction_method多重检验方法:
BH
(Benjamini-Hochberg,默认)、
bonferroni
fdr
BH
background_genes自定义背景基因集(默认:全基因组)
["GENE1", "GENE2", ...]
ranked_gene_list用于GSEA:基因与得分的映射(如log2FC)
{"TP53": 2.5, "BRCA1": -1.3, ...}

Core Principles

核心原则

  1. Report-first approach - Create report file FIRST, then populate progressively
  2. ID disambiguation FIRST - Detect and convert gene IDs before ANY enrichment
  3. Multi-source validation - Run enrichment on at least 2 independent tools, cross-validate
  4. Exact p-values - Report raw p-values AND adjusted p-values with correction method
  5. Multiple testing correction - ALWAYS apply Benjamini-Hochberg unless user specifies otherwise
  6. Gene set size filtering - Filter by min/max gene set size to avoid trivial/overly broad terms
  7. Evidence grading - Grade enrichment sources T1-T4
  8. Negative results documented - "No significant enrichment" is a valid finding
  9. Source references - Every enrichment result must cite the tool/database/library used
  10. Completeness checklist - Mandatory section at end showing analysis coverage

  1. 先报告原则 - 先创建报告文件,再逐步填充内容
  2. 先ID消歧 - 在进行任何富集分析前,先检测并转换基因ID
  3. 多源验证 - 至少使用2个独立工具进行富集分析,交叉验证结果
  4. 精确P值 - 报告原始P值和经过校正方法调整后的P值
  5. 多重检验校正 - 除非用户特别指定,否则始终应用Benjamini-Hochberg校正
  6. 基因集大小过滤 - 通过最小/最大基因集大小过滤,避免无意义或过于宽泛的术语
  7. 证据分级 - 对富集来源进行T1-T4分级
  8. 记录阴性结果 - "无显著富集"是有效的结论
  9. 来源引用 - 每个富集结果必须注明使用的工具/数据库/库
  10. 完整性检查清单 - 在报告末尾添加必填的分析覆盖情况部分

Decision Tree: ORA vs GSEA

决策树:ORA vs GSEA

Q: Do you have a ranked gene list (with scores/fold-changes)?
  YES → Use GSEA (gseapy.prerank)
        - Input: Gene-to-score mapping (e.g., log2FC)
        - Statistics: Running enrichment score, permutation test
        - Cutoff: FDR q-val < 0.25 (standard for GSEA)
        - Output: NES (Normalized Enrichment Score), lead genes
        See: references/gsea_workflow.md

  NO  → Use ORA (gseapy.enrichr)
        - Input: Gene list only
        - Statistics: Fisher's exact test, hypergeometric
        - Cutoff: Adjusted P-value < 0.05 (or user specified)
        - Output: P-value, adjusted P-value, overlap, odds ratio
        See: references/ora_workflow.md

Q: 你是否有带排名的基因列表(包含得分/倍数变化)?
  是 → 使用GSEA(gseapy.prerank)
        - 输入:基因与得分的映射(如log2FC)
        - 统计方法:运行富集得分、置换检验
        - 阈值:FDR q值 < 0.25(GSEA标准阈值)
        - 输出:NES(标准化富集得分)、核心基因
        参考:references/gsea_workflow.md

  否 → 使用ORA(gseapy.enrichr)
        - 输入:仅基因列表
        - 统计方法:Fisher精确检验、超几何检验
        - 阈值:校正后P值 < 0.05(或用户指定值)
        - 输出:P值、校正后P值、重叠基因、优势比
        参考:references/ora_workflow.md

Decision Tree: gseapy vs ToolUniverse Tools

决策树:gseapy vs ToolUniverse工具

Q: Which enrichment method should I use?

Primary Analysis (ALWAYS):
  ├─ gseapy.enrichr (ORA) OR gseapy.prerank (GSEA)
  │  - Most comprehensive (225+ Enrichr libraries)
  │  - GO (BP, MF, CC), KEGG, Reactome, WikiPathways, MSigDB
  │  - All organisms supported
  │  - Returns: P-value, Adjusted P-value, Overlap, Genes
  │  See: references/enrichr_guide.md

Cross-Validation (REQUIRED for publication):
  ├─ PANTHER_enrichment [T1 - curated]
  │  - Curated GO enrichment
  │  - Multiple organisms (taxonomy ID)
  │  - GO BP, MF, CC, PANTHER pathways, Reactome
  ├─ STRING_functional_enrichment [T2 - validated]
  │  - Returns ALL categories in one call
  │  - Filter by category: Process, Function, Component, KEGG, Reactome
  │  - Network-based enrichment
  └─ ReactomeAnalysis_pathway_enrichment [T1 - curated]
     - Reactome curated pathways
     - Cross-species projection
     - Detailed pathway hierarchy

Additional Context (Optional):
  ├─ GO_get_term_by_id, QuickGO_get_term_detail (GO term details)
  ├─ Reactome_get_pathway, Reactome_get_pathway_hierarchy (pathway context)
  ├─ WikiPathways_search, WikiPathways_get_pathway (community pathways)
  └─ STRING_ppi_enrichment (network topology analysis)

Q: 我应该使用哪种富集方法?

主要分析(必须执行):
  ├─ gseapy.enrichr(ORA)或gseapy.prerank(GSEA)
  │  - 覆盖最全面(225+ Enrichr数据库)
  │  - 支持GO(BP、MF、CC)、KEGG、Reactome、WikiPathways、MSigDB
  │  - 支持所有生物种类
  │  - 返回结果:P值、校正后P值、重叠基因、基因列表
  │  参考:references/enrichr_guide.md

交叉验证(发表级结果必填):
  ├─ PANTHER_enrichment [T1 - 人工整理]
  │  - 人工整理的GO富集数据库
  │  - 支持多种生物(分类学ID)
  │  - 支持GO BP、MF、CC、PANTHER通路、Reactome
  ├─ STRING_functional_enrichment [T2 - 已验证]
  │  - 一次调用返回所有分类结果
  │  - 可按分类过滤:过程、功能、组分、KEGG、Reactome
  │  - 基于网络的富集分析
  └─ ReactomeAnalysis_pathway_enrichment [T1 - 人工整理]
     - Reactome人工整理的通路
     - 跨物种映射
     - 详细的通路层级

额外补充(可选):
  ├─ GO_get_term_by_id、QuickGO_get_term_detail(GO术语详情)
  ├─ Reactome_get_pathway、Reactome_get_pathway_hierarchy(通路背景信息)
  ├─ WikiPathways_search、WikiPathways_get_pathway(社区贡献通路)
  └─ STRING_ppi_enrichment(网络拓扑分析)

Quick Start Workflow

快速开始流程

Step 1: Create Report File (IMMEDIATE)

步骤1:立即创建报告文件

python
report_path = f"{analysis_name}_enrichment_report.md"
python
report_path = f"{analysis_name}_enrichment_report.md"

Write header with placeholder sections

写入带占位符章节的标题

Update progressively as analysis proceeds

分析过程中逐步更新内容

undefined
undefined

Step 2: ID Conversion and Validation

步骤2:ID转换与验证

python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()

Detect ID type

检测ID类型

gene_list = ["TP53", "BRCA1", "EGFR"]
gene_list = ["TP53", "BRCA1", "EGFR"]

Auto-detect: ENSG* = Ensembl, numeric = Entrez, pattern = UniProt, else = Symbol

自动检测:ENSG*开头为Ensembl ID,纯数字为Entrez ID,符合特定格式为UniProt ID,其余为基因符号

Convert if needed (Ensembl/Entrez → Symbol)

如有需要进行转换(Ensembl/Entrez → 基因符号)

result = tu.tools.MyGene_batch_query( gene_ids=gene_list, fields="symbol,entrezgene,ensembl.gene" )
result = tu.tools.MyGene_batch_query( gene_ids=gene_list, fields="symbol,entrezgene,ensembl.gene" )

Extract symbols from results

从结果中提取基因符号

Validate with STRING

使用STRING进行验证

mapped = tu.tools.STRING_map_identifiers( protein_ids=gene_symbols, species=9606 # human )
mapped = tu.tools.STRING_map_identifiers( protein_ids=gene_symbols, species=9606 # 人类 )

Use preferredName for canonical symbols

使用preferredName作为标准基因符号


**See**: references/id_conversion.md for complete examples

**参考**:references/id_conversion.md获取完整示例

Step 3: Primary Enrichment with gseapy

步骤3:使用gseapy进行主要富集分析

For ORA (gene list only):
python
import gseapy
针对ORA(仅基因列表)
python
import gseapy

GO Biological Process

GO生物过程

go_bp = gseapy.enrichr( gene_list=gene_symbols, gene_sets='GO_Biological_Process_2021', organism='human', outdir=None, no_plot=True, background=background_genes # None = genome-wide ) go_bp_sig = go_bp.results[go_bp.results['Adjusted P-value'] < 0.05]

**For GSEA (ranked gene list)**:
```python
import pandas as pd
go_bp = gseapy.enrichr( gene_list=gene_symbols, gene_sets='GO_Biological_Process_2021', organism='human', outdir=None, no_plot=True, background=background_genes # None表示使用全基因组背景 ) go_bp_sig = go_bp.results[go_bp.results['Adjusted P-value'] < 0.05]

**针对GSEA(带排名基因列表)**:
```python
import pandas as pd

Ranked by log2FC

按log2FC排序

ranked_series = pd.Series(gene_to_score).sort_values(ascending=False)
gsea_result = gseapy.prerank( rnk=ranked_series, gene_sets='GO_Biological_Process_2021', outdir=None, no_plot=True, seed=42, min_size=5, max_size=500, permutation_num=1000 ) gsea_sig = gsea_result.res2d[gsea_result.res2d['FDR q-val'] < 0.25]

**See**:
- references/ora_workflow.md for complete ORA examples
- references/gsea_workflow.md for complete GSEA examples
- references/enrichr_guide.md for all 225+ libraries
ranked_series = pd.Series(gene_to_score).sort_values(ascending=False)
gsea_result = gseapy.prerank( rnk=ranked_series, gene_sets='GO_Biological_Process_2021', outdir=None, no_plot=True, seed=42, min_size=5, max_size=500, permutation_num=1000 ) gsea_sig = gsea_result.res2d[gsea_result.res2d['FDR q-val'] < 0.25]

**参考**:
- references/ora_workflow.md获取完整的ORA示例(含所有数据库)
- references/gsea_workflow.md获取完整的GSEA流程示例
- references/enrichr_guide.md获取225+数据库的详情

Step 4: Cross-Validation with ToolUniverse

步骤4:使用ToolUniverse进行交叉验证

python
undefined
python
undefined

PANTHER [T1 - curated]

PANTHER [T1 - 人工整理]

panther_bp = tu.tools.PANTHER_enrichment( gene_list=','.join(gene_symbols), # comma-separated string organism=9606, annotation_dataset='GO:0008150' # biological_process )
panther_bp = tu.tools.PANTHER_enrichment( gene_list=','.join(gene_symbols), # 逗号分隔的字符串 organism=9606, annotation_dataset='GO:0008150' #生物过程 )

STRING [T2 - validated]

STRING [T2 - 已验证]

string_result = tu.tools.STRING_functional_enrichment( protein_ids=gene_symbols, species=9606 )
string_result = tu.tools.STRING_functional_enrichment( protein_ids=gene_symbols, species=9606 )

Filter by category: Process, Function, Component, KEGG, Reactome

按分类过滤:过程、功能、组分、KEGG、Reactome

Reactome [T1 - curated]

Reactome [T1 - 人工整理]

reactome_result = tu.tools.ReactomeAnalysis_pathway_enrichment( identifiers=' '.join(gene_symbols), # space-separated page_size=50, include_disease=True )

**See**: references/cross_validation.md for comparison strategies
reactome_result = tu.tools.ReactomeAnalysis_pathway_enrichment( identifiers=' '.join(gene_symbols), # 空格分隔 page_size=50, include_disease=True )

**参考**:references/cross_validation.md获取对比策略

Step 5: Report Compilation

步骤5:报告整理

markdown
undefined
markdown
undefined

Results

结果

GO Biological Process (Top 10)

GO生物过程(Top 10)

TermP-valueAdj. P-valueOverlapGenesEvidence
regulation of cell cycle (GO:0051726)1.2e-083.4e-0612/45TP53;BRCA1;...[T2] gseapy
术语P值校正后P值重叠基因数基因列表证据等级
细胞周期调控(GO:0051726)1.2e-083.4e-0612/45TP53;BRCA1;...[T2] gseapy

Cross-Validation

交叉验证结果

GO Termgseapy FDRPANTHER FDRSTRING FDRConsensus
GO:00517263.4e-062.1e-051.8e-053/3 ✓
GO术语gseapy FDRPANTHER FDRSTRING FDR一致性
GO:00517263.4e-062.1e-051.8e-053/3 ✓

Completeness Checklist

完整性检查清单

  • ID Conversion (MyGene, STRING) - 95% mapped
  • GO BP (gseapy, PANTHER, STRING) - 24 significant terms
  • GO MF (gseapy, PANTHER, STRING) - 18 significant terms
  • GO CC (gseapy, PANTHER, STRING) - 12 significant terms
  • KEGG (gseapy, STRING) - 8 significant pathways
  • Reactome (gseapy, ReactomeAPI) - 15 significant pathways
  • Cross-validation - 12 consensus terms (2+ sources)

**See**: scripts/format_enrichment_output.py for automated formatting

---
  • ID转换(MyGene、STRING)- 95%映射成功
  • GO BP(gseapy、PANTHER、STRING)- 24个显著术语
  • GO MF(gseapy、PANTHER、STRING)- 18个显著术语
  • GO CC(gseapy、PANTHER、STRING)- 12个显著术语
  • KEGG(gseapy、STRING)- 8个显著通路
  • Reactome(gseapy、ReactomeAPI)- 15个显著通路
  • 交叉验证 - 12个一致术语(≥2个来源支持)

**参考**:scripts/format_enrichment_output.py获取自动化格式化脚本

---

Evidence Grading

证据分级

TierSymbolCriteriaExamples
T1[T1]Curated/experimental enrichmentPANTHER, Reactome Analysis Service
T2[T2]Computational enrichment, well-validatedgseapy ORA/GSEA, STRING functional enrichment
T3[T3]Text-mining/predicted enrichmentEnrichr non-curated libraries
T4[T4]Single-source annotationIndividual gene GO annotations from QuickGO

等级标识标准示例
T1[T1]人工整理/实验验证的富集结果PANTHER、Reactome分析服务
T2[T2]计算富集结果,经过充分验证gseapy ORA/GSEA、STRING功能富集
T3[T3]文本挖掘/预测的富集结果Enrichr非人工整理数据库
T4[T4]单来源注释QuickGO中的单个基因GO注释

Supported Organisms

支持的生物种类

OrganismTaxonomy IDgseapyPANTHERSTRINGReactome
Human9606YesYesYesYes
Mouse10090Yes (
*_Mouse
)
YesYesYes (projection)
Rat10116LimitedYesYesYes (projection)
Fly7227LimitedYesYesYes (projection)
Worm6239LimitedYesYesYes (projection)
Yeast4932LimitedYesYesYes
See: references/organism_support.md for organism-specific libraries

生物分类学IDgseapyPANTHERSTRINGReactome
人类9606
小鼠10090是(需使用
*_Mouse
后缀)
是(支持映射)
大鼠10116有限支持是(支持映射)
果蝇7227有限支持是(支持映射)
线虫6239有限支持是(支持映射)
酵母4932有限支持
参考:references/organism_support.md获取生物种类专属数据库信息

Common Patterns

常见模式

Pattern 1: Standard DEG Enrichment (ORA)

模式1:标准差异表达基因富集(ORA)

Input: List of differentially expressed gene symbols
Flow: ID validation → gseapy ORA (GO + KEGG + Reactome) →
      PANTHER + STRING cross-validation → Report top enriched terms
Use: When you have unranked gene list from DESeq2/edgeR
输入:差异表达基因符号列表
流程:ID验证 → gseapy ORA(GO + KEGG + Reactome)→
      PANTHER + STRING交叉验证 → 报告富集度最高的术语
适用场景:拥有来自DESeq2/edgeR的无排名基因列表时

Pattern 2: Ranked Gene List (GSEA)

模式2:带排名基因列表(GSEA)

Input: Gene-to-log2FC mapping from differential expression
Flow: Convert to ranked Series → gseapy GSEA (GO + KEGG + MSigDB) →
      Filter by FDR < 0.25 → Report NES and lead genes
Use: When you have fold-changes or other ranking metric
输入:差异表达得到的基因与log2FC映射关系
流程:转换为排名序列 → gseapy GSEA(GO + KEGG + MSigDB)→
      按FDR < 0.25过滤 → 报告NES和核心基因
适用场景:拥有倍数变化或其他排名指标时

Pattern 3: BixBench Enrichment Question

模式3:特定富集问题查询

Input: Specific question about enrichment (e.g., "What is the adjusted p-val for neutrophil activation?")
Flow: Parse question for gene list and library → Run gseapy with exact library →
      Find specific term → Report exact p-value and adjusted p-value
Use: When answering targeted questions about specific terms
输入:关于富集的特定问题(如“中性粒细胞激活的校正后P值是多少?”)
流程:解析问题中的基因列表和数据库 → 使用gseapy调用指定数据库 →
      找到特定术语 → 报告精确P值和校正后P值
适用场景:回答关于特定术语的靶向问题时

Pattern 4: Multi-Organism Enrichment

模式4:多生物富集分析

Input: Gene list from mouse experiment
Flow: Use organism='mouse' for gseapy → organism=10090 for PANTHER/STRING →
      projection=True for Reactome human pathway mapping
Use: When working with non-human organisms
See: references/common_patterns.md for more examples

输入:来自小鼠实验的基因列表
流程:gseapy中使用organism='mouse' → PANTHER/STRING中使用organism=10090 →
      Reactome中启用projection=True进行人类通路映射
适用场景:处理非人类生物的基因数据时
参考:references/common_patterns.md获取更多示例

Troubleshooting

故障排除

"No significant enrichment found":
  • Verify gene symbols are valid (STRING_map_identifiers)
  • Try different library versions (2021 vs 2023 vs 2025)
  • Try relaxing significance cutoff or use GSEA instead
"Gene not found" errors:
  • Check ID type and convert using MyGene_batch_query
  • Remove version suffixes from Ensembl IDs (ENSG00000141510.16 → ENSG00000141510)
"STRING returns all categories":
  • This is expected; filter by
    d['category'] == 'Process'
    after receiving results
See: references/troubleshooting.md for complete guide

“未发现显著富集”
  • 验证基因符号是否有效(使用STRING_map_identifiers)
  • 尝试不同版本的数据库(2021 vs 2023 vs 2025)
  • 尝试放宽显著性阈值或改用GSEA方法
“未找到基因”错误
  • 检查ID类型,使用MyGene_batch_query进行转换
  • 移除Ensembl ID的版本后缀(如ENSG00000141510.16 → ENSG00000141510)
“STRING返回所有分类结果”
  • 这是正常现象;收到结果后可通过
    d['category'] == 'Process'
    进行过滤
参考:references/troubleshooting.md获取完整指南

Tool Reference

工具参考

Primary Enrichment Tools

主要富集工具

ToolInputOutputUse For
gseapy.enrichr()
gene_list, gene_sets, organism
.results
DataFrame
ORA with 225+ libraries
gseapy.prerank()
rnk (ranked Series), gene_sets
.res2d
DataFrame
GSEA analysis
工具输入输出用途
gseapy.enrichr()
gene_list、gene_sets、organism
.results
数据框
使用225+数据库进行ORA分析
gseapy.prerank()
rnk(排名序列)、gene_sets
.res2d
数据框
GSEA分析

Cross-Validation Tools

交叉验证工具

ToolKey ParametersEvidence Grade
PANTHER_enrichment
gene_list (comma-sep), organism, annotation_dataset[T1]
STRING_functional_enrichment
protein_ids, species[T2]
ReactomeAnalysis_pathway_enrichment
identifiers (space-sep), page_size[T1]
工具关键参数证据等级
PANTHER_enrichment
gene_list(逗号分隔)、organism、annotation_dataset[T1]
STRING_functional_enrichment
protein_ids、species[T2]
ReactomeAnalysis_pathway_enrichment
identifiers(空格分隔)、page_size[T1]

ID Conversion Tools

ID转换工具

ToolInputOutput
MyGene_batch_query
gene_ids, fieldsSymbol, Entrez, Ensembl mappings
STRING_map_identifiers
protein_ids, speciesPreferred names, STRING IDs
See: references/tool_parameters.md for complete parameter documentation

工具输入输出
MyGene_batch_query
gene_ids、fields基因符号、Entrez、Ensembl的映射关系
STRING_map_identifiers
protein_ids、species标准名称、STRING ID
参考:references/tool_parameters.md获取完整参数文档

Detailed Documentation

详细文档

All detailed examples, code blocks, and advanced topics have been moved to
references/
:
  • references/ora_workflow.md - Complete ORA examples with all databases
  • references/gsea_workflow.md - Complete GSEA workflow with ranked lists
  • references/enrichr_guide.md - All 225+ Enrichr libraries and usage
  • references/cross_validation.md - Multi-source validation strategies
  • references/id_conversion.md - Gene ID disambiguation and conversion
  • references/tool_parameters.md - Complete tool parameter reference
  • references/organism_support.md - Organism-specific configurations
  • references/common_patterns.md - Detailed use case examples
  • references/troubleshooting.md - Complete troubleshooting guide
  • references/multiple_testing.md - Correction methods (BH, Bonferroni, BY)
  • references/report_template.md - Standard report format
Helper scripts:
  • scripts/format_enrichment_output.py - Format results for reports
  • scripts/compare_enrichment_sources.py - Cross-validation analysis
  • scripts/filter_by_gene_set_size.py - Filter terms by size

所有详细示例、代码块和进阶内容已移至
references/
目录:
  • references/ora_workflow.md - 全数据库的完整ORA示例
  • references/gsea_workflow.md - 带排名列表的完整GSEA流程
  • references/enrichr_guide.md - 225+ Enrichr数据库及使用方法
  • references/cross_validation.md - 多源验证策略
  • references/id_conversion.md - 基因ID消歧与转换
  • references/tool_parameters.md - 完整工具参数参考
  • references/organism_support.md - 生物种类专属配置
  • references/common_patterns.md - 详细用例示例
  • references/troubleshooting.md - 完整故障排除指南
  • references/multiple_testing.md - 校正方法(BH、Bonferroni、BY)
  • references/report_template.md - 标准报告格式
辅助脚本:
  • scripts/format_enrichment_output.py - 结果格式化脚本(用于报告)
  • scripts/compare_enrichment_sources.py - 交叉验证分析脚本
  • scripts/filter_by_gene_set_size.py - 按基因集大小过滤术语脚本

Resources

相关资源

For network-level analysis: tooluniverse-network-pharmacology For disease characterization: tooluniverse-multiomic-disease-characterization For spatial omics: tooluniverse-spatial-omics-analysis For protein interactions: tooluniverse-protein-interactions
如需进行网络层面分析:tooluniverse-network-pharmacology 如需进行疾病特征分析:tooluniverse-multiomic-disease-characterization 如需进行空间组学分析:tooluniverse-spatial-omics-analysis 如需进行蛋白质相互作用分析:tooluniverse-protein-interactions