tooluniverse-gene-enrichment
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGene Enrichment and Pathway Analysis
基因富集与通路分析
Perform comprehensive gene enrichment analysis including Gene Ontology (GO), KEGG, Reactome, WikiPathways, and MSigDB enrichment using both Over-Representation Analysis (ORA) and Gene Set Enrichment Analysis (GSEA). Integrates local computation via gseapy with ToolUniverse pathway databases for cross-validated, publication-ready results.
IMPORTANT: Always use English terms in tool calls (gene names, pathway names, organism names), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.
使用过表达分析(ORA)和基因集富集分析(GSEA)两种方法,进行包括基因本体(GO)、KEGG、Reactome、WikiPathways和MSigDB在内的全面基因富集分析。通过gseapy进行本地计算,并结合ToolUniverse通路数据库,可得到经过交叉验证、可用于发表的结果。
重要提示:在调用工具时始终使用英文术语(基因名称、通路名称、生物名称),即使用户使用其他语言提问。仅当英文查询无结果时,才尝试使用原语言术语作为备选。使用用户的语言进行回复。
When to Use This Skill
适用场景
Apply when users:
- Ask about gene enrichment analysis (GO, KEGG, Reactome, etc.)
- Have a gene list from differential expression, clustering, or any experiment
- Want to know which biological processes, molecular functions, or cellular components are enriched
- Need KEGG or Reactome pathway enrichment analysis
- Ask about GSEA (Gene Set Enrichment Analysis) with ranked gene lists
- Want over-representation analysis (ORA) with Fisher's exact test
- Need multiple testing correction (Benjamini-Hochberg, Bonferroni)
- Ask about enrichGO, gseapy, clusterProfiler-style analyses
NOT for (use other skills instead):
- Network pharmacology / drug repurposing → Use
tooluniverse-network-pharmacology - Disease characterization → Use
tooluniverse-multiomic-disease-characterization - Single gene function lookup → Use
tooluniverse-disease-research - Spatial omics analysis → Use
tooluniverse-spatial-omics-analysis - Protein-protein interaction analysis only → Use
tooluniverse-protein-interactions
当用户有以下需求时使用本工具:
- 询问基因富集分析(GO、KEGG、Reactome等相关内容)
- 拥有差异表达、聚类或其他实验得到的基因列表
- 想了解哪些生物过程、分子功能或细胞组分被富集
- 需要KEGG或Reactome通路富集分析
- 询问带排名基因列表的GSEA(基因集富集分析)
- 需要基于Fisher精确检验的过表达分析(ORA)
- 需要多重检验校正(Benjamini-Hochberg、Bonferroni)
- 询问enrichGO、gseapy、clusterProfiler风格的分析
不适用场景(请使用其他工具):
- 网络药理学/药物重定位 → 使用
tooluniverse-network-pharmacology - 疾病特征分析 → 使用
tooluniverse-multiomic-disease-characterization - 单个基因功能查询 → 使用
tooluniverse-disease-research - 空间组学分析 → 使用
tooluniverse-spatial-omics-analysis - 仅蛋白质相互作用分析 → 使用
tooluniverse-protein-interactions
Input Parameters
输入参数
| Parameter | Required | Description | Example |
|---|---|---|---|
| gene_list | Yes | List of gene symbols, Ensembl IDs, or Entrez IDs | |
| organism | No | Organism (default: human). Supported: human, mouse, rat, fly, worm, yeast, zebrafish | |
| analysis_type | No | | |
| enrichment_databases | No | Which databases to query. Default: all applicable | |
| gene_id_type | No | Input ID type: | |
| p_value_cutoff | No | Significance threshold (default: 0.05) | |
| correction_method | No | Multiple testing: | |
| background_genes | No | Custom background gene set (default: genome-wide) | |
| ranked_gene_list | No | For GSEA: gene-to-score mapping (e.g., log2FC) | |
| 参数 | 是否必填 | 描述 | 示例 |
|---|---|---|---|
| gene_list | 是 | 基因符号、Ensembl ID或Entrez ID的列表 | |
| organism | 否 | 生物种类(默认:人类)。支持:人类、小鼠、大鼠、果蝇、线虫、酵母、斑马鱼 | |
| analysis_type | 否 | 分析类型: | |
| enrichment_databases | 否 | 要查询的数据库。默认:所有适用数据库 | |
| gene_id_type | 否 | 输入ID类型: | |
| p_value_cutoff | 否 | 显著性阈值(默认:0.05) | |
| correction_method | 否 | 多重检验方法: | |
| background_genes | 否 | 自定义背景基因集(默认:全基因组) | |
| ranked_gene_list | 否 | 用于GSEA:基因与得分的映射(如log2FC) | |
Core Principles
核心原则
- Report-first approach - Create report file FIRST, then populate progressively
- ID disambiguation FIRST - Detect and convert gene IDs before ANY enrichment
- Multi-source validation - Run enrichment on at least 2 independent tools, cross-validate
- Exact p-values - Report raw p-values AND adjusted p-values with correction method
- Multiple testing correction - ALWAYS apply Benjamini-Hochberg unless user specifies otherwise
- Gene set size filtering - Filter by min/max gene set size to avoid trivial/overly broad terms
- Evidence grading - Grade enrichment sources T1-T4
- Negative results documented - "No significant enrichment" is a valid finding
- Source references - Every enrichment result must cite the tool/database/library used
- Completeness checklist - Mandatory section at end showing analysis coverage
- 先报告原则 - 先创建报告文件,再逐步填充内容
- 先ID消歧 - 在进行任何富集分析前,先检测并转换基因ID
- 多源验证 - 至少使用2个独立工具进行富集分析,交叉验证结果
- 精确P值 - 报告原始P值和经过校正方法调整后的P值
- 多重检验校正 - 除非用户特别指定,否则始终应用Benjamini-Hochberg校正
- 基因集大小过滤 - 通过最小/最大基因集大小过滤,避免无意义或过于宽泛的术语
- 证据分级 - 对富集来源进行T1-T4分级
- 记录阴性结果 - "无显著富集"是有效的结论
- 来源引用 - 每个富集结果必须注明使用的工具/数据库/库
- 完整性检查清单 - 在报告末尾添加必填的分析覆盖情况部分
Decision Tree: ORA vs GSEA
决策树:ORA vs GSEA
Q: Do you have a ranked gene list (with scores/fold-changes)?
YES → Use GSEA (gseapy.prerank)
- Input: Gene-to-score mapping (e.g., log2FC)
- Statistics: Running enrichment score, permutation test
- Cutoff: FDR q-val < 0.25 (standard for GSEA)
- Output: NES (Normalized Enrichment Score), lead genes
See: references/gsea_workflow.md
NO → Use ORA (gseapy.enrichr)
- Input: Gene list only
- Statistics: Fisher's exact test, hypergeometric
- Cutoff: Adjusted P-value < 0.05 (or user specified)
- Output: P-value, adjusted P-value, overlap, odds ratio
See: references/ora_workflow.mdQ: 你是否有带排名的基因列表(包含得分/倍数变化)?
是 → 使用GSEA(gseapy.prerank)
- 输入:基因与得分的映射(如log2FC)
- 统计方法:运行富集得分、置换检验
- 阈值:FDR q值 < 0.25(GSEA标准阈值)
- 输出:NES(标准化富集得分)、核心基因
参考:references/gsea_workflow.md
否 → 使用ORA(gseapy.enrichr)
- 输入:仅基因列表
- 统计方法:Fisher精确检验、超几何检验
- 阈值:校正后P值 < 0.05(或用户指定值)
- 输出:P值、校正后P值、重叠基因、优势比
参考:references/ora_workflow.mdDecision Tree: gseapy vs ToolUniverse Tools
决策树:gseapy vs ToolUniverse工具
Q: Which enrichment method should I use?
Primary Analysis (ALWAYS):
├─ gseapy.enrichr (ORA) OR gseapy.prerank (GSEA)
│ - Most comprehensive (225+ Enrichr libraries)
│ - GO (BP, MF, CC), KEGG, Reactome, WikiPathways, MSigDB
│ - All organisms supported
│ - Returns: P-value, Adjusted P-value, Overlap, Genes
│ See: references/enrichr_guide.md
Cross-Validation (REQUIRED for publication):
├─ PANTHER_enrichment [T1 - curated]
│ - Curated GO enrichment
│ - Multiple organisms (taxonomy ID)
│ - GO BP, MF, CC, PANTHER pathways, Reactome
│
├─ STRING_functional_enrichment [T2 - validated]
│ - Returns ALL categories in one call
│ - Filter by category: Process, Function, Component, KEGG, Reactome
│ - Network-based enrichment
│
└─ ReactomeAnalysis_pathway_enrichment [T1 - curated]
- Reactome curated pathways
- Cross-species projection
- Detailed pathway hierarchy
Additional Context (Optional):
├─ GO_get_term_by_id, QuickGO_get_term_detail (GO term details)
├─ Reactome_get_pathway, Reactome_get_pathway_hierarchy (pathway context)
├─ WikiPathways_search, WikiPathways_get_pathway (community pathways)
└─ STRING_ppi_enrichment (network topology analysis)Q: 我应该使用哪种富集方法?
主要分析(必须执行):
├─ gseapy.enrichr(ORA)或gseapy.prerank(GSEA)
│ - 覆盖最全面(225+ Enrichr数据库)
│ - 支持GO(BP、MF、CC)、KEGG、Reactome、WikiPathways、MSigDB
│ - 支持所有生物种类
│ - 返回结果:P值、校正后P值、重叠基因、基因列表
│ 参考:references/enrichr_guide.md
交叉验证(发表级结果必填):
├─ PANTHER_enrichment [T1 - 人工整理]
│ - 人工整理的GO富集数据库
│ - 支持多种生物(分类学ID)
│ - 支持GO BP、MF、CC、PANTHER通路、Reactome
│
├─ STRING_functional_enrichment [T2 - 已验证]
│ - 一次调用返回所有分类结果
│ - 可按分类过滤:过程、功能、组分、KEGG、Reactome
│ - 基于网络的富集分析
│
└─ ReactomeAnalysis_pathway_enrichment [T1 - 人工整理]
- Reactome人工整理的通路
- 跨物种映射
- 详细的通路层级
额外补充(可选):
├─ GO_get_term_by_id、QuickGO_get_term_detail(GO术语详情)
├─ Reactome_get_pathway、Reactome_get_pathway_hierarchy(通路背景信息)
├─ WikiPathways_search、WikiPathways_get_pathway(社区贡献通路)
└─ STRING_ppi_enrichment(网络拓扑分析)Quick Start Workflow
快速开始流程
Step 1: Create Report File (IMMEDIATE)
步骤1:立即创建报告文件
python
report_path = f"{analysis_name}_enrichment_report.md"python
report_path = f"{analysis_name}_enrichment_report.md"Write header with placeholder sections
写入带占位符章节的标题
Update progressively as analysis proceeds
分析过程中逐步更新内容
undefinedundefinedStep 2: ID Conversion and Validation
步骤2:ID转换与验证
python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()Detect ID type
检测ID类型
gene_list = ["TP53", "BRCA1", "EGFR"]
gene_list = ["TP53", "BRCA1", "EGFR"]
Auto-detect: ENSG* = Ensembl, numeric = Entrez, pattern = UniProt, else = Symbol
自动检测:ENSG*开头为Ensembl ID,纯数字为Entrez ID,符合特定格式为UniProt ID,其余为基因符号
Convert if needed (Ensembl/Entrez → Symbol)
如有需要进行转换(Ensembl/Entrez → 基因符号)
result = tu.tools.MyGene_batch_query(
gene_ids=gene_list,
fields="symbol,entrezgene,ensembl.gene"
)
result = tu.tools.MyGene_batch_query(
gene_ids=gene_list,
fields="symbol,entrezgene,ensembl.gene"
)
Extract symbols from results
从结果中提取基因符号
Validate with STRING
使用STRING进行验证
mapped = tu.tools.STRING_map_identifiers(
protein_ids=gene_symbols,
species=9606 # human
)
mapped = tu.tools.STRING_map_identifiers(
protein_ids=gene_symbols,
species=9606 # 人类
)
Use preferredName for canonical symbols
使用preferredName作为标准基因符号
**See**: references/id_conversion.md for complete examples
**参考**:references/id_conversion.md获取完整示例Step 3: Primary Enrichment with gseapy
步骤3:使用gseapy进行主要富集分析
For ORA (gene list only):
python
import gseapy针对ORA(仅基因列表):
python
import gseapyGO Biological Process
GO生物过程
go_bp = gseapy.enrichr(
gene_list=gene_symbols,
gene_sets='GO_Biological_Process_2021',
organism='human',
outdir=None,
no_plot=True,
background=background_genes # None = genome-wide
)
go_bp_sig = go_bp.results[go_bp.results['Adjusted P-value'] < 0.05]
**For GSEA (ranked gene list)**:
```python
import pandas as pdgo_bp = gseapy.enrichr(
gene_list=gene_symbols,
gene_sets='GO_Biological_Process_2021',
organism='human',
outdir=None,
no_plot=True,
background=background_genes # None表示使用全基因组背景
)
go_bp_sig = go_bp.results[go_bp.results['Adjusted P-value'] < 0.05]
**针对GSEA(带排名基因列表)**:
```python
import pandas as pdRanked by log2FC
按log2FC排序
ranked_series = pd.Series(gene_to_score).sort_values(ascending=False)
gsea_result = gseapy.prerank(
rnk=ranked_series,
gene_sets='GO_Biological_Process_2021',
outdir=None,
no_plot=True,
seed=42,
min_size=5,
max_size=500,
permutation_num=1000
)
gsea_sig = gsea_result.res2d[gsea_result.res2d['FDR q-val'] < 0.25]
**See**:
- references/ora_workflow.md for complete ORA examples
- references/gsea_workflow.md for complete GSEA examples
- references/enrichr_guide.md for all 225+ librariesranked_series = pd.Series(gene_to_score).sort_values(ascending=False)
gsea_result = gseapy.prerank(
rnk=ranked_series,
gene_sets='GO_Biological_Process_2021',
outdir=None,
no_plot=True,
seed=42,
min_size=5,
max_size=500,
permutation_num=1000
)
gsea_sig = gsea_result.res2d[gsea_result.res2d['FDR q-val'] < 0.25]
**参考**:
- references/ora_workflow.md获取完整的ORA示例(含所有数据库)
- references/gsea_workflow.md获取完整的GSEA流程示例
- references/enrichr_guide.md获取225+数据库的详情Step 4: Cross-Validation with ToolUniverse
步骤4:使用ToolUniverse进行交叉验证
python
undefinedpython
undefinedPANTHER [T1 - curated]
PANTHER [T1 - 人工整理]
panther_bp = tu.tools.PANTHER_enrichment(
gene_list=','.join(gene_symbols), # comma-separated string
organism=9606,
annotation_dataset='GO:0008150' # biological_process
)
panther_bp = tu.tools.PANTHER_enrichment(
gene_list=','.join(gene_symbols), # 逗号分隔的字符串
organism=9606,
annotation_dataset='GO:0008150' #生物过程
)
STRING [T2 - validated]
STRING [T2 - 已验证]
string_result = tu.tools.STRING_functional_enrichment(
protein_ids=gene_symbols,
species=9606
)
string_result = tu.tools.STRING_functional_enrichment(
protein_ids=gene_symbols,
species=9606
)
Filter by category: Process, Function, Component, KEGG, Reactome
按分类过滤:过程、功能、组分、KEGG、Reactome
Reactome [T1 - curated]
Reactome [T1 - 人工整理]
reactome_result = tu.tools.ReactomeAnalysis_pathway_enrichment(
identifiers=' '.join(gene_symbols), # space-separated
page_size=50,
include_disease=True
)
**See**: references/cross_validation.md for comparison strategiesreactome_result = tu.tools.ReactomeAnalysis_pathway_enrichment(
identifiers=' '.join(gene_symbols), # 空格分隔
page_size=50,
include_disease=True
)
**参考**:references/cross_validation.md获取对比策略Step 5: Report Compilation
步骤5:报告整理
markdown
undefinedmarkdown
undefinedResults
结果
GO Biological Process (Top 10)
GO生物过程(Top 10)
| Term | P-value | Adj. P-value | Overlap | Genes | Evidence |
|---|---|---|---|---|---|
| regulation of cell cycle (GO:0051726) | 1.2e-08 | 3.4e-06 | 12/45 | TP53;BRCA1;... | [T2] gseapy |
| 术语 | P值 | 校正后P值 | 重叠基因数 | 基因列表 | 证据等级 |
|---|---|---|---|---|---|
| 细胞周期调控(GO:0051726) | 1.2e-08 | 3.4e-06 | 12/45 | TP53;BRCA1;... | [T2] gseapy |
Cross-Validation
交叉验证结果
| GO Term | gseapy FDR | PANTHER FDR | STRING FDR | Consensus |
|---|---|---|---|---|
| GO:0051726 | 3.4e-06 | 2.1e-05 | 1.8e-05 | 3/3 ✓ |
| GO术语 | gseapy FDR | PANTHER FDR | STRING FDR | 一致性 |
|---|---|---|---|---|
| GO:0051726 | 3.4e-06 | 2.1e-05 | 1.8e-05 | 3/3 ✓ |
Completeness Checklist
完整性检查清单
- ID Conversion (MyGene, STRING) - 95% mapped
- GO BP (gseapy, PANTHER, STRING) - 24 significant terms
- GO MF (gseapy, PANTHER, STRING) - 18 significant terms
- GO CC (gseapy, PANTHER, STRING) - 12 significant terms
- KEGG (gseapy, STRING) - 8 significant pathways
- Reactome (gseapy, ReactomeAPI) - 15 significant pathways
- Cross-validation - 12 consensus terms (2+ sources)
**See**: scripts/format_enrichment_output.py for automated formatting
---- ID转换(MyGene、STRING)- 95%映射成功
- GO BP(gseapy、PANTHER、STRING)- 24个显著术语
- GO MF(gseapy、PANTHER、STRING)- 18个显著术语
- GO CC(gseapy、PANTHER、STRING)- 12个显著术语
- KEGG(gseapy、STRING)- 8个显著通路
- Reactome(gseapy、ReactomeAPI)- 15个显著通路
- 交叉验证 - 12个一致术语(≥2个来源支持)
**参考**:scripts/format_enrichment_output.py获取自动化格式化脚本
---Evidence Grading
证据分级
| Tier | Symbol | Criteria | Examples |
|---|---|---|---|
| T1 | [T1] | Curated/experimental enrichment | PANTHER, Reactome Analysis Service |
| T2 | [T2] | Computational enrichment, well-validated | gseapy ORA/GSEA, STRING functional enrichment |
| T3 | [T3] | Text-mining/predicted enrichment | Enrichr non-curated libraries |
| T4 | [T4] | Single-source annotation | Individual gene GO annotations from QuickGO |
| 等级 | 标识 | 标准 | 示例 |
|---|---|---|---|
| T1 | [T1] | 人工整理/实验验证的富集结果 | PANTHER、Reactome分析服务 |
| T2 | [T2] | 计算富集结果,经过充分验证 | gseapy ORA/GSEA、STRING功能富集 |
| T3 | [T3] | 文本挖掘/预测的富集结果 | Enrichr非人工整理数据库 |
| T4 | [T4] | 单来源注释 | QuickGO中的单个基因GO注释 |
Supported Organisms
支持的生物种类
| Organism | Taxonomy ID | gseapy | PANTHER | STRING | Reactome |
|---|---|---|---|---|---|
| Human | 9606 | Yes | Yes | Yes | Yes |
| Mouse | 10090 | Yes ( | Yes | Yes | Yes (projection) |
| Rat | 10116 | Limited | Yes | Yes | Yes (projection) |
| Fly | 7227 | Limited | Yes | Yes | Yes (projection) |
| Worm | 6239 | Limited | Yes | Yes | Yes (projection) |
| Yeast | 4932 | Limited | Yes | Yes | Yes |
See: references/organism_support.md for organism-specific libraries
| 生物 | 分类学ID | gseapy | PANTHER | STRING | Reactome |
|---|---|---|---|---|---|
| 人类 | 9606 | 是 | 是 | 是 | 是 |
| 小鼠 | 10090 | 是(需使用 | 是 | 是 | 是(支持映射) |
| 大鼠 | 10116 | 有限支持 | 是 | 是 | 是(支持映射) |
| 果蝇 | 7227 | 有限支持 | 是 | 是 | 是(支持映射) |
| 线虫 | 6239 | 有限支持 | 是 | 是 | 是(支持映射) |
| 酵母 | 4932 | 有限支持 | 是 | 是 | 是 |
参考:references/organism_support.md获取生物种类专属数据库信息
Common Patterns
常见模式
Pattern 1: Standard DEG Enrichment (ORA)
模式1:标准差异表达基因富集(ORA)
Input: List of differentially expressed gene symbols
Flow: ID validation → gseapy ORA (GO + KEGG + Reactome) →
PANTHER + STRING cross-validation → Report top enriched terms
Use: When you have unranked gene list from DESeq2/edgeR输入:差异表达基因符号列表
流程:ID验证 → gseapy ORA(GO + KEGG + Reactome)→
PANTHER + STRING交叉验证 → 报告富集度最高的术语
适用场景:拥有来自DESeq2/edgeR的无排名基因列表时Pattern 2: Ranked Gene List (GSEA)
模式2:带排名基因列表(GSEA)
Input: Gene-to-log2FC mapping from differential expression
Flow: Convert to ranked Series → gseapy GSEA (GO + KEGG + MSigDB) →
Filter by FDR < 0.25 → Report NES and lead genes
Use: When you have fold-changes or other ranking metric输入:差异表达得到的基因与log2FC映射关系
流程:转换为排名序列 → gseapy GSEA(GO + KEGG + MSigDB)→
按FDR < 0.25过滤 → 报告NES和核心基因
适用场景:拥有倍数变化或其他排名指标时Pattern 3: BixBench Enrichment Question
模式3:特定富集问题查询
Input: Specific question about enrichment (e.g., "What is the adjusted p-val for neutrophil activation?")
Flow: Parse question for gene list and library → Run gseapy with exact library →
Find specific term → Report exact p-value and adjusted p-value
Use: When answering targeted questions about specific terms输入:关于富集的特定问题(如“中性粒细胞激活的校正后P值是多少?”)
流程:解析问题中的基因列表和数据库 → 使用gseapy调用指定数据库 →
找到特定术语 → 报告精确P值和校正后P值
适用场景:回答关于特定术语的靶向问题时Pattern 4: Multi-Organism Enrichment
模式4:多生物富集分析
Input: Gene list from mouse experiment
Flow: Use organism='mouse' for gseapy → organism=10090 for PANTHER/STRING →
projection=True for Reactome human pathway mapping
Use: When working with non-human organismsSee: references/common_patterns.md for more examples
输入:来自小鼠实验的基因列表
流程:gseapy中使用organism='mouse' → PANTHER/STRING中使用organism=10090 →
Reactome中启用projection=True进行人类通路映射
适用场景:处理非人类生物的基因数据时参考:references/common_patterns.md获取更多示例
Troubleshooting
故障排除
"No significant enrichment found":
- Verify gene symbols are valid (STRING_map_identifiers)
- Try different library versions (2021 vs 2023 vs 2025)
- Try relaxing significance cutoff or use GSEA instead
"Gene not found" errors:
- Check ID type and convert using MyGene_batch_query
- Remove version suffixes from Ensembl IDs (ENSG00000141510.16 → ENSG00000141510)
"STRING returns all categories":
- This is expected; filter by after receiving results
d['category'] == 'Process'
See: references/troubleshooting.md for complete guide
“未发现显著富集”:
- 验证基因符号是否有效(使用STRING_map_identifiers)
- 尝试不同版本的数据库(2021 vs 2023 vs 2025)
- 尝试放宽显著性阈值或改用GSEA方法
“未找到基因”错误:
- 检查ID类型,使用MyGene_batch_query进行转换
- 移除Ensembl ID的版本后缀(如ENSG00000141510.16 → ENSG00000141510)
“STRING返回所有分类结果”:
- 这是正常现象;收到结果后可通过进行过滤
d['category'] == 'Process'
参考:references/troubleshooting.md获取完整指南
Tool Reference
工具参考
Primary Enrichment Tools
主要富集工具
| Tool | Input | Output | Use For |
|---|---|---|---|
| gene_list, gene_sets, organism | | ORA with 225+ libraries |
| rnk (ranked Series), gene_sets | | GSEA analysis |
| 工具 | 输入 | 输出 | 用途 |
|---|---|---|---|
| gene_list、gene_sets、organism | | 使用225+数据库进行ORA分析 |
| rnk(排名序列)、gene_sets | | GSEA分析 |
Cross-Validation Tools
交叉验证工具
| Tool | Key Parameters | Evidence Grade |
|---|---|---|
| gene_list (comma-sep), organism, annotation_dataset | [T1] |
| protein_ids, species | [T2] |
| identifiers (space-sep), page_size | [T1] |
| 工具 | 关键参数 | 证据等级 |
|---|---|---|
| gene_list(逗号分隔)、organism、annotation_dataset | [T1] |
| protein_ids、species | [T2] |
| identifiers(空格分隔)、page_size | [T1] |
ID Conversion Tools
ID转换工具
| Tool | Input | Output |
|---|---|---|
| gene_ids, fields | Symbol, Entrez, Ensembl mappings |
| protein_ids, species | Preferred names, STRING IDs |
See: references/tool_parameters.md for complete parameter documentation
| 工具 | 输入 | 输出 |
|---|---|---|
| gene_ids、fields | 基因符号、Entrez、Ensembl的映射关系 |
| protein_ids、species | 标准名称、STRING ID |
参考:references/tool_parameters.md获取完整参数文档
Detailed Documentation
详细文档
All detailed examples, code blocks, and advanced topics have been moved to :
references/- references/ora_workflow.md - Complete ORA examples with all databases
- references/gsea_workflow.md - Complete GSEA workflow with ranked lists
- references/enrichr_guide.md - All 225+ Enrichr libraries and usage
- references/cross_validation.md - Multi-source validation strategies
- references/id_conversion.md - Gene ID disambiguation and conversion
- references/tool_parameters.md - Complete tool parameter reference
- references/organism_support.md - Organism-specific configurations
- references/common_patterns.md - Detailed use case examples
- references/troubleshooting.md - Complete troubleshooting guide
- references/multiple_testing.md - Correction methods (BH, Bonferroni, BY)
- references/report_template.md - Standard report format
Helper scripts:
- scripts/format_enrichment_output.py - Format results for reports
- scripts/compare_enrichment_sources.py - Cross-validation analysis
- scripts/filter_by_gene_set_size.py - Filter terms by size
所有详细示例、代码块和进阶内容已移至目录:
references/- references/ora_workflow.md - 全数据库的完整ORA示例
- references/gsea_workflow.md - 带排名列表的完整GSEA流程
- references/enrichr_guide.md - 225+ Enrichr数据库及使用方法
- references/cross_validation.md - 多源验证策略
- references/id_conversion.md - 基因ID消歧与转换
- references/tool_parameters.md - 完整工具参数参考
- references/organism_support.md - 生物种类专属配置
- references/common_patterns.md - 详细用例示例
- references/troubleshooting.md - 完整故障排除指南
- references/multiple_testing.md - 校正方法(BH、Bonferroni、BY)
- references/report_template.md - 标准报告格式
辅助脚本:
- scripts/format_enrichment_output.py - 结果格式化脚本(用于报告)
- scripts/compare_enrichment_sources.py - 交叉验证分析脚本
- scripts/filter_by_gene_set_size.py - 按基因集大小过滤术语脚本
Resources
相关资源
For network-level analysis: tooluniverse-network-pharmacology
For disease characterization: tooluniverse-multiomic-disease-characterization
For spatial omics: tooluniverse-spatial-omics-analysis
For protein interactions: tooluniverse-protein-interactions
gseapy documentation: https://gseapy.readthedocs.io/
PANTHER API: http://pantherdb.org/services/oai/pantherdb/
STRING API: https://string-db.org/cgi/help?sessionId=&subpage=api
Reactome Analysis: https://reactome.org/AnalysisService/
如需进行网络层面分析:tooluniverse-network-pharmacology
如需进行疾病特征分析:tooluniverse-multiomic-disease-characterization
如需进行空间组学分析:tooluniverse-spatial-omics-analysis
如需进行蛋白质相互作用分析:tooluniverse-protein-interactions
gseapy文档:https://gseapy.readthedocs.io/
PANTHER API:http://pantherdb.org/services/oai/pantherdb/
STRING API:https://string-db.org/cgi/help?sessionId=&subpage=api
Reactome分析服务:https://reactome.org/AnalysisService/