deeptools

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

deepTools: NGS Data Analysis Toolkit

deepTools:NGS数据分析工具包

Overview

概述

deepTools is a comprehensive suite of Python command-line tools designed for processing and analyzing high-throughput sequencing data. Use deepTools to perform quality control, normalize data, compare samples, and generate publication-quality visualizations for ChIP-seq, RNA-seq, ATAC-seq, MNase-seq, and other NGS experiments.
Core capabilities:
  • Convert BAM alignments to normalized coverage tracks (bigWig/bedGraph)
  • Quality control assessment (fingerprint, correlation, coverage)
  • Sample comparison and correlation analysis
  • Heatmap and profile plot generation around genomic features
  • Enrichment analysis and peak region visualization
deepTools是一套全面的Python命令行工具套件,专为处理和分析高通量测序数据而设计。使用deepTools可进行质量控制、数据标准化、样本比较,并为ChIP-seq、RNA-seq、ATAC-seq、MNase-seq及其他NGS实验生成可用于发表的可视化结果。
核心功能:
  • 将BAM比对文件转换为标准化覆盖度轨迹文件(bigWig/bedGraph)
  • 质量控制评估(指纹分析、相关性分析、覆盖度分析)
  • 样本比较与相关性分析
  • 围绕基因组特征生成热图和图谱
  • 富集分析与峰区可视化

When to Use This Skill

适用场景

This skill should be used when:
  • File conversion: "Convert BAM to bigWig", "generate coverage tracks", "normalize ChIP-seq data"
  • Quality control: "check ChIP quality", "compare replicates", "assess sequencing depth", "QC analysis"
  • Visualization: "create heatmap around TSS", "plot ChIP signal", "visualize enrichment", "generate profile plot"
  • Sample comparison: "compare treatment vs control", "correlate samples", "PCA analysis"
  • Analysis workflows: "analyze ChIP-seq data", "RNA-seq coverage", "ATAC-seq analysis", "complete workflow"
  • Working with specific file types: BAM files, bigWig files, BED region files in genomics context
本工具适用于以下场景:
  • 格式转换:「将BAM转为bigWig」、「生成覆盖度轨迹」、「标准化ChIP-seq数据」
  • 质量控制:「检查ChIP质量」、「比较重复样本」、「评估测序深度」、「QC分析」
  • 可视化:「围绕TSS创建热图」、「绘制ChIP信号图」、「可视化富集情况」、「生成图谱」
  • 样本比较:「比较处理组与对照组」、「样本相关性分析」、「PCA分析」
  • 分析工作流:「分析ChIP-seq数据」、「RNA-seq覆盖度分析」、「ATAC-seq分析」、「完整工作流执行」
  • 特定文件类型处理:基因组研究场景下的BAM文件、bigWig文件、BED区域文件

Quick Start

快速入门

For users new to deepTools, start with file validation and common workflows:
对于deepTools新用户,可从文件验证和常见工作流开始:

1. Validate Input Files

1. 验证输入文件

Before running any analysis, validate BAM, bigWig, and BED files using the validation script:
bash
python scripts/validate_files.py --bam sample1.bam sample2.bam --bed regions.bed
This checks file existence, BAM indices, and format correctness.
在运行任何分析之前,使用验证脚本检查BAM、bigWig和BED文件:
bash
python scripts/validate_files.py --bam sample1.bam sample2.bam --bed regions.bed
该脚本会检查文件是否存在、BAM索引是否齐全以及格式是否正确。

2. Generate Workflow Template

2. 生成工作流模板

For standard analyses, use the workflow generator to create customized scripts:
bash
undefined
对于标准分析,可使用工作流生成器创建自定义脚本:
bash
undefined

List available workflows

列出可用工作流

python scripts/workflow_generator.py --list
python scripts/workflow_generator.py --list

Generate ChIP-seq QC workflow

生成ChIP-seq QC工作流

python scripts/workflow_generator.py chipseq_qc -o qc_workflow.sh
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398
python scripts/workflow_generator.py chipseq_qc -o qc_workflow.sh
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398

Make executable and run

赋予执行权限并运行

chmod +x qc_workflow.sh ./qc_workflow.sh
undefined
chmod +x qc_workflow.sh ./qc_workflow.sh
undefined

3. Most Common Operations

3. 最常用操作

See
assets/quick_reference.md
for frequently used commands and parameters.
常用命令和参数请参考
assets/quick_reference.md

Installation

安装

bash
uv pip install deeptools
bash
uv pip install deeptools

Core Workflows

核心工作流

deepTools workflows typically follow this pattern: QC → Normalization → Comparison/Visualization
deepTools工作流通常遵循以下流程:质量控制 → 数据标准化 → 比较/可视化

ChIP-seq Quality Control Workflow

ChIP-seq质量控制工作流

When users request ChIP-seq QC or quality assessment:
  1. Generate workflow script using
    scripts/workflow_generator.py chipseq_qc
  2. Key QC steps:
    • Sample correlation (multiBamSummary + plotCorrelation)
    • PCA analysis (plotPCA)
    • Coverage assessment (plotCoverage)
    • Fragment size validation (bamPEFragmentSize)
    • ChIP enrichment strength (plotFingerprint)
Interpreting results:
  • Correlation: Replicates should cluster together with high correlation (>0.9)
  • Fingerprint: Strong ChIP shows steep rise; flat diagonal indicates poor enrichment
  • Coverage: Assess if sequencing depth is adequate for analysis
Full workflow details in
references/workflows.md
→ "ChIP-seq Quality Control Workflow"
当用户需要ChIP-seq质量控制或评估时:
  1. 使用
    scripts/workflow_generator.py chipseq_qc
    生成工作流脚本
  2. 关键QC步骤
    • 样本相关性分析(multiBamSummary + plotCorrelation)
    • PCA分析(plotPCA)
    • 覆盖度评估(plotCoverage)
    • 片段长度验证(bamPEFragmentSize)
    • ChIP富集强度分析(plotFingerprint)
结果解读:
  • 相关性:重复样本应聚类在一起,且相关性高于0.9
  • 指纹图:强ChIP信号会呈现陡峭上升趋势;平坦的对角线表示富集效果差
  • 覆盖度:评估测序深度是否满足分析需求
完整工作流详情请参考
references/workflows.md
→ 「ChIP-seq质量控制工作流」

ChIP-seq Complete Analysis Workflow

ChIP-seq完整分析工作流

For full ChIP-seq analysis from BAM to visualizations:
  1. Generate coverage tracks with normalization (bamCoverage)
  2. Create comparison tracks (bamCompare for log2 ratio)
  3. Compute signal matrices around features (computeMatrix)
  4. Generate visualizations (plotHeatmap, plotProfile)
  5. Enrichment analysis at peaks (plotEnrichment)
Use
scripts/workflow_generator.py chipseq_analysis
to generate template.
Complete command sequences in
references/workflows.md
→ "ChIP-seq Analysis Workflow"
从BAM文件到可视化结果的完整ChIP-seq分析流程:
  1. 生成带标准化的覆盖度轨迹(使用bamCoverage)
  2. 创建样本比较轨迹(使用bamCompare计算log2比值)
  3. 计算特征区域的信号矩阵(使用computeMatrix)
  4. 生成可视化结果(plotHeatmap、plotProfile)
  5. 峰区富集分析(plotEnrichment)
可使用
scripts/workflow_generator.py chipseq_analysis
生成模板。
完整命令序列请参考
references/workflows.md
→ 「ChIP-seq分析工作流」

RNA-seq Coverage Workflow

RNA-seq覆盖度工作流

For strand-specific RNA-seq coverage tracks:
Use bamCoverage with
--filterRNAstrand
to separate forward and reverse strands.
Important: NEVER use
--extendReads
for RNA-seq (would extend over splice junctions).
Use normalization: CPM for fixed bins, RPKM for gene-level analysis.
Template available:
scripts/workflow_generator.py rnaseq_coverage
Details in
references/workflows.md
→ "RNA-seq Coverage Workflow"
针对链特异性RNA-seq覆盖度轨迹:
使用bamCoverage并添加
--filterRNAstrand
参数分离正链和负链。
重要提示: RNA-seq分析绝不允许使用
--extendReads
参数(该参数会延伸读段跨越剪接位点)。
标准化方法选择:固定bin使用CPM,基因水平分析使用RPKM。
模板生成命令:
scripts/workflow_generator.py rnaseq_coverage
详情请参考
references/workflows.md
→ 「RNA-seq覆盖度工作流」

ATAC-seq Analysis Workflow

ATAC-seq分析工作流

ATAC-seq requires Tn5 offset correction:
  1. Shift reads using alignmentSieve with
    --ATACshift
  2. Generate coverage with bamCoverage
  3. Analyze fragment sizes (expect nucleosome ladder pattern)
  4. Visualize at peaks if available
Template:
scripts/workflow_generator.py atacseq
Full workflow in
references/workflows.md
→ "ATAC-seq Workflow"
ATAC-seq分析需要进行Tn5偏移校正:
  1. 使用alignmentSieve并添加
    --ATACshift
    参数偏移读段
  2. 使用bamCoverage生成覆盖度轨迹
  3. 分析片段长度(预期会出现核小体梯状模式)
  4. 若有峰区数据则进行可视化
模板生成命令:
scripts/workflow_generator.py atacseq
完整工作流请参考
references/workflows.md
→ 「ATAC-seq工作流」

Tool Categories and Common Tasks

工具分类与常见任务

BAM/bigWig Processing

BAM/bigWig处理

Convert BAM to normalized coverage:
bash
bamCoverage --bam input.bam --outFileName output.bw \
    --normalizeUsing RPGC --effectiveGenomeSize 2913022398 \
    --binSize 10 --numberOfProcessors 8
Compare two samples (log2 ratio):
bash
bamCompare -b1 treatment.bam -b2 control.bam -o ratio.bw \
    --operation log2 --scaleFactorsMethod readCount
Key tools: bamCoverage, bamCompare, multiBamSummary, multiBigwigSummary, correctGCBias, alignmentSieve
Complete reference:
references/tools_reference.md
→ "BAM and bigWig File Processing Tools"
将BAM转换为标准化覆盖度轨迹:
bash
bamCoverage --bam input.bam --outFileName output.bw \
    --normalizeUsing RPGC --effectiveGenomeSize 2913022398 \
    --binSize 10 --numberOfProcessors 8
比较两个样本(log2比值):
bash
bamCompare -b1 treatment.bam -b2 control.bam -o ratio.bw \
    --operation log2 --scaleFactorsMethod readCount
核心工具: bamCoverage、bamCompare、multiBamSummary、multiBigwigSummary、correctGCBias、alignmentSieve
完整参考文档:
references/tools_reference.md
→ 「BAM和bigWig文件处理工具」

Quality Control

质量控制

Check ChIP enrichment:
bash
plotFingerprint -b input.bam chip.bam -o fingerprint.png \
    --extendReads 200 --ignoreDuplicates
Sample correlation:
bash
multiBamSummary bins --bamfiles *.bam -o counts.npz
plotCorrelation -in counts.npz --corMethod pearson \
    --whatToShow heatmap -o correlation.png
Key tools: plotFingerprint, plotCoverage, plotCorrelation, plotPCA, bamPEFragmentSize
Complete reference:
references/tools_reference.md
→ "Quality Control Tools"
检查ChIP富集效果:
bash
plotFingerprint -b input.bam chip.bam -o fingerprint.png \
    --extendReads 200 --ignoreDuplicates
样本相关性分析:
bash
multiBamSummary bins --bamfiles *.bam -o counts.npz
plotCorrelation -in counts.npz --corMethod pearson \
    --whatToShow heatmap -o correlation.png
核心工具: plotFingerprint、plotCoverage、plotCorrelation、plotPCA、bamPEFragmentSize
完整参考文档:
references/tools_reference.md
→ 「质量控制工具」

Visualization

可视化

Create heatmap around TSS:
bash
undefined
围绕TSS创建热图:
bash
undefined

Compute matrix

计算矩阵

computeMatrix reference-point -S signal.bw -R genes.bed
-b 3000 -a 3000 --referencePoint TSS -o matrix.gz
computeMatrix reference-point -S signal.bw -R genes.bed
-b 3000 -a 3000 --referencePoint TSS -o matrix.gz

Generate heatmap

生成热图

plotHeatmap -m matrix.gz -o heatmap.png
--colorMap RdBu --kmeans 3

**Create profile plot:**
```bash
plotProfile -m matrix.gz -o profile.png \
    --plotType lines --colors blue red
Key tools: computeMatrix, plotHeatmap, plotProfile, plotEnrichment
Complete reference:
references/tools_reference.md
→ "Visualization Tools"
plotHeatmap -m matrix.gz -o heatmap.png
--colorMap RdBu --kmeans 3

**生成图谱:**
```bash
plotProfile -m matrix.gz -o profile.png \
    --plotType lines --colors blue red
核心工具: computeMatrix、plotHeatmap、plotProfile、plotEnrichment
完整参考文档:
references/tools_reference.md
→ 「可视化工具」

Normalization Methods

标准化方法

Choosing the correct normalization is critical for valid comparisons. Consult
references/normalization_methods.md
for comprehensive guidance.
Quick selection guide:
  • ChIP-seq coverage: Use RPGC or CPM
  • ChIP-seq comparison: Use bamCompare with log2 and readCount
  • RNA-seq bins: Use CPM
  • RNA-seq genes: Use RPKM (accounts for gene length)
  • ATAC-seq: Use RPGC or CPM
Normalization methods:
  • RPGC: 1× genome coverage (requires --effectiveGenomeSize)
  • CPM: Counts per million mapped reads
  • RPKM: Reads per kb per million (accounts for region length)
  • BPM: Bins per million
  • None: Raw counts (not recommended for comparisons)
Full explanation:
references/normalization_methods.md
选择正确的标准化方法对于有效比较至关重要。请参考
references/normalization_methods.md
获取全面指导。
快速选择指南:
  • ChIP-seq覆盖度:使用RPGC或CPM
  • ChIP-seq样本比较:使用bamCompare并选择log2和readCount方法
  • RNA-seq bin分析:使用CPM
  • RNA-seq基因分析:使用RPKM(考虑基因长度)
  • ATAC-seq:使用RPGC或CPM
标准化方法说明:
  • RPGC:1×基因组覆盖度(需指定--effectiveGenomeSize参数)
  • CPM:每百万比对读段计数
  • RPKM:每千碱基每百万读段(考虑区域长度)
  • BPM:每百万bin计数
  • 无标准化:原始计数(不推荐用于样本比较)
完整说明请参考:
references/normalization_methods.md

Effective Genome Sizes

有效基因组大小

RPGC normalization requires effective genome size. Common values:
OrganismAssemblySizeUsage
HumanGRCh38/hg382,913,022,398
--effectiveGenomeSize 2913022398
MouseGRCm38/mm102,652,783,500
--effectiveGenomeSize 2652783500
ZebrafishGRCz111,368,780,147
--effectiveGenomeSize 1368780147
Drosophiladm6142,573,017
--effectiveGenomeSize 142573017
C. elegansce10/ce11100,286,401
--effectiveGenomeSize 100286401
Complete table with read-length-specific values:
references/effective_genome_sizes.md
RPGC标准化需要有效基因组大小。常见物种的数值如下:
物种组装版本大小使用方式
人类GRCh38/hg382,913,022,398
--effectiveGenomeSize 2913022398
小鼠GRCm38/mm102,652,783,500
--effectiveGenomeSize 2652783500
斑马鱼GRCz111,368,780,147
--effectiveGenomeSize 1368780147
果蝇dm6142,573,017
--effectiveGenomeSize 142573017
秀丽隐杆线虫ce10/ce11100,286,401
--effectiveGenomeSize 100286401
包含读长特异性数值的完整表格请参考:
references/effective_genome_sizes.md

Common Parameters Across Tools

工具通用参数

Many deepTools commands share these options:
Performance:
  • --numberOfProcessors, -p
    : Enable parallel processing (always use available cores)
  • --region
    : Process specific regions for testing (e.g.,
    chr1:1-1000000
    )
Read Filtering:
  • --ignoreDuplicates
    : Remove PCR duplicates (recommended for most analyses)
  • --minMappingQuality
    : Filter by alignment quality (e.g.,
    --minMappingQuality 10
    )
  • --minFragmentLength
    /
    --maxFragmentLength
    : Fragment length bounds
  • --samFlagInclude
    /
    --samFlagExclude
    : SAM flag filtering
Read Processing:
  • --extendReads
    : Extend to fragment length (ChIP-seq: YES, RNA-seq: NO)
  • --centerReads
    : Center at fragment midpoint for sharper signals
许多deepTools命令共享以下选项:
性能优化:
  • --numberOfProcessors, -p
    :启用并行处理(建议使用所有可用核心)
  • --region
    :仅处理特定区域用于测试(例如:
    chr1:1-1000000
读段过滤:
  • --ignoreDuplicates
    :去除PCR重复(大多数分析推荐使用)
  • --minMappingQuality
    :根据比对质量过滤读段(例如:
    --minMappingQuality 10
  • --minFragmentLength
    /
    --maxFragmentLength
    :片段长度范围
  • --samFlagInclude
    /
    --samFlagExclude
    :根据SAM标签过滤
读段处理:
  • --extendReads
    :将读段延伸至片段长度(ChIP-seq:是,RNA-seq:否)
  • --centerReads
    :将读段居中于片段中点以获得更清晰的信号

Best Practices

最佳实践

File Validation

文件验证

Always validate files first using
scripts/validate_files.py
to check:
  • File existence and readability
  • BAM indices present (.bai files)
  • BED format correctness
  • File sizes reasonable
始终先验证文件,使用
scripts/validate_files.py
检查:
  • 文件是否存在且可读
  • BAM索引文件(.bai)是否存在
  • BED格式是否正确
  • 文件大小是否合理

Analysis Strategy

分析策略

  1. Start with QC: Run correlation, coverage, and fingerprint analysis before proceeding
  2. Test on small regions: Use
    --region chr1:1-10000000
    for parameter testing
  3. Document commands: Save full command lines for reproducibility
  4. Use consistent normalization: Apply same method across samples in comparisons
  5. Verify genome assembly: Ensure BAM and BED files use matching genome builds
  1. 从QC开始:在进行后续分析前先运行相关性、覆盖度和指纹分析
  2. 在小区域测试:使用
    --region chr1:1-10000000
    进行参数测试
  3. 记录命令:保存完整命令行以保证可重复性
  4. 使用一致的标准化方法:在样本比较中对所有样本应用相同的标准化方法
  5. 验证基因组组装版本:确保BAM和BED文件使用匹配的基因组版本

ChIP-seq Specific

ChIP-seq特定注意事项

  • Always extend reads for ChIP-seq:
    --extendReads 200
  • Remove duplicates: Use
    --ignoreDuplicates
    in most cases
  • Check enrichment first: Run plotFingerprint before detailed analysis
  • GC correction: Only apply if significant bias detected; never use
    --ignoreDuplicates
    after GC correction
  • ChIP-seq必须延伸读段:使用
    --extendReads 200
  • 去除重复读段:大多数情况下使用
    --ignoreDuplicates
  • 先检查富集效果:在进行详细分析前先运行plotFingerprint
  • GC校正:仅在检测到显著偏差时应用;GC校正后绝不能使用
    --ignoreDuplicates

RNA-seq Specific

RNA-seq特定注意事项

  • Never extend reads for RNA-seq (would span splice junctions)
  • Strand-specific: Use
    --filterRNAstrand forward/reverse
    for stranded libraries
  • Normalization: CPM for bins, RPKM for genes
  • RNA-seq绝不能延伸读段(会跨越剪接位点)
  • 链特异性数据:对链特异性文库使用
    --filterRNAstrand forward/reverse
  • 标准化方法:bin分析使用CPM,基因分析使用RPKM

ATAC-seq Specific

ATAC-seq特定注意事项

  • Apply Tn5 correction: Use alignmentSieve with
    --ATACshift
  • Fragment filtering: Set appropriate min/max fragment lengths
  • Check nucleosome pattern: Fragment size plot should show ladder pattern
  • 应用Tn5校正:使用alignmentSieve并添加
    --ATACshift
    参数
  • 片段过滤:设置合适的最小/最大片段长度
  • 检查核小体模式:片段长度图应呈现梯状模式

Performance Optimization

性能优化

  1. Use multiple processors:
    --numberOfProcessors 8
    (or available cores)
  2. Increase bin size for faster processing and smaller files
  3. Process chromosomes separately for memory-limited systems
  4. Pre-filter BAM files using alignmentSieve to create reusable filtered files
  5. Use bigWig over bedGraph: Compressed and faster to process
  1. 使用多核心
    --numberOfProcessors 8
    (或所有可用核心)
  2. 增大bin大小:加快处理速度并减小文件体积
  3. 分染色体处理:适用于内存有限的系统
  4. 预过滤BAM文件:使用alignmentSieve创建可重复使用的过滤后文件
  5. 优先使用bigWig而非bedGraph:压缩格式且处理速度更快

Troubleshooting

故障排除

Common Issues

常见问题

BAM index missing:
bash
samtools index input.bam
Out of memory: Process chromosomes individually using
--region
:
bash
bamCoverage --bam input.bam -o chr1.bw --region chr1
Slow processing: Increase
--numberOfProcessors
and/or increase
--binSize
bigWig files too large: Increase bin size:
--binSize 50
or larger
BAM索引缺失:
bash
samtools index input.bam
内存不足: 使用
--region
参数单独处理各染色体:
bash
bamCoverage --bam input.bam -o chr1.bw --region chr1
处理速度慢: 增大
--numberOfProcessors
参数值,或增大
--binSize
bigWig文件过大: 增大bin大小:
--binSize 50
或更大

Validation Errors

验证错误

Run validation script to identify issues:
bash
python scripts/validate_files.py --bam *.bam --bed regions.bed
Common errors and solutions explained in script output.
运行验证脚本识别问题:
bash
python scripts/validate_files.py --bam *.bam --bed regions.bed
脚本输出中会解释常见错误及解决方案。

Reference Documentation

参考文档

This skill includes comprehensive reference documentation:
本工具包含全面的参考文档:

references/tools_reference.md

references/tools_reference.md

Complete documentation of all deepTools commands organized by category:
  • BAM and bigWig processing tools (9 tools)
  • Quality control tools (6 tools)
  • Visualization tools (3 tools)
  • Miscellaneous tools (2 tools)
Each tool includes:
  • Purpose and overview
  • Key parameters with explanations
  • Usage examples
  • Important notes and best practices
Use this reference when: Users ask about specific tools, parameters, or detailed usage.
按分类整理的所有deepTools命令完整文档:
  • BAM和bigWig文件处理工具(9个)
  • 质量控制工具(6个)
  • 可视化工具(3个)
  • 其他工具(2个)
每个工具包含:
  • 用途与概述
  • 关键参数说明
  • 使用示例
  • 重要注意事项与最佳实践
适用场景: 用户询问特定工具、参数或详细用法时

references/workflows.md

references/workflows.md

Complete workflow examples for common analyses:
  • ChIP-seq quality control workflow
  • ChIP-seq complete analysis workflow
  • RNA-seq coverage workflow
  • ATAC-seq analysis workflow
  • Multi-sample comparison workflow
  • Peak region analysis workflow
  • Troubleshooting and performance tips
Use this reference when: Users need complete analysis pipelines or workflow examples.
常见deepTools工作流的完整示例:
  • ChIP-seq质量控制工作流
  • ChIP-seq完整分析工作流
  • RNA-seq覆盖度工作流
  • ATAC-seq分析工作流
  • 多样本比较工作流
  • 峰区分析工作流
  • 故障排除与性能优化技巧
适用场景: 用户需要完整分析流程或工作流示例时

references/normalization_methods.md

references/normalization_methods.md

Comprehensive guide to normalization methods:
  • Detailed explanation of each method (RPGC, CPM, RPKM, BPM, etc.)
  • When to use each method
  • Formulas and interpretation
  • Selection guide by experiment type
  • Common pitfalls and solutions
  • Quick reference table
Use this reference when: Users ask about normalization, comparing samples, or which method to use.
标准化方法全面指南:
  • 每种方法的详细说明(RPGC、CPM、RPKM、BPM等)
  • 各方法的适用场景
  • 计算公式与结果解读
  • 按实验类型分类的选择指南
  • 常见误区与解决方案
  • 快速参考表格
适用场景: 用户询问标准化方法、样本比较或方法选择时

references/effective_genome_sizes.md

references/effective_genome_sizes.md

Effective genome size values and usage:
  • Common organism values (human, mouse, fly, worm, zebrafish)
  • Read-length-specific values
  • Calculation methods
  • When and how to use in commands
  • Custom genome calculation instructions
Use this reference when: Users need genome size for RPGC normalization or GC bias correction.
有效基因组大小数值与使用说明:
  • 常见物种数值(人类、小鼠、果蝇、线虫、斑马鱼)
  • 读长特异性数值
  • 计算方法
  • 在命令中的使用时机与方式
  • 自定义基因组的计算说明
适用场景: 用户需要RPGC标准化或GC偏差校正的基因组大小时

Helper Scripts

辅助脚本

scripts/validate_files.py

scripts/validate_files.py

Validates BAM, bigWig, and BED files for deepTools analysis. Checks file existence, indices, and format.
Usage:
bash
python scripts/validate_files.py --bam sample1.bam sample2.bam \
    --bed peaks.bed --bigwig signal.bw
When to use: Before starting any analysis, or when troubleshooting errors.
验证deepTools分析所需的BAM、bigWig和BED文件。检查文件是否存在、索引是否齐全及格式是否正确。
用法:
bash
python scripts/validate_files.py --bam sample1.bam sample2.bam \
    --bed peaks.bed --bigwig signal.bw
适用场景: 开始任何分析之前,或排查错误时

scripts/workflow_generator.py

scripts/workflow_generator.py

Generates customizable bash script templates for common deepTools workflows.
Available workflows:
  • chipseq_qc
    : ChIP-seq quality control
  • chipseq_analysis
    : Complete ChIP-seq analysis
  • rnaseq_coverage
    : Strand-specific RNA-seq coverage
  • atacseq
    : ATAC-seq with Tn5 correction
Usage:
bash
undefined
为常见deepTools工作流生成可自定义的bash脚本模板。
可用工作流:
  • chipseq_qc
    :ChIP-seq质量控制
  • chipseq_analysis
    :完整ChIP-seq分析
  • rnaseq_coverage
    :链特异性RNA-seq覆盖度分析
  • atacseq
    :带Tn5校正的ATAC-seq分析
用法:
bash
undefined

List workflows

列出可用工作流

python scripts/workflow_generator.py --list
python scripts/workflow_generator.py --list

Generate workflow

生成工作流

python scripts/workflow_generator.py chipseq_qc -o qc.sh
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398 --threads 8
python scripts/workflow_generator.py chipseq_qc -o qc.sh
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398 --threads 8

Run generated workflow

运行生成的工作流

chmod +x qc.sh ./qc.sh

**When to use:** Users request standard workflows or need template scripts to customize.
chmod +x qc.sh ./qc.sh

**适用场景:** 用户需要标准工作流或可自定义的模板脚本时

Assets

资源文件

assets/quick_reference.md

assets/quick_reference.md

Quick reference card with most common commands, effective genome sizes, and typical workflow pattern.
When to use: Users need quick command examples without detailed documentation.
快速参考卡片,包含最常用命令、有效基因组大小及典型工作流模式。
适用场景: 用户需要快速命令示例而无需详细文档时

Handling User Requests

用户请求处理指南

For New Users

新用户

  1. Start with installation verification
  2. Validate input files using
    scripts/validate_files.py
  3. Recommend appropriate workflow based on experiment type
  4. Generate workflow template using
    scripts/workflow_generator.py
  5. Guide through customization and execution
  1. 先验证安装是否成功
  2. 使用
    scripts/validate_files.py
    验证输入文件
  3. 根据实验类型推荐合适的工作流
  4. 使用
    scripts/workflow_generator.py
    生成工作流模板
  5. 指导用户进行自定义和执行

For Experienced Users

资深用户

  1. Provide specific tool commands for requested operations
  2. Reference appropriate sections in
    references/tools_reference.md
  3. Suggest optimizations and best practices
  4. Offer troubleshooting for issues
  1. 为用户请求的操作提供特定工具命令
  2. 引导用户参考
    references/tools_reference.md
    中的对应章节
  3. 建议优化方案与最佳实践
  4. 提供问题排查支持

For Specific Tasks

特定任务处理

"Convert BAM to bigWig":
  • Use bamCoverage with appropriate normalization
  • Recommend RPGC or CPM based on use case
  • Provide effective genome size for organism
  • Suggest relevant parameters (extendReads, ignoreDuplicates, binSize)
"Check ChIP quality":
  • Run full QC workflow or use plotFingerprint specifically
  • Explain interpretation of results
  • Suggest follow-up actions based on results
"Create heatmap":
  • Guide through two-step process: computeMatrix → plotHeatmap
  • Help choose appropriate matrix mode (reference-point vs scale-regions)
  • Suggest visualization parameters and clustering options
"Compare samples":
  • Recommend bamCompare for two-sample comparison
  • Suggest multiBamSummary + plotCorrelation for multiple samples
  • Guide normalization method selection
「将BAM转为bigWig」:
  • 使用bamCoverage并选择合适的标准化方法
  • 根据使用场景推荐RPGC或CPM
  • 提供对应物种的有效基因组大小
  • 建议相关参数(extendReads、ignoreDuplicates、binSize)
「检查ChIP质量」:
  • 运行完整QC工作流或单独使用plotFingerprint
  • 解释结果的解读方式
  • 根据结果建议后续操作
「创建热图」:
  • 引导用户完成两步流程:computeMatrix → plotHeatmap
  • 帮助选择合适的矩阵模式(reference-point vs scale-regions)
  • 建议可视化参数与聚类选项
  • 推荐同时生成图谱作为补充

Referencing Documentation

关键提醒

When users need detailed information:
  • Tool details: Direct to specific sections in
    references/tools_reference.md
  • Workflows: Use
    references/workflows.md
    for complete analysis pipelines
  • Normalization: Consult
    references/normalization_methods.md
    for method selection
  • Genome sizes: Reference
    references/effective_genome_sizes.md
Search references using grep patterns:
bash
undefined
  • 先验证文件:分析前务必验证输入文件
  • 标准化很重要:根据比较类型选择合适的方法
  • 谨慎使用读段延伸:ChIP-seq可用,RNA-seq禁用
  • 使用所有核心:将
    --numberOfProcessors
    设置为可用核心数
  • 在小区域测试:使用
    --region
    进行参数测试
  • 先做QC:在详细分析前先运行质量控制
  • 记录所有操作:保存命令以保证可重复性
  • 参考文档:使用全面的参考文档获取详细指导

Find tool documentation

grep -A 20 "^### toolname" references/tools_reference.md

Find workflow

grep -A 50 "^## Workflow Name" references/workflows.md

Find normalization method

grep -A 15 "^### Method Name" references/normalization_methods.md
undefined

Example Interactions

User: "I need to analyze my ChIP-seq data"
Response approach:
  1. Ask about files available (BAM files, peaks, genes)
  2. Validate files using validation script
  3. Generate chipseq_analysis workflow template
  4. Customize for their specific files and organism
  5. Explain each step as script runs
User: "Which normalization should I use?"
Response approach:
  1. Ask about experiment type (ChIP-seq, RNA-seq, etc.)
  2. Ask about comparison goal (within-sample or between-sample)
  3. Consult
    references/normalization_methods.md
    selection guide
  4. Recommend appropriate method with justification
  5. Provide command example with parameters
User: "Create a heatmap around TSS"
Response approach:
  1. Verify bigWig and gene BED files available
  2. Use computeMatrix with reference-point mode at TSS
  3. Generate plotHeatmap with appropriate visualization parameters
  4. Suggest clustering if dataset is large
  5. Offer profile plot as complement

Key Reminders

  • File validation first: Always validate input files before analysis
  • Normalization matters: Choose appropriate method for comparison type
  • Extend reads carefully: YES for ChIP-seq, NO for RNA-seq
  • Use all cores: Set
    --numberOfProcessors
    to available cores
  • Test on regions: Use
    --region
    for parameter testing
  • Check QC first: Run quality control before detailed analysis
  • Document everything: Save commands for reproducibility
  • Reference documentation: Use comprehensive references for detailed guidance