deeptools
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesedeepTools: NGS Data Analysis Toolkit
deepTools:NGS数据分析工具包
Overview
概述
deepTools is a comprehensive suite of Python command-line tools designed for processing and analyzing high-throughput sequencing data. Use deepTools to perform quality control, normalize data, compare samples, and generate publication-quality visualizations for ChIP-seq, RNA-seq, ATAC-seq, MNase-seq, and other NGS experiments.
Core capabilities:
- Convert BAM alignments to normalized coverage tracks (bigWig/bedGraph)
- Quality control assessment (fingerprint, correlation, coverage)
- Sample comparison and correlation analysis
- Heatmap and profile plot generation around genomic features
- Enrichment analysis and peak region visualization
deepTools是一套全面的Python命令行工具套件,专为处理和分析高通量测序数据而设计。使用deepTools可进行质量控制、数据标准化、样本比较,并为ChIP-seq、RNA-seq、ATAC-seq、MNase-seq及其他NGS实验生成可用于发表的可视化结果。
核心功能:
- 将BAM比对文件转换为标准化覆盖度轨迹文件(bigWig/bedGraph)
- 质量控制评估(指纹分析、相关性分析、覆盖度分析)
- 样本比较与相关性分析
- 围绕基因组特征生成热图和图谱
- 富集分析与峰区可视化
When to Use This Skill
适用场景
This skill should be used when:
- File conversion: "Convert BAM to bigWig", "generate coverage tracks", "normalize ChIP-seq data"
- Quality control: "check ChIP quality", "compare replicates", "assess sequencing depth", "QC analysis"
- Visualization: "create heatmap around TSS", "plot ChIP signal", "visualize enrichment", "generate profile plot"
- Sample comparison: "compare treatment vs control", "correlate samples", "PCA analysis"
- Analysis workflows: "analyze ChIP-seq data", "RNA-seq coverage", "ATAC-seq analysis", "complete workflow"
- Working with specific file types: BAM files, bigWig files, BED region files in genomics context
本工具适用于以下场景:
- 格式转换:「将BAM转为bigWig」、「生成覆盖度轨迹」、「标准化ChIP-seq数据」
- 质量控制:「检查ChIP质量」、「比较重复样本」、「评估测序深度」、「QC分析」
- 可视化:「围绕TSS创建热图」、「绘制ChIP信号图」、「可视化富集情况」、「生成图谱」
- 样本比较:「比较处理组与对照组」、「样本相关性分析」、「PCA分析」
- 分析工作流:「分析ChIP-seq数据」、「RNA-seq覆盖度分析」、「ATAC-seq分析」、「完整工作流执行」
- 特定文件类型处理:基因组研究场景下的BAM文件、bigWig文件、BED区域文件
Quick Start
快速入门
For users new to deepTools, start with file validation and common workflows:
对于deepTools新用户,可从文件验证和常见工作流开始:
1. Validate Input Files
1. 验证输入文件
Before running any analysis, validate BAM, bigWig, and BED files using the validation script:
bash
python scripts/validate_files.py --bam sample1.bam sample2.bam --bed regions.bedThis checks file existence, BAM indices, and format correctness.
在运行任何分析之前,使用验证脚本检查BAM、bigWig和BED文件:
bash
python scripts/validate_files.py --bam sample1.bam sample2.bam --bed regions.bed该脚本会检查文件是否存在、BAM索引是否齐全以及格式是否正确。
2. Generate Workflow Template
2. 生成工作流模板
For standard analyses, use the workflow generator to create customized scripts:
bash
undefined对于标准分析,可使用工作流生成器创建自定义脚本:
bash
undefinedList available workflows
列出可用工作流
python scripts/workflow_generator.py --list
python scripts/workflow_generator.py --list
Generate ChIP-seq QC workflow
生成ChIP-seq QC工作流
python scripts/workflow_generator.py chipseq_qc -o qc_workflow.sh
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398
python scripts/workflow_generator.py chipseq_qc -o qc_workflow.sh
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398
Make executable and run
赋予执行权限并运行
chmod +x qc_workflow.sh
./qc_workflow.sh
undefinedchmod +x qc_workflow.sh
./qc_workflow.sh
undefined3. Most Common Operations
3. 最常用操作
See for frequently used commands and parameters.
assets/quick_reference.md常用命令和参数请参考。
assets/quick_reference.mdInstallation
安装
bash
uv pip install deeptoolsbash
uv pip install deeptoolsCore Workflows
核心工作流
deepTools workflows typically follow this pattern: QC → Normalization → Comparison/Visualization
deepTools工作流通常遵循以下流程:质量控制 → 数据标准化 → 比较/可视化
ChIP-seq Quality Control Workflow
ChIP-seq质量控制工作流
When users request ChIP-seq QC or quality assessment:
- Generate workflow script using
scripts/workflow_generator.py chipseq_qc - Key QC steps:
- Sample correlation (multiBamSummary + plotCorrelation)
- PCA analysis (plotPCA)
- Coverage assessment (plotCoverage)
- Fragment size validation (bamPEFragmentSize)
- ChIP enrichment strength (plotFingerprint)
Interpreting results:
- Correlation: Replicates should cluster together with high correlation (>0.9)
- Fingerprint: Strong ChIP shows steep rise; flat diagonal indicates poor enrichment
- Coverage: Assess if sequencing depth is adequate for analysis
Full workflow details in → "ChIP-seq Quality Control Workflow"
references/workflows.md当用户需要ChIP-seq质量控制或评估时:
- 使用生成工作流脚本
scripts/workflow_generator.py chipseq_qc - 关键QC步骤:
- 样本相关性分析(multiBamSummary + plotCorrelation)
- PCA分析(plotPCA)
- 覆盖度评估(plotCoverage)
- 片段长度验证(bamPEFragmentSize)
- ChIP富集强度分析(plotFingerprint)
结果解读:
- 相关性:重复样本应聚类在一起,且相关性高于0.9
- 指纹图:强ChIP信号会呈现陡峭上升趋势;平坦的对角线表示富集效果差
- 覆盖度:评估测序深度是否满足分析需求
完整工作流详情请参考 → 「ChIP-seq质量控制工作流」
references/workflows.mdChIP-seq Complete Analysis Workflow
ChIP-seq完整分析工作流
For full ChIP-seq analysis from BAM to visualizations:
- Generate coverage tracks with normalization (bamCoverage)
- Create comparison tracks (bamCompare for log2 ratio)
- Compute signal matrices around features (computeMatrix)
- Generate visualizations (plotHeatmap, plotProfile)
- Enrichment analysis at peaks (plotEnrichment)
Use to generate template.
scripts/workflow_generator.py chipseq_analysisComplete command sequences in → "ChIP-seq Analysis Workflow"
references/workflows.md从BAM文件到可视化结果的完整ChIP-seq分析流程:
- 生成带标准化的覆盖度轨迹(使用bamCoverage)
- 创建样本比较轨迹(使用bamCompare计算log2比值)
- 计算特征区域的信号矩阵(使用computeMatrix)
- 生成可视化结果(plotHeatmap、plotProfile)
- 峰区富集分析(plotEnrichment)
可使用生成模板。
scripts/workflow_generator.py chipseq_analysis完整命令序列请参考 → 「ChIP-seq分析工作流」
references/workflows.mdRNA-seq Coverage Workflow
RNA-seq覆盖度工作流
For strand-specific RNA-seq coverage tracks:
Use bamCoverage with to separate forward and reverse strands.
--filterRNAstrandImportant: NEVER use for RNA-seq (would extend over splice junctions).
--extendReadsUse normalization: CPM for fixed bins, RPKM for gene-level analysis.
Template available:
scripts/workflow_generator.py rnaseq_coverageDetails in → "RNA-seq Coverage Workflow"
references/workflows.md针对链特异性RNA-seq覆盖度轨迹:
使用bamCoverage并添加参数分离正链和负链。
--filterRNAstrand重要提示: RNA-seq分析绝不允许使用参数(该参数会延伸读段跨越剪接位点)。
--extendReads标准化方法选择:固定bin使用CPM,基因水平分析使用RPKM。
模板生成命令:
scripts/workflow_generator.py rnaseq_coverage详情请参考 → 「RNA-seq覆盖度工作流」
references/workflows.mdATAC-seq Analysis Workflow
ATAC-seq分析工作流
ATAC-seq requires Tn5 offset correction:
- Shift reads using alignmentSieve with
--ATACshift - Generate coverage with bamCoverage
- Analyze fragment sizes (expect nucleosome ladder pattern)
- Visualize at peaks if available
Template:
scripts/workflow_generator.py atacseqFull workflow in → "ATAC-seq Workflow"
references/workflows.mdATAC-seq分析需要进行Tn5偏移校正:
- 使用alignmentSieve并添加参数偏移读段
--ATACshift - 使用bamCoverage生成覆盖度轨迹
- 分析片段长度(预期会出现核小体梯状模式)
- 若有峰区数据则进行可视化
模板生成命令:
scripts/workflow_generator.py atacseq完整工作流请参考 → 「ATAC-seq工作流」
references/workflows.mdTool Categories and Common Tasks
工具分类与常见任务
BAM/bigWig Processing
BAM/bigWig处理
Convert BAM to normalized coverage:
bash
bamCoverage --bam input.bam --outFileName output.bw \
--normalizeUsing RPGC --effectiveGenomeSize 2913022398 \
--binSize 10 --numberOfProcessors 8Compare two samples (log2 ratio):
bash
bamCompare -b1 treatment.bam -b2 control.bam -o ratio.bw \
--operation log2 --scaleFactorsMethod readCountKey tools: bamCoverage, bamCompare, multiBamSummary, multiBigwigSummary, correctGCBias, alignmentSieve
Complete reference: → "BAM and bigWig File Processing Tools"
references/tools_reference.md将BAM转换为标准化覆盖度轨迹:
bash
bamCoverage --bam input.bam --outFileName output.bw \
--normalizeUsing RPGC --effectiveGenomeSize 2913022398 \
--binSize 10 --numberOfProcessors 8比较两个样本(log2比值):
bash
bamCompare -b1 treatment.bam -b2 control.bam -o ratio.bw \
--operation log2 --scaleFactorsMethod readCount核心工具: bamCoverage、bamCompare、multiBamSummary、multiBigwigSummary、correctGCBias、alignmentSieve
完整参考文档: → 「BAM和bigWig文件处理工具」
references/tools_reference.mdQuality Control
质量控制
Check ChIP enrichment:
bash
plotFingerprint -b input.bam chip.bam -o fingerprint.png \
--extendReads 200 --ignoreDuplicatesSample correlation:
bash
multiBamSummary bins --bamfiles *.bam -o counts.npz
plotCorrelation -in counts.npz --corMethod pearson \
--whatToShow heatmap -o correlation.pngKey tools: plotFingerprint, plotCoverage, plotCorrelation, plotPCA, bamPEFragmentSize
Complete reference: → "Quality Control Tools"
references/tools_reference.md检查ChIP富集效果:
bash
plotFingerprint -b input.bam chip.bam -o fingerprint.png \
--extendReads 200 --ignoreDuplicates样本相关性分析:
bash
multiBamSummary bins --bamfiles *.bam -o counts.npz
plotCorrelation -in counts.npz --corMethod pearson \
--whatToShow heatmap -o correlation.png核心工具: plotFingerprint、plotCoverage、plotCorrelation、plotPCA、bamPEFragmentSize
完整参考文档: → 「质量控制工具」
references/tools_reference.mdVisualization
可视化
Create heatmap around TSS:
bash
undefined围绕TSS创建热图:
bash
undefinedCompute matrix
计算矩阵
computeMatrix reference-point -S signal.bw -R genes.bed
-b 3000 -a 3000 --referencePoint TSS -o matrix.gz
-b 3000 -a 3000 --referencePoint TSS -o matrix.gz
computeMatrix reference-point -S signal.bw -R genes.bed
-b 3000 -a 3000 --referencePoint TSS -o matrix.gz
-b 3000 -a 3000 --referencePoint TSS -o matrix.gz
Generate heatmap
生成热图
plotHeatmap -m matrix.gz -o heatmap.png
--colorMap RdBu --kmeans 3
--colorMap RdBu --kmeans 3
**Create profile plot:**
```bash
plotProfile -m matrix.gz -o profile.png \
--plotType lines --colors blue redKey tools: computeMatrix, plotHeatmap, plotProfile, plotEnrichment
Complete reference: → "Visualization Tools"
references/tools_reference.mdplotHeatmap -m matrix.gz -o heatmap.png
--colorMap RdBu --kmeans 3
--colorMap RdBu --kmeans 3
**生成图谱:**
```bash
plotProfile -m matrix.gz -o profile.png \
--plotType lines --colors blue red核心工具: computeMatrix、plotHeatmap、plotProfile、plotEnrichment
完整参考文档: → 「可视化工具」
references/tools_reference.mdNormalization Methods
标准化方法
Choosing the correct normalization is critical for valid comparisons. Consult for comprehensive guidance.
references/normalization_methods.mdQuick selection guide:
- ChIP-seq coverage: Use RPGC or CPM
- ChIP-seq comparison: Use bamCompare with log2 and readCount
- RNA-seq bins: Use CPM
- RNA-seq genes: Use RPKM (accounts for gene length)
- ATAC-seq: Use RPGC or CPM
Normalization methods:
- RPGC: 1× genome coverage (requires --effectiveGenomeSize)
- CPM: Counts per million mapped reads
- RPKM: Reads per kb per million (accounts for region length)
- BPM: Bins per million
- None: Raw counts (not recommended for comparisons)
Full explanation:
references/normalization_methods.md选择正确的标准化方法对于有效比较至关重要。请参考获取全面指导。
references/normalization_methods.md快速选择指南:
- ChIP-seq覆盖度:使用RPGC或CPM
- ChIP-seq样本比较:使用bamCompare并选择log2和readCount方法
- RNA-seq bin分析:使用CPM
- RNA-seq基因分析:使用RPKM(考虑基因长度)
- ATAC-seq:使用RPGC或CPM
标准化方法说明:
- RPGC:1×基因组覆盖度(需指定--effectiveGenomeSize参数)
- CPM:每百万比对读段计数
- RPKM:每千碱基每百万读段(考虑区域长度)
- BPM:每百万bin计数
- 无标准化:原始计数(不推荐用于样本比较)
完整说明请参考:
references/normalization_methods.mdEffective Genome Sizes
有效基因组大小
RPGC normalization requires effective genome size. Common values:
| Organism | Assembly | Size | Usage |
|---|---|---|---|
| Human | GRCh38/hg38 | 2,913,022,398 | |
| Mouse | GRCm38/mm10 | 2,652,783,500 | |
| Zebrafish | GRCz11 | 1,368,780,147 | |
| Drosophila | dm6 | 142,573,017 | |
| C. elegans | ce10/ce11 | 100,286,401 | |
Complete table with read-length-specific values:
references/effective_genome_sizes.mdRPGC标准化需要有效基因组大小。常见物种的数值如下:
| 物种 | 组装版本 | 大小 | 使用方式 |
|---|---|---|---|
| 人类 | GRCh38/hg38 | 2,913,022,398 | |
| 小鼠 | GRCm38/mm10 | 2,652,783,500 | |
| 斑马鱼 | GRCz11 | 1,368,780,147 | |
| 果蝇 | dm6 | 142,573,017 | |
| 秀丽隐杆线虫 | ce10/ce11 | 100,286,401 | |
包含读长特异性数值的完整表格请参考:
references/effective_genome_sizes.mdCommon Parameters Across Tools
工具通用参数
Many deepTools commands share these options:
Performance:
- : Enable parallel processing (always use available cores)
--numberOfProcessors, -p - : Process specific regions for testing (e.g.,
--region)chr1:1-1000000
Read Filtering:
- : Remove PCR duplicates (recommended for most analyses)
--ignoreDuplicates - : Filter by alignment quality (e.g.,
--minMappingQuality)--minMappingQuality 10 - /
--minFragmentLength: Fragment length bounds--maxFragmentLength - /
--samFlagInclude: SAM flag filtering--samFlagExclude
Read Processing:
- : Extend to fragment length (ChIP-seq: YES, RNA-seq: NO)
--extendReads - : Center at fragment midpoint for sharper signals
--centerReads
许多deepTools命令共享以下选项:
性能优化:
- :启用并行处理(建议使用所有可用核心)
--numberOfProcessors, -p - :仅处理特定区域用于测试(例如:
--region)chr1:1-1000000
读段过滤:
- :去除PCR重复(大多数分析推荐使用)
--ignoreDuplicates - :根据比对质量过滤读段(例如:
--minMappingQuality)--minMappingQuality 10 - /
--minFragmentLength:片段长度范围--maxFragmentLength - /
--samFlagInclude:根据SAM标签过滤--samFlagExclude
读段处理:
- :将读段延伸至片段长度(ChIP-seq:是,RNA-seq:否)
--extendReads - :将读段居中于片段中点以获得更清晰的信号
--centerReads
Best Practices
最佳实践
File Validation
文件验证
Always validate files first using to check:
scripts/validate_files.py- File existence and readability
- BAM indices present (.bai files)
- BED format correctness
- File sizes reasonable
始终先验证文件,使用检查:
scripts/validate_files.py- 文件是否存在且可读
- BAM索引文件(.bai)是否存在
- BED格式是否正确
- 文件大小是否合理
Analysis Strategy
分析策略
- Start with QC: Run correlation, coverage, and fingerprint analysis before proceeding
- Test on small regions: Use for parameter testing
--region chr1:1-10000000 - Document commands: Save full command lines for reproducibility
- Use consistent normalization: Apply same method across samples in comparisons
- Verify genome assembly: Ensure BAM and BED files use matching genome builds
- 从QC开始:在进行后续分析前先运行相关性、覆盖度和指纹分析
- 在小区域测试:使用进行参数测试
--region chr1:1-10000000 - 记录命令:保存完整命令行以保证可重复性
- 使用一致的标准化方法:在样本比较中对所有样本应用相同的标准化方法
- 验证基因组组装版本:确保BAM和BED文件使用匹配的基因组版本
ChIP-seq Specific
ChIP-seq特定注意事项
- Always extend reads for ChIP-seq:
--extendReads 200 - Remove duplicates: Use in most cases
--ignoreDuplicates - Check enrichment first: Run plotFingerprint before detailed analysis
- GC correction: Only apply if significant bias detected; never use after GC correction
--ignoreDuplicates
- ChIP-seq必须延伸读段:使用
--extendReads 200 - 去除重复读段:大多数情况下使用
--ignoreDuplicates - 先检查富集效果:在进行详细分析前先运行plotFingerprint
- GC校正:仅在检测到显著偏差时应用;GC校正后绝不能使用
--ignoreDuplicates
RNA-seq Specific
RNA-seq特定注意事项
- Never extend reads for RNA-seq (would span splice junctions)
- Strand-specific: Use for stranded libraries
--filterRNAstrand forward/reverse - Normalization: CPM for bins, RPKM for genes
- RNA-seq绝不能延伸读段(会跨越剪接位点)
- 链特异性数据:对链特异性文库使用
--filterRNAstrand forward/reverse - 标准化方法:bin分析使用CPM,基因分析使用RPKM
ATAC-seq Specific
ATAC-seq特定注意事项
- Apply Tn5 correction: Use alignmentSieve with
--ATACshift - Fragment filtering: Set appropriate min/max fragment lengths
- Check nucleosome pattern: Fragment size plot should show ladder pattern
- 应用Tn5校正:使用alignmentSieve并添加参数
--ATACshift - 片段过滤:设置合适的最小/最大片段长度
- 检查核小体模式:片段长度图应呈现梯状模式
Performance Optimization
性能优化
- Use multiple processors: (or available cores)
--numberOfProcessors 8 - Increase bin size for faster processing and smaller files
- Process chromosomes separately for memory-limited systems
- Pre-filter BAM files using alignmentSieve to create reusable filtered files
- Use bigWig over bedGraph: Compressed and faster to process
- 使用多核心:(或所有可用核心)
--numberOfProcessors 8 - 增大bin大小:加快处理速度并减小文件体积
- 分染色体处理:适用于内存有限的系统
- 预过滤BAM文件:使用alignmentSieve创建可重复使用的过滤后文件
- 优先使用bigWig而非bedGraph:压缩格式且处理速度更快
Troubleshooting
故障排除
Common Issues
常见问题
BAM index missing:
bash
samtools index input.bamOut of memory:
Process chromosomes individually using :
--regionbash
bamCoverage --bam input.bam -o chr1.bw --region chr1Slow processing:
Increase and/or increase
--numberOfProcessors--binSizebigWig files too large:
Increase bin size: or larger
--binSize 50BAM索引缺失:
bash
samtools index input.bam内存不足:
使用参数单独处理各染色体:
--regionbash
bamCoverage --bam input.bam -o chr1.bw --region chr1处理速度慢:
增大参数值,或增大
--numberOfProcessors--binSizebigWig文件过大:
增大bin大小:或更大
--binSize 50Validation Errors
验证错误
Run validation script to identify issues:
bash
python scripts/validate_files.py --bam *.bam --bed regions.bedCommon errors and solutions explained in script output.
运行验证脚本识别问题:
bash
python scripts/validate_files.py --bam *.bam --bed regions.bed脚本输出中会解释常见错误及解决方案。
Reference Documentation
参考文档
This skill includes comprehensive reference documentation:
本工具包含全面的参考文档:
references/tools_reference.md
references/tools_reference.md
Complete documentation of all deepTools commands organized by category:
- BAM and bigWig processing tools (9 tools)
- Quality control tools (6 tools)
- Visualization tools (3 tools)
- Miscellaneous tools (2 tools)
Each tool includes:
- Purpose and overview
- Key parameters with explanations
- Usage examples
- Important notes and best practices
Use this reference when: Users ask about specific tools, parameters, or detailed usage.
按分类整理的所有deepTools命令完整文档:
- BAM和bigWig文件处理工具(9个)
- 质量控制工具(6个)
- 可视化工具(3个)
- 其他工具(2个)
每个工具包含:
- 用途与概述
- 关键参数说明
- 使用示例
- 重要注意事项与最佳实践
适用场景: 用户询问特定工具、参数或详细用法时
references/workflows.md
references/workflows.md
Complete workflow examples for common analyses:
- ChIP-seq quality control workflow
- ChIP-seq complete analysis workflow
- RNA-seq coverage workflow
- ATAC-seq analysis workflow
- Multi-sample comparison workflow
- Peak region analysis workflow
- Troubleshooting and performance tips
Use this reference when: Users need complete analysis pipelines or workflow examples.
常见deepTools工作流的完整示例:
- ChIP-seq质量控制工作流
- ChIP-seq完整分析工作流
- RNA-seq覆盖度工作流
- ATAC-seq分析工作流
- 多样本比较工作流
- 峰区分析工作流
- 故障排除与性能优化技巧
适用场景: 用户需要完整分析流程或工作流示例时
references/normalization_methods.md
references/normalization_methods.md
Comprehensive guide to normalization methods:
- Detailed explanation of each method (RPGC, CPM, RPKM, BPM, etc.)
- When to use each method
- Formulas and interpretation
- Selection guide by experiment type
- Common pitfalls and solutions
- Quick reference table
Use this reference when: Users ask about normalization, comparing samples, or which method to use.
标准化方法全面指南:
- 每种方法的详细说明(RPGC、CPM、RPKM、BPM等)
- 各方法的适用场景
- 计算公式与结果解读
- 按实验类型分类的选择指南
- 常见误区与解决方案
- 快速参考表格
适用场景: 用户询问标准化方法、样本比较或方法选择时
references/effective_genome_sizes.md
references/effective_genome_sizes.md
Effective genome size values and usage:
- Common organism values (human, mouse, fly, worm, zebrafish)
- Read-length-specific values
- Calculation methods
- When and how to use in commands
- Custom genome calculation instructions
Use this reference when: Users need genome size for RPGC normalization or GC bias correction.
有效基因组大小数值与使用说明:
- 常见物种数值(人类、小鼠、果蝇、线虫、斑马鱼)
- 读长特异性数值
- 计算方法
- 在命令中的使用时机与方式
- 自定义基因组的计算说明
适用场景: 用户需要RPGC标准化或GC偏差校正的基因组大小时
Helper Scripts
辅助脚本
scripts/validate_files.py
scripts/validate_files.py
Validates BAM, bigWig, and BED files for deepTools analysis. Checks file existence, indices, and format.
Usage:
bash
python scripts/validate_files.py --bam sample1.bam sample2.bam \
--bed peaks.bed --bigwig signal.bwWhen to use: Before starting any analysis, or when troubleshooting errors.
验证deepTools分析所需的BAM、bigWig和BED文件。检查文件是否存在、索引是否齐全及格式是否正确。
用法:
bash
python scripts/validate_files.py --bam sample1.bam sample2.bam \
--bed peaks.bed --bigwig signal.bw适用场景: 开始任何分析之前,或排查错误时
scripts/workflow_generator.py
scripts/workflow_generator.py
Generates customizable bash script templates for common deepTools workflows.
Available workflows:
- : ChIP-seq quality control
chipseq_qc - : Complete ChIP-seq analysis
chipseq_analysis - : Strand-specific RNA-seq coverage
rnaseq_coverage - : ATAC-seq with Tn5 correction
atacseq
Usage:
bash
undefined为常见deepTools工作流生成可自定义的bash脚本模板。
可用工作流:
- :ChIP-seq质量控制
chipseq_qc - :完整ChIP-seq分析
chipseq_analysis - :链特异性RNA-seq覆盖度分析
rnaseq_coverage - :带Tn5校正的ATAC-seq分析
atacseq
用法:
bash
undefinedList workflows
列出可用工作流
python scripts/workflow_generator.py --list
python scripts/workflow_generator.py --list
Generate workflow
生成工作流
python scripts/workflow_generator.py chipseq_qc -o qc.sh
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398 --threads 8
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398 --threads 8
python scripts/workflow_generator.py chipseq_qc -o qc.sh
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398 --threads 8
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398 --threads 8
Run generated workflow
运行生成的工作流
chmod +x qc.sh
./qc.sh
**When to use:** Users request standard workflows or need template scripts to customize.chmod +x qc.sh
./qc.sh
**适用场景:** 用户需要标准工作流或可自定义的模板脚本时Assets
资源文件
assets/quick_reference.md
assets/quick_reference.md
Quick reference card with most common commands, effective genome sizes, and typical workflow pattern.
When to use: Users need quick command examples without detailed documentation.
快速参考卡片,包含最常用命令、有效基因组大小及典型工作流模式。
适用场景: 用户需要快速命令示例而无需详细文档时
Handling User Requests
用户请求处理指南
For New Users
新用户
- Start with installation verification
- Validate input files using
scripts/validate_files.py - Recommend appropriate workflow based on experiment type
- Generate workflow template using
scripts/workflow_generator.py - Guide through customization and execution
- 先验证安装是否成功
- 使用验证输入文件
scripts/validate_files.py - 根据实验类型推荐合适的工作流
- 使用生成工作流模板
scripts/workflow_generator.py - 指导用户进行自定义和执行
For Experienced Users
资深用户
- Provide specific tool commands for requested operations
- Reference appropriate sections in
references/tools_reference.md - Suggest optimizations and best practices
- Offer troubleshooting for issues
- 为用户请求的操作提供特定工具命令
- 引导用户参考中的对应章节
references/tools_reference.md - 建议优化方案与最佳实践
- 提供问题排查支持
For Specific Tasks
特定任务处理
"Convert BAM to bigWig":
- Use bamCoverage with appropriate normalization
- Recommend RPGC or CPM based on use case
- Provide effective genome size for organism
- Suggest relevant parameters (extendReads, ignoreDuplicates, binSize)
"Check ChIP quality":
- Run full QC workflow or use plotFingerprint specifically
- Explain interpretation of results
- Suggest follow-up actions based on results
"Create heatmap":
- Guide through two-step process: computeMatrix → plotHeatmap
- Help choose appropriate matrix mode (reference-point vs scale-regions)
- Suggest visualization parameters and clustering options
"Compare samples":
- Recommend bamCompare for two-sample comparison
- Suggest multiBamSummary + plotCorrelation for multiple samples
- Guide normalization method selection
「将BAM转为bigWig」:
- 使用bamCoverage并选择合适的标准化方法
- 根据使用场景推荐RPGC或CPM
- 提供对应物种的有效基因组大小
- 建议相关参数(extendReads、ignoreDuplicates、binSize)
「检查ChIP质量」:
- 运行完整QC工作流或单独使用plotFingerprint
- 解释结果的解读方式
- 根据结果建议后续操作
「创建热图」:
- 引导用户完成两步流程:computeMatrix → plotHeatmap
- 帮助选择合适的矩阵模式(reference-point vs scale-regions)
- 建议可视化参数与聚类选项
- 推荐同时生成图谱作为补充
Referencing Documentation
关键提醒
When users need detailed information:
- Tool details: Direct to specific sections in
references/tools_reference.md - Workflows: Use for complete analysis pipelines
references/workflows.md - Normalization: Consult for method selection
references/normalization_methods.md - Genome sizes: Reference
references/effective_genome_sizes.md
Search references using grep patterns:
bash
undefined- 先验证文件:分析前务必验证输入文件
- 标准化很重要:根据比较类型选择合适的方法
- 谨慎使用读段延伸:ChIP-seq可用,RNA-seq禁用
- 使用所有核心:将设置为可用核心数
--numberOfProcessors - 在小区域测试:使用进行参数测试
--region - 先做QC:在详细分析前先运行质量控制
- 记录所有操作:保存命令以保证可重复性
- 参考文档:使用全面的参考文档获取详细指导
Find tool documentation
—
grep -A 20 "^### toolname" references/tools_reference.md
—
Find workflow
—
grep -A 50 "^## Workflow Name" references/workflows.md
—
Find normalization method
—
grep -A 15 "^### Method Name" references/normalization_methods.md
undefined—
Example Interactions
—
User: "I need to analyze my ChIP-seq data"
Response approach:
- Ask about files available (BAM files, peaks, genes)
- Validate files using validation script
- Generate chipseq_analysis workflow template
- Customize for their specific files and organism
- Explain each step as script runs
User: "Which normalization should I use?"
Response approach:
- Ask about experiment type (ChIP-seq, RNA-seq, etc.)
- Ask about comparison goal (within-sample or between-sample)
- Consult selection guide
references/normalization_methods.md - Recommend appropriate method with justification
- Provide command example with parameters
User: "Create a heatmap around TSS"
Response approach:
- Verify bigWig and gene BED files available
- Use computeMatrix with reference-point mode at TSS
- Generate plotHeatmap with appropriate visualization parameters
- Suggest clustering if dataset is large
- Offer profile plot as complement
—
Key Reminders
—
- File validation first: Always validate input files before analysis
- Normalization matters: Choose appropriate method for comparison type
- Extend reads carefully: YES for ChIP-seq, NO for RNA-seq
- Use all cores: Set to available cores
--numberOfProcessors - Test on regions: Use for parameter testing
--region - Check QC first: Run quality control before detailed analysis
- Document everything: Save commands for reproducibility
- Reference documentation: Use comprehensive references for detailed guidance
—