bio-igv

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

IGV Integration

IGV 集成方案

Automated IGV (Integrative Genomics Viewer) snapshot generation for genomic regions with multiple BAM files. Designed for WGS/WES analysis visualization and quality control.
针对包含多个BAM文件的基因组区域,实现自动化IGV(Integrative Genomics Viewer,整合基因组浏览器)快照生成。适用于全基因组测序(WGS)/全外显子组测序(WES)分析的可视化与质量控制。

Quick Start

快速开始

Install

安装

Install IGV:
bash
undefined
安装IGV:
bash
undefined

macOS

macOS

brew install --cask igv
brew install --cask igv

Linux

Linux

Windows

Windows


Install Python dependencies:

```bash
uv pip install typer

安装Python依赖:

```bash
uv pip install typer

Basic Usage

基础用法

bash
undefined
bash
undefined

Single region with multiple BAM files

单个区域搭配多个BAM文件

python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample1.bam sample2.bam sample3.bam
--region chr1:1000-2000
--output-dir ./igv_snapshots
python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample1.bam sample2.bam sample3.bam
--region chr1:1000-2000
--output-dir ./igv_snapshots

Multiple regions from BED file

从BED文件读取多个区域

python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample1.bam sample2.bam
--bed regions.bed
--output-dir ./igv_snapshots
undefined
python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample1.bam sample2.bam
--bed regions.bed
--output-dir ./igv_snapshots
undefined

Scripts

脚本说明

generate_igv_snapshots.py - IGV Snapshot Generation

generate_igv_snapshots.py - IGV快照生成脚本

Generate PNG screenshots of genomic regions with multiple BAM tracks using IGV batch mode.
利用IGV批量模式生成带有多个BAM轨道的基因组区域PNG截图。

Required Arguments

必填参数

  • --genome TEXT
    - Genome assembly ID (e.g.,
    hg38
    ,
    hg19
    ,
    mm39
    )
  • --bam PATH
    - BAM file path(s). Can specify multiple files.
  • --region TEXT
    or
    --bed PATH
    - Either single region (e.g.,
    chr1:1000-2000
    ) or BED file with multiple regions
  • --output-dir PATH
    - Output directory for snapshots
  • --genome TEXT
    - 基因组组装版本ID(例如:
    hg38
    ,
    hg19
    ,
    mm39
  • --bam PATH
    - BAM文件路径,可指定多个文件
  • --region TEXT
    --bed PATH
    - 单个区域(格式如
    chr1:1000-2000
    )或包含多个区域的BED文件
  • --output-dir PATH
    - 快照输出目录

Optional Arguments

可选参数

IGV Configuration:
  • --igv-path TEXT
    - Path to IGV executable (default: auto-detect)
  • --max-panel-height INT
    - Maximum panel height in pixels (default: 500)
  • --java-heap TEXT
    - Java heap size (default:
    2g
    )
Output:
  • --save-batch-script
    - Save IGV batch script for debugging (default: False)
IGV配置:
  • --igv-path TEXT
    - IGV可执行文件路径(默认:自动检测)
  • --max-panel-height INT
    - 面板最大高度(单位:像素,默认:500)
  • --java-heap TEXT
    - Java堆内存大小(默认:
    2g
输出设置:
  • --save-batch-script
    - 保存IGV批量脚本用于调试(默认:关闭)

Output Format

输出格式

PNG Screenshots:
  • Named by region:
    chr1_1000-2000.png
  • Or by BED name field:
    region_name.png
Optional Batch Script (with
--save-batch-script
):
  • igv_batch_script.txt
    - IGV commands used for snapshot generation
PNG截图:
  • 按区域命名:
    chr1_1000-2000.png
  • 或按BED文件的名称字段命名:
    region_name.png
可选批量脚本(启用
--save-batch-script
后生成):
  • igv_batch_script.txt
    - 生成快照所用的IGV命令集合

Usage Examples

使用示例

bash
undefined
bash
undefined

Visualize single region with 3 BAM files

用3个BAM文件可视化单个区域

python scripts/generate_igv_snapshots.py
--genome hg38
--bam control.bam treatment1.bam treatment2.bam
--region chr1:1000-2000
--output-dir ./snapshots
python scripts/generate_igv_snapshots.py
--genome hg38
--bam control.bam treatment1.bam treatment2.bam
--region chr1:1000-2000
--output-dir ./snapshots

Visualize multiple regions from BED file

从BED文件可视化多个区域

python scripts/generate_igv_snapshots.py
--genome hg38
--bam alignment.bam
--bed regions_of_interest.bed
--output-dir ./snapshots
python scripts/generate_igv_snapshots.py
--genome hg38
--bam alignment.bam
--bed regions_of_interest.bed
--output-dir ./snapshots

Custom IGV settings with larger panel height

自定义IGV设置,增大面板高度

python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample.bam
--region chr1:1000-2000
--output-dir ./snapshots
--max-panel-height 800
--java-heap 4g
python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample.bam
--region chr1:1000-2000
--output-dir ./snapshots
--max-panel-height 800
--java-heap 4g

Save batch script for debugging

保存批量脚本用于调试

python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample.bam
--region chr1:1000-2000
--output-dir ./snapshots
--save-batch-script
undefined
python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample.bam
--region chr1:1000-2000
--output-dir ./snapshots
--save-batch-script
undefined

Workflow Examples

工作流示例

Example 1: Visualize BLAT Results

示例1:可视化BLAT比对结果

Visualize BLAT alignment results from BAM file:
bash
undefined
可视化BAM文件中的BLAT比对结果:
bash
undefined

Step 1: Extract BLAT hit regions to BED file

步骤1:提取BLAT比对区域到BED文件

(Assuming you have BLAT results in PSL or JSON format)

echo -e "chr1\t1000\t2000\tinsert1" > blat_hits.bed echo -e "chr2\t5000\t6000\tinsert2" >> blat_hits.bed
#(假设BLAT结果为PSL或JSON格式) echo -e "chr1\t1000\t2000\tinsert1" > blat_hits.bed echo -e "chr2\t5000\t6000\tinsert2" >> blat_hits.bed

Step 2: Generate IGV snapshots

步骤2:生成IGV快照

python scripts/generate_igv_snapshots.py
--genome hg38
--bam alignment.bam
--bed blat_hits.bed
--output-dir ./blat_visualizations
undefined
python scripts/generate_igv_snapshots.py
--genome hg38
--bam alignment.bam
--bed blat_hits.bed
--output-dir ./blat_visualizations
undefined

Example 2: Compare Multiple Samples

示例2:对比多个样本

Visualize the same regions across multiple BAM files:
bash
undefined
在多个BAM文件中可视化相同区域:
bash
undefined

Generate snapshots with all samples loaded

加载所有样本并生成快照

python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample1.bam sample2.bam sample3.bam
--bed candidate_regions.bed
--output-dir ./multi_sample_comparison
undefined
python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample1.bam sample2.bam sample3.bam
--bed candidate_regions.bed
--output-dir ./multi_sample_comparison
undefined

Example 3: Validate Variant Calls

示例3:验证变异检测结果

Visualize variant sites with BAM alignment:
bash
undefined
结合BAM比对可视化变异位点:
bash
undefined

Step 1: Extract variant positions to BED

步骤1:从VCF文件提取变异位置到BED

(From VCF file using vcf-toolkit or bcftools)

bcftools query -f '%CHROM\t%POS0\t%END\t%ID\n' variants.vcf > variant_sites.bed
#(使用vcf-toolkit或bcftools工具) bcftools query -f '%CHROM\t%POS0\t%END\t%ID\n' variants.vcf > variant_sites.bed

Step 2: Visualize with flanking regions

步骤2:可视化包含侧翼序列的区域

python scripts/generate_igv_snapshots.py
--genome hg38
--bam alignment.bam
--bed variant_sites.bed
--output-dir ./variant_validation
--max-panel-height 600
undefined
python scripts/generate_igv_snapshots.py
--genome hg38
--bam alignment.bam
--bed variant_sites.bed
--output-dir ./variant_validation
--max-panel-height 600
undefined

IGV Batch Script Format

IGV批量脚本格式

The script generates IGV batch commands in this format:
new
genome hg38
load /path/to/sample1.bam
load /path/to/sample2.bam
snapshotDirectory /path/to/output
maxPanelHeight 500
goto chr1:1000-2000
snapshot chr1_1000-2000.png
goto chr2:5000-6000
snapshot chr2_5000-6000.png
exit
本脚本生成的IGV批量命令格式如下:
new
genome hg38
load /path/to/sample1.bam
load /path/to/sample2.bam
snapshotDirectory /path/to/output
maxPanelHeight 500
goto chr1:1000-2000
snapshot chr1_1000-2000.png
goto chr2:5000-6000
snapshot chr2_5000-6000.png
exit

Error Handling

错误处理

IGV Not Found

未找到IGV

bash
$ python scripts/generate_igv_snapshots.py --genome hg38 --bam sample.bam --region chr1:1000-2000 --output-dir ./out

Error: IGV not found. Please install IGV or specify path with --igv-path.
Install: brew install --cask igv (macOS) or download from https://software.broadinstitute.org/software/igv/download
Solution: Install IGV or specify path:
bash
undefined
bash
$ python scripts/generate_igv_snapshots.py --genome hg38 --bam sample.bam --region chr1:1000-2000 --output-dir ./out

Error: IGV not found. Please install IGV or specify path with --igv-path.
Install: brew install --cask igv (macOS) or download from https://software.broadinstitute.org/software/igv/download
解决方案: 安装IGV或指定路径:
bash
undefined

Install IGV

安装IGV

brew install --cask igv
brew install --cask igv

Or specify custom path

或指定自定义路径

python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample.bam
--region chr1:1000-2000
--output-dir ./out
--igv-path /path/to/igv.sh
undefined
python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample.bam
--region chr1:1000-2000
--output-dir ./out
--igv-path /path/to/igv.sh
undefined

Invalid Region Format

无效的区域格式

bash
$ python scripts/generate_igv_snapshots.py --genome hg38 --bam sample.bam --region 1000-2000 --output-dir ./out

Error: Invalid region format: 1000-2000. Expected format: chr1:1000-2000
Solution: Use correct region format
chr:start-end
:
bash
python scripts/generate_igv_snapshots.py \
  --genome hg38 \
  --bam sample.bam \
  --region chr1:1000-2000 \
  --output-dir ./out
bash
$ python scripts/generate_igv_snapshots.py --genome hg38 --bam sample.bam --region 1000-2000 --output-dir ./out

Error: Invalid region format: 1000-2000. Expected format: chr1:1000-2000
解决方案: 使用正确的区域格式
chr:start-end
bash
python scripts/generate_igv_snapshots.py \
  --genome hg38 \
  --bam sample.bam \
  --region chr1:1000-2000 \
  --output-dir ./out

No Snapshots Generated

未生成任何快照

bash
$ python scripts/generate_igv_snapshots.py --genome hg38 --bam sample.bam --region chr1:1000-2000 --output-dir ./out

Snapshots generated successfully in: ./out
Total regions processed: 1

Warning: No snapshots found. Check IGV output for errors.
Possible causes:
  1. BAM file not indexed (create with
    samtools index
    )
  2. Region not present in BAM file
  3. IGV failed to start (check
    --save-batch-script
    for debugging)
Solution:
bash
undefined
bash
$ python scripts/generate_igv_snapshots.py --genome hg38 --bam sample.bam --region chr1:1000-2000 --output-dir ./out

Snapshots generated successfully in: ./out
Total regions processed: 1

Warning: No snapshots found. Check IGV output for errors.
可能原因:
  1. BAM文件未建立索引(使用
    samtools index
    创建)
  2. 指定区域不存在于BAM文件中
  3. IGV启动失败(使用
    --save-batch-script
    调试)
解决方案:
bash
undefined

Check BAM file is indexed

检查BAM文件是否已建立索引

samtools index sample.bam
samtools index sample.bam

Save batch script to debug IGV execution

保存批量脚本以调试IGV执行过程

python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample.bam
--region chr1:1000-2000
--output-dir ./out
--save-batch-script
python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample.bam
--region chr1:1000-2000
--output-dir ./out
--save-batch-script

Manually run batch script to see errors

手动运行批量脚本查看错误

igv.sh -b ./out/igv_batch_script.txt
undefined
igv.sh -b ./out/igv_batch_script.txt
undefined

Best Practices

最佳实践

1. Always Index BAM Files

1. 始终为BAM文件建立索引

Create BAM index before generating snapshots:
bash
undefined
生成快照前先创建BAM索引:
bash
undefined

✅ Good: Index BAM files first

✅ 推荐:先为BAM文件建立索引

samtools index sample1.bam samtools index sample2.bam
samtools index sample1.bam samtools index sample2.bam

Then generate snapshots

再生成快照

python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample1.bam sample2.bam
--region chr1:1000-2000
--output-dir ./snapshots
undefined
python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample1.bam sample2.bam
--region chr1:1000-2000
--output-dir ./snapshots
undefined

2. Use BED Files for Multiple Regions

2. 使用BED文件处理多个区域

For batch processing, use BED files instead of multiple script calls:
bash
undefined
批量处理时,优先使用BED文件而非多次调用脚本:
bash
undefined

✅ Good: Single call with BED file

✅ 推荐:单次调用搭配BED文件

python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample.bam
--bed regions.bed
--output-dir ./snapshots
python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample.bam
--bed regions.bed
--output-dir ./snapshots

❌ Bad: Multiple script calls

❌ 不推荐:多次调用脚本

python scripts/generate_igv_snapshots.py --bam sample.bam --region chr1:1000-2000 --output-dir ./out python scripts/generate_igv_snapshots.py --bam sample.bam --region chr2:5000-6000 --output-dir ./out
python scripts/generate_igv_snapshots.py --bam sample.bam --region chr1:1000-2000 --output-dir ./out python scripts/generate_igv_snapshots.py --bam sample.bam --region chr2:5000-6000 --output-dir ./out

... (inefficient, IGV starts/stops repeatedly)

...(效率低下,IGV需反复启动/关闭)

undefined
undefined

3. Adjust Panel Height for Readability

3. 调整面板高度提升可读性

Increase panel height when visualizing multiple BAM tracks:
bash
undefined
可视化多个BAM轨道时,增大面板高度:
bash
undefined

✅ Good: Larger panel for 5 BAM files

✅ 推荐:为5个BAM文件设置更大面板

python scripts/generate_igv_snapshots.py
--genome hg38
--bam s1.bam s2.bam s3.bam s4.bam s5.bam
--region chr1:1000-2000
--output-dir ./snapshots
--max-panel-height 800
undefined
python scripts/generate_igv_snapshots.py
--genome hg38
--bam s1.bam s2.bam s3.bam s4.bam s5.bam
--region chr1:1000-2000
--output-dir ./snapshots
--max-panel-height 800
undefined

4. Use Appropriate Genome Assembly

4. 使用匹配的基因组组装版本

Match genome assembly to BAM file reference:
bash
undefined
确保基因组组装版本与BAM文件的参考基因组一致:
bash
undefined

Check BAM header for reference genome

查看BAM文件头中的参考基因组

samtools view -H alignment.bam | grep @SQ
samtools view -H alignment.bam | grep @SQ

Use matching genome assembly

使用匹配的基因组组装版本

python scripts/generate_igv_snapshots.py
--genome hg38
--bam alignment.bam
--region chr1:1000-2000
--output-dir ./snapshots
undefined
python scripts/generate_igv_snapshots.py
--genome hg38
--bam alignment.bam
--region chr1:1000-2000
--output-dir ./snapshots
undefined

When to Use igv-integration vs Manual IGV

igv-integration vs 手动操作IGV的适用场景

Taskigv-integrationManual IGV
Visualize many regions (>10)✅ generate_igv_snapshots.py❌ Too tedious
Generate publication figures✅ Reproducible batch mode✅ Fine-tuned manual control
Compare multiple samples✅ Load all BAMs at once✅ Interactive comparison
Validate variant calls✅ Automate with BED file✅ Interactive inspection
Explore data interactively❌ Use manual IGV✅ Full GUI control
Recommended Workflow:
  1. Use igv-integration for batch snapshot generation
  2. Use manual IGV for interactive exploration and fine-tuning
  3. Combine both for comprehensive analysis
任务igv-integration 工具手动操作IGV
可视化大量区域(>10个)✅ 使用generate_igv_snapshots.py❌ 过于繁琐
生成用于发表的图片✅ 可复现的批量模式✅ 可精细调整的手动控制
比较多个样本✅ 一次性加载所有BAM文件✅ 交互式对比
验证变异检测结果✅ 结合BED文件自动化处理✅ 交互式检查
交互式探索数据❌ 请使用手动IGV✅ 完整GUI控制
推荐工作流:
  1. 使用igv-integration进行批量快照生成
  2. 使用手动IGV进行交互式探索与精细调整
  3. 结合两种方式完成全面分析

Related Skills

相关技能

  • bam-toolkit - BAM file analysis and read extraction
  • vcf-toolkit - VCF variant analysis
  • blat-api-searching - BLAT genome mapping to find regions for visualization
  • sequence-io - FASTA/FASTQ sequence operations
  • bam-toolkit - BAM文件分析与读取提取
  • vcf-toolkit - VCF变异分析
  • blat-api-searching - BLAT基因组比对,寻找待可视化区域
  • sequence-io - FASTA/FASTQ序列操作

Troubleshooting

故障排除

Java Heap Size Errors

Java堆内存不足错误

If IGV fails with memory errors, increase Java heap size:
bash
python scripts/generate_igv_snapshots.py \
  --genome hg38 \
  --bam large_file.bam \
  --region chr1:1000-2000 \
  --output-dir ./out \
  --java-heap 4g
若IGV因内存不足失败,增大Java堆内存:
bash
python scripts/generate_igv_snapshots.py \
  --genome hg38 \
  --bam large_file.bam \
  --region chr1:1000-2000 \
  --output-dir ./out \
  --java-heap 4g

Chromosome Name Mismatch

染色体名称不匹配

Ensure chromosome names match between BAM file and region specification:
bash
undefined
确保BAM文件与区域指定的染色体名称一致:
bash
undefined

Check chromosome names in BAM

查看BAM文件中的染色体名称

samtools idxstats sample.bam | cut -f1
samtools idxstats sample.bam | cut -f1

Use correct chromosome name

使用正确的染色体名称

If BAM uses "1" instead of "chr1":

若BAM文件使用"1"而非"chr1"

python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample.bam
--region 1:1000-2000
--output-dir ./out
undefined
python scripts/generate_igv_snapshots.py
--genome hg38
--bam sample.bam
--region 1:1000-2000
--output-dir ./out
undefined

BED File Format Errors

BED文件格式错误

Ensure BED file has at least 3 columns (chrom, start, end):
bash
undefined
确保BED文件至少包含3列(染色体、起始位置、终止位置):
bash
undefined

✅ Good: Valid BED format

✅ 有效BED格式

chr1 1000 2000 chr2 5000 6000 region_name
chr1 1000 2000 chr2 5000 6000 region_name

❌ Bad: Missing columns

❌ 无效格式(缺少列)

chr1:1000-2000
undefined
chr1:1000-2000
undefined

References

参考资料