scvi-tools

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

scvi-tools Deep Learning Skill

scvi-tools 深度学习单细胞分析技能

This skill provides guidance for deep learning-based single-cell analysis using scvi-tools, the leading framework for probabilistic models in single-cell genomics.
本技能为使用scvi-tools进行基于深度学习的单细胞分析提供指导,scvi-tools是单细胞基因组学领域领先的概率模型框架。

How to Use This Skill

如何使用本技能

  1. Identify the appropriate workflow from the model/workflow tables below
  2. Read the corresponding reference file for detailed steps and code
  3. Use scripts in
    scripts/
    to avoid rewriting common code
  4. For installation or GPU issues, consult
    references/environment_setup.md
  5. For debugging, consult
    references/troubleshooting.md
  1. 从下方的模型/工作流表格中选择合适的工作流
  2. 阅读对应的参考文件获取详细步骤与代码
  3. 使用
    scripts/
    目录下的脚本避免重复编写通用代码
  4. 若遇到安装或GPU相关问题,请查阅
    references/environment_setup.md
  5. 若需要调试,请查阅
    references/troubleshooting.md

When to Use This Skill

何时使用本技能

  • When scvi-tools, scVI, scANVI, or related models are mentioned
  • When deep learning-based batch correction or integration is needed
  • When working with multi-modal data (CITE-seq, multiome)
  • When reference mapping or label transfer is required
  • When analyzing ATAC-seq or spatial transcriptomics data
  • When learning latent representations of single-cell data
  • 当提及scvi-tools、scVI、scANVI或相关模型时
  • 当需要基于深度学习的批次校正或数据整合时
  • 当处理多模态数据(CITE-seq、多组学)时
  • 当需要参考映射或标签转移时
  • 当分析ATAC-seq或空间转录组学数据时
  • 当需要学习单细胞数据的潜在表征时

Model Selection Guide

模型选择指南

Data TypeModelPrimary Use Case
scRNA-seqscVIUnsupervised integration, DE, imputation
scRNA-seq + labelsscANVILabel transfer, semi-supervised integration
CITE-seq (RNA+protein)totalVIMulti-modal integration, protein denoising
scATAC-seqPeakVIChromatin accessibility analysis
Multiome (RNA+ATAC)MultiVIJoint modality analysis
Spatial + scRNA referenceDestVICell type deconvolution
RNA velocityveloVITranscriptional dynamics
Cross-technologysysVISystem-level batch correction
数据类型模型主要应用场景
scRNA-seqscVI无监督整合、差异表达分析(DE)、插补
scRNA-seq + 标签scANVI标签转移、半监督整合
CITE-seq(RNA+蛋白)totalVI多模态整合、蛋白去噪
scATAC-seqPeakVI染色质可及性分析
多组学(RNA+ATAC)MultiVI联合模态分析
空间转录组 + scRNA参考DestVI细胞类型反卷积
RNA速率veloVI转录动力学分析
跨技术平台sysVI系统级批次校正

Workflow Reference Files

工作流参考文件

WorkflowReference FileDescription
Environment Setup
references/environment_setup.md
Installation, GPU, version info
Data Preparation
references/data_preparation.md
Formatting data for any model
scRNA Integration
references/scrna_integration.md
scVI/scANVI batch correction
ATAC-seq Analysis
references/atac_peakvi.md
PeakVI for accessibility
CITE-seq Analysis
references/citeseq_totalvi.md
totalVI for protein+RNA
Multiome Analysis
references/multiome_multivi.md
MultiVI for RNA+ATAC
Spatial Deconvolution
references/spatial_deconvolution.md
DestVI spatial analysis
Label Transfer
references/label_transfer.md
scANVI reference mapping
scArches Mapping
references/scarches_mapping.md
Query-to-reference mapping
Batch Correction
references/batch_correction_sysvi.md
Advanced batch methods
RNA Velocity
references/rna_velocity_velovi.md
veloVI dynamics
Troubleshooting
references/troubleshooting.md
Common issues and solutions
工作流参考文件描述
环境搭建
references/environment_setup.md
安装、GPU配置、版本信息
数据准备
references/data_preparation.md
为各类模型格式化数据
scRNA整合
references/scrna_integration.md
利用scVI/scANVI进行批次校正
ATAC-seq分析
references/atac_peakvi.md
利用PeakVI进行可及性分析
CITE-seq分析
references/citeseq_totalvi.md
利用totalVI进行蛋白+RNA分析
多组学分析
references/multiome_multivi.md
利用MultiVI进行RNA+ATAC分析
空间转录组反卷积
references/spatial_deconvolution.md
利用DestVI进行空间分析
标签转移
references/label_transfer.md
利用scANVI进行参考映射
scArches映射
references/scarches_mapping.md
查询数据集到参考数据集的映射
批次校正
references/batch_correction_sysvi.md
高级批次校正方法
RNA速率
references/rna_velocity_velovi.md
利用veloVI进行动力学分析
故障排查
references/troubleshooting.md
常见问题与解决方案

CLI Scripts

CLI脚本

Modular scripts for common workflows. Chain together or modify as needed.
用于常见工作流的模块化脚本,可按需组合或修改。

Pipeline Scripts

流水线脚本

ScriptPurposeUsage
prepare_data.py
QC, filter, HVG selection
python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch
train_model.py
Train any scvi-tools model
python scripts/train_model.py prepared.h5ad results/ --model scvi
cluster_embed.py
Neighbors, UMAP, Leiden
python scripts/cluster_embed.py adata.h5ad results/
differential_expression.py
DE analysis
python scripts/differential_expression.py model/ adata.h5ad de.csv --groupby leiden
transfer_labels.py
Label transfer with scANVI
python scripts/transfer_labels.py ref_model/ query.h5ad results/
integrate_datasets.py
Multi-dataset integration
python scripts/integrate_datasets.py results/ data1.h5ad data2.h5ad
validate_adata.py
Check data compatibility
python scripts/validate_adata.py data.h5ad --batch-key batch
脚本用途使用方法
prepare_data.py
质量控制(QC)、过滤、高可变基因(HVG)筛选
python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch
train_model.py
训练任意scvi-tools模型
python scripts/train_model.py prepared.h5ad results/ --model scvi
cluster_embed.py
近邻分析、UMAP可视化、Leiden聚类
python scripts/cluster_embed.py adata.h5ad results/
differential_expression.py
差异表达(DE)分析
python scripts/differential_expression.py model/ adata.h5ad de.csv --groupby leiden
transfer_labels.py
利用scANVI进行标签转移
python scripts/transfer_labels.py ref_model/ query.h5ad results/
integrate_datasets.py
多数据集整合
python scripts/integrate_datasets.py results/ data1.h5ad data2.h5ad
validate_adata.py
检查数据兼容性
python scripts/validate_adata.py data.h5ad --batch-key batch

Example Workflow

示例工作流

bash
undefined
bash
undefined

1. Validate input data

1. 验证输入数据

python scripts/validate_adata.py raw.h5ad --batch-key batch --suggest
python scripts/validate_adata.py raw.h5ad --batch-key batch --suggest

2. Prepare data (QC, HVG selection)

2. 数据准备(QC、HVG筛选)

python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch --n-hvgs 2000
python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch --n-hvgs 2000

3. Train model

3. 训练模型

python scripts/train_model.py prepared.h5ad results/ --model scvi --batch-key batch
python scripts/train_model.py prepared.h5ad results/ --model scvi --batch-key batch

4. Cluster and visualize

4. 聚类与可视化

python scripts/cluster_embed.py results/adata_trained.h5ad results/ --resolution 0.8
python scripts/cluster_embed.py results/adata_trained.h5ad results/ --resolution 0.8

5. Differential expression

5. 差异表达分析

python scripts/differential_expression.py results/model results/adata_clustered.h5ad results/de.csv --groupby leiden
undefined
python scripts/differential_expression.py results/model results/adata_clustered.h5ad results/de.csv --groupby leiden
undefined

Python Utilities

Python工具函数

The
scripts/model_utils.py
provides importable functions for custom workflows:
FunctionPurpose
prepare_adata()
Data preparation (QC, HVG, layer setup)
train_scvi()
Train scVI or scANVI
evaluate_integration()
Compute integration metrics
get_marker_genes()
Extract DE markers
save_results()
Save model, data, plots
auto_select_model()
Suggest best model
quick_clustering()
Neighbors + UMAP + Leiden
scripts/model_utils.py
为自定义工作流提供了可导入的函数:
函数用途
prepare_adata()
数据准备(QC、HVG筛选、层设置)
train_scvi()
训练scVI或scANVI模型
evaluate_integration()
计算整合指标
get_marker_genes()
提取差异表达标记基因
save_results()
保存模型、数据、图表
auto_select_model()
推荐最优模型
quick_clustering()
近邻分析 + UMAP可视化 + Leiden聚类

Critical Requirements

关键要求

  1. Raw counts required: scvi-tools models require integer count data
    python
    adata.layers["counts"] = adata.X.copy()  # Before normalization
    scvi.model.SCVI.setup_anndata(adata, layer="counts")
  2. HVG selection: Use 2000-4000 highly variable genes
    python
    sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="batch", layer="counts", flavor="seurat_v3")
    adata = adata[:, adata.var['highly_variable']].copy()
  3. Batch information: Specify batch_key for integration
    python
    scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch")
  1. 需提供原始计数数据:scvi-tools模型要求输入整数计数数据
    python
    adata.layers["counts"] = adata.X.copy()  # 归一化前执行
    scvi.model.SCVI.setup_anndata(adata, layer="counts")
  2. 高可变基因(HVG)筛选:选择2000-4000个高可变基因
    python
    sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="batch", layer="counts", flavor="seurat_v3")
    adata = adata[:, adata.var['highly_variable']].copy()
  3. 批次信息:整合时需指定batch_key参数
    python
    scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch")

Quick Decision Tree

快速决策树

Need to integrate scRNA-seq data?
├── Have cell type labels? → scANVI (references/label_transfer.md)
└── No labels? → scVI (references/scrna_integration.md)

Have multi-modal data?
├── CITE-seq (RNA + protein)? → totalVI (references/citeseq_totalvi.md)
├── Multiome (RNA + ATAC)? → MultiVI (references/multiome_multivi.md)
└── scATAC-seq only? → PeakVI (references/atac_peakvi.md)

Have spatial data?
└── Need cell type deconvolution? → DestVI (references/spatial_deconvolution.md)

Have pre-trained reference model?
└── Map query to reference? → scArches (references/scarches_mapping.md)

Need RNA velocity?
└── veloVI (references/rna_velocity_velovi.md)

Strong cross-technology batch effects?
└── sysVI (references/batch_correction_sysvi.md)
需要整合scRNA-seq数据?
├── 有细胞类型标签? → scANVI(参考references/label_transfer.md)
└── 无标签? → scVI(参考references/scrna_integration.md)

有多模态数据?
├── CITE-seq(RNA + 蛋白)? → totalVI(参考references/citeseq_totalvi.md)
├── 多组学(RNA + ATAC)? → MultiVI(参考references/multiome_multivi.md)
└── 仅scATAC-seq? → PeakVI(参考references/atac_peakvi.md)

有空间转录组数据?
└── 需要细胞类型反卷积? → DestVI(参考references/spatial_deconvolution.md)

有预训练的参考模型?
└── 将查询数据集映射到参考数据集? → scArches(参考references/scarches_mapping.md)

需要RNA速率分析?
└── veloVI(参考references/rna_velocity_velovi.md)

存在严重的跨技术平台批次效应?
└── sysVI(参考references/batch_correction_sysvi.md)

Key Resources

关键资源