scvi-tools
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesescvi-tools Deep Learning Skill
scvi-tools 深度学习单细胞分析技能
This skill provides guidance for deep learning-based single-cell analysis using scvi-tools, the leading framework for probabilistic models in single-cell genomics.
本技能为使用scvi-tools进行基于深度学习的单细胞分析提供指导,scvi-tools是单细胞基因组学领域领先的概率模型框架。
How to Use This Skill
如何使用本技能
- Identify the appropriate workflow from the model/workflow tables below
- Read the corresponding reference file for detailed steps and code
- Use scripts in to avoid rewriting common code
scripts/ - For installation or GPU issues, consult
references/environment_setup.md - For debugging, consult
references/troubleshooting.md
- 从下方的模型/工作流表格中选择合适的工作流
- 阅读对应的参考文件获取详细步骤与代码
- 使用目录下的脚本避免重复编写通用代码
scripts/ - 若遇到安装或GPU相关问题,请查阅
references/environment_setup.md - 若需要调试,请查阅
references/troubleshooting.md
When to Use This Skill
何时使用本技能
- When scvi-tools, scVI, scANVI, or related models are mentioned
- When deep learning-based batch correction or integration is needed
- When working with multi-modal data (CITE-seq, multiome)
- When reference mapping or label transfer is required
- When analyzing ATAC-seq or spatial transcriptomics data
- When learning latent representations of single-cell data
- 当提及scvi-tools、scVI、scANVI或相关模型时
- 当需要基于深度学习的批次校正或数据整合时
- 当处理多模态数据(CITE-seq、多组学)时
- 当需要参考映射或标签转移时
- 当分析ATAC-seq或空间转录组学数据时
- 当需要学习单细胞数据的潜在表征时
Model Selection Guide
模型选择指南
| Data Type | Model | Primary Use Case |
|---|---|---|
| scRNA-seq | scVI | Unsupervised integration, DE, imputation |
| scRNA-seq + labels | scANVI | Label transfer, semi-supervised integration |
| CITE-seq (RNA+protein) | totalVI | Multi-modal integration, protein denoising |
| scATAC-seq | PeakVI | Chromatin accessibility analysis |
| Multiome (RNA+ATAC) | MultiVI | Joint modality analysis |
| Spatial + scRNA reference | DestVI | Cell type deconvolution |
| RNA velocity | veloVI | Transcriptional dynamics |
| Cross-technology | sysVI | System-level batch correction |
| 数据类型 | 模型 | 主要应用场景 |
|---|---|---|
| scRNA-seq | scVI | 无监督整合、差异表达分析(DE)、插补 |
| scRNA-seq + 标签 | scANVI | 标签转移、半监督整合 |
| CITE-seq(RNA+蛋白) | totalVI | 多模态整合、蛋白去噪 |
| scATAC-seq | PeakVI | 染色质可及性分析 |
| 多组学(RNA+ATAC) | MultiVI | 联合模态分析 |
| 空间转录组 + scRNA参考 | DestVI | 细胞类型反卷积 |
| RNA速率 | veloVI | 转录动力学分析 |
| 跨技术平台 | sysVI | 系统级批次校正 |
Workflow Reference Files
工作流参考文件
| Workflow | Reference File | Description |
|---|---|---|
| Environment Setup | | Installation, GPU, version info |
| Data Preparation | | Formatting data for any model |
| scRNA Integration | | scVI/scANVI batch correction |
| ATAC-seq Analysis | | PeakVI for accessibility |
| CITE-seq Analysis | | totalVI for protein+RNA |
| Multiome Analysis | | MultiVI for RNA+ATAC |
| Spatial Deconvolution | | DestVI spatial analysis |
| Label Transfer | | scANVI reference mapping |
| scArches Mapping | | Query-to-reference mapping |
| Batch Correction | | Advanced batch methods |
| RNA Velocity | | veloVI dynamics |
| Troubleshooting | | Common issues and solutions |
| 工作流 | 参考文件 | 描述 |
|---|---|---|
| 环境搭建 | | 安装、GPU配置、版本信息 |
| 数据准备 | | 为各类模型格式化数据 |
| scRNA整合 | | 利用scVI/scANVI进行批次校正 |
| ATAC-seq分析 | | 利用PeakVI进行可及性分析 |
| CITE-seq分析 | | 利用totalVI进行蛋白+RNA分析 |
| 多组学分析 | | 利用MultiVI进行RNA+ATAC分析 |
| 空间转录组反卷积 | | 利用DestVI进行空间分析 |
| 标签转移 | | 利用scANVI进行参考映射 |
| scArches映射 | | 查询数据集到参考数据集的映射 |
| 批次校正 | | 高级批次校正方法 |
| RNA速率 | | 利用veloVI进行动力学分析 |
| 故障排查 | | 常见问题与解决方案 |
CLI Scripts
CLI脚本
Modular scripts for common workflows. Chain together or modify as needed.
用于常见工作流的模块化脚本,可按需组合或修改。
Pipeline Scripts
流水线脚本
| Script | Purpose | Usage |
|---|---|---|
| QC, filter, HVG selection | |
| Train any scvi-tools model | |
| Neighbors, UMAP, Leiden | |
| DE analysis | |
| Label transfer with scANVI | |
| Multi-dataset integration | |
| Check data compatibility | |
| 脚本 | 用途 | 使用方法 |
|---|---|---|
| 质量控制(QC)、过滤、高可变基因(HVG)筛选 | |
| 训练任意scvi-tools模型 | |
| 近邻分析、UMAP可视化、Leiden聚类 | |
| 差异表达(DE)分析 | |
| 利用scANVI进行标签转移 | |
| 多数据集整合 | |
| 检查数据兼容性 | |
Example Workflow
示例工作流
bash
undefinedbash
undefined1. Validate input data
1. 验证输入数据
python scripts/validate_adata.py raw.h5ad --batch-key batch --suggest
python scripts/validate_adata.py raw.h5ad --batch-key batch --suggest
2. Prepare data (QC, HVG selection)
2. 数据准备(QC、HVG筛选)
python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch --n-hvgs 2000
python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch --n-hvgs 2000
3. Train model
3. 训练模型
python scripts/train_model.py prepared.h5ad results/ --model scvi --batch-key batch
python scripts/train_model.py prepared.h5ad results/ --model scvi --batch-key batch
4. Cluster and visualize
4. 聚类与可视化
python scripts/cluster_embed.py results/adata_trained.h5ad results/ --resolution 0.8
python scripts/cluster_embed.py results/adata_trained.h5ad results/ --resolution 0.8
5. Differential expression
5. 差异表达分析
python scripts/differential_expression.py results/model results/adata_clustered.h5ad results/de.csv --groupby leiden
undefinedpython scripts/differential_expression.py results/model results/adata_clustered.h5ad results/de.csv --groupby leiden
undefinedPython Utilities
Python工具函数
The provides importable functions for custom workflows:
scripts/model_utils.py| Function | Purpose |
|---|---|
| Data preparation (QC, HVG, layer setup) |
| Train scVI or scANVI |
| Compute integration metrics |
| Extract DE markers |
| Save model, data, plots |
| Suggest best model |
| Neighbors + UMAP + Leiden |
scripts/model_utils.py| 函数 | 用途 |
|---|---|
| 数据准备(QC、HVG筛选、层设置) |
| 训练scVI或scANVI模型 |
| 计算整合指标 |
| 提取差异表达标记基因 |
| 保存模型、数据、图表 |
| 推荐最优模型 |
| 近邻分析 + UMAP可视化 + Leiden聚类 |
Critical Requirements
关键要求
-
Raw counts required: scvi-tools models require integer count datapython
adata.layers["counts"] = adata.X.copy() # Before normalization scvi.model.SCVI.setup_anndata(adata, layer="counts") -
HVG selection: Use 2000-4000 highly variable genespython
sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="batch", layer="counts", flavor="seurat_v3") adata = adata[:, adata.var['highly_variable']].copy() -
Batch information: Specify batch_key for integrationpython
scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch")
-
需提供原始计数数据:scvi-tools模型要求输入整数计数数据python
adata.layers["counts"] = adata.X.copy() # 归一化前执行 scvi.model.SCVI.setup_anndata(adata, layer="counts") -
高可变基因(HVG)筛选:选择2000-4000个高可变基因python
sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="batch", layer="counts", flavor="seurat_v3") adata = adata[:, adata.var['highly_variable']].copy() -
批次信息:整合时需指定batch_key参数python
scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch")
Quick Decision Tree
快速决策树
Need to integrate scRNA-seq data?
├── Have cell type labels? → scANVI (references/label_transfer.md)
└── No labels? → scVI (references/scrna_integration.md)
Have multi-modal data?
├── CITE-seq (RNA + protein)? → totalVI (references/citeseq_totalvi.md)
├── Multiome (RNA + ATAC)? → MultiVI (references/multiome_multivi.md)
└── scATAC-seq only? → PeakVI (references/atac_peakvi.md)
Have spatial data?
└── Need cell type deconvolution? → DestVI (references/spatial_deconvolution.md)
Have pre-trained reference model?
└── Map query to reference? → scArches (references/scarches_mapping.md)
Need RNA velocity?
└── veloVI (references/rna_velocity_velovi.md)
Strong cross-technology batch effects?
└── sysVI (references/batch_correction_sysvi.md)需要整合scRNA-seq数据?
├── 有细胞类型标签? → scANVI(参考references/label_transfer.md)
└── 无标签? → scVI(参考references/scrna_integration.md)
有多模态数据?
├── CITE-seq(RNA + 蛋白)? → totalVI(参考references/citeseq_totalvi.md)
├── 多组学(RNA + ATAC)? → MultiVI(参考references/multiome_multivi.md)
└── 仅scATAC-seq? → PeakVI(参考references/atac_peakvi.md)
有空间转录组数据?
└── 需要细胞类型反卷积? → DestVI(参考references/spatial_deconvolution.md)
有预训练的参考模型?
└── 将查询数据集映射到参考数据集? → scArches(参考references/scarches_mapping.md)
需要RNA速率分析?
└── veloVI(参考references/rna_velocity_velovi.md)
存在严重的跨技术平台批次效应?
└── sysVI(参考references/batch_correction_sysvi.md)