scvelo
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesescVelo — RNA Velocity Analysis
scVelo — RNA Velocity分析
Overview
概述
scVelo is the leading Python package for RNA velocity analysis in single-cell RNA-seq data. It infers cell state transitions by modeling the kinetics of mRNA splicing — using the ratio of unspliced (pre-mRNA) to spliced (mature mRNA) abundances to determine whether a gene is being upregulated or downregulated in each cell. This allows reconstruction of developmental trajectories and identification of cell fate decisions without requiring time-course data.
Installation:
pip install scveloKey resources:
- Documentation: https://scvelo.readthedocs.io/
- GitHub: https://github.com/theislab/scvelo
- Paper: Bergen et al. (2020) Nature Biotechnology. PMID: 32747759
scVelo是用于单细胞RNA-seq数据RNA velocity分析的领先Python包。它通过建模mRNA剪接动力学来推断细胞状态转变——利用未剪接(前体mRNA)与已剪接(成熟mRNA)的丰度比,判断每个细胞中基因是上调还是下调。这使得无需时间序列数据即可重建发育轨迹并识别细胞命运决策。
安装:
pip install scvelo关键资源:
- 文档:https://scvelo.readthedocs.io/
- GitHub:https://github.com/theislab/scvelo
- 论文:Bergen等人(2020)《Nature Biotechnology》。PMID: 32747759
When to Use This Skill
何时使用该工具
Use scVelo when:
- Trajectory inference from snapshot data: Determine which direction cells are differentiating
- Cell fate prediction: Identify progenitor cells and their downstream fates
- Driver gene identification: Find genes whose dynamics best explain observed trajectories
- Developmental biology: Model hematopoiesis, neurogenesis, epithelial-to-mesenchymal transitions
- Latent time estimation: Order cells along a pseudotime derived from splicing dynamics
- Complement to Scanpy: Add directional information to UMAP embeddings
在以下场景使用scVelo:
- 从快照数据推断轨迹:确定细胞分化的方向
- 细胞命运预测:识别祖细胞及其下游命运
- 驱动基因识别:找出其动力学最能解释观测轨迹的基因
- 发育生物学:建模造血作用、神经发生、上皮-间质转化
- 潜在时间估计:根据剪接动力学对细胞进行伪时间排序
- 补充Scanpy:为UMAP嵌入添加方向信息
Prerequisites
前提条件
scVelo requires count matrices for both unspliced and spliced RNA. These are generated by:
- STARsolo or kallisto|bustools with mode
lamanno - velocyto CLI: /
velocyto run10xvelocyto run - alevin-fry / simpleaf with spliced/unspliced output
Data is stored in an object with and .
AnnDatalayers["spliced"]layers["unspliced"]scVelo需要未剪接和已剪接RNA的计数矩阵。可通过以下方式生成:
- STARsolo或kallisto|bustools的模式
lamanno - velocyto命令行工具:/
velocyto run10xvelocyto run - alevin-fry / simpleaf的剪接/未剪接输出
数据存储在对象中,包含和。
AnnDatalayers["spliced"]layers["unspliced"]Standard RNA Velocity Workflow
标准RNA Velocity分析流程
1. Setup and Data Loading
1. 设置与数据加载
python
import scvelo as scv
import scanpy as sc
import numpy as np
import matplotlib.pyplot as pltpython
import scvelo as scv
import scanpy as sc
import numpy as np
import matplotlib.pyplot as pltConfigure settings
配置设置
scv.settings.verbosity = 3 # Show computation steps
scv.settings.presenter_view = True
scv.settings.set_figure_params('scvelo')
scv.settings.verbosity = 3 # 显示计算步骤
scv.settings.presenter_view = True
scv.settings.set_figure_params('scvelo')
Load data (AnnData with spliced/unspliced layers)
加载数据(包含剪接/未剪接层的AnnData)
Option A: Load from loom (velocyto output)
选项A:从loom文件加载(velocyto输出)
adata = scv.read("cellranger_output.loom", cache=True)
adata = scv.read("cellranger_output.loom", cache=True)
Option B: Merge velocyto loom with Scanpy-processed AnnData
选项B:合并velocyto loom文件与Scanpy处理后的AnnData
adata_processed = sc.read_h5ad("processed.h5ad") # Has UMAP, clusters
adata_velocity = scv.read("velocyto.loom")
adata = scv.utils.merge(adata_processed, adata_velocity)
adata_processed = sc.read_h5ad("processed.h5ad") # 包含UMAP、聚类信息
adata_velocity = scv.read("velocyto.loom")
adata = scv.utils.merge(adata_processed, adata_velocity)
Verify layers
验证层信息
print(adata)
print(adata)
obs × var: N × G
obs × var: N × G
layers: 'spliced', 'unspliced' (required)
layers: 'spliced', 'unspliced'(必需)
obsm['X_umap'] (required for visualization)
obsm['X_umap'](可视化必需)
undefinedundefined2. Preprocessing
2. 预处理
python
undefinedpython
undefinedFilter and normalize (follows Scanpy conventions)
过滤与标准化(遵循Scanpy规范)
scv.pp.filter_and_normalize(
adata,
min_shared_counts=20, # Minimum counts in spliced+unspliced
n_top_genes=2000 # Top highly variable genes
)
scv.pp.filter_and_normalize(
adata,
min_shared_counts=20, # 剪接+未剪接的最小计数
n_top_genes=2000 # 高可变基因数量
)
Compute first and second order moments (means and variances)
计算一阶和二阶矩(均值与方差)
knn_connectivities must be computed first
必须先计算knn_connectivities
sc.pp.neighbors(adata, n_neighbors=30, n_pcs=30)
scv.pp.moments(
adata,
n_pcs=30,
n_neighbors=30
)
undefinedsc.pp.neighbors(adata, n_neighbors=30, n_pcs=30)
scv.pp.moments(
adata,
n_pcs=30,
n_neighbors=30
)
undefined3. Velocity Estimation — Stochastic Model
3. 速度估计——随机模型
The stochastic model is fast and suitable for exploratory analysis:
python
undefined随机模型速度快,适合探索性分析:
python
undefinedStochastic velocity (faster, less accurate)
随机速度模型(更快,精度较低)
scv.tl.velocity(adata, mode='stochastic')
scv.tl.velocity_graph(adata)
scv.tl.velocity(adata, mode='stochastic')
scv.tl.velocity_graph(adata)
Visualize
可视化
scv.pl.velocity_embedding_stream(
adata,
basis='umap',
color='leiden',
title="RNA Velocity (Stochastic)"
)
undefinedscv.pl.velocity_embedding_stream(
adata,
basis='umap',
color='leiden',
title="RNA Velocity (随机模型)"
)
undefined4. Velocity Estimation — Dynamical Model (Recommended)
4. 速度估计——动力学模型(推荐)
The dynamical model fits the full splicing kinetics and is more accurate:
python
undefined动力学模型拟合完整的剪接动力学,精度更高:
python
undefinedRecover dynamics (computationally intensive; ~10-30 min for 10K cells)
恢复动力学(计算密集;10K细胞约需10-30分钟)
scv.tl.recover_dynamics(adata, n_jobs=4)
scv.tl.recover_dynamics(adata, n_jobs=4)
Compute velocity from dynamical model
基于动力学模型计算速度
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)
undefinedscv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)
undefined5. Latent Time
5. 潜在时间
The dynamical model enables computation of a shared latent time (pseudotime):
python
undefined动力学模型可计算共享的潜在时间(伪时间):
python
undefinedCompute latent time
计算潜在时间
scv.tl.latent_time(adata)
scv.tl.latent_time(adata)
Visualize latent time on UMAP
在UMAP上可视化潜在时间
scv.pl.scatter(
adata,
color='latent_time',
color_map='gnuplot',
size=80,
title='Latent time'
)
scv.pl.scatter(
adata,
color='latent_time',
color_map='gnuplot',
size=80,
title='潜在时间'
)
Identify top genes ordered by latent time
按潜在时间排序识别顶级基因
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300]
scv.pl.heatmap(
adata,
var_names=top_genes,
sortby='latent_time',
col_color='leiden',
n_convolve=100
)
undefinedtop_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300]
scv.pl.heatmap(
adata,
var_names=top_genes,
sortby='latent_time',
col_color='leiden',
n_convolve=100
)
undefined6. Driver Gene Analysis
6. 驱动基因分析
python
undefinedpython
undefinedIdentify genes with highest velocity fit
识别速度拟合度最高的基因
scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)
df = scv.DataFrame(adata.uns['rank_velocity_genes']['names'])
print(df.head(10))
scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)
df = scv.DataFrame(adata.uns['rank_velocity_genes']['names'])
print(df.head(10))
Speed and coherence
速度与一致性
scv.tl.velocity_confidence(adata)
scv.pl.scatter(
adata,
c=['velocity_length', 'velocity_confidence'],
cmap='coolwarm',
perc=[5, 95]
)
scv.tl.velocity_confidence(adata)
scv.pl.scatter(
adata,
c=['velocity_length', 'velocity_confidence'],
cmap='coolwarm',
perc=[5, 95]
)
Phase portraits for specific genes
特定基因的相位图
scv.pl.velocity(adata, ['Cpe', 'Gnao1', 'Ins2'],
ncols=3, figsize=(16, 4))
undefinedscv.pl.velocity(adata, ['Cpe', 'Gnao1', 'Ins2'],
ncols=3, figsize=(16, 4))
undefined7. Velocity Arrows and Pseudotime
7. 速度箭头与伪时间
python
undefinedpython
undefinedArrow plot on UMAP
UMAP上的箭头图
scv.pl.velocity_embedding(
adata,
arrow_length=3,
arrow_size=2,
color='leiden',
basis='umap'
)
scv.pl.velocity_embedding(
adata,
arrow_length=3,
arrow_size=2,
color='leiden',
basis='umap'
)
Stream plot (cleaner visualization)
流图(更清晰的可视化)
scv.pl.velocity_embedding_stream(
adata,
basis='umap',
color='leiden',
smooth=0.8,
min_mass=4
)
scv.pl.velocity_embedding_stream(
adata,
basis='umap',
color='leiden',
smooth=0.8,
min_mass=4
)
Velocity pseudotime (alternative to latent time)
速度伪时间(潜在时间的替代方案)
scv.tl.velocity_pseudotime(adata)
scv.pl.scatter(adata, color='velocity_pseudotime', cmap='gnuplot')
undefinedscv.tl.velocity_pseudotime(adata)
scv.pl.scatter(adata, color='velocity_pseudotime', cmap='gnuplot')
undefined8. PAGA Trajectory Graph
8. PAGA轨迹图
python
undefinedpython
undefinedPAGA graph with velocity-informed transitions
包含速度信息的PAGA图
scv.tl.paga(adata, groups='leiden')
df = scv.get_df(adata, 'paga/transitions_confidence', precision=2).T
df.style.background_gradient(cmap='Blues').format('{:.2g}')
scv.tl.paga(adata, groups='leiden')
df = scv.get_df(adata, 'paga/transitions_confidence', precision=2).T
df.style.background_gradient(cmap='Blues').format('{:.2g}')
Plot PAGA with velocity
绘制带速度的PAGA图
scv.pl.paga(
adata,
basis='umap',
size=50,
alpha=0.1,
min_edge_width=2,
node_size_scale=1.5
)
undefinedscv.pl.paga(
adata,
basis='umap',
size=50,
alpha=0.1,
min_edge_width=2,
node_size_scale=1.5
)
undefinedComplete Workflow Script
完整流程脚本
python
import scvelo as scv
import scanpy as sc
def run_rna_velocity(adata, n_top_genes=2000, mode='dynamical', n_jobs=4):
"""
Complete RNA velocity workflow.
Args:
adata: AnnData with 'spliced' and 'unspliced' layers, UMAP in obsm
n_top_genes: Number of top HVGs for velocity
mode: 'stochastic' (fast) or 'dynamical' (accurate)
n_jobs: Parallel jobs for dynamical model
Returns:
Processed AnnData with velocity information
"""
scv.settings.verbosity = 2
# 1. Preprocessing
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=n_top_genes)
if 'neighbors' not in adata.uns:
sc.pp.neighbors(adata, n_neighbors=30)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
# 2. Velocity estimation
if mode == 'dynamical':
scv.tl.recover_dynamics(adata, n_jobs=n_jobs)
scv.tl.velocity(adata, mode=mode)
scv.tl.velocity_graph(adata)
# 3. Downstream analyses
if mode == 'dynamical':
scv.tl.latent_time(adata)
scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)
scv.tl.velocity_confidence(adata)
scv.tl.velocity_pseudotime(adata)
return adatapython
import scvelo as scv
import scanpy as sc
def run_rna_velocity(adata, n_top_genes=2000, mode='dynamical', n_jobs=4):
"""
完整的RNA velocity分析流程。
参数:
adata: 包含'spliced'和'unspliced'层、UMAP存储在obsm中的AnnData对象
n_top_genes: 用于速度分析的高可变基因数量
mode: 'stochastic'(快速)或'dynamical'(高精度)
n_jobs: 动力学模型的并行任务数
返回:
包含速度信息的处理后AnnData对象
"""
scv.settings.verbosity = 2
# 1. 预处理
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=n_top_genes)
if 'neighbors' not in adata.uns:
sc.pp.neighbors(adata, n_neighbors=30)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
# 2. 速度估计
if mode == 'dynamical':
scv.tl.recover_dynamics(adata, n_jobs=n_jobs)
scv.tl.velocity(adata, mode=mode)
scv.tl.velocity_graph(adata)
# 3. 下游分析
if mode == 'dynamical':
scv.tl.latent_time(adata)
scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)
scv.tl.velocity_confidence(adata)
scv.tl.velocity_pseudotime(adata)
return adataKey Output Fields in AnnData
AnnData中的关键输出字段
After running the workflow, the following fields are added:
| Location | Key | Description |
|---|---|---|
| | RNA velocity per gene per cell |
| | Fitted latent time per gene per cell |
| | 2D velocity vectors on UMAP |
| | Pseudotime from velocity |
| | Latent time from dynamical model |
| | Speed of each cell |
| | Confidence score per cell |
| | Gene-level model fit quality |
| | Transcription rate |
| | Splicing rate |
| | Degradation rate |
| | Cell-cell transition probability matrix |
运行流程后,会添加以下字段:
| 位置 | 键 | 描述 |
|---|---|---|
| | 每个细胞每个基因的RNA velocity |
| | 每个细胞每个基因的拟合潜在时间 |
| | UMAP上的2D速度向量 |
| | 来自velocity的伪时间 |
| | 来自动力学模型的潜在时间 |
| | 每个细胞的速度 |
| | 每个细胞的置信度分数 |
| | 基因水平的模型拟合质量 |
| | 转录速率 |
| | 剪接速率 |
| | 降解速率 |
| | 细胞间转变概率矩阵 |
Velocity Models Comparison
速度模型对比
| Model | Speed | Accuracy | When to Use |
|---|---|---|---|
| Fast | Moderate | Exploratory; large datasets |
| Medium | Moderate | Simple linear kinetics |
| Slow | High | Publication-quality; identifies driver genes |
| 模型 | 速度 | 精度 | 使用场景 |
|---|---|---|---|
| 快 | 中等 | 探索性分析;大型数据集 |
| 中等 | 中等 | 简单线性动力学 |
| 慢 | 高 | 发表级分析;识别驱动基因 |
Best Practices
最佳实践
- Start with stochastic mode for exploration; switch to dynamical for final analysis
- Need good coverage of unspliced reads: Short reads (< 100 bp) may miss intron coverage
- Minimum 2,000 cells: RNA velocity is noisy with fewer cells
- Velocity should be coherent: Arrows should follow known biology; randomness indicates issues
- k-NN bandwidth matters: Too few neighbors → noisy velocity; too many → oversmoothed
- Sanity check: Root cells (progenitors) should have high unspliced/spliced ratios for marker genes
- Dynamical model requires distinct kinetic states: Works best for clear differentiation processes
- 先使用随机模型进行探索;最终分析切换到动力学模型
- 需要充足的未剪接读段覆盖:短读长(<100bp)可能无法覆盖内含子
- 至少2000个细胞:细胞数量过少时RNA velocity噪声较大
- 速度应具有一致性:箭头应符合已知生物学规律;随机性表示存在问题
- k-NN带宽很重要:邻居过少→速度噪声大;邻居过多→过度平滑
- 合理性检查:根细胞(祖细胞)的标记基因应具有高未剪接/已剪接比率
- 动力学模型需要明确的动力学状态:在清晰的分化过程中效果最佳
Troubleshooting
故障排除
| Problem | Solution |
|---|---|
| Missing unspliced layer | Re-run velocyto or use STARsolo with |
| Very few velocity genes | Lower |
| Random-looking arrows | Try different |
| Memory error with dynamical | Set |
| Negative velocity everywhere | Check that spliced/unspliced layers are not swapped |
| 问题 | 解决方案 |
|---|---|
| 缺少未剪接层 | 重新运行velocyto或使用带 |
| 速度基因极少 | 降低 |
| 箭头看起来随机 | 尝试不同的 |
| 动力学模型出现内存错误 | 设置 |
| 所有速度均为负值 | 检查剪接/未剪接层是否被交换 |
Additional Resources
额外资源
- scVelo documentation: https://scvelo.readthedocs.io/
- Tutorial notebooks: https://scvelo.readthedocs.io/tutorials/
- GitHub: https://github.com/theislab/scvelo
- Paper: Bergen V et al. (2020) Nature Biotechnology. PMID: 32747759
- velocyto (preprocessing): http://velocyto.org/
- CellRank (fate prediction, extends scVelo): https://cellrank.readthedocs.io/
- dynamo (metabolic labeling alternative): https://dynamo-release.readthedocs.io/
- scVelo文档:https://scvelo.readthedocs.io/
- 教程笔记本:https://scvelo.readthedocs.io/tutorials/
- GitHub:https://github.com/theislab/scvelo
- 论文:Bergen V等人(2020)《Nature Biotechnology》。PMID: 32747759
- velocyto(预处理):http://velocyto.org/
- CellRank(命运预测,扩展scVelo):https://cellrank.readthedocs.io/
- dynamo(代谢标记替代方案):https://dynamo-release.readthedocs.io/