scvelo

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

scVelo — RNA Velocity Analysis

scVelo — RNA Velocity分析

Overview

概述

scVelo is the leading Python package for RNA velocity analysis in single-cell RNA-seq data. It infers cell state transitions by modeling the kinetics of mRNA splicing — using the ratio of unspliced (pre-mRNA) to spliced (mature mRNA) abundances to determine whether a gene is being upregulated or downregulated in each cell. This allows reconstruction of developmental trajectories and identification of cell fate decisions without requiring time-course data.

Installation:

pip install scvelo

Key resources:

Documentation: https://scvelo.readthedocs.io/
GitHub: https://github.com/theislab/scvelo
Paper: Bergen et al. (2020) Nature Biotechnology. PMID: 32747759

scVelo是用于单细胞RNA-seq数据RNA velocity分析的领先Python包。它通过建模mRNA剪接动力学来推断细胞状态转变——利用未剪接（前体mRNA）与已剪接（成熟mRNA）的丰度比，判断每个细胞中基因是上调还是下调。这使得无需时间序列数据即可重建发育轨迹并识别细胞命运决策。

安装：

pip install scvelo

关键资源：

文档：https://scvelo.readthedocs.io/
GitHub：https://github.com/theislab/scvelo
论文：Bergen等人（2020）《Nature Biotechnology》。PMID: 32747759

When to Use This Skill

何时使用该工具

Use scVelo when:

Trajectory inference from snapshot data: Determine which direction cells are differentiating
Cell fate prediction: Identify progenitor cells and their downstream fates
Driver gene identification: Find genes whose dynamics best explain observed trajectories
Developmental biology: Model hematopoiesis, neurogenesis, epithelial-to-mesenchymal transitions
Latent time estimation: Order cells along a pseudotime derived from splicing dynamics
Complement to Scanpy: Add directional information to UMAP embeddings

在以下场景使用scVelo：

从快照数据推断轨迹：确定细胞分化的方向
细胞命运预测：识别祖细胞及其下游命运
驱动基因识别：找出其动力学最能解释观测轨迹的基因
发育生物学：建模造血作用、神经发生、上皮-间质转化
潜在时间估计：根据剪接动力学对细胞进行伪时间排序
补充Scanpy：为UMAP嵌入添加方向信息

Prerequisites

前提条件

scVelo requires count matrices for both unspliced and spliced RNA. These are generated by:

STARsolo or kallisto|bustools with
```
lamanno
```
mode
velocyto CLI:
```
velocyto run10x
```
/
```
velocyto run
```
alevin-fry / simpleaf with spliced/unspliced output

Data is stored in an

AnnData

object with

layers["spliced"]

and

layers["unspliced"]

scVelo需要未剪接和已剪接RNA的计数矩阵。可通过以下方式生成：

STARsolo或kallisto|bustools的
```
lamanno
```
模式
velocyto命令行工具：
```
velocyto run10x
```
/
```
velocyto run
```
alevin-fry / simpleaf的剪接/未剪接输出

数据存储在

AnnData

对象中，包含

layers["spliced"]

和

layers["unspliced"]

。

Standard RNA Velocity Workflow

标准RNA Velocity分析流程

1. Setup and Data Loading

1. 设置与数据加载

python

import scvelo as scv
import scanpy as sc
import numpy as np
import matplotlib.pyplot as plt

python

import scvelo as scv
import scanpy as sc
import numpy as np
import matplotlib.pyplot as plt

Configure settings

配置设置

scv.settings.verbosity = 3 # Show computation steps scv.settings.presenter_view = True scv.settings.set_figure_params('scvelo')

scv.settings.verbosity = 3 # 显示计算步骤 scv.settings.presenter_view = True scv.settings.set_figure_params('scvelo')

Load data (AnnData with spliced/unspliced layers)

加载数据（包含剪接/未剪接层的AnnData）

Option A: Load from loom (velocyto output)

选项A：从loom文件加载（velocyto输出）

adata = scv.read("cellranger_output.loom", cache=True)

Option B: Merge velocyto loom with Scanpy-processed AnnData

选项B：合并velocyto loom文件与Scanpy处理后的AnnData

adata_processed = sc.read_h5ad("processed.h5ad") # Has UMAP, clusters adata_velocity = scv.read("velocyto.loom") adata = scv.utils.merge(adata_processed, adata_velocity)

adata_processed = sc.read_h5ad("processed.h5ad") # 包含UMAP、聚类信息 adata_velocity = scv.read("velocyto.loom") adata = scv.utils.merge(adata_processed, adata_velocity)

Verify layers

验证层信息

print(adata)

obs × var: N × G

layers: 'spliced', 'unspliced' (required)

layers: 'spliced', 'unspliced'（必需）

obsm['X_umap'] (required for visualization)

obsm['X_umap']（可视化必需）

undefined

undefined

2. Preprocessing

2. 预处理

python

undefined

python

undefined

Filter and normalize (follows Scanpy conventions)

过滤与标准化（遵循Scanpy规范）

scv.pp.filter_and_normalize( adata, min_shared_counts=20, # Minimum counts in spliced+unspliced n_top_genes=2000 # Top highly variable genes )

scv.pp.filter_and_normalize( adata, min_shared_counts=20, # 剪接+未剪接的最小计数 n_top_genes=2000 # 高可变基因数量 )

Compute first and second order moments (means and variances)

计算一阶和二阶矩（均值与方差）

knn_connectivities must be computed first

必须先计算knn_connectivities

sc.pp.neighbors(adata, n_neighbors=30, n_pcs=30) scv.pp.moments( adata, n_pcs=30, n_neighbors=30 )

undefined

sc.pp.neighbors(adata, n_neighbors=30, n_pcs=30) scv.pp.moments( adata, n_pcs=30, n_neighbors=30 )

undefined

3. Velocity Estimation — Stochastic Model

3. 速度估计——随机模型

The stochastic model is fast and suitable for exploratory analysis:

python

undefined

随机模型速度快，适合探索性分析：

python

undefined

Stochastic velocity (faster, less accurate)

随机速度模型（更快，精度较低）

scv.tl.velocity(adata, mode='stochastic') scv.tl.velocity_graph(adata)

Visualize

可视化

scv.pl.velocity_embedding_stream( adata, basis='umap', color='leiden', title="RNA Velocity (Stochastic)" )

undefined

scv.pl.velocity_embedding_stream( adata, basis='umap', color='leiden', title="RNA Velocity (随机模型)" )

undefined

4. Velocity Estimation — Dynamical Model (Recommended)

4. 速度估计——动力学模型（推荐）

The dynamical model fits the full splicing kinetics and is more accurate:

python

undefined

动力学模型拟合完整的剪接动力学，精度更高：

python

undefined

Recover dynamics (computationally intensive; ~10-30 min for 10K cells)

恢复动力学（计算密集；10K细胞约需10-30分钟）

scv.tl.recover_dynamics(adata, n_jobs=4)

Compute velocity from dynamical model

基于动力学模型计算速度

scv.tl.velocity(adata, mode='dynamical') scv.tl.velocity_graph(adata)

undefined

scv.tl.velocity(adata, mode='dynamical') scv.tl.velocity_graph(adata)

undefined

5. Latent Time

5. 潜在时间

The dynamical model enables computation of a shared latent time (pseudotime):

python

undefined

动力学模型可计算共享的潜在时间（伪时间）：

python

undefined

Compute latent time

计算潜在时间

scv.tl.latent_time(adata)

Visualize latent time on UMAP

在UMAP上可视化潜在时间

scv.pl.scatter( adata, color='latent_time', color_map='gnuplot', size=80, title='Latent time' )

scv.pl.scatter( adata, color='latent_time', color_map='gnuplot', size=80, title='潜在时间' )

Identify top genes ordered by latent time

按潜在时间排序识别顶级基因

top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300] scv.pl.heatmap( adata, var_names=top_genes, sortby='latent_time', col_color='leiden', n_convolve=100 )

undefined

top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300] scv.pl.heatmap( adata, var_names=top_genes, sortby='latent_time', col_color='leiden', n_convolve=100 )

undefined

6. Driver Gene Analysis

6. 驱动基因分析

python

undefined

python

undefined

Identify genes with highest velocity fit

识别速度拟合度最高的基因

scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3) df = scv.DataFrame(adata.uns['rank_velocity_genes']['names']) print(df.head(10))

Speed and coherence

速度与一致性

scv.tl.velocity_confidence(adata) scv.pl.scatter( adata, c=['velocity_length', 'velocity_confidence'], cmap='coolwarm', perc=[5, 95] )

Phase portraits for specific genes

特定基因的相位图

scv.pl.velocity(adata, ['Cpe', 'Gnao1', 'Ins2'], ncols=3, figsize=(16, 4))

undefined

scv.pl.velocity(adata, ['Cpe', 'Gnao1', 'Ins2'], ncols=3, figsize=(16, 4))

undefined

7. Velocity Arrows and Pseudotime

7. 速度箭头与伪时间

python

undefined

python

undefined

Arrow plot on UMAP

UMAP上的箭头图

scv.pl.velocity_embedding( adata, arrow_length=3, arrow_size=2, color='leiden', basis='umap' )

Stream plot (cleaner visualization)

流图（更清晰的可视化）

scv.pl.velocity_embedding_stream( adata, basis='umap', color='leiden', smooth=0.8, min_mass=4 )

Velocity pseudotime (alternative to latent time)

速度伪时间（潜在时间的替代方案）

scv.tl.velocity_pseudotime(adata) scv.pl.scatter(adata, color='velocity_pseudotime', cmap='gnuplot')

undefined

scv.tl.velocity_pseudotime(adata) scv.pl.scatter(adata, color='velocity_pseudotime', cmap='gnuplot')

undefined

8. PAGA Trajectory Graph

8. PAGA轨迹图

python

undefined

python

undefined

PAGA graph with velocity-informed transitions

包含速度信息的PAGA图

scv.tl.paga(adata, groups='leiden') df = scv.get_df(adata, 'paga/transitions_confidence', precision=2).T df.style.background_gradient(cmap='Blues').format('{:.2g}')

Plot PAGA with velocity

绘制带速度的PAGA图

scv.pl.paga( adata, basis='umap', size=50, alpha=0.1, min_edge_width=2, node_size_scale=1.5 )

undefined

scv.pl.paga( adata, basis='umap', size=50, alpha=0.1, min_edge_width=2, node_size_scale=1.5 )

undefined

Complete Workflow Script

完整流程脚本

python

import scvelo as scv
import scanpy as sc

def run_rna_velocity(adata, n_top_genes=2000, mode='dynamical', n_jobs=4):
    """
    Complete RNA velocity workflow.

    Args:
        adata: AnnData with 'spliced' and 'unspliced' layers, UMAP in obsm
        n_top_genes: Number of top HVGs for velocity
        mode: 'stochastic' (fast) or 'dynamical' (accurate)
        n_jobs: Parallel jobs for dynamical model

    Returns:
        Processed AnnData with velocity information
    """
    scv.settings.verbosity = 2

    # 1. Preprocessing
    scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=n_top_genes)

    if 'neighbors' not in adata.uns:
        sc.pp.neighbors(adata, n_neighbors=30)

    scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

    # 2. Velocity estimation
    if mode == 'dynamical':
        scv.tl.recover_dynamics(adata, n_jobs=n_jobs)

    scv.tl.velocity(adata, mode=mode)
    scv.tl.velocity_graph(adata)

    # 3. Downstream analyses
    if mode == 'dynamical':
        scv.tl.latent_time(adata)
        scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)

    scv.tl.velocity_confidence(adata)
    scv.tl.velocity_pseudotime(adata)

    return adata

python

import scvelo as scv
import scanpy as sc

def run_rna_velocity(adata, n_top_genes=2000, mode='dynamical', n_jobs=4):
    """
    完整的RNA velocity分析流程。

    参数:
        adata: 包含'spliced'和'unspliced'层、UMAP存储在obsm中的AnnData对象
        n_top_genes: 用于速度分析的高可变基因数量
        mode: 'stochastic'（快速）或'dynamical'（高精度）
        n_jobs: 动力学模型的并行任务数

    返回:
        包含速度信息的处理后AnnData对象
    """
    scv.settings.verbosity = 2

    # 1. 预处理
    scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=n_top_genes)

    if 'neighbors' not in adata.uns:
        sc.pp.neighbors(adata, n_neighbors=30)

    scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

    # 2. 速度估计
    if mode == 'dynamical':
        scv.tl.recover_dynamics(adata, n_jobs=n_jobs)

    scv.tl.velocity(adata, mode=mode)
    scv.tl.velocity_graph(adata)

    # 3. 下游分析
    if mode == 'dynamical':
        scv.tl.latent_time(adata)
        scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)

    scv.tl.velocity_confidence(adata)
    scv.tl.velocity_pseudotime(adata)

    return adata

Key Output Fields in AnnData

AnnData中的关键输出字段

After running the workflow, the following fields are added:

Location	Key	Description
`adata.layers`	`velocity`	RNA velocity per gene per cell
`adata.layers`	`fit_t`	Fitted latent time per gene per cell
`adata.obsm`	`velocity_umap`	2D velocity vectors on UMAP
`adata.obs`	`velocity_pseudotime`	Pseudotime from velocity
`adata.obs`	`latent_time`	Latent time from dynamical model
`adata.obs`	`velocity_length`	Speed of each cell
`adata.obs`	`velocity_confidence`	Confidence score per cell
`adata.var`	`fit_likelihood`	Gene-level model fit quality
`adata.var`	`fit_alpha`	Transcription rate
`adata.var`	`fit_beta`	Splicing rate
`adata.var`	`fit_gamma`	Degradation rate
`adata.uns`	`velocity_graph`	Cell-cell transition probability matrix

运行流程后，会添加以下字段：

位置	键	描述
`adata.layers`	`velocity`	每个细胞每个基因的RNA velocity
`adata.layers`	`fit_t`	每个细胞每个基因的拟合潜在时间
`adata.obsm`	`velocity_umap`	UMAP上的2D速度向量
`adata.obs`	`velocity_pseudotime`	来自velocity的伪时间
`adata.obs`	`latent_time`	来自动力学模型的潜在时间
`adata.obs`	`velocity_length`	每个细胞的速度
`adata.obs`	`velocity_confidence`	每个细胞的置信度分数
`adata.var`	`fit_likelihood`	基因水平的模型拟合质量
`adata.var`	`fit_alpha`	转录速率
`adata.var`	`fit_beta`	剪接速率
`adata.var`	`fit_gamma`	降解速率
`adata.uns`	`velocity_graph`	细胞间转变概率矩阵

Velocity Models Comparison

速度模型对比

Model	Speed	Accuracy	When to Use
`stochastic`	Fast	Moderate	Exploratory; large datasets
`deterministic`	Medium	Moderate	Simple linear kinetics
`dynamical`	Slow	High	Publication-quality; identifies driver genes

模型	速度	精度	使用场景
`stochastic`	快	中等	探索性分析；大型数据集
`deterministic`	中等	中等	简单线性动力学
`dynamical`	慢	高	发表级分析；识别驱动基因

Best Practices

最佳实践

Start with stochastic mode for exploration; switch to dynamical for final analysis
Need good coverage of unspliced reads: Short reads (< 100 bp) may miss intron coverage
Minimum 2,000 cells: RNA velocity is noisy with fewer cells
Velocity should be coherent: Arrows should follow known biology; randomness indicates issues
k-NN bandwidth matters: Too few neighbors → noisy velocity; too many → oversmoothed
Sanity check: Root cells (progenitors) should have high unspliced/spliced ratios for marker genes
Dynamical model requires distinct kinetic states: Works best for clear differentiation processes

先使用随机模型进行探索；最终分析切换到动力学模型
需要充足的未剪接读段覆盖：短读长（<100bp）可能无法覆盖内含子
至少2000个细胞：细胞数量过少时RNA velocity噪声较大
速度应具有一致性：箭头应符合已知生物学规律；随机性表示存在问题
k-NN带宽很重要：邻居过少→速度噪声大；邻居过多→过度平滑
合理性检查：根细胞（祖细胞）的标记基因应具有高未剪接/已剪接比率
动力学模型需要明确的动力学状态：在清晰的分化过程中效果最佳

Troubleshooting

故障排除

Problem	Solution
Missing unspliced layer	Re-run velocyto or use STARsolo with `--soloFeatures Gene Velocyto`
Very few velocity genes	Lower `min_shared_counts` ; check sequencing depth
Random-looking arrows	Try different `n_neighbors` or velocity model
Memory error with dynamical	Set `n_jobs=1` ; reduce `n_top_genes`
Negative velocity everywhere	Check that spliced/unspliced layers are not swapped

问题	解决方案
缺少未剪接层	重新运行velocyto或使用带 `--soloFeatures Gene Velocyto` 参数的STARsolo
速度基因极少	降低 `min_shared_counts` ；检查测序深度
箭头看起来随机	尝试不同的 `n_neighbors` 或速度模型
动力学模型出现内存错误	设置 `n_jobs=1` ；减少 `n_top_genes`
所有速度均为负值	检查剪接/未剪接层是否被交换

Additional Resources

额外资源

scVelo documentation: https://scvelo.readthedocs.io/
Tutorial notebooks: https://scvelo.readthedocs.io/tutorials/
GitHub: https://github.com/theislab/scvelo
Paper: Bergen V et al. (2020) Nature Biotechnology. PMID: 32747759
velocyto (preprocessing): http://velocyto.org/
CellRank (fate prediction, extends scVelo): https://cellrank.readthedocs.io/
dynamo (metabolic labeling alternative): https://dynamo-release.readthedocs.io/

scVelo文档：https://scvelo.readthedocs.io/
教程笔记本：https://scvelo.readthedocs.io/tutorials/
GitHub：https://github.com/theislab/scvelo
论文：Bergen V等人（2020）《Nature Biotechnology》。PMID: 32747759
velocyto（预处理）：http://velocyto.org/
CellRank（命运预测，扩展scVelo）：https://cellrank.readthedocs.io/
dynamo（代谢标记替代方案）：https://dynamo-release.readthedocs.io/