scvelo

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

scVelo — RNA Velocity Analysis

scVelo — RNA Velocity分析

Overview

概述

scVelo is the leading Python package for RNA velocity analysis in single-cell RNA-seq data. It infers cell state transitions by modeling the kinetics of mRNA splicing — using the ratio of unspliced (pre-mRNA) to spliced (mature mRNA) abundances to determine whether a gene is being upregulated or downregulated in each cell. This allows reconstruction of developmental trajectories and identification of cell fate decisions without requiring time-course data.
Installation:
pip install scvelo
Key resources:
scVelo是用于单细胞RNA-seq数据RNA velocity分析的领先Python包。它通过建模mRNA剪接动力学来推断细胞状态转变——利用未剪接(前体mRNA)与已剪接(成熟mRNA)的丰度比,判断每个细胞中基因是上调还是下调。这使得无需时间序列数据即可重建发育轨迹并识别细胞命运决策。
安装:
pip install scvelo
关键资源:

When to Use This Skill

何时使用该工具

Use scVelo when:
  • Trajectory inference from snapshot data: Determine which direction cells are differentiating
  • Cell fate prediction: Identify progenitor cells and their downstream fates
  • Driver gene identification: Find genes whose dynamics best explain observed trajectories
  • Developmental biology: Model hematopoiesis, neurogenesis, epithelial-to-mesenchymal transitions
  • Latent time estimation: Order cells along a pseudotime derived from splicing dynamics
  • Complement to Scanpy: Add directional information to UMAP embeddings
在以下场景使用scVelo:
  • 从快照数据推断轨迹:确定细胞分化的方向
  • 细胞命运预测:识别祖细胞及其下游命运
  • 驱动基因识别:找出其动力学最能解释观测轨迹的基因
  • 发育生物学:建模造血作用、神经发生、上皮-间质转化
  • 潜在时间估计:根据剪接动力学对细胞进行伪时间排序
  • 补充Scanpy:为UMAP嵌入添加方向信息

Prerequisites

前提条件

scVelo requires count matrices for both unspliced and spliced RNA. These are generated by:
  1. STARsolo or kallisto|bustools with
    lamanno
    mode
  2. velocyto CLI:
    velocyto run10x
    /
    velocyto run
  3. alevin-fry / simpleaf with spliced/unspliced output
Data is stored in an
AnnData
object with
layers["spliced"]
and
layers["unspliced"]
.
scVelo需要未剪接已剪接RNA的计数矩阵。可通过以下方式生成:
  1. STARsolokallisto|bustools
    lamanno
    模式
  2. velocyto命令行工具:
    velocyto run10x
    /
    velocyto run
  3. alevin-fry / simpleaf的剪接/未剪接输出
数据存储在
AnnData
对象中,包含
layers["spliced"]
layers["unspliced"]

Standard RNA Velocity Workflow

标准RNA Velocity分析流程

1. Setup and Data Loading

1. 设置与数据加载

python
import scvelo as scv
import scanpy as sc
import numpy as np
import matplotlib.pyplot as plt
python
import scvelo as scv
import scanpy as sc
import numpy as np
import matplotlib.pyplot as plt

Configure settings

配置设置

scv.settings.verbosity = 3 # Show computation steps scv.settings.presenter_view = True scv.settings.set_figure_params('scvelo')
scv.settings.verbosity = 3 # 显示计算步骤 scv.settings.presenter_view = True scv.settings.set_figure_params('scvelo')

Load data (AnnData with spliced/unspliced layers)

加载数据(包含剪接/未剪接层的AnnData)

Option A: Load from loom (velocyto output)

选项A:从loom文件加载(velocyto输出)

adata = scv.read("cellranger_output.loom", cache=True)
adata = scv.read("cellranger_output.loom", cache=True)

Option B: Merge velocyto loom with Scanpy-processed AnnData

选项B:合并velocyto loom文件与Scanpy处理后的AnnData

adata_processed = sc.read_h5ad("processed.h5ad") # Has UMAP, clusters adata_velocity = scv.read("velocyto.loom") adata = scv.utils.merge(adata_processed, adata_velocity)
adata_processed = sc.read_h5ad("processed.h5ad") # 包含UMAP、聚类信息 adata_velocity = scv.read("velocyto.loom") adata = scv.utils.merge(adata_processed, adata_velocity)

Verify layers

验证层信息

print(adata)
print(adata)

obs × var: N × G

obs × var: N × G

layers: 'spliced', 'unspliced' (required)

layers: 'spliced', 'unspliced'(必需)

obsm['X_umap'] (required for visualization)

obsm['X_umap'](可视化必需)

undefined
undefined

2. Preprocessing

2. 预处理

python
undefined
python
undefined

Filter and normalize (follows Scanpy conventions)

过滤与标准化(遵循Scanpy规范)

scv.pp.filter_and_normalize( adata, min_shared_counts=20, # Minimum counts in spliced+unspliced n_top_genes=2000 # Top highly variable genes )
scv.pp.filter_and_normalize( adata, min_shared_counts=20, # 剪接+未剪接的最小计数 n_top_genes=2000 # 高可变基因数量 )

Compute first and second order moments (means and variances)

计算一阶和二阶矩(均值与方差)

knn_connectivities must be computed first

必须先计算knn_connectivities

sc.pp.neighbors(adata, n_neighbors=30, n_pcs=30) scv.pp.moments( adata, n_pcs=30, n_neighbors=30 )
undefined
sc.pp.neighbors(adata, n_neighbors=30, n_pcs=30) scv.pp.moments( adata, n_pcs=30, n_neighbors=30 )
undefined

3. Velocity Estimation — Stochastic Model

3. 速度估计——随机模型

The stochastic model is fast and suitable for exploratory analysis:
python
undefined
随机模型速度快,适合探索性分析:
python
undefined

Stochastic velocity (faster, less accurate)

随机速度模型(更快,精度较低)

scv.tl.velocity(adata, mode='stochastic') scv.tl.velocity_graph(adata)
scv.tl.velocity(adata, mode='stochastic') scv.tl.velocity_graph(adata)

Visualize

可视化

scv.pl.velocity_embedding_stream( adata, basis='umap', color='leiden', title="RNA Velocity (Stochastic)" )
undefined
scv.pl.velocity_embedding_stream( adata, basis='umap', color='leiden', title="RNA Velocity (随机模型)" )
undefined

4. Velocity Estimation — Dynamical Model (Recommended)

4. 速度估计——动力学模型(推荐)

The dynamical model fits the full splicing kinetics and is more accurate:
python
undefined
动力学模型拟合完整的剪接动力学,精度更高:
python
undefined

Recover dynamics (computationally intensive; ~10-30 min for 10K cells)

恢复动力学(计算密集;10K细胞约需10-30分钟)

scv.tl.recover_dynamics(adata, n_jobs=4)
scv.tl.recover_dynamics(adata, n_jobs=4)

Compute velocity from dynamical model

基于动力学模型计算速度

scv.tl.velocity(adata, mode='dynamical') scv.tl.velocity_graph(adata)
undefined
scv.tl.velocity(adata, mode='dynamical') scv.tl.velocity_graph(adata)
undefined

5. Latent Time

5. 潜在时间

The dynamical model enables computation of a shared latent time (pseudotime):
python
undefined
动力学模型可计算共享的潜在时间(伪时间):
python
undefined

Compute latent time

计算潜在时间

scv.tl.latent_time(adata)
scv.tl.latent_time(adata)

Visualize latent time on UMAP

在UMAP上可视化潜在时间

scv.pl.scatter( adata, color='latent_time', color_map='gnuplot', size=80, title='Latent time' )
scv.pl.scatter( adata, color='latent_time', color_map='gnuplot', size=80, title='潜在时间' )

Identify top genes ordered by latent time

按潜在时间排序识别顶级基因

top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300] scv.pl.heatmap( adata, var_names=top_genes, sortby='latent_time', col_color='leiden', n_convolve=100 )
undefined
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300] scv.pl.heatmap( adata, var_names=top_genes, sortby='latent_time', col_color='leiden', n_convolve=100 )
undefined

6. Driver Gene Analysis

6. 驱动基因分析

python
undefined
python
undefined

Identify genes with highest velocity fit

识别速度拟合度最高的基因

scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3) df = scv.DataFrame(adata.uns['rank_velocity_genes']['names']) print(df.head(10))
scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3) df = scv.DataFrame(adata.uns['rank_velocity_genes']['names']) print(df.head(10))

Speed and coherence

速度与一致性

scv.tl.velocity_confidence(adata) scv.pl.scatter( adata, c=['velocity_length', 'velocity_confidence'], cmap='coolwarm', perc=[5, 95] )
scv.tl.velocity_confidence(adata) scv.pl.scatter( adata, c=['velocity_length', 'velocity_confidence'], cmap='coolwarm', perc=[5, 95] )

Phase portraits for specific genes

特定基因的相位图

scv.pl.velocity(adata, ['Cpe', 'Gnao1', 'Ins2'], ncols=3, figsize=(16, 4))
undefined
scv.pl.velocity(adata, ['Cpe', 'Gnao1', 'Ins2'], ncols=3, figsize=(16, 4))
undefined

7. Velocity Arrows and Pseudotime

7. 速度箭头与伪时间

python
undefined
python
undefined

Arrow plot on UMAP

UMAP上的箭头图

scv.pl.velocity_embedding( adata, arrow_length=3, arrow_size=2, color='leiden', basis='umap' )
scv.pl.velocity_embedding( adata, arrow_length=3, arrow_size=2, color='leiden', basis='umap' )

Stream plot (cleaner visualization)

流图(更清晰的可视化)

scv.pl.velocity_embedding_stream( adata, basis='umap', color='leiden', smooth=0.8, min_mass=4 )
scv.pl.velocity_embedding_stream( adata, basis='umap', color='leiden', smooth=0.8, min_mass=4 )

Velocity pseudotime (alternative to latent time)

速度伪时间(潜在时间的替代方案)

scv.tl.velocity_pseudotime(adata) scv.pl.scatter(adata, color='velocity_pseudotime', cmap='gnuplot')
undefined
scv.tl.velocity_pseudotime(adata) scv.pl.scatter(adata, color='velocity_pseudotime', cmap='gnuplot')
undefined

8. PAGA Trajectory Graph

8. PAGA轨迹图

python
undefined
python
undefined

PAGA graph with velocity-informed transitions

包含速度信息的PAGA图

scv.tl.paga(adata, groups='leiden') df = scv.get_df(adata, 'paga/transitions_confidence', precision=2).T df.style.background_gradient(cmap='Blues').format('{:.2g}')
scv.tl.paga(adata, groups='leiden') df = scv.get_df(adata, 'paga/transitions_confidence', precision=2).T df.style.background_gradient(cmap='Blues').format('{:.2g}')

Plot PAGA with velocity

绘制带速度的PAGA图

scv.pl.paga( adata, basis='umap', size=50, alpha=0.1, min_edge_width=2, node_size_scale=1.5 )
undefined
scv.pl.paga( adata, basis='umap', size=50, alpha=0.1, min_edge_width=2, node_size_scale=1.5 )
undefined

Complete Workflow Script

完整流程脚本

python
import scvelo as scv
import scanpy as sc

def run_rna_velocity(adata, n_top_genes=2000, mode='dynamical', n_jobs=4):
    """
    Complete RNA velocity workflow.

    Args:
        adata: AnnData with 'spliced' and 'unspliced' layers, UMAP in obsm
        n_top_genes: Number of top HVGs for velocity
        mode: 'stochastic' (fast) or 'dynamical' (accurate)
        n_jobs: Parallel jobs for dynamical model

    Returns:
        Processed AnnData with velocity information
    """
    scv.settings.verbosity = 2

    # 1. Preprocessing
    scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=n_top_genes)

    if 'neighbors' not in adata.uns:
        sc.pp.neighbors(adata, n_neighbors=30)

    scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

    # 2. Velocity estimation
    if mode == 'dynamical':
        scv.tl.recover_dynamics(adata, n_jobs=n_jobs)

    scv.tl.velocity(adata, mode=mode)
    scv.tl.velocity_graph(adata)

    # 3. Downstream analyses
    if mode == 'dynamical':
        scv.tl.latent_time(adata)
        scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)

    scv.tl.velocity_confidence(adata)
    scv.tl.velocity_pseudotime(adata)

    return adata
python
import scvelo as scv
import scanpy as sc

def run_rna_velocity(adata, n_top_genes=2000, mode='dynamical', n_jobs=4):
    """
    完整的RNA velocity分析流程。

    参数:
        adata: 包含'spliced'和'unspliced'层、UMAP存储在obsm中的AnnData对象
        n_top_genes: 用于速度分析的高可变基因数量
        mode: 'stochastic'(快速)或'dynamical'(高精度)
        n_jobs: 动力学模型的并行任务数

    返回:
        包含速度信息的处理后AnnData对象
    """
    scv.settings.verbosity = 2

    # 1. 预处理
    scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=n_top_genes)

    if 'neighbors' not in adata.uns:
        sc.pp.neighbors(adata, n_neighbors=30)

    scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

    # 2. 速度估计
    if mode == 'dynamical':
        scv.tl.recover_dynamics(adata, n_jobs=n_jobs)

    scv.tl.velocity(adata, mode=mode)
    scv.tl.velocity_graph(adata)

    # 3. 下游分析
    if mode == 'dynamical':
        scv.tl.latent_time(adata)
        scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)

    scv.tl.velocity_confidence(adata)
    scv.tl.velocity_pseudotime(adata)

    return adata

Key Output Fields in AnnData

AnnData中的关键输出字段

After running the workflow, the following fields are added:
LocationKeyDescription
adata.layers
velocity
RNA velocity per gene per cell
adata.layers
fit_t
Fitted latent time per gene per cell
adata.obsm
velocity_umap
2D velocity vectors on UMAP
adata.obs
velocity_pseudotime
Pseudotime from velocity
adata.obs
latent_time
Latent time from dynamical model
adata.obs
velocity_length
Speed of each cell
adata.obs
velocity_confidence
Confidence score per cell
adata.var
fit_likelihood
Gene-level model fit quality
adata.var
fit_alpha
Transcription rate
adata.var
fit_beta
Splicing rate
adata.var
fit_gamma
Degradation rate
adata.uns
velocity_graph
Cell-cell transition probability matrix
运行流程后,会添加以下字段:
位置描述
adata.layers
velocity
每个细胞每个基因的RNA velocity
adata.layers
fit_t
每个细胞每个基因的拟合潜在时间
adata.obsm
velocity_umap
UMAP上的2D速度向量
adata.obs
velocity_pseudotime
来自velocity的伪时间
adata.obs
latent_time
来自动力学模型的潜在时间
adata.obs
velocity_length
每个细胞的速度
adata.obs
velocity_confidence
每个细胞的置信度分数
adata.var
fit_likelihood
基因水平的模型拟合质量
adata.var
fit_alpha
转录速率
adata.var
fit_beta
剪接速率
adata.var
fit_gamma
降解速率
adata.uns
velocity_graph
细胞间转变概率矩阵

Velocity Models Comparison

速度模型对比

ModelSpeedAccuracyWhen to Use
stochastic
FastModerateExploratory; large datasets
deterministic
MediumModerateSimple linear kinetics
dynamical
SlowHighPublication-quality; identifies driver genes
模型速度精度使用场景
stochastic
中等探索性分析;大型数据集
deterministic
中等中等简单线性动力学
dynamical
发表级分析;识别驱动基因

Best Practices

最佳实践

  • Start with stochastic mode for exploration; switch to dynamical for final analysis
  • Need good coverage of unspliced reads: Short reads (< 100 bp) may miss intron coverage
  • Minimum 2,000 cells: RNA velocity is noisy with fewer cells
  • Velocity should be coherent: Arrows should follow known biology; randomness indicates issues
  • k-NN bandwidth matters: Too few neighbors → noisy velocity; too many → oversmoothed
  • Sanity check: Root cells (progenitors) should have high unspliced/spliced ratios for marker genes
  • Dynamical model requires distinct kinetic states: Works best for clear differentiation processes
  • 先使用随机模型进行探索;最终分析切换到动力学模型
  • 需要充足的未剪接读段覆盖:短读长(<100bp)可能无法覆盖内含子
  • 至少2000个细胞:细胞数量过少时RNA velocity噪声较大
  • 速度应具有一致性:箭头应符合已知生物学规律;随机性表示存在问题
  • k-NN带宽很重要:邻居过少→速度噪声大;邻居过多→过度平滑
  • 合理性检查:根细胞(祖细胞)的标记基因应具有高未剪接/已剪接比率
  • 动力学模型需要明确的动力学状态:在清晰的分化过程中效果最佳

Troubleshooting

故障排除

ProblemSolution
Missing unspliced layerRe-run velocyto or use STARsolo with
--soloFeatures Gene Velocyto
Very few velocity genesLower
min_shared_counts
; check sequencing depth
Random-looking arrowsTry different
n_neighbors
or velocity model
Memory error with dynamicalSet
n_jobs=1
; reduce
n_top_genes
Negative velocity everywhereCheck that spliced/unspliced layers are not swapped
问题解决方案
缺少未剪接层重新运行velocyto或使用带
--soloFeatures Gene Velocyto
参数的STARsolo
速度基因极少降低
min_shared_counts
;检查测序深度
箭头看起来随机尝试不同的
n_neighbors
或速度模型
动力学模型出现内存错误设置
n_jobs=1
;减少
n_top_genes
所有速度均为负值检查剪接/未剪接层是否被交换

Additional Resources

额外资源