bulk-rna-seq-deconvolution-with-bulk2single

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Bulk RNA-seq deconvolution with Bulk2Single

基于Bulk2Single的批量RNA-seq反卷积分析

Overview

概述

Use this skill when a user wants to reconstruct single-cell profiles from bulk RNA-seq together with a matched reference scRNA-seq atlas. It follows
t_bulk2single.ipynb
, which demonstrates how to harmonise PDAC bulk replicates, train the beta-VAE generator, and benchmark the output cells against dentate gyrus scRNA-seq.
当用户想要从批量RNA-seq结合匹配的参考单细胞RNA-seq图谱重建单细胞图谱时,可以使用本技能。本技能遵循
t_bulk2single.ipynb
教程,该教程展示了如何协调PDAC批量重复样本、训练beta-VAE生成器,并将输出细胞与齿状回单细胞RNA-seq进行基准测试。

Instructions

操作步骤

  1. Load libraries and data
    • Import
      omicverse as ov
      ,
      scanpy as sc
      ,
      scvelo as scv
      ,
      anndata
      , and
      matplotlib.pyplot as plt
      , then call
      ov.plot_set()
      to match omicverse styling.
    • Read the bulk counts table with
      ov.read(...)
      /
      ov.utils.read(...)
      and harmonise gene identifiers via
      ov.bulk.Matrix_ID_mapping(<df>, 'genesets/pair_GRCm39.tsv')
      .
    • Load the reference scRNA-seq AnnData (e.g.,
      scv.datasets.dentategyrus()
      ) and confirm the cluster labels (stored in
      adata.obs['clusters']
      ).
  2. Initialise the Bulk2Single model
    • Instantiate
      ov.bulk2single.Bulk2Single(bulk_data=bulk_df, single_data=adata, celltype_key='clusters', bulk_group=['dg_d_1', 'dg_d_2', 'dg_d_3'], top_marker_num=200, ratio_num=1, gpu=0)
      .
    • Explain GPU selection (
      gpu=-1
      forces CPU) and how
      bulk_group
      names align with column IDs in the bulk matrix.
  3. Estimate cell fractions
    • Call
      model.predicted_fraction()
      to run the integrated TAPE estimator, then plot stacked bar charts per sample to validate proportions.
    • Encourage saving the fraction table for downstream reporting (
      df.to_csv(...)
      ).
  4. Preprocess for beta-VAE
    • Execute
      model.bulk_preprocess_lazy()
      ,
      model.single_preprocess_lazy()
      , and
      model.prepare_input()
      to produce matched feature spaces.
    • Clarify that the lazy preprocessing expects raw counts; skip if the user has already log-normalised data and instead provide aligned matrices manually.
  5. Train or load the beta-VAE
    • Train with
      model.train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_vae', generate_save_dir='...', generate_save_name='dg')
      .
    • Mention early stopping via
      patience
      and how to resume by reloading weights with
      model.load('.../dg_vae.pth')
      .
    • Use
      model.plot_loss()
      to monitor convergence.
  6. Generate and filter synthetic cells
    • Produce an AnnData using
      model.generate()
      and reduce noise through
      model.filtered(generate_adata, leiden_size=25)
      .
    • Store the filtered AnnData (
      .write_h5ad
      ) for reuse, noting it contains PCA embeddings in
      obsm['X_pca']
      .
  7. Benchmark against the reference atlas
    • Plot cell-type compositions with
      ov.bulk2single.bulk2single_plot_cellprop(...)
      for both generated and reference data.
    • Assess correlation using
      ov.bulk2single.bulk2single_plot_correlation(single_data, generate_adata, celltype_key='clusters')
      .
    • Embed with
      generate_adata.obsm['X_mde'] = ov.utils.mde(generate_adata.obsm['X_pca'])
      and visualise via
      ov.utils.embedding(..., color=['clusters'], palette=ov.utils.pyomic_palette())
      .
  8. Troubleshooting tips
    • If marker selection fails, increase
      top_marker_num
      or provide a curated marker list.
    • Alignment errors typically stem from mismatched
      bulk_group
      names—double-check column IDs in the bulk matrix.
    • Training on CPU can take several hours; advise switching
      gpu
      to an available CUDA device for speed.
  1. 加载库与数据
    • 导入
      omicverse as ov
      scanpy as sc
      scvelo as scv
      anndata
      matplotlib.pyplot as plt
      ,然后调用
      ov.plot_set()
      以匹配omicverse的样式。
    • 使用
      ov.read(...)
      /
      ov.utils.read(...)
      读取批量计数表,并通过
      ov.bulk.Matrix_ID_mapping(<df>, 'genesets/pair_GRCm39.tsv')
      协调基因标识符。
    • 加载参考单细胞RNA-seq的AnnData(例如
      scv.datasets.dentategyrus()
      ),并确认聚类标签(存储在
      adata.obs['clusters']
      中)。
  2. 初始化Bulk2Single模型
    • 实例化
      ov.bulk2single.Bulk2Single(bulk_data=bulk_df, single_data=adata, celltype_key='clusters', bulk_group=['dg_d_1', 'dg_d_2', 'dg_d_3'], top_marker_num=200, ratio_num=1, gpu=0)
    • 说明GPU选择(
      gpu=-1
      强制使用CPU),以及
      bulk_group
      名称如何与批量矩阵中的列ID对应。
  3. 估计细胞比例
    • 调用
      model.predicted_fraction()
      运行集成的TAPE估计器,然后绘制每个样本的堆叠条形图以验证比例。
    • 建议保存比例表用于下游报告(
      df.to_csv(...)
      )。
  4. beta-VAE预处理
    • 执行
      model.bulk_preprocess_lazy()
      model.single_preprocess_lazy()
      model.prepare_input()
      以生成匹配的特征空间。
    • 说明惰性预处理需要原始计数数据;如果用户已完成对数归一化,则可跳过此步骤,手动提供对齐后的矩阵。
  5. 训练或加载beta-VAE
    • 使用
      model.train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_vae', generate_save_dir='...', generate_save_name='dg')
      进行训练。
    • 提及通过
      patience
      参数实现早停,以及如何通过重新加载权重(
      model.load('.../dg_vae.pth')
      )恢复训练。
    • 使用
      model.plot_loss()
      监控收敛情况。
  6. 生成并过滤合成细胞
    • 使用
      model.generate()
      生成AnnData,并通过
      model.filtered(generate_adata, leiden_size=25)
      降低噪声。
    • 保存过滤后的AnnData(
      .write_h5ad
      )以便重复使用,注意其在
      obsm['X_pca']
      中包含PCA嵌入。
  7. 与参考图谱进行基准测试
    • 使用
      ov.bulk2single.bulk2single_plot_cellprop(...)
      绘制生成数据和参考数据的细胞类型组成。
    • 使用
      ov.bulk2single.bulk2single_plot_correlation(single_data, generate_adata, celltype_key='clusters')
      评估相关性。
    • 通过
      generate_adata.obsm['X_mde'] = ov.utils.mde(generate_adata.obsm['X_pca'])
      进行嵌入,然后通过
      ov.utils.embedding(..., color=['clusters'], palette=ov.utils.pyomic_palette())
      可视化。
  8. 故障排除提示
    • 如果标记基因选择失败,增加
      top_marker_num
      或提供 curated的标记基因列表。
    • 对齐错误通常源于
      bulk_group
      名称不匹配——请仔细检查批量矩阵中的列ID。
    • 在CPU上训练可能需要数小时;建议将
      gpu
      切换到可用的CUDA设备以提升速度。

Examples

示例

  • "Estimate cell fractions for PDAC bulk replicates and generate synthetic scRNA-seq using Bulk2Single."
  • "Load a pre-trained Bulk2Single model, regenerate cells, and compare cluster proportions to the dentate gyrus atlas."
  • "Plot correlation heatmaps between generated cells and reference clusters after filtering noisy synthetic cells."
  • "使用Bulk2Single估计PDAC批量重复样本的细胞比例并生成合成单细胞RNA-seq数据。"
  • "加载预训练的Bulk2Single模型,重新生成细胞,并将聚类比例与齿状回图谱进行比较。"
  • "过滤噪声合成细胞后,绘制生成细胞与参考聚类之间的相关性热图。"

References

参考文献

  • Tutorial notebook:
    t_bulk2single.ipynb
  • Example data and weights:
    omicverse_guide/docs/Tutorials-bulk2single/data/
  • Quick copy/paste commands:
    reference.md
  • 教程笔记本:
    t_bulk2single.ipynb
  • 示例数据与权重:
    omicverse_guide/docs/Tutorials-bulk2single/data/
  • 快速复制命令:
    reference.md