bulk-rna-seq-deconvolution-with-bulk2single
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBulk RNA-seq deconvolution with Bulk2Single
基于Bulk2Single的批量RNA-seq反卷积分析
Overview
概述
Use this skill when a user wants to reconstruct single-cell profiles from bulk RNA-seq together with a matched reference scRNA-seq atlas. It follows , which demonstrates how to harmonise PDAC bulk replicates, train the beta-VAE generator, and benchmark the output cells against dentate gyrus scRNA-seq.
t_bulk2single.ipynb当用户想要从批量RNA-seq结合匹配的参考单细胞RNA-seq图谱重建单细胞图谱时,可以使用本技能。本技能遵循教程,该教程展示了如何协调PDAC批量重复样本、训练beta-VAE生成器,并将输出细胞与齿状回单细胞RNA-seq进行基准测试。
t_bulk2single.ipynbInstructions
操作步骤
- Load libraries and data
- Import ,
omicverse as ov,scanpy as sc,scvelo as scv, andanndata, then callmatplotlib.pyplot as pltto match omicverse styling.ov.plot_set() - Read the bulk counts table with /
ov.read(...)and harmonise gene identifiers viaov.utils.read(...).ov.bulk.Matrix_ID_mapping(<df>, 'genesets/pair_GRCm39.tsv') - Load the reference scRNA-seq AnnData (e.g., ) and confirm the cluster labels (stored in
scv.datasets.dentategyrus()).adata.obs['clusters']
- Import
- Initialise the Bulk2Single model
- Instantiate .
ov.bulk2single.Bulk2Single(bulk_data=bulk_df, single_data=adata, celltype_key='clusters', bulk_group=['dg_d_1', 'dg_d_2', 'dg_d_3'], top_marker_num=200, ratio_num=1, gpu=0) - Explain GPU selection (forces CPU) and how
gpu=-1names align with column IDs in the bulk matrix.bulk_group
- Instantiate
- Estimate cell fractions
- Call to run the integrated TAPE estimator, then plot stacked bar charts per sample to validate proportions.
model.predicted_fraction() - Encourage saving the fraction table for downstream reporting ().
df.to_csv(...)
- Call
- Preprocess for beta-VAE
- Execute ,
model.bulk_preprocess_lazy(), andmodel.single_preprocess_lazy()to produce matched feature spaces.model.prepare_input() - Clarify that the lazy preprocessing expects raw counts; skip if the user has already log-normalised data and instead provide aligned matrices manually.
- Execute
- Train or load the beta-VAE
- Train with .
model.train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_vae', generate_save_dir='...', generate_save_name='dg') - Mention early stopping via and how to resume by reloading weights with
patience.model.load('.../dg_vae.pth') - Use to monitor convergence.
model.plot_loss()
- Train with
- Generate and filter synthetic cells
- Produce an AnnData using and reduce noise through
model.generate().model.filtered(generate_adata, leiden_size=25) - Store the filtered AnnData () for reuse, noting it contains PCA embeddings in
.write_h5ad.obsm['X_pca']
- Produce an AnnData using
- Benchmark against the reference atlas
- Plot cell-type compositions with for both generated and reference data.
ov.bulk2single.bulk2single_plot_cellprop(...) - Assess correlation using .
ov.bulk2single.bulk2single_plot_correlation(single_data, generate_adata, celltype_key='clusters') - Embed with and visualise via
generate_adata.obsm['X_mde'] = ov.utils.mde(generate_adata.obsm['X_pca']).ov.utils.embedding(..., color=['clusters'], palette=ov.utils.pyomic_palette())
- Plot cell-type compositions with
- Troubleshooting tips
- If marker selection fails, increase or provide a curated marker list.
top_marker_num - Alignment errors typically stem from mismatched names—double-check column IDs in the bulk matrix.
bulk_group - Training on CPU can take several hours; advise switching to an available CUDA device for speed.
gpu
- If marker selection fails, increase
- 加载库与数据
- 导入、
omicverse as ov、scanpy as sc、scvelo as scv和anndata,然后调用matplotlib.pyplot as plt以匹配omicverse的样式。ov.plot_set() - 使用/
ov.read(...)读取批量计数表,并通过ov.utils.read(...)协调基因标识符。ov.bulk.Matrix_ID_mapping(<df>, 'genesets/pair_GRCm39.tsv') - 加载参考单细胞RNA-seq的AnnData(例如),并确认聚类标签(存储在
scv.datasets.dentategyrus()中)。adata.obs['clusters']
- 导入
- 初始化Bulk2Single模型
- 实例化。
ov.bulk2single.Bulk2Single(bulk_data=bulk_df, single_data=adata, celltype_key='clusters', bulk_group=['dg_d_1', 'dg_d_2', 'dg_d_3'], top_marker_num=200, ratio_num=1, gpu=0) - 说明GPU选择(强制使用CPU),以及
gpu=-1名称如何与批量矩阵中的列ID对应。bulk_group
- 实例化
- 估计细胞比例
- 调用运行集成的TAPE估计器,然后绘制每个样本的堆叠条形图以验证比例。
model.predicted_fraction() - 建议保存比例表用于下游报告()。
df.to_csv(...)
- 调用
- beta-VAE预处理
- 执行、
model.bulk_preprocess_lazy()和model.single_preprocess_lazy()以生成匹配的特征空间。model.prepare_input() - 说明惰性预处理需要原始计数数据;如果用户已完成对数归一化,则可跳过此步骤,手动提供对齐后的矩阵。
- 执行
- 训练或加载beta-VAE
- 使用进行训练。
model.train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_vae', generate_save_dir='...', generate_save_name='dg') - 提及通过参数实现早停,以及如何通过重新加载权重(
patience)恢复训练。model.load('.../dg_vae.pth') - 使用监控收敛情况。
model.plot_loss()
- 使用
- 生成并过滤合成细胞
- 使用生成AnnData,并通过
model.generate()降低噪声。model.filtered(generate_adata, leiden_size=25) - 保存过滤后的AnnData()以便重复使用,注意其在
.write_h5ad中包含PCA嵌入。obsm['X_pca']
- 使用
- 与参考图谱进行基准测试
- 使用绘制生成数据和参考数据的细胞类型组成。
ov.bulk2single.bulk2single_plot_cellprop(...) - 使用评估相关性。
ov.bulk2single.bulk2single_plot_correlation(single_data, generate_adata, celltype_key='clusters') - 通过进行嵌入,然后通过
generate_adata.obsm['X_mde'] = ov.utils.mde(generate_adata.obsm['X_pca'])可视化。ov.utils.embedding(..., color=['clusters'], palette=ov.utils.pyomic_palette())
- 使用
- 故障排除提示
- 如果标记基因选择失败,增加或提供 curated的标记基因列表。
top_marker_num - 对齐错误通常源于名称不匹配——请仔细检查批量矩阵中的列ID。
bulk_group - 在CPU上训练可能需要数小时;建议将切换到可用的CUDA设备以提升速度。
gpu
- 如果标记基因选择失败,增加
Examples
示例
- "Estimate cell fractions for PDAC bulk replicates and generate synthetic scRNA-seq using Bulk2Single."
- "Load a pre-trained Bulk2Single model, regenerate cells, and compare cluster proportions to the dentate gyrus atlas."
- "Plot correlation heatmaps between generated cells and reference clusters after filtering noisy synthetic cells."
- "使用Bulk2Single估计PDAC批量重复样本的细胞比例并生成合成单细胞RNA-seq数据。"
- "加载预训练的Bulk2Single模型,重新生成细胞,并将聚类比例与齿状回图谱进行比较。"
- "过滤噪声合成细胞后,绘制生成细胞与参考聚类之间的相关性热图。"
References
参考文献
- Tutorial notebook:
t_bulk2single.ipynb - Example data and weights:
omicverse_guide/docs/Tutorials-bulk2single/data/ - Quick copy/paste commands:
reference.md
- 教程笔记本:
t_bulk2single.ipynb - 示例数据与权重:
omicverse_guide/docs/Tutorials-bulk2single/data/ - 快速复制命令:
reference.md