single-cell-annotation-skills-with-omicverse

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Single-cell annotation skills with omicverse

基于omicverse的单细胞注释技能

Overview

概述

Use this skill to reproduce and adapt the single-cell annotation playbook captured in omicverse tutorials: SCSA

t_cellanno.ipynb

, MetaTiME

t_metatime.ipynb

, CellVote

t_cellvote.md

t_cellvote_pbmc3k.ipynb

, CellMatch

t_cellmatch.ipynb

, GPTAnno

t_gptanno.ipynb

, and label transfer

t_anno_trans.ipynb

. Each section below highlights required inputs, training/inference steps, and how to read the outputs.

使用本技能可复现并调整omicverse教程中记录的单细胞注释流程：SCSA（对应

t_cellanno.ipynb

）、MetaTiME（对应

t_metatime.ipynb

）、CellVote（对应

t_cellvote.md

和

t_cellvote_pbmc3k.ipynb

）、CellMatch（对应

t_cellmatch.ipynb

）、GPTAnno（对应

t_gptanno.ipynb

）以及标签迁移（对应

t_anno_trans.ipynb

）。以下每个部分都会重点说明所需输入、训练/推理步骤以及如何解读输出结果。

Instructions

操作步骤

SCSA automated cluster annotation
- Data requirements: PBMC3k raw counts from 10x Genomics (
```
pbmc3k_filtered_gene_bc_matrices.tar.gz
```
  ) or the processed
```
sample/rna.h5ad
```
  . Download instructions are embedded in the notebook; unpack to
```
data/filtered_gene_bc_matrices/hg19/
```
  . Ensure an SCSA SQLite database is available (e.g.
```
pySCSA_2024_v1_plus.db
```
  from the Figshare/Drive links listed in the tutorial) and point
```
model_path
```
  to its location.
- Preprocessing & model fit: Load with
```
sc.read_10x_mtx
```
  , run QC (
```
ov.pp.qc
```
  ), normalization and HVG selection (
```
ov.pp.preprocess
```
  ), scaling (
```
ov.pp.scale
```
  ), PCA (
```
ov.pp.pca
```
  ), neighbors, Leiden clustering, and compute rank markers (
```
sc.tl.rank_genes_groups
```
  ). Instantiate
```
scsa = ov.single.pySCSA(...)
```
  choosing
```
target='cellmarker'
```
  or
```
'panglaodb'
```
  , tissue scope, and thresholds (
```
foldchange
```
  ,
```
pvalue
```
  ).
- Inference & interpretation: Call
```
scsa.cell_anno(clustertype='leiden', result_key='scsa_celltype_cellmarker')
```
  or
```
scsa.cell_auto_anno
```
  to append predictions to
```
adata.obs
```
  . Compare to manual marker-based labels via
```
ov.utils.embedding
```
  or
```
sc.pl.dotplot
```
  , inspect marker dictionaries (
```
ov.single.get_celltype_marker
```
  ), and query supported tissues with
```
scsa.get_model_tissue()
```
  . Use the ROI/ROE helpers (
```
ov.utils.roe
```
  ,
```
ov.utils.plot_cellproportion
```
  ) to validate abundance trends.
MetaTiME tumour microenvironment states
- Data requirements: Batched TME AnnData with an scVI latent embedding. The tutorial uses
```
TiME_adata_scvi.h5ad
```
  from Figshare (
```
https://figshare.com/ndownloader/files/41440050
```
  ). If starting from counts, run scVI (
```
scvi.model.SCVI
```
  ) first to populate
```
adata.obsm['X_scVI']
```
  .
- Preprocessing & model fit: Optionally subset to non-malignant cells via
```
adata.obs['isTME']
```
  . Rebuild neighbors on the latent representation (
```
sc.pp.neighbors(adata, use_rep="X_scVI")
```
  ) and embed with pymde (
```
adata.obsm['X_mde'] = ov.utils.mde(...)
```
  ). Initialise
```
TiME_object = ov.single.MetaTiME(adata, mode='table')
```
  and, if finer granularity is desired, over-cluster with
```
TiME_object.overcluster(resolution=8, clustercol='overcluster')
```
  .
- Inference & interpretation: Run
```
TiME_object.predictTiME(save_obs_name='MetaTiME')
```
  to assign minor states and
```
Major_MetaTiME
```
  . Visualise using
```
TiME_object.plot
```
  or
```
sc.pl.embedding
```
  . Interpret the outputs by comparing cluster-level distributions and confirming that MetaTiME and Major_MetaTiME columns align with expected niches.
CellVote consensus labelling
- Data requirements: A clustered AnnData (e.g. PBMC3k stored as
```
CELLVOTE_PBMC3K
```
  env var or
```
data/pbmc3k.h5ad
```
  ) plus at least two precomputed annotation columns (simulated in the tutorial as
```
scsa_annotation
```
  ,
```
gpt_celltype
```
  ,
```
gbi_celltype
```
  ). Prepare per-cluster marker genes via
```
sc.tl.rank_genes_groups
```
  .
- Preprocessing & model fit: After standard preprocessing (normalize, log1p, HVGs, PCA, neighbors, Leiden) build a marker dictionary
```
marker_dict = top_markers_from_rgg(adata, 'leiden', topn=10)
```
  or via
```
ov.single.get_celltype_marker
```
  . Instantiate
```
cv = ov.single.CellVote(adata)
```
  .
- Inference & interpretation: Call
```
cv.vote(clusters_key='leiden', cluster_markers=marker_dict, celltype_keys=[...], species='human', organization='PBMC', provider='openai', model='gpt-4o-mini')
```
  . Offline examples monkey-patch arbitration to avoid API calls; online voting requires valid credentials. Final consensus labels live in
```
adata.obs['CellVote_celltype']
```
  . Compare each cluster’s majority vote with the input sources (
```
adata.obs[['leiden', 'scsa_annotation', ...]]
```
  ) to justify decisions.
CellMatch ontology mapping
- Data requirements: Annotated AnnData such as
```
pertpy.dt.haber_2017_regions()
```
  with
```
adata.obs['cell_label']
```
  . Download Cell Ontology JSON (
```
cl.json
```
  ) via
```
ov.single.download_cl(...)
```
  or manual links, and optionally Cell Taxonomy resources (
```
Cell_Taxonomy_resource.txt
```
  ). Ensure access to a SentenceTransformer model (
```
sentence-transformers/all-MiniLM-L6-v2
```
  ,
```
BAAI/bge-base-en-v1.5
```
  , etc.), downloading to
```
local_model_dir
```
  if offline.
- Preprocessing & model fit: Create the mapper with
```
ov.single.CellOntologyMapper(cl_obo_file='new_ontology/cl.json', model_name='sentence-transformers/all-MiniLM-L6-v2', local_model_dir='./my_models')
```
  . Run
```
mapper.map_adata(...)
```
  to assign ontology-derived labels/IDs, optionally enabling taxonomy matching (
```
use_taxonomy=True
```
  after calling
```
load_cell_taxonomy_resource
```
  ).
- Inference & interpretation: Explore mapping summaries (
```
mapper.print_mapping_summary_taxonomy
```
  ) and inspect embeddings coloured by
```
cell_ontology
```
  ,
```
cell_ontology_cl_id
```
  , or
```
enhanced_cell_ontology
```
  . Use helper queries such as
```
mapper.find_similar_cells('T helper cell')
```
  ,
```
mapper.get_cell_info(...)
```
  , and category browsing to validate ontology coverage.
GPTAnno LLM-powered annotation
- Data requirements: The same PBMC3k dataset (raw matrix or
```
.h5ad
```
  ) and cluster assignments. Access to an LLM endpoint—configure
```
AGI_API_KEY
```
  for OpenAI-compatible providers (
```
provider='openai'
```
  ,
```
'qwen'
```
  ,
```
'kimi'
```
  , etc.), or supply a local model path for
```
ov.single.gptcelltype_local
```
  .
- Preprocessing & model fit: Follow the QC, normalization, HVG, scaling, PCA, neighbor, Leiden, and marker discovery steps described above (reusing outputs from the SCSA workflow). Build the marker dictionary automatically with
```
ov.single.get_celltype_marker(adata, clustertype='leiden', rank=True, key='rank_genes_groups', foldchange=2, topgenenumber=5)
```
  .
- Inference & interpretation: Invoke
```
ov.single.gptcelltype(...)
```
  specifying tissue/species context and desired provider/model. Post-process responses to keep clean labels (
```
result[key].split(': ')[-1]...
```
  ) and write them to
```
adata.obs['gpt_celltype']
```
  . Compare embeddings (
```
ov.pl.embedding(..., color=['leiden','gpt_celltype'])
```
  ) to verify cluster identities. If operating offline, call
```
ov.single.gptcelltype_local
```
  with a downloaded instruction-tuned checkpoint.
Weighted KNN annotation transfer
- Data requirements: Cross-modal GLUE outputs with aligned embeddings, e.g.
```
data/analysis_lymph/rna-emb.h5ad
```
  (annotated RNA) and
```
data/analysis_lymph/atac-emb.h5ad
```
  (query ATAC) where both contain
```
obsm['X_glue']
```
  .
- Preprocessing & model fit: Load both modalities, optionally concatenate for QC plots, and compute a shared low-dimensional embedding with
```
ov.utils.mde
```
  . Train a neighbour model using
```
ov.utils.weighted_knn_trainer(train_adata=rna, train_adata_emb='X_glue', n_neighbors=15)
```
  .
- Inference & interpretation: Transfer labels via
```
labels, uncert = ov.utils.weighted_knn_transfer(query_adata=atac, query_adata_emb='X_glue', label_keys='major_celltype', knn_model=knn_transformer, ref_adata_obs=rna.obs)
```
  . Store predictions in
```
atac.obs['transf_celltype']
```
  and uncertainties in
```
atac.obs['transf_celltype_unc']
```
  ; copy to
```
major_celltype
```
  if you want consistent naming. Visualise (
```
ov.utils.embedding
```
  ) and inspect uncertainty to flag ambiguous cells.

SCSA自动化聚类注释
- 数据要求：来自10x Genomics的PBMC3k原始计数数据（
```
pbmc3k_filtered_gene_bc_matrices.tar.gz
```
  ）或已处理的
```
sample/rna.h5ad
```
  文件。下载说明已嵌入notebook中；解压至
```
data/filtered_gene_bc_matrices/hg19/
```
  路径下。确保SCSA SQLite数据库可用（例如教程中列出的Figshare/网盘链接中的
```
pySCSA_2024_v1_plus.db
```
  ），并将
```
model_path
```
  指向该数据库的位置。
- 预处理与模型拟合：使用
```
sc.read_10x_mtx
```
  加载数据，运行QC（
```
ov.pp.qc
```
  ）、归一化和高变基因（HVG）筛选（
```
ov.pp.preprocess
```
  ）、标准化（
```
ov.pp.scale
```
  ）、PCA（
```
ov.pp.pca
```
  ）、构建邻居图、Leiden聚类，然后计算差异标记基因（
```
sc.tl.rank_genes_groups
```
  ）。实例化
```
scsa = ov.single.pySCSA(...)
```
  ，选择
```
target='cellmarker'
```
  或
```
'panglaodb'
```
  、组织范围以及阈值（
```
foldchange
```
  、
```
pvalue
```
  ）。
- 推理与解读：调用
```
scsa.cell_anno(clustertype='leiden', result_key='scsa_celltype_cellmarker')
```
  或
```
scsa.cell_auto_anno
```
  将预测结果添加到
```
adata.obs
```
  中。通过
```
ov.utils.embedding
```
  或
```
sc.pl.dotplot
```
  与基于手动标记基因的标签进行比较，查看标记基因字典（
```
ov.single.get_celltype_marker
```
  ），并使用
```
scsa.get_model_tissue()
```
  查询支持的组织类型。使用ROI/ROE辅助工具（
```
ov.utils.roe
```
  、
```
ov.utils.plot_cellproportion
```
  ）验证细胞丰度趋势。
MetaTiME肿瘤微环境状态分析
- 数据要求：带有scVI潜在嵌入的批量肿瘤微环境（TME）AnnData数据。教程使用Figshare（
```
https://figshare.com/ndownloader/files/41440050
```
  ）上的
```
TiME_adata_scvi.h5ad
```
  文件。如果从计数数据开始，需先运行scVI（
```
scvi.model.SCVI
```
  ）以生成
```
adata.obsm['X_scVI']
```
  。
- 预处理与模型拟合：可选择通过
```
adata.obs['isTME']
```
  筛选出非恶性细胞。基于潜在嵌入重建邻居图（
```
sc.pp.neighbors(adata, use_rep="X_scVI")
```
  ），并使用pymde进行嵌入（
```
adata.obsm['X_mde'] = ov.utils.mde(...)
```
  ）。初始化
```
TiME_object = ov.single.MetaTiME(adata, mode='table')
```
  ，如果需要更精细的粒度，可使用
```
TiME_object.overcluster(resolution=8, clustercol='overcluster')
```
  进行过度聚类。
- 推理与解读：运行
```
TiME_object.predictTiME(save_obs_name='MetaTiME')
```
  以分配次要状态和
```
Major_MetaTiME
```
  。使用
```
TiME_object.plot
```
  或
```
sc.pl.embedding
```
  进行可视化。通过比较聚类水平的分布情况，并确认MetaTiME和Major_MetaTiME列与预期的生态位一致来解读输出结果。
CellVote共识标注
- 数据要求：已完成聚类的AnnData数据（例如存储为环境变量
```
CELLVOTE_PBMC3K
```
  或
```
data/pbmc3k.h5ad
```
  的PBMC3k数据），以及至少两个预计算的注释列（教程中模拟为
```
scsa_annotation
```
  、
```
gpt_celltype
```
  、
```
gbi_celltype
```
  ）。通过
```
sc.tl.rank_genes_groups
```
  准备每个聚类的标记基因。
- 预处理与模型拟合：完成标准预处理（归一化、log1p转换、HVG筛选、PCA、构建邻居图、Leiden聚类）后，构建标记基因字典
```
marker_dict = top_markers_from_rgg(adata, 'leiden', topn=10)
```
  或通过
```
ov.single.get_celltype_marker
```
  生成。实例化
```
cv = ov.single.CellVote(adata)
```
  。
- 推理与解读：调用
```
cv.vote(clusters_key='leiden', cluster_markers=marker_dict, celltype_keys=[...], species='human', organization='PBMC', provider='openai', model='gpt-4o-mini')
```
  。离线示例通过修补仲裁逻辑避免调用API；在线投票需要有效的凭证。最终的共识标签存储在
```
adata.obs['CellVote_celltype']
```
  中。比较每个聚类的多数投票结果与输入来源（
```
adata.obs[['leiden', 'scsa_annotation', ...]]
```
  ）以验证决策合理性。
CellMatch本体映射
- 数据要求：已注释的AnnData数据，例如
```
pertpy.dt.haber_2017_regions()
```
  ，其中包含
```
adata.obs['cell_label']
```
  。通过
```
ov.single.download_cl(...)
```
  或手动链接下载细胞本体JSON文件（
```
cl.json
```
  ），也可选择下载细胞分类资源（
```
Cell_Taxonomy_resource.txt
```
  ）。确保可访问SentenceTransformer模型（
```
sentence-transformers/all-MiniLM-L6-v2
```
  、
```
BAAI/bge-base-en-v1.5
```
  等），如果离线使用，需下载到
```
local_model_dir
```
  目录。
- 预处理与模型拟合：使用
```
ov.single.CellOntologyMapper(cl_obo_file='new_ontology/cl.json', model_name='sentence-transformers/all-MiniLM-L6-v2', local_model_dir='./my_models')
```
  创建映射器。运行
```
mapper.map_adata(...)
```
  分配基于本体的标签/ID，若需要可启用分类匹配（调用
```
load_cell_taxonomy_resource
```
  后设置
```
use_taxonomy=True
```
  ）。
- 推理与解读：查看映射摘要（
```
mapper.print_mapping_summary_taxonomy
```
  ），并按
```
cell_ontology
```
  、
```
cell_ontology_cl_id
```
  或
```
enhanced_cell_ontology
```
  对嵌入结果进行着色可视化。使用辅助查询工具，如
```
mapper.find_similar_cells('T helper cell')
```
  、
```
mapper.get_cell_info(...)
```
  以及类别浏览来验证本体覆盖范围。
GPTAnno大语言模型驱动的注释
- 数据要求：相同的PBMC3k数据集（原始矩阵或
```
.h5ad
```
  文件）以及聚类结果。可访问大语言模型（LLM）端点——为OpenAI兼容提供商配置
```
AGI_API_KEY
```
  （
```
provider='openai'
```
  、
```
'qwen'
```
  、
```
'kimi'
```
  等），或为
```
ov.single.gptcelltype_local
```
  提供本地模型路径。
- 预处理与模型拟合：遵循上述QC、归一化、HVG筛选、标准化、PCA、构建邻居图、Leiden聚类以及标记基因发现步骤（可复用SCSA工作流的输出结果）。通过
```
ov.single.get_celltype_marker(adata, clustertype='leiden', rank=True, key='rank_genes_groups', foldchange=2, topgenenumber=5)
```
  自动构建标记基因字典。
- 推理与解读：调用
```
ov.single.gptcelltype(...)
```
  并指定组织/物种上下文以及所需的提供商/模型。对响应结果进行后处理以保留清晰的标签（
```
result[key].split(': ')[-1]...
```
  ），并将其写入
```
adata.obs['gpt_celltype']
```
  。通过可视化嵌入结果（
```
ov.pl.embedding(..., color=['leiden','gpt_celltype'])
```
  ）验证聚类身份。如果离线操作，可调用
```
ov.single.gptcelltype_local
```
  并使用已下载的指令微调模型 checkpoint。
加权KNN注释迁移
- 数据要求：经过GLUE整合的跨组学输出数据，带有对齐的嵌入，例如
```
data/analysis_lymph/rna-emb.h5ad
```
  （已注释的RNA数据）和
```
data/analysis_lymph/atac-emb.h5ad
```
  （待查询的ATAC数据），两者均包含
```
obsm['X_glue']
```
  。
- 预处理与模型拟合：加载两种组学数据，可选择合并以生成QC图，并使用
```
ov.utils.mde
```
  计算共享的低维嵌入。使用
```
ov.utils.weighted_knn_trainer(train_adata=rna, train_adata_emb='X_glue', n_neighbors=15)
```
  训练邻居模型。
- 推理与解读：通过
```
labels, uncert = ov.utils.weighted_knn_transfer(query_adata=atac, query_adata_emb='X_glue', label_keys='major_celltype', knn_model=knn_transformer, ref_adata_obs=rna.obs)
```
  迁移标签。将预测结果存储在
```
atac.obs['transf_celltype']
```
  中，不确定性存储在
```
atac.obs['transf_celltype_unc']
```
  中；如果需要统一命名，可将其复制到
```
major_celltype
```
  列。通过可视化（
```
ov.utils.embedding
```
  ）并查看不确定性来标记模糊细胞。

Critical API Reference - EXACT Function Signatures

关键API参考 - 精确函数签名

pySCSA - IMPORTANT: Parameter is

clustertype

, NOT

cluster

pySCSA - 重要提示：参数为

clustertype

，而非

cluster

CORRECT usage:

python

undefined

正确用法：

python

undefined

Step 1: Initialize pySCSA

scsa = ov.single.pySCSA( adata, foldchange=1.5, pvalue=0.01, species='Human', tissue='All', target='cellmarker' # or 'panglaodb' )

Step 2: Run annotation - NOTE: use clustertype='leiden', NOT cluster='leiden'!

anno_result = scsa.cell_anno(clustertype='leiden', cluster='all')

Step 3: Add cell type labels to adata.obs

scsa.cell_auto_anno(adata, clustertype='leiden', key='scsa_celltype')

Results are stored in adata.obs['scsa_celltype']


**WRONG - DO NOT USE:**
```python


**错误用法 - 请勿使用：**
```python

WRONG! 'cluster' is NOT a valid parameter for cell_auto_anno!

scsa.cell_auto_anno(adata, cluster='leiden') # ERROR!

undefined

undefined

COSG Marker Genes - Results stored in adata.uns, NOT adata.obs

COSG标记基因 - 结果存储在adata.uns中，而非adata.obs

CORRECT usage:

python

undefined

正确用法：

python

undefined

Step 1: Run COSG marker gene identification

ov.single.cosg(adata, groupby='leiden', n_genes_user=50)

Step 2: Access results from adata.uns (NOT adata.obs!)

marker_names = adata.uns['rank_genes_groups']['names'] # DataFrame with cluster columns marker_scores = adata.uns['rank_genes_groups']['scores']

Step 3: Get top markers for specific cluster

cluster_0_markers = adata.uns['rank_genes_groups']['names']['0'][:10].tolist()

Step 4: To create celltype column, manually map clusters to cell types

cluster_to_celltype = { '0': 'T cells', '1': 'B cells', '2': 'Monocytes', } adata.obs['cosg_celltype'] = adata.obs['leiden'].map(cluster_to_celltype)


**WRONG - DO NOT USE:**
```python

cluster_to_celltype = { '0': 'T cells', '1': 'B cells', '2': 'Monocytes', } adata.obs['cosg_celltype'] = adata.obs['leiden'].map(cluster_to_celltype)


**错误用法 - 请勿使用：**
```python

WRONG! COSG does NOT create adata.obs columns directly!

adata.obs['cosg_celltype'] # This key does NOT exist after running COSG!

adata.uns['cosg_celltype'] # This key also does NOT exist!

undefined

undefined

Common Pitfalls to Avoid

需避免的常见陷阱

pySCSA parameter confusion:
- ```
clustertype
```
  = which obs column contains cluster labels (e.g., 'leiden')
- ```
cluster
```
  = which specific clusters to annotate ('all' or specific cluster IDs)
- These are DIFFERENT parameters!
COSG result access:
- COSG is a marker gene finder, NOT a cell type annotator
- Results are per-cluster gene rankings stored in
```
adata.uns['rank_genes_groups']
```
- To assign cell types, you must manually map clusters to cell types based on markers
Result storage patterns in OmicVerse:
- Cell type annotations →
```
adata.obs['<key>']
```
- Marker gene results →
```
adata.uns['<key>']
```
  (includes 'names', 'scores', 'logfoldchanges')
- Differential expression →
```
adata.uns['rank_genes_groups']
```

pySCSA参数混淆：
- ```
clustertype
```
  = 存储聚类标签的obs列名称（例如'leiden'）
- ```
cluster
```
  = 要注释的特定聚类（'all'或特定聚类ID）
- 这是两个不同的参数！
COSG结果访问：
- COSG是标记基因查找工具，而非细胞类型注释工具
- 结果为每个聚类的基因排名，存储在
```
adata.uns['rank_genes_groups']
```
  中
- 要分配细胞类型，必须基于标记基因手动将聚类映射到细胞类型
OmicVerse中的结果存储模式：
- 细胞类型注释 →
```
adata.obs['<key>']
```
- 标记基因结果 →
```
adata.uns['<key>']
```
  （包含'names'、'scores'、'logfoldchanges'）
- 差异表达分析 →
```
adata.uns['rank_genes_groups']
```

Examples

示例

"Run SCSA with both CellMarker and PanglaoDB references on PBMC3k, then benchmark against manual marker assignments before feeding the results into CellVote."
"Annotate tumour microenvironment states in the MetaTiME Figshare dataset, highlight Major_MetaTiME classes, and export the label distribution per patient."
"Download Cell Ontology resources, map
```
haber_2017_regions
```
clusters to ontology terms, and enrich ambiguous clusters using Cell Taxonomy hints."
"Propagate RNA-derived
```
major_celltype
```
labels onto GLUE-integrated ATAC cells and report clusters with high transfer uncertainty."

"在PBMC3k数据集上使用CellMarker和PanglaoDB参考数据库运行SCSA，然后与手动标记基因分配的结果进行基准测试，再将结果输入到CellVote中。"
"注释MetaTiME Figshare数据集中的肿瘤微环境状态，突出显示Major_MetaTiME类别，并导出每个患者的标签分布情况。"
"下载细胞本体资源，将
```
haber_2017_regions
```
聚类映射到本体术语，并使用细胞分类提示信息丰富模糊聚类的注释。"
"将RNA数据中的
```
major_celltype
```
标签迁移到经过GLUE整合的ATAC细胞上，并报告迁移不确定性较高的聚类。"

References

参考文献

Tutorials and notebooks:

t_cellanno.ipynb

t_metatime.ipynb

t_cellvote.md

t_cellvote_pbmc3k.ipynb

t_cellmatch.ipynb

t_gptanno.ipynb

t_anno_trans.ipynb

Sample data & assets: PBMC3k matrix from 10x Genomics, MetaTiME
```
TiME_adata_scvi.h5ad
```
(Figshare), SCSA database downloads, GLUE embeddings under
```
data/analysis_lymph/
```
, Cell Ontology
```
cl.json
```
, and Cell Taxonomy resource.
Quick copy commands:
```
reference.md
```
.

教程与notebook：

t_cellanno.ipynb

、

t_metatime.ipynb

、

t_cellvote.md

、

t_cellvote_pbmc3k.ipynb

、

t_cellmatch.ipynb

、

t_gptanno.ipynb

、

t_anno_trans.ipynb

。

样本数据与资源：来自10x Genomics的PBMC3k矩阵、Figshare上的MetaTiME
```
TiME_adata_scvi.h5ad
```
文件、SCSA数据库下载链接、
```
data/analysis_lymph/
```
下的GLUE嵌入数据、细胞本体
```
cl.json
```
文件以及细胞分类资源。
快速复制命令：
```
reference.md
```
。

single-cell-annotation-skills-with-omicverse

Original

Translation

Single-cell annotation skills with omicverse

基于omicverse的单细胞注释技能

Overview

概述

Instructions

操作步骤

Critical API Reference - EXACT Function Signatures

关键API参考 - 精确函数签名

pySCSA - IMPORTANT: Parameter is clustertype, NOT cluster

pySCSA - 重要提示：参数为clustertype，而非cluster

Step 1: Initialize pySCSA

Step 1: Initialize pySCSA

Step 2: Run annotation - NOTE: use clustertype='leiden', NOT cluster='leiden'!

Step 2: Run annotation - NOTE: use clustertype='leiden', NOT cluster='leiden'!

Step 3: Add cell type labels to adata.obs

Step 3: Add cell type labels to adata.obs

Results are stored in adata.obs['scsa_celltype']

Results are stored in adata.obs['scsa_celltype']

WRONG! 'cluster' is NOT a valid parameter for cell_auto_anno!

WRONG! 'cluster' is NOT a valid parameter for cell_auto_anno!

scsa.cell_auto_anno(adata, cluster='leiden') # ERROR!

scsa.cell_auto_anno(adata, cluster='leiden') # ERROR!

COSG Marker Genes - Results stored in adata.uns, NOT adata.obs

COSG标记基因 - 结果存储在adata.uns中，而非adata.obs

Step 1: Run COSG marker gene identification

Step 1: Run COSG marker gene identification

Step 2: Access results from adata.uns (NOT adata.obs!)

Step 2: Access results from adata.uns (NOT adata.obs!)

Step 3: Get top markers for specific cluster

Step 3: Get top markers for specific cluster

Step 4: To create celltype column, manually map clusters to cell types

Step 4: To create celltype column, manually map clusters to cell types

WRONG! COSG does NOT create adata.obs columns directly!

WRONG! COSG does NOT create adata.obs columns directly!

adata.obs['cosg_celltype'] # This key does NOT exist after running COSG!

adata.obs['cosg_celltype'] # This key does NOT exist after running COSG!

adata.uns['cosg_celltype'] # This key also does NOT exist!

adata.uns['cosg_celltype'] # This key also does NOT exist!

Common Pitfalls to Avoid

需避免的常见陷阱

Examples

示例

References

参考文献

pySCSA - IMPORTANT: Parameter is
`clustertype`
, NOT
`cluster`

pySCSA - 重要提示：参数为
`clustertype`
，而非
`cluster`