tooluniverse-spatial-transcriptomics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSpatial Transcriptomics Analysis
空间转录组数据分析
Comprehensive analysis of spatially-resolved transcriptomics data to understand gene expression patterns in tissue architecture context. Combines expression profiling with spatial coordinates to reveal tissue organization, cell-cell interactions, and spatially variable genes.
对空间分辨转录组数据进行全面分析,以了解组织结构背景下的基因表达模式。将表达谱分析与空间坐标相结合,揭示组织排列、细胞间相互作用以及空间可变基因。
When to Use This Skill
何时使用该技能
Triggers:
- User has spatial transcriptomics data (Visium, MERFISH, seqFISH, etc.)
- Questions about tissue architecture or spatial organization
- Spatial gene expression pattern analysis
- Cell-cell proximity or neighborhood analysis requests
- Tumor microenvironment spatial structure questions
- Integration of spatial with single-cell data
- Spatial domain identification
- Tissue morphology correlation with expression
Example Questions This Skill Solves:
- "Analyze this 10x Visium dataset to identify spatial domains"
- "Which genes show spatially variable expression in this tissue?"
- "Map the tumor microenvironment spatial organization"
- "Find genes enriched at tissue boundaries"
- "Identify cell-cell interactions based on spatial proximity"
- "Integrate spatial transcriptomics with scRNA-seq annotations"
- "Characterize spatial gradients in gene expression"
- "Map ligand-receptor pairs in tissue context"
触发场景:
- 用户拥有空间转录组数据(Visium、MERFISH、seqFISH等)
- 关于组织架构或空间排列的问题
- 空间基因表达模式分析需求
- 细胞间邻近性或微环境分析请求
- 肿瘤微环境空间结构相关问题
- 空间数据与单细胞数据的整合需求
- 空间区域识别需求
- 组织形态与表达的相关性分析
该技能可解决的示例问题:
- "分析这个10x Visium数据集以识别空间区域"
- "该组织中哪些基因表现出空间可变表达?"
- "绘制肿瘤微环境的空间排列"
- "找出在组织边界处富集的基因"
- "基于空间邻近性识别细胞间相互作用"
- "将空间转录组数据与scRNA-seq注释整合"
- "表征基因表达的空间梯度"
- "在组织背景下绘制配体-受体对"
Core Capabilities
核心能力
| Capability | Description |
|---|---|
| Data Import | 10x Visium, MERFISH, seqFISH, Slide-seq, STARmap, Xenium formats |
| Quality Control | Spot/cell QC, spatial alignment verification, tissue coverage |
| Normalization | Spatial-aware normalization accounting for tissue heterogeneity |
| Spatial Clustering | Identify spatial domains with similar expression profiles |
| Spatial Variable Genes | Find genes with non-random spatial patterns |
| Neighborhood Analysis | Cell-cell proximity, spatial neighborhoods, niche identification |
| Spatial Patterns | Gradients, boundaries, hotspots, expression waves |
| Integration | Merge with scRNA-seq for cell type mapping |
| Ligand-Receptor Spatial | Map cell communication in tissue context |
| Visualization | Spatial plots, heatmaps on tissue, 3D reconstruction |
| 能力 | 描述 |
|---|---|
| 数据导入 | 支持10x Visium、MERFISH、seqFISH、Slide-seq、STARmap、Xenium格式 |
| 质量控制 | 斑点/细胞质量控制、空间对齐验证、组织覆盖度分析 |
| 标准化 | 考虑组织异质性的空间感知标准化 |
| 空间聚类 | 识别具有相似表达谱的空间区域 |
| 空间可变基因 | 寻找具有非随机空间模式的基因 |
| 微环境分析 | 细胞间邻近性、空间微环境、生态位识别 |
| 空间模式 | 梯度、边界、热点、表达波 |
| 数据整合 | 与scRNA-seq合并以进行细胞类型映射 |
| 空间配体-受体分析 | 在组织背景下绘制细胞通讯 |
| 可视化 | 空间图、组织上的热图、3D重建 |
Workflow Overview
工作流程概述
Input: Spatial Transcriptomics Data + Tissue Image
|
v
Phase 1: Data Import & QC
|-- Load spatial coordinates + expression matrix
|-- Load tissue histology image
|-- Quality control per spot/cell
|-- Filter low-quality spots
|-- Align spatial coordinates to tissue
|
v
Phase 2: Preprocessing
|-- Normalization (spatial-aware methods)
|-- Highly variable gene selection
|-- Dimensionality reduction (PCA)
|-- Spatial lag smoothing (optional)
|
v
Phase 3: Spatial Clustering
|-- Identify spatial domains/regions
|-- Graph-based clustering with spatial constraints
|-- Annotate domains with marker genes
|-- Visualize domains on tissue
|
v
Phase 4: Spatial Variable Genes
|-- Test for spatial autocorrelation (Moran's I, Geary's C)
|-- Identify genes with spatial patterns
|-- Classify pattern types (gradient, hotspot, boundary)
|-- Rank by spatial significance
|
v
Phase 5: Neighborhood Analysis
|-- Define spatial neighborhoods (k-NN, radius)
|-- Calculate neighborhood composition
|-- Identify interaction zones
|-- Niche characterization
|
v
Phase 6: Integration with scRNA-seq
|-- Cell type deconvolution per spot
|-- Map cell types to spatial locations
|-- Predict cell type spatial distributions
|-- Validate with marker genes
|
v
Phase 7: Spatial Cell Communication
|-- Identify proximal cell type pairs
|-- Query ligand-receptor database (OmniPath)
|-- Score spatial interactions
|-- Map communication hotspots
|
v
Phase 8: Generate Spatial Report
|-- Tissue overview with domains
|-- Spatially variable genes
|-- Cell type spatial maps
|-- Interaction networks in tissue context
|-- 3D visualization (if applicable)Input: Spatial Transcriptomics Data + Tissue Image
|
v
Phase 1: Data Import & QC
|-- Load spatial coordinates + expression matrix
|-- Load tissue histology image
|-- Quality control per spot/cell
|-- Filter low-quality spots
|-- Align spatial coordinates to tissue
|
v
Phase 2: Preprocessing
|-- Normalization (spatial-aware methods)
|-- Highly variable gene selection
|-- Dimensionality reduction (PCA)
|-- Spatial lag smoothing (optional)
|
v
Phase 3: Spatial Clustering
|-- Identify spatial domains/regions
|-- Graph-based clustering with spatial constraints
|-- Annotate domains with marker genes
|-- Visualize domains on tissue
|
v
Phase 4: Spatial Variable Genes
|-- Test for spatial autocorrelation (Moran's I, Geary's C)
|-- Identify genes with spatial patterns
|-- Classify pattern types (gradient, hotspot, boundary)
|-- Rank by spatial significance
|
v
Phase 5: Neighborhood Analysis
|-- Define spatial neighborhoods (k-NN, radius)
|-- Calculate neighborhood composition
|-- Identify interaction zones
|-- Niche characterization
|
v
Phase 6: Integration with scRNA-seq
|-- Cell type deconvolution per spot
|-- Map cell types to spatial locations
|-- Predict cell type spatial distributions
|-- Validate with marker genes
|
v
Phase 7: Spatial Cell Communication
|-- Identify proximal cell type pairs
|-- Query ligand-receptor database (OmniPath)
|-- Score spatial interactions
|-- Map communication hotspots
|
v
Phase 8: Generate Spatial Report
|-- Tissue overview with domains
|-- Spatially variable genes
|-- Cell type spatial maps
|-- Interaction networks in tissue context
|-- 3D visualization (if applicable)Phase Details
阶段详情
Phase 1: Data Import & Quality Control
阶段1: 数据导入与质量控制
Objective: Load spatial data and assess quality.
Supported platforms:
10x Visium (most common):
- Spots: 55μm diameter, ~50 cells per spot
- Resolution: ~5,000-10,000 spots per capture area
- Data: Expression matrix + spatial coordinates + H&E image
MERFISH/seqFISH (imaging-based):
- Single-cell resolution
- Targeted gene panels (100-10,000 genes)
- Absolute coordinates per cell
Slide-seq/Slide-seqV2:
- 10μm bead resolution
- Genome-wide profiling
Xenium (10x single-cell spatial):
- Single-cell resolution
- Large gene panels (300+ genes)
- Subcellular resolution
Data loading (Visium):
python
def load_visium_data(data_dir):
"""
Load 10x Visium spatial transcriptomics data.
Expected structure:
data_dir/
├── filtered_feature_bc_matrix/
│ ├── barcodes.tsv.gz
│ ├── features.tsv.gz
│ └── matrix.mtx.gz
├── spatial/
│ ├── tissue_positions_list.csv
│ ├── scalefactors_json.json
│ └── tissue_hires_image.png
Returns: AnnData object with spatial coordinates
"""
import scanpy as sc
import pandas as pd
# Load expression data
adata = sc.read_visium(data_dir)
# Spatial coordinates are in adata.obsm['spatial']
# Tissue image in adata.uns['spatial']
return adataQuality Control:
- Spot-level QC:
python
def spatial_qc(adata):
"""
Quality control for spatial transcriptomics data.
"""
import scanpy as sc
# Calculate QC metrics
sc.pp.calculate_qc_metrics(adata, inplace=True)
# Visualize QC metrics spatially
sc.pl.spatial(adata, color='n_genes_by_counts', title='Genes per Spot')
sc.pl.spatial(adata, color='total_counts', title='UMI Counts per Spot')
# Filter criteria
# - Min 200 genes per spot
# - Min 500 UMI counts per spot
# - Max mitochondrial content < 20%
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_cells(adata, min_counts=500)
# Mitochondrial filtering
adata.var['mt'] = adata.var_names.str.startswith('MT-')
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True)
adata = adata[adata.obs['pct_counts_mt'] < 20].copy()
return adata- Spatial alignment verification:
python
def verify_spatial_alignment(adata):
"""
Verify spatial coordinates align with tissue image.
"""
import matplotlib.pyplot as plt
# Plot spots on tissue image
fig, ax = plt.subplots(figsize=(10, 10))
# Tissue image
img = adata.uns['spatial']['tissue_hires_image']
ax.imshow(img)
# Overlay spot coordinates
coords = adata.obsm['spatial']
ax.scatter(coords[:, 0], coords[:, 1], c='red', s=1, alpha=0.5)
ax.set_title('Spatial Alignment Verification')
plt.axis('off')目标: 加载空间数据并评估质量。
支持的平台:
10x Visium(最常用):
- 斑点:直径55μm,每个斑点约含50个细胞
- 分辨率:每个捕获区域约5,000-10,000个斑点
- 数据:表达矩阵 + 空间坐标 + H&E图像
MERFISH/seqFISH(基于成像):
- 单细胞分辨率
- 靶向基因面板(100-10,000个基因)
- 每个细胞的绝对坐标
Slide-seq/Slide-seqV2:
- 10μm磁珠分辨率
- 全基因组分析
Xenium(10x单细胞空间平台):
- 单细胞分辨率
- 大型基因面板(300+基因)
- 亚细胞分辨率
数据加载(Visium):
python
def load_visium_data(data_dir):
"""
Load 10x Visium spatial transcriptomics data.
Expected structure:
data_dir/
├── filtered_feature_bc_matrix/
│ ├── barcodes.tsv.gz
│ ├── features.tsv.gz
│ └── matrix.mtx.gz
├── spatial/
│ ├── tissue_positions_list.csv
│ ├── scalefactors_json.json
│ └── tissue_hires_image.png
Returns: AnnData object with spatial coordinates
"""
import scanpy as sc
import pandas as pd
# Load expression data
adata = sc.read_visium(data_dir)
# Spatial coordinates are in adata.obsm['spatial']
# Tissue image in adata.uns['spatial']
return adata质量控制:
- 斑点级质量控制:
python
def spatial_qc(adata):
"""
Quality control for spatial transcriptomics data.
"""
import scanpy as sc
# Calculate QC metrics
sc.pp.calculate_qc_metrics(adata, inplace=True)
# Visualize QC metrics spatially
sc.pl.spatial(adata, color='n_genes_by_counts', title='Genes per Spot')
sc.pl.spatial(adata, color='total_counts', title='UMI Counts per Spot')
# Filter criteria
# - Min 200 genes per spot
# - Min 500 UMI counts per spot
# - Max mitochondrial content < 20%
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_cells(adata, min_counts=500)
# Mitochondrial filtering
adata.var['mt'] = adata.var_names.str.startswith('MT-')
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True)
adata = adata[adata.obs['pct_counts_mt'] < 20].copy()
return adata- 空间对齐验证:
python
def verify_spatial_alignment(adata):
"""
Verify spatial coordinates align with tissue image.
"""
import matplotlib.pyplot as plt
# Plot spots on tissue image
fig, ax = plt.subplots(figsize=(10, 10))
# Tissue image
img = adata.uns['spatial']['tissue_hires_image']
ax.imshow(img)
# Overlay spot coordinates
coords = adata.obsm['spatial']
ax.scatter(coords[:, 0], coords[:, 1], c='red', s=1, alpha=0.5)
ax.set_title('Spatial Alignment Verification')
plt.axis('off')Phase 2: Preprocessing & Normalization
阶段2: 预处理与标准化
Objective: Normalize data accounting for spatial heterogeneity.
Normalization:
python
def normalize_spatial(adata):
"""
Normalize spatial transcriptomics data.
"""
import scanpy as sc
# Filter genes (min 3 spots)
sc.pp.filter_genes(adata, min_cells=3)
# Normalize to median total counts
sc.pp.normalize_total(adata, target_sum=1e4)
# Log-transform
sc.pp.log1p(adata)
# Store raw counts
adata.raw = adata
return adataHighly variable genes:
python
def select_hvg_spatial(adata):
"""
Select highly variable genes for spatial analysis.
"""
import scanpy as sc
# Standard HVG selection
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
# Optionally: weight by spatial autocorrelation
# Genes with spatial patterns are more informative
return adataSpatial smoothing (optional):
python
def spatial_smooth(adata, radius=2):
"""
Smooth expression by averaging over spatial neighbors.
Useful for noisy data, but can blur boundaries.
"""
from sklearn.neighbors import NearestNeighbors
# Find spatial neighbors
coords = adata.obsm['spatial']
nn = NearestNeighbors(n_neighbors=radius, metric='euclidean')
nn.fit(coords)
distances, indices = nn.kneighbors(coords)
# Smooth expression matrix
X_smooth = adata.X.copy()
for i in range(adata.n_obs):
neighbors = indices[i]
X_smooth[i] = adata.X[neighbors].mean(axis=0)
adata.layers['smoothed'] = X_smooth
return adata目标: 针对空间异质性对数据进行标准化。
标准化:
python
def normalize_spatial(adata):
"""
Normalize spatial transcriptomics data.
"""
import scanpy as sc
# Filter genes (min 3 spots)
sc.pp.filter_genes(adata, min_cells=3)
# Normalize to median total counts
sc.pp.normalize_total(adata, target_sum=1e4)
# Log-transform
sc.pp.log1p(adata)
# Store raw counts
adata.raw = adata
return adata高可变基因:
python
def select_hvg_spatial(adata):
"""
Select highly variable genes for spatial analysis.
"""
import scanpy as sc
# Standard HVG selection
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
# Optionally: weight by spatial autocorrelation
# Genes with spatial patterns are more informative
return adata空间平滑(可选):
python
def spatial_smooth(adata, radius=2):
"""
Smooth expression by averaging over spatial neighbors.
Useful for noisy data, but can blur boundaries.
"""
from sklearn.neighbors import NearestNeighbors
# Find spatial neighbors
coords = adata.obsm['spatial']
nn = NearestNeighbors(n_neighbors=radius, metric='euclidean')
nn.fit(coords)
distances, indices = nn.kneighbors(coords)
# Smooth expression matrix
X_smooth = adata.X.copy()
for i in range(adata.n_obs):
neighbors = indices[i]
X_smooth[i] = adata.X[neighbors].mean(axis=0)
adata.layers['smoothed'] = X_smooth
return adataPhase 3: Spatial Clustering
阶段3: 空间聚类
Objective: Identify spatial domains (regions with distinct expression).
Graph-based clustering with spatial constraints:
python
def spatial_clustering(adata, n_neighbors=6):
"""
Cluster spots into spatial domains.
Uses both expression similarity AND spatial proximity.
"""
import scanpy as sc
import squidpy as sq
# PCA for dimensionality reduction
sc.pp.pca(adata, n_comps=50)
# Build spatial neighbor graph
sq.gr.spatial_neighbors(adata, coord_type='generic', n_neighs=n_neighbors)
# Clustering with spatial constraints
# Uses both PCA space and spatial graph
sc.tl.leiden(adata, resolution=1.0, key_added='spatial_domain')
# Visualize domains on tissue
sc.pl.spatial(adata, color='spatial_domain', title='Spatial Domains')
return adataDomain marker genes:
python
def find_domain_markers(adata):
"""
Identify marker genes for each spatial domain.
"""
import scanpy as sc
# Differential expression per domain
sc.tl.rank_genes_groups(adata, groupby='spatial_domain', method='wilcoxon')
# Get top markers per domain
markers = sc.get.rank_genes_groups_df(adata, group=None)
return markers目标: 识别空间区域(具有不同表达特征的区域)。
带空间约束的图聚类:
python
def spatial_clustering(adata, n_neighbors=6):
"""
Cluster spots into spatial domains.
Uses both expression similarity AND spatial proximity.
"""
import scanpy as sc
import squidpy as sq
# PCA for dimensionality reduction
sc.pp.pca(adata, n_comps=50)
# Build spatial neighbor graph
sq.gr.spatial_neighbors(adata, coord_type='generic', n_neighs=n_neighbors)
# Clustering with spatial constraints
# Uses both PCA space and spatial graph
sc.tl.leiden(adata, resolution=1.0, key_added='spatial_domain')
# Visualize domains on tissue
sc.pl.spatial(adata, color='spatial_domain', title='Spatial Domains')
return adata区域标记基因:
python
def find_domain_markers(adata):
"""
Identify marker genes for each spatial domain.
"""
import scanpy as sc
# Differential expression per domain
sc.tl.rank_genes_groups(adata, groupby='spatial_domain', method='wilcoxon')
# Get top markers per domain
markers = sc.get.rank_genes_groups_df(adata, group=None)
return markersPhase 4: Spatially Variable Genes
阶段4: 空间可变基因
Objective: Find genes with non-random spatial patterns.
Moran's I (spatial autocorrelation):
python
def identify_spatial_genes(adata):
"""
Test for spatial autocorrelation using Moran's I.
Moran's I > 0: Positive spatial autocorrelation (clustering)
Moran's I ~ 0: Random spatial distribution
Moran's I < 0: Negative autocorrelation (checkerboard)
"""
import squidpy as sq
# Calculate Moran's I for all genes
sq.gr.spatial_autocorr(
adata,
mode='moran',
n_perms=100,
n_jobs=-1
)
# Results in adata.uns['moranI']
spatial_genes = adata.uns['moranI'].sort_values('I', ascending=False)
# Filter significant spatial genes (FDR < 0.05)
sig_spatial = spatial_genes[spatial_genes['pval_norm_fdr_bh'] < 0.05]
return sig_spatialSpatial pattern classification:
python
def classify_spatial_patterns(adata, spatial_genes):
"""
Classify types of spatial patterns.
Pattern types:
- Gradient: Smooth directional change
- Hotspot: Localized high expression
- Boundary: Expression at domain edges
- Periodic: Regular spacing
"""
patterns = {}
for gene in spatial_genes.index[:100]: # Top 100 spatial genes
# Get expression and coordinates
expr = adata[:, gene].X.toarray().flatten()
coords = adata.obsm['spatial']
# Detect pattern type
pattern_type = detect_pattern_type(expr, coords)
patterns[gene] = pattern_type
return patterns目标: 寻找具有非随机空间模式的基因。
Moran's I(空间自相关):
python
def identify_spatial_genes(adata):
"""
Test for spatial autocorrelation using Moran's I.
Moran's I > 0: Positive spatial autocorrelation (clustering)
Moran's I ~ 0: Random spatial distribution
Moran's I < 0: Negative autocorrelation (checkerboard)
"""
import squidpy as sq
# Calculate Moran's I for all genes
sq.gr.spatial_autocorr(
adata,
mode='moran',
n_perms=100,
n_jobs=-1
)
# Results in adata.uns['moranI']
spatial_genes = adata.uns['moranI'].sort_values('I', ascending=False)
# Filter significant spatial genes (FDR < 0.05)
sig_spatial = spatial_genes[spatial_genes['pval_norm_fdr_bh'] < 0.05]
return sig_spatial空间模式分类:
python
def classify_spatial_patterns(adata, spatial_genes):
"""
Classify types of spatial patterns.
Pattern types:
- Gradient: Smooth directional change
- Hotspot: Localized high expression
- Boundary: Expression at domain edges
- Periodic: Regular spacing
"""
patterns = {}
for gene in spatial_genes.index[:100]: # Top 100 spatial genes
# Get expression and coordinates
expr = adata[:, gene].X.toarray().flatten()
coords = adata.obsm['spatial']
# Detect pattern type
pattern_type = detect_pattern_type(expr, coords)
patterns[gene] = pattern_type
return patternsPhase 5: Neighborhood Analysis
阶段5: 微环境分析
Objective: Analyze cell-cell proximity and spatial niches.
Define spatial neighborhoods:
python
def analyze_neighborhoods(adata, radius=150):
"""
Analyze spatial neighborhood composition.
For each spot, characterize its microenvironment.
"""
import squidpy as sq
# Calculate neighborhood enrichment
# Tests if cell types are enriched in proximity
sq.gr.nhood_enrichment(adata, cluster_key='spatial_domain')
# Visualize neighborhood enrichment
sq.pl.nhood_enrichment(adata, cluster_key='spatial_domain')
# Results: which domains are spatially proximal?
return adataInteraction zones:
python
def identify_interaction_zones(adata, domain_a, domain_b):
"""
Find boundary regions between two spatial domains.
These are hotspots for cell-cell interactions.
"""
# Get spots from each domain
spots_a = adata.obs['spatial_domain'] == domain_a
spots_b = adata.obs['spatial_domain'] == domain_b
# Find spots that neighbor the other domain
# (spots from A that have neighbors in B)
coords = adata.obsm['spatial']
from sklearn.neighbors import NearestNeighbors
nn = NearestNeighbors(n_neighbors=6)
nn.fit(coords)
distances, indices = nn.kneighbors(coords)
interaction_spots = []
for i, spot_in_a in enumerate(spots_a):
if spot_in_a:
neighbors = indices[i]
if any(spots_b[neighbors]):
interaction_spots.append(i)
# Mark interaction zone
adata.obs['interaction_zone'] = False
adata.obs.loc[interaction_spots, 'interaction_zone'] = True
return adata目标: 分析细胞间邻近性和空间生态位。
定义空间微环境:
python
def analyze_neighborhoods(adata, radius=150):
"""
Analyze spatial neighborhood composition.
For each spot, characterize its microenvironment.
"""
import squidpy as sq
# Calculate neighborhood enrichment
# Tests if cell types are enriched in proximity
sq.gr.nhood_enrichment(adata, cluster_key='spatial_domain')
# Visualize neighborhood enrichment
sq.pl.nhood_enrichment(adata, cluster_key='spatial_domain')
# Results: which domains are spatially proximal?
return adata相互作用区域:
python
def identify_interaction_zones(adata, domain_a, domain_b):
"""
Find boundary regions between two spatial domains.
These are hotspots for cell-cell interactions.
"""
# Get spots from each domain
spots_a = adata.obs['spatial_domain'] == domain_a
spots_b = adata.obs['spatial_domain'] == domain_b
# Find spots that neighbor the other domain
# (spots from A that have neighbors in B)
coords = adata.obsm['spatial']
from sklearn.neighbors import NearestNeighbors
nn = NearestNeighbors(n_neighbors=6)
nn.fit(coords)
distances, indices = nn.kneighbors(coords)
interaction_spots = []
for i, spot_in_a in enumerate(spots_a):
if spot_in_a:
neighbors = indices[i]
if any(spots_b[neighbors]):
interaction_spots.append(i)
# Mark interaction zone
adata.obs['interaction_zone'] = False
adata.obs.loc[interaction_spots, 'interaction_zone'] = True
return adataPhase 6: Integration with Single-Cell RNA-seq
阶段6: 与单细胞RNA-seq的整合
Objective: Map cell types from scRNA-seq to spatial locations.
Cell type deconvolution:
python
def deconvolve_cell_types(adata_spatial, adata_sc):
"""
Predict cell type composition per spatial spot.
Uses scRNA-seq reference to deconvolve Visium spots.
Methods: Cell2location, Tangram, SPOTlight
"""
import cell2location
# Prepare single-cell reference
# Extract signature genes per cell type
cell_type_signatures = extract_signatures(adata_sc)
# Run cell2location
# Estimates cell type abundances per spot
mod = cell2location.models.Cell2location(
adata_spatial,
cell_state_df=cell_type_signatures
)
mod.train(max_epochs=30000)
# Add cell type proportions to adata_spatial
adata_spatial.obsm['cell_type_fractions'] = mod.get_cell_type_fractions()
return adata_spatialSpatial cell type mapping:
python
def map_cell_types_spatial(adata):
"""
Visualize cell type spatial distributions.
"""
import scanpy as sc
# For each cell type, plot abundance on tissue
cell_types = adata.obsm['cell_type_fractions'].columns
for ct in cell_types:
sc.pl.spatial(
adata,
color=adata.obsm['cell_type_fractions'][ct],
title=f'{ct} Spatial Distribution'
)目标: 将scRNA-seq中的细胞类型映射到空间位置。
细胞类型反卷积:
python
def deconvolve_cell_types(adata_spatial, adata_sc):
"""
Predict cell type composition per spatial spot.
Uses scRNA-seq reference to deconvolve Visium spots.
Methods: Cell2location, Tangram, SPOTlight
"""
import cell2location
# Prepare single-cell reference
# Extract signature genes per cell type
cell_type_signatures = extract_signatures(adata_sc)
# Run cell2location
# Estimates cell type abundances per spot
mod = cell2location.models.Cell2location(
adata_spatial,
cell_state_df=cell_type_signatures
)
mod.train(max_epochs=30000)
# Add cell type proportions to adata_spatial
adata_spatial.obsm['cell_type_fractions'] = mod.get_cell_type_fractions()
return adata_spatial空间细胞类型映射:
python
def map_cell_types_spatial(adata):
"""
Visualize cell type spatial distributions.
"""
import scanpy as sc
# For each cell type, plot abundance on tissue
cell_types = adata.obsm['cell_type_fractions'].columns
for ct in cell_types:
sc.pl.spatial(
adata,
color=adata.obsm['cell_type_fractions'][ct],
title=f'{ct} Spatial Distribution'
)Phase 7: Spatial Cell Communication
阶段7: 空间细胞通讯
Objective: Map ligand-receptor interactions in tissue context.
Spatial proximity-based communication:
python
def spatial_cell_communication(adata):
"""
Identify cell-cell communication based on spatial proximity.
Requires:
- Cell type annotations (from deconvolution)
- Ligand-receptor database (OmniPath)
"""
import squidpy as sq
from tooluniverse import ToolUniverse
tu = ToolUniverse()
# Get ligand-receptor pairs from OmniPath
lr_pairs = tu.run_one_function({
"name": "OmniPath_get_ligand_receptor_interactions",
"arguments": {"partners": ""} # Get all pairs
})
# For each cell type pair that are spatially proximal
# Calculate interaction scores
sq.gr.ligrec(
adata,
n_perms=100,
cluster_key='cell_type',
interactions=lr_pairs,
copy=False
)
# Visualize significant interactions
sq.pl.ligrec(adata, cluster_key='cell_type')
return adataCommunication hotspot mapping:
python
def map_communication_hotspots(adata, ligand, receptor):
"""
Map spatial locations of specific L-R interactions.
"""
import matplotlib.pyplot as plt
# Get ligand expression
ligand_expr = adata[:, ligand].X.toarray().flatten()
# Get receptor expression
receptor_expr = adata[:, receptor].X.toarray().flatten()
# Interaction score = ligand × receptor
interaction_score = ligand_expr * receptor_expr
# Add to adata
adata.obs[f'{ligand}_{receptor}_score'] = interaction_score
# Visualize on tissue
sc.pl.spatial(adata, color=f'{ligand}_{receptor}_score',
title=f'{ligand}-{receptor} Interaction Hotspots')目标: 在组织背景下绘制配体-受体相互作用。
基于空间邻近性的通讯分析:
python
def spatial_cell_communication(adata):
"""
Identify cell-cell communication based on spatial proximity.
Requires:
- Cell type annotations (from deconvolution)
- Ligand-receptor database (OmniPath)
"""
import squidpy as sq
from tooluniverse import ToolUniverse
tu = ToolUniverse()
# Get ligand-receptor pairs from OmniPath
lr_pairs = tu.run_one_function({
"name": "OmniPath_get_ligand_receptor_interactions",
"arguments": {"partners": ""} # Get all pairs
})
# For each cell type pair that are spatially proximal
# Calculate interaction scores
sq.gr.ligrec(
adata,
n_perms=100,
cluster_key='cell_type',
interactions=lr_pairs,
copy=False
)
# Visualize significant interactions
sq.pl.ligrec(adata, cluster_key='cell_type')
return adata通讯热点映射:
python
def map_communication_hotspots(adata, ligand, receptor):
"""
Map spatial locations of specific L-R interactions.
"""
import matplotlib.pyplot as plt
# Get ligand expression
ligand_expr = adata[:, ligand].X.toarray().flatten()
# Get receptor expression
receptor_expr = adata[:, receptor].X.toarray().flatten()
# Interaction score = ligand × receptor
interaction_score = ligand_expr * receptor_expr
# Add to adata
adata.obs[f'{ligand}_{receptor}_score'] = interaction_score
# Visualize on tissue
sc.pl.spatial(adata, color=f'{ligand}_{receptor}_score',
title=f'{ligand}-{receptor} Interaction Hotspots')Phase 8: Spatial Report Generation
阶段8: 空间报告生成
Generate comprehensive spatial report:
markdown
undefined生成全面的空间报告:
markdown
undefinedSpatial Transcriptomics Analysis Report
Spatial Transcriptomics Analysis Report
Dataset Summary
Dataset Summary
- Platform: 10x Visium
- Tissue: Breast cancer tumor section
- Spots: 3,562 (after QC filtering)
- Genes: 18,432 detected
- Resolution: 55μm spot diameter (~50 cells/spot)
- Platform: 10x Visium
- Tissue: Breast cancer tumor section
- Spots: 3,562 (after QC filtering)
- Genes: 18,432 detected
- Resolution: 55μm spot diameter (~50 cells/spot)
Quality Control
Quality Control
- Mean genes per spot: 3,245
- Mean UMI counts: 12,543
- Mitochondrial content: 8.2% average
- Tissue coverage: 85% of capture area
- Mean genes per spot: 3,245
- Mean UMI counts: 12,543
- Mitochondrial content: 8.2% average
- Tissue coverage: 85% of capture area
Spatial Domains Identified
Spatial Domains Identified
- 7 distinct spatial domains detected via graph-based clustering
- Domain 1: Tumor core (32% of tissue)
- Domain 2: Invasive margin (18%)
- Domain 3: Stromal region (25%)
- Domain 4: Immune infiltrate (12%)
- Domain 5: Necrotic region (8%)
- Domain 6: Normal epithelium (3%)
- Domain 7: Adipose tissue (2%)
- 7 distinct spatial domains detected via graph-based clustering
- Domain 1: Tumor core (32% of tissue)
- Domain 2: Invasive margin (18%)
- Domain 3: Stromal region (25%)
- Domain 4: Immune infiltrate (12%)
- Domain 5: Necrotic region (8%)
- Domain 6: Normal epithelium (3%)
- Domain 7: Adipose tissue (2%)
Top Marker Genes per Domain
Top Marker Genes per Domain
Domain 1 (Tumor Core)
Domain 1 (Tumor Core)
- EPCAM, KRT19, MKI67, CCNB1, TOP2A (proliferative tumor)
- EPCAM, KRT19, MKI67, CCNB1, TOP2A (proliferative tumor)
Domain 2 (Invasive Margin)
Domain 2 (Invasive Margin)
- VIM, FN1, MMP2, SNAI2 (EMT signature)
- VIM, FN1, MMP2, SNAI2 (EMT signature)
Domain 4 (Immune Infiltrate)
Domain 4 (Immune Infiltrate)
- CD3D, CD8A, CD4, PTPRC (T cell enriched)
- CD68, CD14 (macrophage enriched)
- CD3D, CD8A, CD4, PTPRC (T cell enriched)
- CD68, CD14 (macrophage enriched)
Spatially Variable Genes
Spatially Variable Genes
- 456 genes with significant spatial patterns (Moran's I, FDR < 0.05)
- 456 genes with significant spatial patterns (Moran's I, FDR < 0.05)
Top 10 Spatial Genes
Top 10 Spatial Genes
- MKI67 (I=0.82) - Hotspot pattern in tumor core
- CD8A (I=0.78) - Gradient from margin to stroma
- VIM (I=0.75) - Boundary enrichment at invasive margin
- COL1A1 (I=0.71) - Stromal-specific expression
- EPCAM (I=0.69) - Tumor region pattern
- MKI67 (I=0.82) - Hotspot pattern in tumor core
- CD8A (I=0.78) - Gradient from margin to stroma
- VIM (I=0.75) - Boundary enrichment at invasive margin
- COL1A1 (I=0.71) - Stromal-specific expression
- EPCAM (I=0.69) - Tumor region pattern
Cell Type Deconvolution
Cell Type Deconvolution
Integration with scRNA-seq reference (Bassez et al. 2021)
Integration with scRNA-seq reference (Bassez et al. 2021)
Cell Type Spatial Distributions
Cell Type Spatial Distributions
- Tumor cells: Concentrated in core, sparse at margin
- T cells: Enriched at invasive margin and infiltrate zones
- CAFs: Stromal region and invasive margin
- Macrophages: Scattered, enriched near necrosis
- B cells: Lymphoid aggregates (2% of tissue)
- Tumor cells: Concentrated in core, sparse at margin
- T cells: Enriched at invasive margin and infiltrate zones
- CAFs: Stromal region and invasive margin
- Macrophages: Scattered, enriched near necrosis
- B cells: Lymphoid aggregates (2% of tissue)
Tumor Microenvironment Composition
Tumor Microenvironment Composition
- Tumor core: 85% tumor cells, 10% CAFs, 5% immune
- Invasive margin: 45% tumor, 30% CAFs, 25% immune (T cell rich)
- Immune infiltrate: 70% T cells, 20% macrophages, 10% B cells
- Tumor core: 85% tumor cells, 10% CAFs, 5% immune
- Invasive margin: 45% tumor, 30% CAFs, 25% immune (T cell rich)
- Immune infiltrate: 70% T cells, 20% macrophages, 10% B cells
Spatial Cell Communication
Spatial Cell Communication
Top L-R Interactions (Spatially Proximal)
Top L-R Interactions (Spatially Proximal)
- Tumor → T cell: CD274 (PD-L1) → PDCD1 (PD-1)
- Hotspot: Invasive margin
- Interpretation: Immune checkpoint evasion
- CAF → Tumor: TGFB1 → TGFBR2
- Hotspot: Stromal-tumor interface
- Interpretation: TGF-β-driven EMT
- Macrophage → Tumor: TNF → TNFRSF1A
- Scattered across tumor
- Interpretation: Inflammatory signaling
- Tumor → T cell: CD274 (PD-L1) → PDCD1 (PD-1)
- Hotspot: Invasive margin
- Interpretation: Immune checkpoint evasion
- CAF → Tumor: TGFB1 → TGFBR2
- Hotspot: Stromal-tumor interface
- Interpretation: TGF-β-driven EMT
- Macrophage → Tumor: TNF → TNFRSF1A
- Scattered across tumor
- Interpretation: Inflammatory signaling
Interaction Zones
Interaction Zones
- Tumor-Immune Interface: 245 spots (7% of tissue)
- High expression: CXCL10, CXCL9 (chemokines)
- T cell recruitment and activation
- Stromal-Tumor Interface: 387 spots (11% of tissue)
- High expression: MMP2, MMP9 (matrix remodeling)
- Invasion-promoting niche
- Tumor-Immune Interface: 245 spots (7% of tissue)
- High expression: CXCL10, CXCL9 (chemokines)
- T cell recruitment and activation
- Stromal-Tumor Interface: 387 spots (11% of tissue)
- High expression: MMP2, MMP9 (matrix remodeling)
- Invasion-promoting niche
Spatial Gradients
Spatial Gradients
- Hypoxia gradient: HIF1A, VEGFA increase toward tumor core
- Proliferation gradient: MKI67, TOP2A decrease from core to margin
- Immune gradient: CD8A, GZMB peak at invasive margin
- Hypoxia gradient: HIF1A, VEGFA increase toward tumor core
- Proliferation gradient: MKI67, TOP2A decrease from core to margin
- Immune gradient: CD8A, GZMB peak at invasive margin
Biological Interpretation
Biological Interpretation
Spatial analysis reveals distinct tumor microenvironment organization:
- Tumor core: Highly proliferative, hypoxic, immune-excluded
- Invasive margin: Active EMT, high immune infiltration, checkpoint expression
- Stromal barrier: CAF-rich, matrix remodeling, immunosuppressive signals
The invasive margin shows hallmarks of immune-tumor interaction with
PD-L1/PD-1 checkpoint engagement, suggesting potential for checkpoint
blockade therapy. CAF-mediated TGF-β signaling may drive EMT and therapy
resistance at tumor-stroma interface.
Spatial analysis reveals distinct tumor microenvironment organization:
- Tumor core: Highly proliferative, hypoxic, immune-excluded
- Invasive margin: Active EMT, high immune infiltration, checkpoint expression
- Stromal barrier: CAF-rich, matrix remodeling, immunosuppressive signals
The invasive margin shows hallmarks of immune-tumor interaction with
PD-L1/PD-1 checkpoint engagement, suggesting potential for checkpoint
blockade therapy. CAF-mediated TGF-β signaling may drive EMT and therapy
resistance at tumor-stroma interface.
Clinical Relevance
Clinical Relevance
- Checkpoint inhibitor response: High immune infiltration at margin suggests potential
- Resistance mechanisms: CAF barrier and TGF-β signaling
- Biomarkers: Spatial arrangement of immune cells more predictive than bulk tumor metrics
---- Checkpoint inhibitor response: High immune infiltration at margin suggests potential
- Resistance mechanisms: CAF barrier and TGF-β signaling
- Biomarkers: Spatial arrangement of immune cells more predictive than bulk tumor metrics
---Integration with ToolUniverse Skills
与ToolUniverse技能的整合
| Skill | Used For | Phase |
|---|---|---|
| scRNA-seq reference for deconvolution | Phase 6 |
| L-R database for communication | Phase 7 |
| Pathway enrichment for spatial domains | Phase 3 |
| Integrate with other omics | Phase 8 |
| 技能 | 用途 | 阶段 |
|---|---|---|
| 用于反卷积的scRNA-seq参考 | 阶段6 |
| 用于通讯分析的配体-受体数据库 | 阶段7 |
| 空间区域的通路富集分析 | 阶段3 |
| 与其他组学数据整合 | 阶段8 |
Example Use Cases
示例用例
Use Case 1: Tumor Microenvironment Mapping
用例1: 肿瘤微环境映射
Question: "Map the spatial organization of tumor, immune, and stromal cells"
Workflow:
- Load Visium data, QC and normalize
- Spatial clustering → 7 domains identified
- Cell type deconvolution using scRNA-seq reference
- Map cell type distributions spatially
- Identify interaction zones (tumor-immune, tumor-stroma)
- Analyze L-R interactions in each zone
- Report: Comprehensive TME spatial architecture
问题: "绘制肿瘤、免疫和基质细胞的空间排列"
工作流程:
- 加载Visium数据,进行质量控制和标准化
- 空间聚类 → 识别出7个区域
- 使用scRNA-seq参考进行细胞类型反卷积
- 空间映射细胞类型分布
- 识别相互作用区域(肿瘤-免疫、肿瘤-基质)
- 分析每个区域的配体-受体相互作用
- 报告:全面的肿瘤微环境空间架构
Use Case 2: Developmental Gradient Analysis
用例2: 发育梯度分析
Question: "Identify spatial gene expression gradients in developing tissue"
Workflow:
- Load spatial data (e.g., mouse embryo)
- Identify spatially variable genes
- Classify gradient patterns (anterior-posterior, dorsal-ventral)
- Map morphogen expression (WNT, BMP, FGF)
- Correlate with cell fate markers
- Report: Developmental spatial patterns
问题: "识别发育组织中的基因表达空间梯度"
工作流程:
- 加载空间数据(如小鼠胚胎)
- 识别空间可变基因
- 分类梯度模式(前后轴、背腹轴)
- 绘制形态发生素表达(WNT、BMP、FGF)
- 与细胞命运标记物关联
- 报告:发育空间模式
Use Case 3: Brain Region Identification
用例3: 脑区识别
Question: "Automatically segment brain tissue into anatomical regions"
Workflow:
- Load Visium mouse brain data
- Spatial clustering with high resolution
- Match domains to known brain regions (cortex, hippocampus, etc.)
- Identify region-specific marker genes
- Validate with Allen Brain Atlas
- Report: Automated brain region annotation
问题: "自动将脑组织分割为解剖区域"
工作流程:
- 加载Visium小鼠脑数据
- 高分辨率空间聚类
- 将区域与已知脑区(皮层、海马体等)匹配
- 识别区域特异性标记基因
- 用Allen脑图谱验证
- 报告:自动化脑区注释
Quantified Minimums
量化最低要求
| Component | Requirement |
|---|---|
| Spots/cells | At least 500 spatial locations |
| QC | Filter low-quality spots, verify alignment |
| Spatial clustering | At least one method (graph-based or spatial) |
| Spatial genes | Moran's I or similar spatial test |
| Visualization | Spatial plots on tissue images |
| Report | Domains, spatial genes, visualizations |
| 组件 | 要求 |
|---|---|
| 斑点/细胞数 | 至少500个空间位置 |
| 质量控制 | 过滤低质量斑点,验证对齐情况 |
| 空间聚类 | 至少使用一种方法(基于图或空间的方法) |
| 空间基因 | 使用Moran's I或类似的空间检验方法 |
| 可视化 | 组织图像上的空间图 |
| 报告 | 包含区域、空间基因和可视化内容 |
Limitations
局限性
- Resolution: Visium spots contain multiple cells (not single-cell)
- Gene coverage: Imaging methods have limited gene panels
- 3D structure: Most platforms are 2D sections
- Tissue quality: Requires well-preserved tissue for imaging
- Computational: Large datasets require significant memory
- Reference dependency: Deconvolution quality depends on scRNA-seq reference
- 分辨率: Visium斑点包含多个细胞(非单细胞分辨率)
- 基因覆盖度: 基于成像的平台基因面板有限
- 3D结构: 大多数平台仅支持2D切片
- 组织质量: 成像需要保存完好的组织
- 计算需求: 大型数据集需要大量内存
- 参考依赖性: 反卷积质量取决于scRNA-seq参考数据
References
参考文献
Methods:
- Squidpy: https://doi.org/10.1038/s41592-021-01358-2
- Cell2location: https://doi.org/10.1038/s41587-021-01139-4
- SpatialDE: https://doi.org/10.1038/nmeth.4636
Platforms:
方法:
- Squidpy: https://doi.org/10.1038/s41592-021-01358-2
- Cell2location: https://doi.org/10.1038/s41587-021-01139-4
- SpatialDE: https://doi.org/10.1038/nmeth.4636
平台: