tooluniverse-spatial-transcriptomics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Spatial Transcriptomics Analysis

空间转录组数据分析

Comprehensive analysis of spatially-resolved transcriptomics data to understand gene expression patterns in tissue architecture context. Combines expression profiling with spatial coordinates to reveal tissue organization, cell-cell interactions, and spatially variable genes.
对空间分辨转录组数据进行全面分析,以了解组织结构背景下的基因表达模式。将表达谱分析与空间坐标相结合,揭示组织排列、细胞间相互作用以及空间可变基因。

When to Use This Skill

何时使用该技能

Triggers:
  • User has spatial transcriptomics data (Visium, MERFISH, seqFISH, etc.)
  • Questions about tissue architecture or spatial organization
  • Spatial gene expression pattern analysis
  • Cell-cell proximity or neighborhood analysis requests
  • Tumor microenvironment spatial structure questions
  • Integration of spatial with single-cell data
  • Spatial domain identification
  • Tissue morphology correlation with expression
Example Questions This Skill Solves:
  1. "Analyze this 10x Visium dataset to identify spatial domains"
  2. "Which genes show spatially variable expression in this tissue?"
  3. "Map the tumor microenvironment spatial organization"
  4. "Find genes enriched at tissue boundaries"
  5. "Identify cell-cell interactions based on spatial proximity"
  6. "Integrate spatial transcriptomics with scRNA-seq annotations"
  7. "Characterize spatial gradients in gene expression"
  8. "Map ligand-receptor pairs in tissue context"

触发场景:
  • 用户拥有空间转录组数据(Visium、MERFISH、seqFISH等)
  • 关于组织架构或空间排列的问题
  • 空间基因表达模式分析需求
  • 细胞间邻近性或微环境分析请求
  • 肿瘤微环境空间结构相关问题
  • 空间数据与单细胞数据的整合需求
  • 空间区域识别需求
  • 组织形态与表达的相关性分析
该技能可解决的示例问题:
  1. "分析这个10x Visium数据集以识别空间区域"
  2. "该组织中哪些基因表现出空间可变表达?"
  3. "绘制肿瘤微环境的空间排列"
  4. "找出在组织边界处富集的基因"
  5. "基于空间邻近性识别细胞间相互作用"
  6. "将空间转录组数据与scRNA-seq注释整合"
  7. "表征基因表达的空间梯度"
  8. "在组织背景下绘制配体-受体对"

Core Capabilities

核心能力

CapabilityDescription
Data Import10x Visium, MERFISH, seqFISH, Slide-seq, STARmap, Xenium formats
Quality ControlSpot/cell QC, spatial alignment verification, tissue coverage
NormalizationSpatial-aware normalization accounting for tissue heterogeneity
Spatial ClusteringIdentify spatial domains with similar expression profiles
Spatial Variable GenesFind genes with non-random spatial patterns
Neighborhood AnalysisCell-cell proximity, spatial neighborhoods, niche identification
Spatial PatternsGradients, boundaries, hotspots, expression waves
IntegrationMerge with scRNA-seq for cell type mapping
Ligand-Receptor SpatialMap cell communication in tissue context
VisualizationSpatial plots, heatmaps on tissue, 3D reconstruction

能力描述
数据导入支持10x Visium、MERFISH、seqFISH、Slide-seq、STARmap、Xenium格式
质量控制斑点/细胞质量控制、空间对齐验证、组织覆盖度分析
标准化考虑组织异质性的空间感知标准化
空间聚类识别具有相似表达谱的空间区域
空间可变基因寻找具有非随机空间模式的基因
微环境分析细胞间邻近性、空间微环境、生态位识别
空间模式梯度、边界、热点、表达波
数据整合与scRNA-seq合并以进行细胞类型映射
空间配体-受体分析在组织背景下绘制细胞通讯
可视化空间图、组织上的热图、3D重建

Workflow Overview

工作流程概述

Input: Spatial Transcriptomics Data + Tissue Image
    |
    v
Phase 1: Data Import & QC
    |-- Load spatial coordinates + expression matrix
    |-- Load tissue histology image
    |-- Quality control per spot/cell
    |-- Filter low-quality spots
    |-- Align spatial coordinates to tissue
    |
    v
Phase 2: Preprocessing
    |-- Normalization (spatial-aware methods)
    |-- Highly variable gene selection
    |-- Dimensionality reduction (PCA)
    |-- Spatial lag smoothing (optional)
    |
    v
Phase 3: Spatial Clustering
    |-- Identify spatial domains/regions
    |-- Graph-based clustering with spatial constraints
    |-- Annotate domains with marker genes
    |-- Visualize domains on tissue
    |
    v
Phase 4: Spatial Variable Genes
    |-- Test for spatial autocorrelation (Moran's I, Geary's C)
    |-- Identify genes with spatial patterns
    |-- Classify pattern types (gradient, hotspot, boundary)
    |-- Rank by spatial significance
    |
    v
Phase 5: Neighborhood Analysis
    |-- Define spatial neighborhoods (k-NN, radius)
    |-- Calculate neighborhood composition
    |-- Identify interaction zones
    |-- Niche characterization
    |
    v
Phase 6: Integration with scRNA-seq
    |-- Cell type deconvolution per spot
    |-- Map cell types to spatial locations
    |-- Predict cell type spatial distributions
    |-- Validate with marker genes
    |
    v
Phase 7: Spatial Cell Communication
    |-- Identify proximal cell type pairs
    |-- Query ligand-receptor database (OmniPath)
    |-- Score spatial interactions
    |-- Map communication hotspots
    |
    v
Phase 8: Generate Spatial Report
    |-- Tissue overview with domains
    |-- Spatially variable genes
    |-- Cell type spatial maps
    |-- Interaction networks in tissue context
    |-- 3D visualization (if applicable)

Input: Spatial Transcriptomics Data + Tissue Image
    |
    v
Phase 1: Data Import & QC
    |-- Load spatial coordinates + expression matrix
    |-- Load tissue histology image
    |-- Quality control per spot/cell
    |-- Filter low-quality spots
    |-- Align spatial coordinates to tissue
    |
    v
Phase 2: Preprocessing
    |-- Normalization (spatial-aware methods)
    |-- Highly variable gene selection
    |-- Dimensionality reduction (PCA)
    |-- Spatial lag smoothing (optional)
    |
    v
Phase 3: Spatial Clustering
    |-- Identify spatial domains/regions
    |-- Graph-based clustering with spatial constraints
    |-- Annotate domains with marker genes
    |-- Visualize domains on tissue
    |
    v
Phase 4: Spatial Variable Genes
    |-- Test for spatial autocorrelation (Moran's I, Geary's C)
    |-- Identify genes with spatial patterns
    |-- Classify pattern types (gradient, hotspot, boundary)
    |-- Rank by spatial significance
    |
    v
Phase 5: Neighborhood Analysis
    |-- Define spatial neighborhoods (k-NN, radius)
    |-- Calculate neighborhood composition
    |-- Identify interaction zones
    |-- Niche characterization
    |
    v
Phase 6: Integration with scRNA-seq
    |-- Cell type deconvolution per spot
    |-- Map cell types to spatial locations
    |-- Predict cell type spatial distributions
    |-- Validate with marker genes
    |
    v
Phase 7: Spatial Cell Communication
    |-- Identify proximal cell type pairs
    |-- Query ligand-receptor database (OmniPath)
    |-- Score spatial interactions
    |-- Map communication hotspots
    |
    v
Phase 8: Generate Spatial Report
    |-- Tissue overview with domains
    |-- Spatially variable genes
    |-- Cell type spatial maps
    |-- Interaction networks in tissue context
    |-- 3D visualization (if applicable)

Phase Details

阶段详情

Phase 1: Data Import & Quality Control

阶段1: 数据导入与质量控制

Objective: Load spatial data and assess quality.
Supported platforms:
10x Visium (most common):
  • Spots: 55μm diameter, ~50 cells per spot
  • Resolution: ~5,000-10,000 spots per capture area
  • Data: Expression matrix + spatial coordinates + H&E image
MERFISH/seqFISH (imaging-based):
  • Single-cell resolution
  • Targeted gene panels (100-10,000 genes)
  • Absolute coordinates per cell
Slide-seq/Slide-seqV2:
  • 10μm bead resolution
  • Genome-wide profiling
Xenium (10x single-cell spatial):
  • Single-cell resolution
  • Large gene panels (300+ genes)
  • Subcellular resolution
Data loading (Visium):
python
def load_visium_data(data_dir):
    """
    Load 10x Visium spatial transcriptomics data.

    Expected structure:
    data_dir/
      ├── filtered_feature_bc_matrix/
      │   ├── barcodes.tsv.gz
      │   ├── features.tsv.gz
      │   └── matrix.mtx.gz
      ├── spatial/
      │   ├── tissue_positions_list.csv
      │   ├── scalefactors_json.json
      │   └── tissue_hires_image.png

    Returns: AnnData object with spatial coordinates
    """
    import scanpy as sc
    import pandas as pd

    # Load expression data
    adata = sc.read_visium(data_dir)

    # Spatial coordinates are in adata.obsm['spatial']
    # Tissue image in adata.uns['spatial']

    return adata
Quality Control:
  1. Spot-level QC:
python
def spatial_qc(adata):
    """
    Quality control for spatial transcriptomics data.
    """
    import scanpy as sc

    # Calculate QC metrics
    sc.pp.calculate_qc_metrics(adata, inplace=True)

    # Visualize QC metrics spatially
    sc.pl.spatial(adata, color='n_genes_by_counts', title='Genes per Spot')
    sc.pl.spatial(adata, color='total_counts', title='UMI Counts per Spot')

    # Filter criteria
    # - Min 200 genes per spot
    # - Min 500 UMI counts per spot
    # - Max mitochondrial content < 20%

    sc.pp.filter_cells(adata, min_genes=200)
    sc.pp.filter_cells(adata, min_counts=500)

    # Mitochondrial filtering
    adata.var['mt'] = adata.var_names.str.startswith('MT-')
    sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True)
    adata = adata[adata.obs['pct_counts_mt'] < 20].copy()

    return adata
  1. Spatial alignment verification:
python
def verify_spatial_alignment(adata):
    """
    Verify spatial coordinates align with tissue image.
    """
    import matplotlib.pyplot as plt

    # Plot spots on tissue image
    fig, ax = plt.subplots(figsize=(10, 10))

    # Tissue image
    img = adata.uns['spatial']['tissue_hires_image']
    ax.imshow(img)

    # Overlay spot coordinates
    coords = adata.obsm['spatial']
    ax.scatter(coords[:, 0], coords[:, 1], c='red', s=1, alpha=0.5)

    ax.set_title('Spatial Alignment Verification')
    plt.axis('off')
目标: 加载空间数据并评估质量。
支持的平台:
10x Visium(最常用):
  • 斑点:直径55μm,每个斑点约含50个细胞
  • 分辨率:每个捕获区域约5,000-10,000个斑点
  • 数据:表达矩阵 + 空间坐标 + H&E图像
MERFISH/seqFISH(基于成像):
  • 单细胞分辨率
  • 靶向基因面板(100-10,000个基因)
  • 每个细胞的绝对坐标
Slide-seq/Slide-seqV2:
  • 10μm磁珠分辨率
  • 全基因组分析
Xenium(10x单细胞空间平台):
  • 单细胞分辨率
  • 大型基因面板(300+基因)
  • 亚细胞分辨率
数据加载(Visium):
python
def load_visium_data(data_dir):
    """
    Load 10x Visium spatial transcriptomics data.

    Expected structure:
    data_dir/
      ├── filtered_feature_bc_matrix/
      │   ├── barcodes.tsv.gz
      │   ├── features.tsv.gz
      │   └── matrix.mtx.gz
      ├── spatial/
      │   ├── tissue_positions_list.csv
      │   ├── scalefactors_json.json
      │   └── tissue_hires_image.png

    Returns: AnnData object with spatial coordinates
    """
    import scanpy as sc
    import pandas as pd

    # Load expression data
    adata = sc.read_visium(data_dir)

    # Spatial coordinates are in adata.obsm['spatial']
    # Tissue image in adata.uns['spatial']

    return adata
质量控制:
  1. 斑点级质量控制:
python
def spatial_qc(adata):
    """
    Quality control for spatial transcriptomics data.
    """
    import scanpy as sc

    # Calculate QC metrics
    sc.pp.calculate_qc_metrics(adata, inplace=True)

    # Visualize QC metrics spatially
    sc.pl.spatial(adata, color='n_genes_by_counts', title='Genes per Spot')
    sc.pl.spatial(adata, color='total_counts', title='UMI Counts per Spot')

    # Filter criteria
    # - Min 200 genes per spot
    # - Min 500 UMI counts per spot
    # - Max mitochondrial content < 20%

    sc.pp.filter_cells(adata, min_genes=200)
    sc.pp.filter_cells(adata, min_counts=500)

    # Mitochondrial filtering
    adata.var['mt'] = adata.var_names.str.startswith('MT-')
    sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True)
    adata = adata[adata.obs['pct_counts_mt'] < 20].copy()

    return adata
  1. 空间对齐验证:
python
def verify_spatial_alignment(adata):
    """
    Verify spatial coordinates align with tissue image.
    """
    import matplotlib.pyplot as plt

    # Plot spots on tissue image
    fig, ax = plt.subplots(figsize=(10, 10))

    # Tissue image
    img = adata.uns['spatial']['tissue_hires_image']
    ax.imshow(img)

    # Overlay spot coordinates
    coords = adata.obsm['spatial']
    ax.scatter(coords[:, 0], coords[:, 1], c='red', s=1, alpha=0.5)

    ax.set_title('Spatial Alignment Verification')
    plt.axis('off')

Phase 2: Preprocessing & Normalization

阶段2: 预处理与标准化

Objective: Normalize data accounting for spatial heterogeneity.
Normalization:
python
def normalize_spatial(adata):
    """
    Normalize spatial transcriptomics data.
    """
    import scanpy as sc

    # Filter genes (min 3 spots)
    sc.pp.filter_genes(adata, min_cells=3)

    # Normalize to median total counts
    sc.pp.normalize_total(adata, target_sum=1e4)

    # Log-transform
    sc.pp.log1p(adata)

    # Store raw counts
    adata.raw = adata

    return adata
Highly variable genes:
python
def select_hvg_spatial(adata):
    """
    Select highly variable genes for spatial analysis.
    """
    import scanpy as sc

    # Standard HVG selection
    sc.pp.highly_variable_genes(adata, n_top_genes=2000)

    # Optionally: weight by spatial autocorrelation
    # Genes with spatial patterns are more informative

    return adata
Spatial smoothing (optional):
python
def spatial_smooth(adata, radius=2):
    """
    Smooth expression by averaging over spatial neighbors.

    Useful for noisy data, but can blur boundaries.
    """
    from sklearn.neighbors import NearestNeighbors

    # Find spatial neighbors
    coords = adata.obsm['spatial']
    nn = NearestNeighbors(n_neighbors=radius, metric='euclidean')
    nn.fit(coords)
    distances, indices = nn.kneighbors(coords)

    # Smooth expression matrix
    X_smooth = adata.X.copy()
    for i in range(adata.n_obs):
        neighbors = indices[i]
        X_smooth[i] = adata.X[neighbors].mean(axis=0)

    adata.layers['smoothed'] = X_smooth

    return adata
目标: 针对空间异质性对数据进行标准化。
标准化:
python
def normalize_spatial(adata):
    """
    Normalize spatial transcriptomics data.
    """
    import scanpy as sc

    # Filter genes (min 3 spots)
    sc.pp.filter_genes(adata, min_cells=3)

    # Normalize to median total counts
    sc.pp.normalize_total(adata, target_sum=1e4)

    # Log-transform
    sc.pp.log1p(adata)

    # Store raw counts
    adata.raw = adata

    return adata
高可变基因:
python
def select_hvg_spatial(adata):
    """
    Select highly variable genes for spatial analysis.
    """
    import scanpy as sc

    # Standard HVG selection
    sc.pp.highly_variable_genes(adata, n_top_genes=2000)

    # Optionally: weight by spatial autocorrelation
    # Genes with spatial patterns are more informative

    return adata
空间平滑(可选):
python
def spatial_smooth(adata, radius=2):
    """
    Smooth expression by averaging over spatial neighbors.

    Useful for noisy data, but can blur boundaries.
    """
    from sklearn.neighbors import NearestNeighbors

    # Find spatial neighbors
    coords = adata.obsm['spatial']
    nn = NearestNeighbors(n_neighbors=radius, metric='euclidean')
    nn.fit(coords)
    distances, indices = nn.kneighbors(coords)

    # Smooth expression matrix
    X_smooth = adata.X.copy()
    for i in range(adata.n_obs):
        neighbors = indices[i]
        X_smooth[i] = adata.X[neighbors].mean(axis=0)

    adata.layers['smoothed'] = X_smooth

    return adata

Phase 3: Spatial Clustering

阶段3: 空间聚类

Objective: Identify spatial domains (regions with distinct expression).
Graph-based clustering with spatial constraints:
python
def spatial_clustering(adata, n_neighbors=6):
    """
    Cluster spots into spatial domains.

    Uses both expression similarity AND spatial proximity.
    """
    import scanpy as sc
    import squidpy as sq

    # PCA for dimensionality reduction
    sc.pp.pca(adata, n_comps=50)

    # Build spatial neighbor graph
    sq.gr.spatial_neighbors(adata, coord_type='generic', n_neighs=n_neighbors)

    # Clustering with spatial constraints
    # Uses both PCA space and spatial graph
    sc.tl.leiden(adata, resolution=1.0, key_added='spatial_domain')

    # Visualize domains on tissue
    sc.pl.spatial(adata, color='spatial_domain', title='Spatial Domains')

    return adata
Domain marker genes:
python
def find_domain_markers(adata):
    """
    Identify marker genes for each spatial domain.
    """
    import scanpy as sc

    # Differential expression per domain
    sc.tl.rank_genes_groups(adata, groupby='spatial_domain', method='wilcoxon')

    # Get top markers per domain
    markers = sc.get.rank_genes_groups_df(adata, group=None)

    return markers
目标: 识别空间区域(具有不同表达特征的区域)。
带空间约束的图聚类:
python
def spatial_clustering(adata, n_neighbors=6):
    """
    Cluster spots into spatial domains.

    Uses both expression similarity AND spatial proximity.
    """
    import scanpy as sc
    import squidpy as sq

    # PCA for dimensionality reduction
    sc.pp.pca(adata, n_comps=50)

    # Build spatial neighbor graph
    sq.gr.spatial_neighbors(adata, coord_type='generic', n_neighs=n_neighbors)

    # Clustering with spatial constraints
    # Uses both PCA space and spatial graph
    sc.tl.leiden(adata, resolution=1.0, key_added='spatial_domain')

    # Visualize domains on tissue
    sc.pl.spatial(adata, color='spatial_domain', title='Spatial Domains')

    return adata
区域标记基因:
python
def find_domain_markers(adata):
    """
    Identify marker genes for each spatial domain.
    """
    import scanpy as sc

    # Differential expression per domain
    sc.tl.rank_genes_groups(adata, groupby='spatial_domain', method='wilcoxon')

    # Get top markers per domain
    markers = sc.get.rank_genes_groups_df(adata, group=None)

    return markers

Phase 4: Spatially Variable Genes

阶段4: 空间可变基因

Objective: Find genes with non-random spatial patterns.
Moran's I (spatial autocorrelation):
python
def identify_spatial_genes(adata):
    """
    Test for spatial autocorrelation using Moran's I.

    Moran's I > 0: Positive spatial autocorrelation (clustering)
    Moran's I ~ 0: Random spatial distribution
    Moran's I < 0: Negative autocorrelation (checkerboard)
    """
    import squidpy as sq

    # Calculate Moran's I for all genes
    sq.gr.spatial_autocorr(
        adata,
        mode='moran',
        n_perms=100,
        n_jobs=-1
    )

    # Results in adata.uns['moranI']
    spatial_genes = adata.uns['moranI'].sort_values('I', ascending=False)

    # Filter significant spatial genes (FDR < 0.05)
    sig_spatial = spatial_genes[spatial_genes['pval_norm_fdr_bh'] < 0.05]

    return sig_spatial
Spatial pattern classification:
python
def classify_spatial_patterns(adata, spatial_genes):
    """
    Classify types of spatial patterns.

    Pattern types:
    - Gradient: Smooth directional change
    - Hotspot: Localized high expression
    - Boundary: Expression at domain edges
    - Periodic: Regular spacing
    """
    patterns = {}

    for gene in spatial_genes.index[:100]:  # Top 100 spatial genes
        # Get expression and coordinates
        expr = adata[:, gene].X.toarray().flatten()
        coords = adata.obsm['spatial']

        # Detect pattern type
        pattern_type = detect_pattern_type(expr, coords)
        patterns[gene] = pattern_type

    return patterns
目标: 寻找具有非随机空间模式的基因。
Moran's I(空间自相关):
python
def identify_spatial_genes(adata):
    """
    Test for spatial autocorrelation using Moran's I.

    Moran's I > 0: Positive spatial autocorrelation (clustering)
    Moran's I ~ 0: Random spatial distribution
    Moran's I < 0: Negative autocorrelation (checkerboard)
    """
    import squidpy as sq

    # Calculate Moran's I for all genes
    sq.gr.spatial_autocorr(
        adata,
        mode='moran',
        n_perms=100,
        n_jobs=-1
    )

    # Results in adata.uns['moranI']
    spatial_genes = adata.uns['moranI'].sort_values('I', ascending=False)

    # Filter significant spatial genes (FDR < 0.05)
    sig_spatial = spatial_genes[spatial_genes['pval_norm_fdr_bh'] < 0.05]

    return sig_spatial
空间模式分类:
python
def classify_spatial_patterns(adata, spatial_genes):
    """
    Classify types of spatial patterns.

    Pattern types:
    - Gradient: Smooth directional change
    - Hotspot: Localized high expression
    - Boundary: Expression at domain edges
    - Periodic: Regular spacing
    """
    patterns = {}

    for gene in spatial_genes.index[:100]:  # Top 100 spatial genes
        # Get expression and coordinates
        expr = adata[:, gene].X.toarray().flatten()
        coords = adata.obsm['spatial']

        # Detect pattern type
        pattern_type = detect_pattern_type(expr, coords)
        patterns[gene] = pattern_type

    return patterns

Phase 5: Neighborhood Analysis

阶段5: 微环境分析

Objective: Analyze cell-cell proximity and spatial niches.
Define spatial neighborhoods:
python
def analyze_neighborhoods(adata, radius=150):
    """
    Analyze spatial neighborhood composition.

    For each spot, characterize its microenvironment.
    """
    import squidpy as sq

    # Calculate neighborhood enrichment
    # Tests if cell types are enriched in proximity
    sq.gr.nhood_enrichment(adata, cluster_key='spatial_domain')

    # Visualize neighborhood enrichment
    sq.pl.nhood_enrichment(adata, cluster_key='spatial_domain')

    # Results: which domains are spatially proximal?

    return adata
Interaction zones:
python
def identify_interaction_zones(adata, domain_a, domain_b):
    """
    Find boundary regions between two spatial domains.

    These are hotspots for cell-cell interactions.
    """
    # Get spots from each domain
    spots_a = adata.obs['spatial_domain'] == domain_a
    spots_b = adata.obs['spatial_domain'] == domain_b

    # Find spots that neighbor the other domain
    # (spots from A that have neighbors in B)
    coords = adata.obsm['spatial']
    from sklearn.neighbors import NearestNeighbors

    nn = NearestNeighbors(n_neighbors=6)
    nn.fit(coords)
    distances, indices = nn.kneighbors(coords)

    interaction_spots = []
    for i, spot_in_a in enumerate(spots_a):
        if spot_in_a:
            neighbors = indices[i]
            if any(spots_b[neighbors]):
                interaction_spots.append(i)

    # Mark interaction zone
    adata.obs['interaction_zone'] = False
    adata.obs.loc[interaction_spots, 'interaction_zone'] = True

    return adata
目标: 分析细胞间邻近性和空间生态位。
定义空间微环境:
python
def analyze_neighborhoods(adata, radius=150):
    """
    Analyze spatial neighborhood composition.

    For each spot, characterize its microenvironment.
    """
    import squidpy as sq

    # Calculate neighborhood enrichment
    # Tests if cell types are enriched in proximity
    sq.gr.nhood_enrichment(adata, cluster_key='spatial_domain')

    # Visualize neighborhood enrichment
    sq.pl.nhood_enrichment(adata, cluster_key='spatial_domain')

    # Results: which domains are spatially proximal?

    return adata
相互作用区域:
python
def identify_interaction_zones(adata, domain_a, domain_b):
    """
    Find boundary regions between two spatial domains.

    These are hotspots for cell-cell interactions.
    """
    # Get spots from each domain
    spots_a = adata.obs['spatial_domain'] == domain_a
    spots_b = adata.obs['spatial_domain'] == domain_b

    # Find spots that neighbor the other domain
    # (spots from A that have neighbors in B)
    coords = adata.obsm['spatial']
    from sklearn.neighbors import NearestNeighbors

    nn = NearestNeighbors(n_neighbors=6)
    nn.fit(coords)
    distances, indices = nn.kneighbors(coords)

    interaction_spots = []
    for i, spot_in_a in enumerate(spots_a):
        if spot_in_a:
            neighbors = indices[i]
            if any(spots_b[neighbors]):
                interaction_spots.append(i)

    # Mark interaction zone
    adata.obs['interaction_zone'] = False
    adata.obs.loc[interaction_spots, 'interaction_zone'] = True

    return adata

Phase 6: Integration with Single-Cell RNA-seq

阶段6: 与单细胞RNA-seq的整合

Objective: Map cell types from scRNA-seq to spatial locations.
Cell type deconvolution:
python
def deconvolve_cell_types(adata_spatial, adata_sc):
    """
    Predict cell type composition per spatial spot.

    Uses scRNA-seq reference to deconvolve Visium spots.
    Methods: Cell2location, Tangram, SPOTlight
    """
    import cell2location

    # Prepare single-cell reference
    # Extract signature genes per cell type
    cell_type_signatures = extract_signatures(adata_sc)

    # Run cell2location
    # Estimates cell type abundances per spot
    mod = cell2location.models.Cell2location(
        adata_spatial,
        cell_state_df=cell_type_signatures
    )

    mod.train(max_epochs=30000)

    # Add cell type proportions to adata_spatial
    adata_spatial.obsm['cell_type_fractions'] = mod.get_cell_type_fractions()

    return adata_spatial
Spatial cell type mapping:
python
def map_cell_types_spatial(adata):
    """
    Visualize cell type spatial distributions.
    """
    import scanpy as sc

    # For each cell type, plot abundance on tissue
    cell_types = adata.obsm['cell_type_fractions'].columns

    for ct in cell_types:
        sc.pl.spatial(
            adata,
            color=adata.obsm['cell_type_fractions'][ct],
            title=f'{ct} Spatial Distribution'
        )
目标: 将scRNA-seq中的细胞类型映射到空间位置。
细胞类型反卷积:
python
def deconvolve_cell_types(adata_spatial, adata_sc):
    """
    Predict cell type composition per spatial spot.

    Uses scRNA-seq reference to deconvolve Visium spots.
    Methods: Cell2location, Tangram, SPOTlight
    """
    import cell2location

    # Prepare single-cell reference
    # Extract signature genes per cell type
    cell_type_signatures = extract_signatures(adata_sc)

    # Run cell2location
    # Estimates cell type abundances per spot
    mod = cell2location.models.Cell2location(
        adata_spatial,
        cell_state_df=cell_type_signatures
    )

    mod.train(max_epochs=30000)

    # Add cell type proportions to adata_spatial
    adata_spatial.obsm['cell_type_fractions'] = mod.get_cell_type_fractions()

    return adata_spatial
空间细胞类型映射:
python
def map_cell_types_spatial(adata):
    """
    Visualize cell type spatial distributions.
    """
    import scanpy as sc

    # For each cell type, plot abundance on tissue
    cell_types = adata.obsm['cell_type_fractions'].columns

    for ct in cell_types:
        sc.pl.spatial(
            adata,
            color=adata.obsm['cell_type_fractions'][ct],
            title=f'{ct} Spatial Distribution'
        )

Phase 7: Spatial Cell Communication

阶段7: 空间细胞通讯

Objective: Map ligand-receptor interactions in tissue context.
Spatial proximity-based communication:
python
def spatial_cell_communication(adata):
    """
    Identify cell-cell communication based on spatial proximity.

    Requires:
    - Cell type annotations (from deconvolution)
    - Ligand-receptor database (OmniPath)
    """
    import squidpy as sq
    from tooluniverse import ToolUniverse

    tu = ToolUniverse()

    # Get ligand-receptor pairs from OmniPath
    lr_pairs = tu.run_one_function({
        "name": "OmniPath_get_ligand_receptor_interactions",
        "arguments": {"partners": ""}  # Get all pairs
    })

    # For each cell type pair that are spatially proximal
    # Calculate interaction scores
    sq.gr.ligrec(
        adata,
        n_perms=100,
        cluster_key='cell_type',
        interactions=lr_pairs,
        copy=False
    )

    # Visualize significant interactions
    sq.pl.ligrec(adata, cluster_key='cell_type')

    return adata
Communication hotspot mapping:
python
def map_communication_hotspots(adata, ligand, receptor):
    """
    Map spatial locations of specific L-R interactions.
    """
    import matplotlib.pyplot as plt

    # Get ligand expression
    ligand_expr = adata[:, ligand].X.toarray().flatten()

    # Get receptor expression
    receptor_expr = adata[:, receptor].X.toarray().flatten()

    # Interaction score = ligand × receptor
    interaction_score = ligand_expr * receptor_expr

    # Add to adata
    adata.obs[f'{ligand}_{receptor}_score'] = interaction_score

    # Visualize on tissue
    sc.pl.spatial(adata, color=f'{ligand}_{receptor}_score',
                  title=f'{ligand}-{receptor} Interaction Hotspots')
目标: 在组织背景下绘制配体-受体相互作用。
基于空间邻近性的通讯分析:
python
def spatial_cell_communication(adata):
    """
    Identify cell-cell communication based on spatial proximity.

    Requires:
    - Cell type annotations (from deconvolution)
    - Ligand-receptor database (OmniPath)
    """
    import squidpy as sq
    from tooluniverse import ToolUniverse

    tu = ToolUniverse()

    # Get ligand-receptor pairs from OmniPath
    lr_pairs = tu.run_one_function({
        "name": "OmniPath_get_ligand_receptor_interactions",
        "arguments": {"partners": ""}  # Get all pairs
    })

    # For each cell type pair that are spatially proximal
    # Calculate interaction scores
    sq.gr.ligrec(
        adata,
        n_perms=100,
        cluster_key='cell_type',
        interactions=lr_pairs,
        copy=False
    )

    # Visualize significant interactions
    sq.pl.ligrec(adata, cluster_key='cell_type')

    return adata
通讯热点映射:
python
def map_communication_hotspots(adata, ligand, receptor):
    """
    Map spatial locations of specific L-R interactions.
    """
    import matplotlib.pyplot as plt

    # Get ligand expression
    ligand_expr = adata[:, ligand].X.toarray().flatten()

    # Get receptor expression
    receptor_expr = adata[:, receptor].X.toarray().flatten()

    # Interaction score = ligand × receptor
    interaction_score = ligand_expr * receptor_expr

    # Add to adata
    adata.obs[f'{ligand}_{receptor}_score'] = interaction_score

    # Visualize on tissue
    sc.pl.spatial(adata, color=f'{ligand}_{receptor}_score',
                  title=f'{ligand}-{receptor} Interaction Hotspots')

Phase 8: Spatial Report Generation

阶段8: 空间报告生成

Generate comprehensive spatial report:
markdown
undefined
生成全面的空间报告:
markdown
undefined

Spatial Transcriptomics Analysis Report

Spatial Transcriptomics Analysis Report

Dataset Summary

Dataset Summary

  • Platform: 10x Visium
  • Tissue: Breast cancer tumor section
  • Spots: 3,562 (after QC filtering)
  • Genes: 18,432 detected
  • Resolution: 55μm spot diameter (~50 cells/spot)
  • Platform: 10x Visium
  • Tissue: Breast cancer tumor section
  • Spots: 3,562 (after QC filtering)
  • Genes: 18,432 detected
  • Resolution: 55μm spot diameter (~50 cells/spot)

Quality Control

Quality Control

  • Mean genes per spot: 3,245
  • Mean UMI counts: 12,543
  • Mitochondrial content: 8.2% average
  • Tissue coverage: 85% of capture area
  • Mean genes per spot: 3,245
  • Mean UMI counts: 12,543
  • Mitochondrial content: 8.2% average
  • Tissue coverage: 85% of capture area

Spatial Domains Identified

Spatial Domains Identified

  • 7 distinct spatial domains detected via graph-based clustering
    • Domain 1: Tumor core (32% of tissue)
    • Domain 2: Invasive margin (18%)
    • Domain 3: Stromal region (25%)
    • Domain 4: Immune infiltrate (12%)
    • Domain 5: Necrotic region (8%)
    • Domain 6: Normal epithelium (3%)
    • Domain 7: Adipose tissue (2%)
  • 7 distinct spatial domains detected via graph-based clustering
    • Domain 1: Tumor core (32% of tissue)
    • Domain 2: Invasive margin (18%)
    • Domain 3: Stromal region (25%)
    • Domain 4: Immune infiltrate (12%)
    • Domain 5: Necrotic region (8%)
    • Domain 6: Normal epithelium (3%)
    • Domain 7: Adipose tissue (2%)

Top Marker Genes per Domain

Top Marker Genes per Domain

Domain 1 (Tumor Core)

Domain 1 (Tumor Core)

  • EPCAM, KRT19, MKI67, CCNB1, TOP2A (proliferative tumor)
  • EPCAM, KRT19, MKI67, CCNB1, TOP2A (proliferative tumor)

Domain 2 (Invasive Margin)

Domain 2 (Invasive Margin)

  • VIM, FN1, MMP2, SNAI2 (EMT signature)
  • VIM, FN1, MMP2, SNAI2 (EMT signature)

Domain 4 (Immune Infiltrate)

Domain 4 (Immune Infiltrate)

  • CD3D, CD8A, CD4, PTPRC (T cell enriched)
  • CD68, CD14 (macrophage enriched)
  • CD3D, CD8A, CD4, PTPRC (T cell enriched)
  • CD68, CD14 (macrophage enriched)

Spatially Variable Genes

Spatially Variable Genes

  • 456 genes with significant spatial patterns (Moran's I, FDR < 0.05)
  • 456 genes with significant spatial patterns (Moran's I, FDR < 0.05)

Top 10 Spatial Genes

Top 10 Spatial Genes

  1. MKI67 (I=0.82) - Hotspot pattern in tumor core
  2. CD8A (I=0.78) - Gradient from margin to stroma
  3. VIM (I=0.75) - Boundary enrichment at invasive margin
  4. COL1A1 (I=0.71) - Stromal-specific expression
  5. EPCAM (I=0.69) - Tumor region pattern
  1. MKI67 (I=0.82) - Hotspot pattern in tumor core
  2. CD8A (I=0.78) - Gradient from margin to stroma
  3. VIM (I=0.75) - Boundary enrichment at invasive margin
  4. COL1A1 (I=0.71) - Stromal-specific expression
  5. EPCAM (I=0.69) - Tumor region pattern

Cell Type Deconvolution

Cell Type Deconvolution

Integration with scRNA-seq reference (Bassez et al. 2021)
Integration with scRNA-seq reference (Bassez et al. 2021)

Cell Type Spatial Distributions

Cell Type Spatial Distributions

  • Tumor cells: Concentrated in core, sparse at margin
  • T cells: Enriched at invasive margin and infiltrate zones
  • CAFs: Stromal region and invasive margin
  • Macrophages: Scattered, enriched near necrosis
  • B cells: Lymphoid aggregates (2% of tissue)
  • Tumor cells: Concentrated in core, sparse at margin
  • T cells: Enriched at invasive margin and infiltrate zones
  • CAFs: Stromal region and invasive margin
  • Macrophages: Scattered, enriched near necrosis
  • B cells: Lymphoid aggregates (2% of tissue)

Tumor Microenvironment Composition

Tumor Microenvironment Composition

  • Tumor core: 85% tumor cells, 10% CAFs, 5% immune
  • Invasive margin: 45% tumor, 30% CAFs, 25% immune (T cell rich)
  • Immune infiltrate: 70% T cells, 20% macrophages, 10% B cells
  • Tumor core: 85% tumor cells, 10% CAFs, 5% immune
  • Invasive margin: 45% tumor, 30% CAFs, 25% immune (T cell rich)
  • Immune infiltrate: 70% T cells, 20% macrophages, 10% B cells

Spatial Cell Communication

Spatial Cell Communication

Top L-R Interactions (Spatially Proximal)

Top L-R Interactions (Spatially Proximal)

  1. Tumor → T cell: CD274 (PD-L1) → PDCD1 (PD-1)
    • Hotspot: Invasive margin
    • Interpretation: Immune checkpoint evasion
  2. CAF → Tumor: TGFB1 → TGFBR2
    • Hotspot: Stromal-tumor interface
    • Interpretation: TGF-β-driven EMT
  3. Macrophage → Tumor: TNF → TNFRSF1A
    • Scattered across tumor
    • Interpretation: Inflammatory signaling
  1. Tumor → T cell: CD274 (PD-L1) → PDCD1 (PD-1)
    • Hotspot: Invasive margin
    • Interpretation: Immune checkpoint evasion
  2. CAF → Tumor: TGFB1 → TGFBR2
    • Hotspot: Stromal-tumor interface
    • Interpretation: TGF-β-driven EMT
  3. Macrophage → Tumor: TNF → TNFRSF1A
    • Scattered across tumor
    • Interpretation: Inflammatory signaling

Interaction Zones

Interaction Zones

  • Tumor-Immune Interface: 245 spots (7% of tissue)
    • High expression: CXCL10, CXCL9 (chemokines)
    • T cell recruitment and activation
  • Stromal-Tumor Interface: 387 spots (11% of tissue)
    • High expression: MMP2, MMP9 (matrix remodeling)
    • Invasion-promoting niche
  • Tumor-Immune Interface: 245 spots (7% of tissue)
    • High expression: CXCL10, CXCL9 (chemokines)
    • T cell recruitment and activation
  • Stromal-Tumor Interface: 387 spots (11% of tissue)
    • High expression: MMP2, MMP9 (matrix remodeling)
    • Invasion-promoting niche

Spatial Gradients

Spatial Gradients

  • Hypoxia gradient: HIF1A, VEGFA increase toward tumor core
  • Proliferation gradient: MKI67, TOP2A decrease from core to margin
  • Immune gradient: CD8A, GZMB peak at invasive margin
  • Hypoxia gradient: HIF1A, VEGFA increase toward tumor core
  • Proliferation gradient: MKI67, TOP2A decrease from core to margin
  • Immune gradient: CD8A, GZMB peak at invasive margin

Biological Interpretation

Biological Interpretation

Spatial analysis reveals distinct tumor microenvironment organization:
  1. Tumor core: Highly proliferative, hypoxic, immune-excluded
  2. Invasive margin: Active EMT, high immune infiltration, checkpoint expression
  3. Stromal barrier: CAF-rich, matrix remodeling, immunosuppressive signals
The invasive margin shows hallmarks of immune-tumor interaction with PD-L1/PD-1 checkpoint engagement, suggesting potential for checkpoint blockade therapy. CAF-mediated TGF-β signaling may drive EMT and therapy resistance at tumor-stroma interface.
Spatial analysis reveals distinct tumor microenvironment organization:
  1. Tumor core: Highly proliferative, hypoxic, immune-excluded
  2. Invasive margin: Active EMT, high immune infiltration, checkpoint expression
  3. Stromal barrier: CAF-rich, matrix remodeling, immunosuppressive signals
The invasive margin shows hallmarks of immune-tumor interaction with PD-L1/PD-1 checkpoint engagement, suggesting potential for checkpoint blockade therapy. CAF-mediated TGF-β signaling may drive EMT and therapy resistance at tumor-stroma interface.

Clinical Relevance

Clinical Relevance

  • Checkpoint inhibitor response: High immune infiltration at margin suggests potential
  • Resistance mechanisms: CAF barrier and TGF-β signaling
  • Biomarkers: Spatial arrangement of immune cells more predictive than bulk tumor metrics

---
  • Checkpoint inhibitor response: High immune infiltration at margin suggests potential
  • Resistance mechanisms: CAF barrier and TGF-β signaling
  • Biomarkers: Spatial arrangement of immune cells more predictive than bulk tumor metrics

---

Integration with ToolUniverse Skills

与ToolUniverse技能的整合

SkillUsed ForPhase
tooluniverse-single-cell
scRNA-seq reference for deconvolutionPhase 6
tooluniverse-single-cell
(Phase 10)
L-R database for communicationPhase 7
tooluniverse-gene-enrichment
Pathway enrichment for spatial domainsPhase 3
tooluniverse-multi-omics-integration
Integrate with other omicsPhase 8

技能用途阶段
tooluniverse-single-cell
用于反卷积的scRNA-seq参考阶段6
tooluniverse-single-cell
(阶段10)
用于通讯分析的配体-受体数据库阶段7
tooluniverse-gene-enrichment
空间区域的通路富集分析阶段3
tooluniverse-multi-omics-integration
与其他组学数据整合阶段8

Example Use Cases

示例用例

Use Case 1: Tumor Microenvironment Mapping

用例1: 肿瘤微环境映射

Question: "Map the spatial organization of tumor, immune, and stromal cells"
Workflow:
  1. Load Visium data, QC and normalize
  2. Spatial clustering → 7 domains identified
  3. Cell type deconvolution using scRNA-seq reference
  4. Map cell type distributions spatially
  5. Identify interaction zones (tumor-immune, tumor-stroma)
  6. Analyze L-R interactions in each zone
  7. Report: Comprehensive TME spatial architecture
问题: "绘制肿瘤、免疫和基质细胞的空间排列"
工作流程:
  1. 加载Visium数据,进行质量控制和标准化
  2. 空间聚类 → 识别出7个区域
  3. 使用scRNA-seq参考进行细胞类型反卷积
  4. 空间映射细胞类型分布
  5. 识别相互作用区域(肿瘤-免疫、肿瘤-基质)
  6. 分析每个区域的配体-受体相互作用
  7. 报告:全面的肿瘤微环境空间架构

Use Case 2: Developmental Gradient Analysis

用例2: 发育梯度分析

Question: "Identify spatial gene expression gradients in developing tissue"
Workflow:
  1. Load spatial data (e.g., mouse embryo)
  2. Identify spatially variable genes
  3. Classify gradient patterns (anterior-posterior, dorsal-ventral)
  4. Map morphogen expression (WNT, BMP, FGF)
  5. Correlate with cell fate markers
  6. Report: Developmental spatial patterns
问题: "识别发育组织中的基因表达空间梯度"
工作流程:
  1. 加载空间数据(如小鼠胚胎)
  2. 识别空间可变基因
  3. 分类梯度模式(前后轴、背腹轴)
  4. 绘制形态发生素表达(WNT、BMP、FGF)
  5. 与细胞命运标记物关联
  6. 报告:发育空间模式

Use Case 3: Brain Region Identification

用例3: 脑区识别

Question: "Automatically segment brain tissue into anatomical regions"
Workflow:
  1. Load Visium mouse brain data
  2. Spatial clustering with high resolution
  3. Match domains to known brain regions (cortex, hippocampus, etc.)
  4. Identify region-specific marker genes
  5. Validate with Allen Brain Atlas
  6. Report: Automated brain region annotation

问题: "自动将脑组织分割为解剖区域"
工作流程:
  1. 加载Visium小鼠脑数据
  2. 高分辨率空间聚类
  3. 将区域与已知脑区(皮层、海马体等)匹配
  4. 识别区域特异性标记基因
  5. 用Allen脑图谱验证
  6. 报告:自动化脑区注释

Quantified Minimums

量化最低要求

ComponentRequirement
Spots/cellsAt least 500 spatial locations
QCFilter low-quality spots, verify alignment
Spatial clusteringAt least one method (graph-based or spatial)
Spatial genesMoran's I or similar spatial test
VisualizationSpatial plots on tissue images
ReportDomains, spatial genes, visualizations

组件要求
斑点/细胞数至少500个空间位置
质量控制过滤低质量斑点,验证对齐情况
空间聚类至少使用一种方法(基于图或空间的方法)
空间基因使用Moran's I或类似的空间检验方法
可视化组织图像上的空间图
报告包含区域、空间基因和可视化内容

Limitations

局限性

  • Resolution: Visium spots contain multiple cells (not single-cell)
  • Gene coverage: Imaging methods have limited gene panels
  • 3D structure: Most platforms are 2D sections
  • Tissue quality: Requires well-preserved tissue for imaging
  • Computational: Large datasets require significant memory
  • Reference dependency: Deconvolution quality depends on scRNA-seq reference

  • 分辨率: Visium斑点包含多个细胞(非单细胞分辨率)
  • 基因覆盖度: 基于成像的平台基因面板有限
  • 3D结构: 大多数平台仅支持2D切片
  • 组织质量: 成像需要保存完好的组织
  • 计算需求: 大型数据集需要大量内存
  • 参考依赖性: 反卷积质量取决于scRNA-seq参考数据

References

参考文献