tooluniverse-spatial-transcriptomics

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Spatial Transcriptomics Analysis

空间转录组数据分析

Comprehensive analysis of spatially-resolved transcriptomics data to understand gene expression patterns in tissue architecture context. Combines expression profiling with spatial coordinates to reveal tissue organization, cell-cell interactions, and spatially variable genes.

对空间分辨转录组数据进行全面分析，以了解组织结构背景下的基因表达模式。将表达谱分析与空间坐标相结合，揭示组织排列、细胞间相互作用以及空间可变基因。

When to Use This Skill

何时使用该技能

Triggers:

User has spatial transcriptomics data (Visium, MERFISH, seqFISH, etc.)
Questions about tissue architecture or spatial organization
Spatial gene expression pattern analysis
Cell-cell proximity or neighborhood analysis requests
Tumor microenvironment spatial structure questions
Integration of spatial with single-cell data
Spatial domain identification
Tissue morphology correlation with expression

Example Questions This Skill Solves:

"Analyze this 10x Visium dataset to identify spatial domains"
"Which genes show spatially variable expression in this tissue?"
"Map the tumor microenvironment spatial organization"
"Find genes enriched at tissue boundaries"
"Identify cell-cell interactions based on spatial proximity"
"Integrate spatial transcriptomics with scRNA-seq annotations"
"Characterize spatial gradients in gene expression"
"Map ligand-receptor pairs in tissue context"

触发场景:

用户拥有空间转录组数据（Visium、MERFISH、seqFISH等）
关于组织架构或空间排列的问题
空间基因表达模式分析需求
细胞间邻近性或微环境分析请求
肿瘤微环境空间结构相关问题
空间数据与单细胞数据的整合需求
空间区域识别需求
组织形态与表达的相关性分析

该技能可解决的示例问题:

"分析这个10x Visium数据集以识别空间区域"
"该组织中哪些基因表现出空间可变表达？"
"绘制肿瘤微环境的空间排列"
"找出在组织边界处富集的基因"
"基于空间邻近性识别细胞间相互作用"
"将空间转录组数据与scRNA-seq注释整合"
"表征基因表达的空间梯度"
"在组织背景下绘制配体-受体对"

Core Capabilities

核心能力

Capability	Description
Data Import	10x Visium, MERFISH, seqFISH, Slide-seq, STARmap, Xenium formats
Quality Control	Spot/cell QC, spatial alignment verification, tissue coverage
Normalization	Spatial-aware normalization accounting for tissue heterogeneity
Spatial Clustering	Identify spatial domains with similar expression profiles
Spatial Variable Genes	Find genes with non-random spatial patterns
Neighborhood Analysis	Cell-cell proximity, spatial neighborhoods, niche identification
Spatial Patterns	Gradients, boundaries, hotspots, expression waves
Integration	Merge with scRNA-seq for cell type mapping
Ligand-Receptor Spatial	Map cell communication in tissue context
Visualization	Spatial plots, heatmaps on tissue, 3D reconstruction

能力	描述
数据导入	支持10x Visium、MERFISH、seqFISH、Slide-seq、STARmap、Xenium格式
质量控制	斑点/细胞质量控制、空间对齐验证、组织覆盖度分析
标准化	考虑组织异质性的空间感知标准化
空间聚类	识别具有相似表达谱的空间区域
空间可变基因	寻找具有非随机空间模式的基因
微环境分析	细胞间邻近性、空间微环境、生态位识别
空间模式	梯度、边界、热点、表达波
数据整合	与scRNA-seq合并以进行细胞类型映射
空间配体-受体分析	在组织背景下绘制细胞通讯
可视化	空间图、组织上的热图、3D重建

Workflow Overview

工作流程概述

Input: Spatial Transcriptomics Data + Tissue Image
    |
    v
Phase 1: Data Import & QC
    |-- Load spatial coordinates + expression matrix
    |-- Load tissue histology image
    |-- Quality control per spot/cell
    |-- Filter low-quality spots
    |-- Align spatial coordinates to tissue
    |
    v
Phase 2: Preprocessing
    |-- Normalization (spatial-aware methods)
    |-- Highly variable gene selection
    |-- Dimensionality reduction (PCA)
    |-- Spatial lag smoothing (optional)
    |
    v
Phase 3: Spatial Clustering
    |-- Identify spatial domains/regions
    |-- Graph-based clustering with spatial constraints
    |-- Annotate domains with marker genes
    |-- Visualize domains on tissue
    |
    v
Phase 4: Spatial Variable Genes
    |-- Test for spatial autocorrelation (Moran's I, Geary's C)
    |-- Identify genes with spatial patterns
    |-- Classify pattern types (gradient, hotspot, boundary)
    |-- Rank by spatial significance
    |
    v
Phase 5: Neighborhood Analysis
    |-- Define spatial neighborhoods (k-NN, radius)
    |-- Calculate neighborhood composition
    |-- Identify interaction zones
    |-- Niche characterization
    |
    v
Phase 6: Integration with scRNA-seq
    |-- Cell type deconvolution per spot
    |-- Map cell types to spatial locations
    |-- Predict cell type spatial distributions
    |-- Validate with marker genes
    |
    v
Phase 7: Spatial Cell Communication
    |-- Identify proximal cell type pairs
    |-- Query ligand-receptor database (OmniPath)
    |-- Score spatial interactions
    |-- Map communication hotspots
    |
    v
Phase 8: Generate Spatial Report
    |-- Tissue overview with domains
    |-- Spatially variable genes
    |-- Cell type spatial maps
    |-- Interaction networks in tissue context
    |-- 3D visualization (if applicable)

Input: Spatial Transcriptomics Data + Tissue Image
    |
    v
Phase 1: Data Import & QC
    |-- Load spatial coordinates + expression matrix
    |-- Load tissue histology image
    |-- Quality control per spot/cell
    |-- Filter low-quality spots
    |-- Align spatial coordinates to tissue
    |
    v
Phase 2: Preprocessing
    |-- Normalization (spatial-aware methods)
    |-- Highly variable gene selection
    |-- Dimensionality reduction (PCA)
    |-- Spatial lag smoothing (optional)
    |
    v
Phase 3: Spatial Clustering
    |-- Identify spatial domains/regions
    |-- Graph-based clustering with spatial constraints
    |-- Annotate domains with marker genes
    |-- Visualize domains on tissue
    |
    v
Phase 4: Spatial Variable Genes
    |-- Test for spatial autocorrelation (Moran's I, Geary's C)
    |-- Identify genes with spatial patterns
    |-- Classify pattern types (gradient, hotspot, boundary)
    |-- Rank by spatial significance
    |
    v
Phase 5: Neighborhood Analysis
    |-- Define spatial neighborhoods (k-NN, radius)
    |-- Calculate neighborhood composition
    |-- Identify interaction zones
    |-- Niche characterization
    |
    v
Phase 6: Integration with scRNA-seq
    |-- Cell type deconvolution per spot
    |-- Map cell types to spatial locations
    |-- Predict cell type spatial distributions
    |-- Validate with marker genes
    |
    v
Phase 7: Spatial Cell Communication
    |-- Identify proximal cell type pairs
    |-- Query ligand-receptor database (OmniPath)
    |-- Score spatial interactions
    |-- Map communication hotspots
    |
    v
Phase 8: Generate Spatial Report
    |-- Tissue overview with domains
    |-- Spatially variable genes
    |-- Cell type spatial maps
    |-- Interaction networks in tissue context
    |-- 3D visualization (if applicable)

Phase Details

阶段详情

Phase 1: Data Import & Quality Control

阶段1: 数据导入与质量控制

Objective: Load spatial data and assess quality.

Supported platforms:

10x Visium (most common):

Spots: 55μm diameter, ~50 cells per spot
Resolution: ~5,000-10,000 spots per capture area
Data: Expression matrix + spatial coordinates + H&E image

MERFISH/seqFISH (imaging-based):

Single-cell resolution
Targeted gene panels (100-10,000 genes)
Absolute coordinates per cell

Slide-seq/Slide-seqV2:

10μm bead resolution
Genome-wide profiling

Xenium (10x single-cell spatial):

Single-cell resolution
Large gene panels (300+ genes)
Subcellular resolution

Data loading (Visium):

python

def load_visium_data(data_dir):
    """
    Load 10x Visium spatial transcriptomics data.

    Expected structure:
    data_dir/
      ├── filtered_feature_bc_matrix/
      │   ├── barcodes.tsv.gz
      │   ├── features.tsv.gz
      │   └── matrix.mtx.gz
      ├── spatial/
      │   ├── tissue_positions_list.csv
      │   ├── scalefactors_json.json
      │   └── tissue_hires_image.png

    Returns: AnnData object with spatial coordinates
    """
    import scanpy as sc
    import pandas as pd

    # Load expression data
    adata = sc.read_visium(data_dir)

    # Spatial coordinates are in adata.obsm['spatial']
    # Tissue image in adata.uns['spatial']

    return adata

Quality Control:

Spot-level QC:

python

def spatial_qc(adata):
    """
    Quality control for spatial transcriptomics data.
    """
    import scanpy as sc

    # Calculate QC metrics
    sc.pp.calculate_qc_metrics(adata, inplace=True)

    # Visualize QC metrics spatially
    sc.pl.spatial(adata, color='n_genes_by_counts', title='Genes per Spot')
    sc.pl.spatial(adata, color='total_counts', title='UMI Counts per Spot')

    # Filter criteria
    # - Min 200 genes per spot
    # - Min 500 UMI counts per spot
    # - Max mitochondrial content < 20%

    sc.pp.filter_cells(adata, min_genes=200)
    sc.pp.filter_cells(adata, min_counts=500)

    # Mitochondrial filtering
    adata.var['mt'] = adata.var_names.str.startswith('MT-')
    sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True)
    adata = adata[adata.obs['pct_counts_mt'] < 20].copy()

    return adata

Spatial alignment verification:

python

def verify_spatial_alignment(adata):
    """
    Verify spatial coordinates align with tissue image.
    """
    import matplotlib.pyplot as plt

    # Plot spots on tissue image
    fig, ax = plt.subplots(figsize=(10, 10))

    # Tissue image
    img = adata.uns['spatial']['tissue_hires_image']
    ax.imshow(img)

    # Overlay spot coordinates
    coords = adata.obsm['spatial']
    ax.scatter(coords[:, 0], coords[:, 1], c='red', s=1, alpha=0.5)

    ax.set_title('Spatial Alignment Verification')
    plt.axis('off')

目标: 加载空间数据并评估质量。

支持的平台:

10x Visium（最常用）:

斑点：直径55μm，每个斑点约含50个细胞
分辨率：每个捕获区域约5,000-10,000个斑点
数据：表达矩阵 + 空间坐标 + H&E图像

MERFISH/seqFISH（基于成像）:

单细胞分辨率
靶向基因面板（100-10,000个基因）
每个细胞的绝对坐标

Slide-seq/Slide-seqV2:

10μm磁珠分辨率
全基因组分析

Xenium（10x单细胞空间平台）:

单细胞分辨率
大型基因面板（300+基因）
亚细胞分辨率

数据加载（Visium）:

python

def load_visium_data(data_dir):
    """
    Load 10x Visium spatial transcriptomics data.

    Expected structure:
    data_dir/
      ├── filtered_feature_bc_matrix/
      │   ├── barcodes.tsv.gz
      │   ├── features.tsv.gz
      │   └── matrix.mtx.gz
      ├── spatial/
      │   ├── tissue_positions_list.csv
      │   ├── scalefactors_json.json
      │   └── tissue_hires_image.png

    Returns: AnnData object with spatial coordinates
    """
    import scanpy as sc
    import pandas as pd

    # Load expression data
    adata = sc.read_visium(data_dir)

    # Spatial coordinates are in adata.obsm['spatial']
    # Tissue image in adata.uns['spatial']

    return adata

质量控制:

斑点级质量控制:

python

def spatial_qc(adata):
    """
    Quality control for spatial transcriptomics data.
    """
    import scanpy as sc

    # Calculate QC metrics
    sc.pp.calculate_qc_metrics(adata, inplace=True)

    # Visualize QC metrics spatially
    sc.pl.spatial(adata, color='n_genes_by_counts', title='Genes per Spot')
    sc.pl.spatial(adata, color='total_counts', title='UMI Counts per Spot')

    # Filter criteria
    # - Min 200 genes per spot
    # - Min 500 UMI counts per spot
    # - Max mitochondrial content < 20%

    sc.pp.filter_cells(adata, min_genes=200)
    sc.pp.filter_cells(adata, min_counts=500)

    # Mitochondrial filtering
    adata.var['mt'] = adata.var_names.str.startswith('MT-')
    sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True)
    adata = adata[adata.obs['pct_counts_mt'] < 20].copy()

    return adata

空间对齐验证:

python

def verify_spatial_alignment(adata):
    """
    Verify spatial coordinates align with tissue image.
    """
    import matplotlib.pyplot as plt

    # Plot spots on tissue image
    fig, ax = plt.subplots(figsize=(10, 10))

    # Tissue image
    img = adata.uns['spatial']['tissue_hires_image']
    ax.imshow(img)

    # Overlay spot coordinates
    coords = adata.obsm['spatial']
    ax.scatter(coords[:, 0], coords[:, 1], c='red', s=1, alpha=0.5)

    ax.set_title('Spatial Alignment Verification')
    plt.axis('off')

Phase 2: Preprocessing & Normalization

阶段2: 预处理与标准化

Objective: Normalize data accounting for spatial heterogeneity.

Normalization:

python

def normalize_spatial(adata):
    """
    Normalize spatial transcriptomics data.
    """
    import scanpy as sc

    # Filter genes (min 3 spots)
    sc.pp.filter_genes(adata, min_cells=3)

    # Normalize to median total counts
    sc.pp.normalize_total(adata, target_sum=1e4)

    # Log-transform
    sc.pp.log1p(adata)

    # Store raw counts
    adata.raw = adata

    return adata

Highly variable genes:

python

def select_hvg_spatial(adata):
    """
    Select highly variable genes for spatial analysis.
    """
    import scanpy as sc

    # Standard HVG selection
    sc.pp.highly_variable_genes(adata, n_top_genes=2000)

    # Optionally: weight by spatial autocorrelation
    # Genes with spatial patterns are more informative

    return adata

Spatial smoothing (optional):

python

def spatial_smooth(adata, radius=2):
    """
    Smooth expression by averaging over spatial neighbors.

    Useful for noisy data, but can blur boundaries.
    """
    from sklearn.neighbors import NearestNeighbors

    # Find spatial neighbors
    coords = adata.obsm['spatial']
    nn = NearestNeighbors(n_neighbors=radius, metric='euclidean')
    nn.fit(coords)
    distances, indices = nn.kneighbors(coords)

    # Smooth expression matrix
    X_smooth = adata.X.copy()
    for i in range(adata.n_obs):
        neighbors = indices[i]
        X_smooth[i] = adata.X[neighbors].mean(axis=0)

    adata.layers['smoothed'] = X_smooth

    return adata

目标: 针对空间异质性对数据进行标准化。

标准化:

python

def normalize_spatial(adata):
    """
    Normalize spatial transcriptomics data.
    """
    import scanpy as sc

    # Filter genes (min 3 spots)
    sc.pp.filter_genes(adata, min_cells=3)

    # Normalize to median total counts
    sc.pp.normalize_total(adata, target_sum=1e4)

    # Log-transform
    sc.pp.log1p(adata)

    # Store raw counts
    adata.raw = adata

    return adata

高可变基因:

python

def select_hvg_spatial(adata):
    """
    Select highly variable genes for spatial analysis.
    """
    import scanpy as sc

    # Standard HVG selection
    sc.pp.highly_variable_genes(adata, n_top_genes=2000)

    # Optionally: weight by spatial autocorrelation
    # Genes with spatial patterns are more informative

    return adata

空间平滑（可选）:

python

def spatial_smooth(adata, radius=2):
    """
    Smooth expression by averaging over spatial neighbors.

    Useful for noisy data, but can blur boundaries.
    """
    from sklearn.neighbors import NearestNeighbors

    # Find spatial neighbors
    coords = adata.obsm['spatial']
    nn = NearestNeighbors(n_neighbors=radius, metric='euclidean')
    nn.fit(coords)
    distances, indices = nn.kneighbors(coords)

    # Smooth expression matrix
    X_smooth = adata.X.copy()
    for i in range(adata.n_obs):
        neighbors = indices[i]
        X_smooth[i] = adata.X[neighbors].mean(axis=0)

    adata.layers['smoothed'] = X_smooth

    return adata

Phase 3: Spatial Clustering

阶段3: 空间聚类

Objective: Identify spatial domains (regions with distinct expression).

Graph-based clustering with spatial constraints:

python

def spatial_clustering(adata, n_neighbors=6):
    """
    Cluster spots into spatial domains.

    Uses both expression similarity AND spatial proximity.
    """
    import scanpy as sc
    import squidpy as sq

    # PCA for dimensionality reduction
    sc.pp.pca(adata, n_comps=50)

    # Build spatial neighbor graph
    sq.gr.spatial_neighbors(adata, coord_type='generic', n_neighs=n_neighbors)

    # Clustering with spatial constraints
    # Uses both PCA space and spatial graph
    sc.tl.leiden(adata, resolution=1.0, key_added='spatial_domain')

    # Visualize domains on tissue
    sc.pl.spatial(adata, color='spatial_domain', title='Spatial Domains')

    return adata

Domain marker genes:

python

def find_domain_markers(adata):
    """
    Identify marker genes for each spatial domain.
    """
    import scanpy as sc

    # Differential expression per domain
    sc.tl.rank_genes_groups(adata, groupby='spatial_domain', method='wilcoxon')

    # Get top markers per domain
    markers = sc.get.rank_genes_groups_df(adata, group=None)

    return markers

目标: 识别空间区域（具有不同表达特征的区域）。

带空间约束的图聚类:

python

def spatial_clustering(adata, n_neighbors=6):
    """
    Cluster spots into spatial domains.

    Uses both expression similarity AND spatial proximity.
    """
    import scanpy as sc
    import squidpy as sq

    # PCA for dimensionality reduction
    sc.pp.pca(adata, n_comps=50)

    # Build spatial neighbor graph
    sq.gr.spatial_neighbors(adata, coord_type='generic', n_neighs=n_neighbors)

    # Clustering with spatial constraints
    # Uses both PCA space and spatial graph
    sc.tl.leiden(adata, resolution=1.0, key_added='spatial_domain')

    # Visualize domains on tissue
    sc.pl.spatial(adata, color='spatial_domain', title='Spatial Domains')

    return adata

区域标记基因:

python

def find_domain_markers(adata):
    """
    Identify marker genes for each spatial domain.
    """
    import scanpy as sc

    # Differential expression per domain
    sc.tl.rank_genes_groups(adata, groupby='spatial_domain', method='wilcoxon')

    # Get top markers per domain
    markers = sc.get.rank_genes_groups_df(adata, group=None)

    return markers

Phase 4: Spatially Variable Genes

阶段4: 空间可变基因

Objective: Find genes with non-random spatial patterns.

Moran's I (spatial autocorrelation):

python

def identify_spatial_genes(adata):
    """
    Test for spatial autocorrelation using Moran's I.

    Moran's I > 0: Positive spatial autocorrelation (clustering)
    Moran's I ~ 0: Random spatial distribution
    Moran's I < 0: Negative autocorrelation (checkerboard)
    """
    import squidpy as sq

    # Calculate Moran's I for all genes
    sq.gr.spatial_autocorr(
        adata,
        mode='moran',
        n_perms=100,
        n_jobs=-1
    )

    # Results in adata.uns['moranI']
    spatial_genes = adata.uns['moranI'].sort_values('I', ascending=False)

    # Filter significant spatial genes (FDR < 0.05)
    sig_spatial = spatial_genes[spatial_genes['pval_norm_fdr_bh'] < 0.05]

    return sig_spatial

Spatial pattern classification:

python

def classify_spatial_patterns(adata, spatial_genes):
    """
    Classify types of spatial patterns.

    Pattern types:
    - Gradient: Smooth directional change
    - Hotspot: Localized high expression
    - Boundary: Expression at domain edges
    - Periodic: Regular spacing
    """
    patterns = {}

    for gene in spatial_genes.index[:100]:  # Top 100 spatial genes
        # Get expression and coordinates
        expr = adata[:, gene].X.toarray().flatten()
        coords = adata.obsm['spatial']

        # Detect pattern type
        pattern_type = detect_pattern_type(expr, coords)
        patterns[gene] = pattern_type

    return patterns

目标: 寻找具有非随机空间模式的基因。

Moran's I（空间自相关）:

python

def identify_spatial_genes(adata):
    """
    Test for spatial autocorrelation using Moran's I.

    Moran's I > 0: Positive spatial autocorrelation (clustering)
    Moran's I ~ 0: Random spatial distribution
    Moran's I < 0: Negative autocorrelation (checkerboard)
    """
    import squidpy as sq

    # Calculate Moran's I for all genes
    sq.gr.spatial_autocorr(
        adata,
        mode='moran',
        n_perms=100,
        n_jobs=-1
    )

    # Results in adata.uns['moranI']
    spatial_genes = adata.uns['moranI'].sort_values('I', ascending=False)

    # Filter significant spatial genes (FDR < 0.05)
    sig_spatial = spatial_genes[spatial_genes['pval_norm_fdr_bh'] < 0.05]

    return sig_spatial

空间模式分类:

python

def classify_spatial_patterns(adata, spatial_genes):
    """
    Classify types of spatial patterns.

    Pattern types:
    - Gradient: Smooth directional change
    - Hotspot: Localized high expression
    - Boundary: Expression at domain edges
    - Periodic: Regular spacing
    """
    patterns = {}

    for gene in spatial_genes.index[:100]:  # Top 100 spatial genes
        # Get expression and coordinates
        expr = adata[:, gene].X.toarray().flatten()
        coords = adata.obsm['spatial']

        # Detect pattern type
        pattern_type = detect_pattern_type(expr, coords)
        patterns[gene] = pattern_type

    return patterns

Phase 5: Neighborhood Analysis

阶段5: 微环境分析

Objective: Analyze cell-cell proximity and spatial niches.

Define spatial neighborhoods:

python

def analyze_neighborhoods(adata, radius=150):
    """
    Analyze spatial neighborhood composition.

    For each spot, characterize its microenvironment.
    """
    import squidpy as sq

    # Calculate neighborhood enrichment
    # Tests if cell types are enriched in proximity
    sq.gr.nhood_enrichment(adata, cluster_key='spatial_domain')

    # Visualize neighborhood enrichment
    sq.pl.nhood_enrichment(adata, cluster_key='spatial_domain')

    # Results: which domains are spatially proximal?

    return adata

Interaction zones:

python

def identify_interaction_zones(adata, domain_a, domain_b):
    """
    Find boundary regions between two spatial domains.

    These are hotspots for cell-cell interactions.
    """
    # Get spots from each domain
    spots_a = adata.obs['spatial_domain'] == domain_a
    spots_b = adata.obs['spatial_domain'] == domain_b

    # Find spots that neighbor the other domain
    # (spots from A that have neighbors in B)
    coords = adata.obsm['spatial']
    from sklearn.neighbors import NearestNeighbors

    nn = NearestNeighbors(n_neighbors=6)
    nn.fit(coords)
    distances, indices = nn.kneighbors(coords)

    interaction_spots = []
    for i, spot_in_a in enumerate(spots_a):
        if spot_in_a:
            neighbors = indices[i]
            if any(spots_b[neighbors]):
                interaction_spots.append(i)

    # Mark interaction zone
    adata.obs['interaction_zone'] = False
    adata.obs.loc[interaction_spots, 'interaction_zone'] = True

    return adata

目标: 分析细胞间邻近性和空间生态位。

定义空间微环境:

python

def analyze_neighborhoods(adata, radius=150):
    """
    Analyze spatial neighborhood composition.

    For each spot, characterize its microenvironment.
    """
    import squidpy as sq

    # Calculate neighborhood enrichment
    # Tests if cell types are enriched in proximity
    sq.gr.nhood_enrichment(adata, cluster_key='spatial_domain')

    # Visualize neighborhood enrichment
    sq.pl.nhood_enrichment(adata, cluster_key='spatial_domain')

    # Results: which domains are spatially proximal?

    return adata

相互作用区域:

python

def identify_interaction_zones(adata, domain_a, domain_b):
    """
    Find boundary regions between two spatial domains.

    These are hotspots for cell-cell interactions.
    """
    # Get spots from each domain
    spots_a = adata.obs['spatial_domain'] == domain_a
    spots_b = adata.obs['spatial_domain'] == domain_b

    # Find spots that neighbor the other domain
    # (spots from A that have neighbors in B)
    coords = adata.obsm['spatial']
    from sklearn.neighbors import NearestNeighbors

    nn = NearestNeighbors(n_neighbors=6)
    nn.fit(coords)
    distances, indices = nn.kneighbors(coords)

    interaction_spots = []
    for i, spot_in_a in enumerate(spots_a):
        if spot_in_a:
            neighbors = indices[i]
            if any(spots_b[neighbors]):
                interaction_spots.append(i)

    # Mark interaction zone
    adata.obs['interaction_zone'] = False
    adata.obs.loc[interaction_spots, 'interaction_zone'] = True

    return adata

Phase 6: Integration with Single-Cell RNA-seq

阶段6: 与单细胞RNA-seq的整合

Objective: Map cell types from scRNA-seq to spatial locations.

Cell type deconvolution:

python

def deconvolve_cell_types(adata_spatial, adata_sc):
    """
    Predict cell type composition per spatial spot.

    Uses scRNA-seq reference to deconvolve Visium spots.
    Methods: Cell2location, Tangram, SPOTlight
    """
    import cell2location

    # Prepare single-cell reference
    # Extract signature genes per cell type
    cell_type_signatures = extract_signatures(adata_sc)

    # Run cell2location
    # Estimates cell type abundances per spot
    mod = cell2location.models.Cell2location(
        adata_spatial,
        cell_state_df=cell_type_signatures
    )

    mod.train(max_epochs=30000)

    # Add cell type proportions to adata_spatial
    adata_spatial.obsm['cell_type_fractions'] = mod.get_cell_type_fractions()

    return adata_spatial

Spatial cell type mapping:

python

def map_cell_types_spatial(adata):
    """
    Visualize cell type spatial distributions.
    """
    import scanpy as sc

    # For each cell type, plot abundance on tissue
    cell_types = adata.obsm['cell_type_fractions'].columns

    for ct in cell_types:
        sc.pl.spatial(
            adata,
            color=adata.obsm['cell_type_fractions'][ct],
            title=f'{ct} Spatial Distribution'
        )

目标: 将scRNA-seq中的细胞类型映射到空间位置。

细胞类型反卷积:

python

def deconvolve_cell_types(adata_spatial, adata_sc):
    """
    Predict cell type composition per spatial spot.

    Uses scRNA-seq reference to deconvolve Visium spots.
    Methods: Cell2location, Tangram, SPOTlight
    """
    import cell2location

    # Prepare single-cell reference
    # Extract signature genes per cell type
    cell_type_signatures = extract_signatures(adata_sc)

    # Run cell2location
    # Estimates cell type abundances per spot
    mod = cell2location.models.Cell2location(
        adata_spatial,
        cell_state_df=cell_type_signatures
    )

    mod.train(max_epochs=30000)

    # Add cell type proportions to adata_spatial
    adata_spatial.obsm['cell_type_fractions'] = mod.get_cell_type_fractions()

    return adata_spatial

空间细胞类型映射:

python

def map_cell_types_spatial(adata):
    """
    Visualize cell type spatial distributions.
    """
    import scanpy as sc

    # For each cell type, plot abundance on tissue
    cell_types = adata.obsm['cell_type_fractions'].columns

    for ct in cell_types:
        sc.pl.spatial(
            adata,
            color=adata.obsm['cell_type_fractions'][ct],
            title=f'{ct} Spatial Distribution'
        )

Phase 7: Spatial Cell Communication

阶段7: 空间细胞通讯

Objective: Map ligand-receptor interactions in tissue context.

Spatial proximity-based communication:

python

def spatial_cell_communication(adata):
    """
    Identify cell-cell communication based on spatial proximity.

    Requires:
    - Cell type annotations (from deconvolution)
    - Ligand-receptor database (OmniPath)
    """
    import squidpy as sq
    from tooluniverse import ToolUniverse

    tu = ToolUniverse()

    # Get ligand-receptor pairs from OmniPath
    lr_pairs = tu.run_one_function({
        "name": "OmniPath_get_ligand_receptor_interactions",
        "arguments": {"partners": ""}  # Get all pairs
    })

    # For each cell type pair that are spatially proximal
    # Calculate interaction scores
    sq.gr.ligrec(
        adata,
        n_perms=100,
        cluster_key='cell_type',
        interactions=lr_pairs,
        copy=False
    )

    # Visualize significant interactions
    sq.pl.ligrec(adata, cluster_key='cell_type')

    return adata

Communication hotspot mapping:

python

def map_communication_hotspots(adata, ligand, receptor):
    """
    Map spatial locations of specific L-R interactions.
    """
    import matplotlib.pyplot as plt

    # Get ligand expression
    ligand_expr = adata[:, ligand].X.toarray().flatten()

    # Get receptor expression
    receptor_expr = adata[:, receptor].X.toarray().flatten()

    # Interaction score = ligand × receptor
    interaction_score = ligand_expr * receptor_expr

    # Add to adata
    adata.obs[f'{ligand}_{receptor}_score'] = interaction_score

    # Visualize on tissue
    sc.pl.spatial(adata, color=f'{ligand}_{receptor}_score',
                  title=f'{ligand}-{receptor} Interaction Hotspots')

目标: 在组织背景下绘制配体-受体相互作用。

基于空间邻近性的通讯分析:

python

def spatial_cell_communication(adata):
    """
    Identify cell-cell communication based on spatial proximity.

    Requires:
    - Cell type annotations (from deconvolution)
    - Ligand-receptor database (OmniPath)
    """
    import squidpy as sq
    from tooluniverse import ToolUniverse

    tu = ToolUniverse()

    # Get ligand-receptor pairs from OmniPath
    lr_pairs = tu.run_one_function({
        "name": "OmniPath_get_ligand_receptor_interactions",
        "arguments": {"partners": ""}  # Get all pairs
    })

    # For each cell type pair that are spatially proximal
    # Calculate interaction scores
    sq.gr.ligrec(
        adata,
        n_perms=100,
        cluster_key='cell_type',
        interactions=lr_pairs,
        copy=False
    )

    # Visualize significant interactions
    sq.pl.ligrec(adata, cluster_key='cell_type')

    return adata

通讯热点映射:

python

def map_communication_hotspots(adata, ligand, receptor):
    """
    Map spatial locations of specific L-R interactions.
    """
    import matplotlib.pyplot as plt

    # Get ligand expression
    ligand_expr = adata[:, ligand].X.toarray().flatten()

    # Get receptor expression
    receptor_expr = adata[:, receptor].X.toarray().flatten()

    # Interaction score = ligand × receptor
    interaction_score = ligand_expr * receptor_expr

    # Add to adata
    adata.obs[f'{ligand}_{receptor}_score'] = interaction_score

    # Visualize on tissue
    sc.pl.spatial(adata, color=f'{ligand}_{receptor}_score',
                  title=f'{ligand}-{receptor} Interaction Hotspots')

Phase 8: Spatial Report Generation

阶段8: 空间报告生成

Generate comprehensive spatial report:

markdown

undefined

生成全面的空间报告:

markdown

undefined

Spatial Transcriptomics Analysis Report

Dataset Summary

Platform: 10x Visium
Tissue: Breast cancer tumor section
Spots: 3,562 (after QC filtering)
Genes: 18,432 detected
Resolution: 55μm spot diameter (~50 cells/spot)

Platform: 10x Visium
Tissue: Breast cancer tumor section
Spots: 3,562 (after QC filtering)
Genes: 18,432 detected
Resolution: 55μm spot diameter (~50 cells/spot)

Quality Control

Mean genes per spot: 3,245
Mean UMI counts: 12,543
Mitochondrial content: 8.2% average
Tissue coverage: 85% of capture area

Mean genes per spot: 3,245
Mean UMI counts: 12,543
Mitochondrial content: 8.2% average
Tissue coverage: 85% of capture area

Spatial Domains Identified

7 distinct spatial domains detected via graph-based clustering
- Domain 1: Tumor core (32% of tissue)
- Domain 2: Invasive margin (18%)
- Domain 3: Stromal region (25%)
- Domain 4: Immune infiltrate (12%)
- Domain 5: Necrotic region (8%)
- Domain 6: Normal epithelium (3%)
- Domain 7: Adipose tissue (2%)

7 distinct spatial domains detected via graph-based clustering
- Domain 1: Tumor core (32% of tissue)
- Domain 2: Invasive margin (18%)
- Domain 3: Stromal region (25%)
- Domain 4: Immune infiltrate (12%)
- Domain 5: Necrotic region (8%)
- Domain 6: Normal epithelium (3%)
- Domain 7: Adipose tissue (2%)

Top Marker Genes per Domain

Domain 1 (Tumor Core)

EPCAM, KRT19, MKI67, CCNB1, TOP2A (proliferative tumor)

EPCAM, KRT19, MKI67, CCNB1, TOP2A (proliferative tumor)

Domain 2 (Invasive Margin)

VIM, FN1, MMP2, SNAI2 (EMT signature)

VIM, FN1, MMP2, SNAI2 (EMT signature)

Domain 4 (Immune Infiltrate)

CD3D, CD8A, CD4, PTPRC (T cell enriched)
CD68, CD14 (macrophage enriched)

CD3D, CD8A, CD4, PTPRC (T cell enriched)
CD68, CD14 (macrophage enriched)

Spatially Variable Genes

456 genes with significant spatial patterns (Moran's I, FDR < 0.05)

456 genes with significant spatial patterns (Moran's I, FDR < 0.05)

Top 10 Spatial Genes

MKI67 (I=0.82) - Hotspot pattern in tumor core
CD8A (I=0.78) - Gradient from margin to stroma
VIM (I=0.75) - Boundary enrichment at invasive margin
COL1A1 (I=0.71) - Stromal-specific expression
EPCAM (I=0.69) - Tumor region pattern

MKI67 (I=0.82) - Hotspot pattern in tumor core
CD8A (I=0.78) - Gradient from margin to stroma
VIM (I=0.75) - Boundary enrichment at invasive margin
COL1A1 (I=0.71) - Stromal-specific expression
EPCAM (I=0.69) - Tumor region pattern

Cell Type Deconvolution

Integration with scRNA-seq reference (Bassez et al. 2021)

Cell Type Spatial Distributions

Tumor cells: Concentrated in core, sparse at margin
T cells: Enriched at invasive margin and infiltrate zones
CAFs: Stromal region and invasive margin
Macrophages: Scattered, enriched near necrosis
B cells: Lymphoid aggregates (2% of tissue)

Tumor cells: Concentrated in core, sparse at margin
T cells: Enriched at invasive margin and infiltrate zones
CAFs: Stromal region and invasive margin
Macrophages: Scattered, enriched near necrosis
B cells: Lymphoid aggregates (2% of tissue)

Tumor Microenvironment Composition

Tumor core: 85% tumor cells, 10% CAFs, 5% immune
Invasive margin: 45% tumor, 30% CAFs, 25% immune (T cell rich)
Immune infiltrate: 70% T cells, 20% macrophages, 10% B cells

Tumor core: 85% tumor cells, 10% CAFs, 5% immune
Invasive margin: 45% tumor, 30% CAFs, 25% immune (T cell rich)
Immune infiltrate: 70% T cells, 20% macrophages, 10% B cells

Spatial Cell Communication

Top L-R Interactions (Spatially Proximal)

Tumor → T cell: CD274 (PD-L1) → PDCD1 (PD-1)
- Hotspot: Invasive margin
- Interpretation: Immune checkpoint evasion
CAF → Tumor: TGFB1 → TGFBR2
- Hotspot: Stromal-tumor interface
- Interpretation: TGF-β-driven EMT
Macrophage → Tumor: TNF → TNFRSF1A
- Scattered across tumor
- Interpretation: Inflammatory signaling

Tumor → T cell: CD274 (PD-L1) → PDCD1 (PD-1)
- Hotspot: Invasive margin
- Interpretation: Immune checkpoint evasion
CAF → Tumor: TGFB1 → TGFBR2
- Hotspot: Stromal-tumor interface
- Interpretation: TGF-β-driven EMT
Macrophage → Tumor: TNF → TNFRSF1A
- Scattered across tumor
- Interpretation: Inflammatory signaling

Interaction Zones

Tumor-Immune Interface: 245 spots (7% of tissue)
- High expression: CXCL10, CXCL9 (chemokines)
- T cell recruitment and activation
Stromal-Tumor Interface: 387 spots (11% of tissue)
- High expression: MMP2, MMP9 (matrix remodeling)
- Invasion-promoting niche

Tumor-Immune Interface: 245 spots (7% of tissue)
- High expression: CXCL10, CXCL9 (chemokines)
- T cell recruitment and activation
Stromal-Tumor Interface: 387 spots (11% of tissue)
- High expression: MMP2, MMP9 (matrix remodeling)
- Invasion-promoting niche

Spatial Gradients

Hypoxia gradient: HIF1A, VEGFA increase toward tumor core
Proliferation gradient: MKI67, TOP2A decrease from core to margin
Immune gradient: CD8A, GZMB peak at invasive margin

Hypoxia gradient: HIF1A, VEGFA increase toward tumor core
Proliferation gradient: MKI67, TOP2A decrease from core to margin
Immune gradient: CD8A, GZMB peak at invasive margin

Biological Interpretation

Spatial analysis reveals distinct tumor microenvironment organization:

Tumor core: Highly proliferative, hypoxic, immune-excluded
Invasive margin: Active EMT, high immune infiltration, checkpoint expression
Stromal barrier: CAF-rich, matrix remodeling, immunosuppressive signals

The invasive margin shows hallmarks of immune-tumor interaction with PD-L1/PD-1 checkpoint engagement, suggesting potential for checkpoint blockade therapy. CAF-mediated TGF-β signaling may drive EMT and therapy resistance at tumor-stroma interface.

Spatial analysis reveals distinct tumor microenvironment organization:

Tumor core: Highly proliferative, hypoxic, immune-excluded
Invasive margin: Active EMT, high immune infiltration, checkpoint expression
Stromal barrier: CAF-rich, matrix remodeling, immunosuppressive signals

Clinical Relevance

Checkpoint inhibitor response: High immune infiltration at margin suggests potential
Resistance mechanisms: CAF barrier and TGF-β signaling
Biomarkers: Spatial arrangement of immune cells more predictive than bulk tumor metrics

---

Checkpoint inhibitor response: High immune infiltration at margin suggests potential
Resistance mechanisms: CAF barrier and TGF-β signaling
Biomarkers: Spatial arrangement of immune cells more predictive than bulk tumor metrics

---

Integration with ToolUniverse Skills

与ToolUniverse技能的整合

Skill	Used For	Phase
`tooluniverse-single-cell`	scRNA-seq reference for deconvolution	Phase 6
`tooluniverse-single-cell` (Phase 10)	L-R database for communication	Phase 7
`tooluniverse-gene-enrichment`	Pathway enrichment for spatial domains	Phase 3
`tooluniverse-multi-omics-integration`	Integrate with other omics	Phase 8

技能	用途	阶段
`tooluniverse-single-cell`	用于反卷积的scRNA-seq参考	阶段6
`tooluniverse-single-cell` （阶段10）	用于通讯分析的配体-受体数据库	阶段7
`tooluniverse-gene-enrichment`	空间区域的通路富集分析	阶段3
`tooluniverse-multi-omics-integration`	与其他组学数据整合	阶段8

Example Use Cases

示例用例

Use Case 1: Tumor Microenvironment Mapping

用例1: 肿瘤微环境映射

Question: "Map the spatial organization of tumor, immune, and stromal cells"

Workflow:

Load Visium data, QC and normalize
Spatial clustering → 7 domains identified
Cell type deconvolution using scRNA-seq reference
Map cell type distributions spatially
Identify interaction zones (tumor-immune, tumor-stroma)
Analyze L-R interactions in each zone
Report: Comprehensive TME spatial architecture

问题: "绘制肿瘤、免疫和基质细胞的空间排列"

工作流程:

加载Visium数据，进行质量控制和标准化
空间聚类 → 识别出7个区域
使用scRNA-seq参考进行细胞类型反卷积
空间映射细胞类型分布
识别相互作用区域（肿瘤-免疫、肿瘤-基质）
分析每个区域的配体-受体相互作用
报告：全面的肿瘤微环境空间架构

Use Case 2: Developmental Gradient Analysis

用例2: 发育梯度分析

Question: "Identify spatial gene expression gradients in developing tissue"

Workflow:

Load spatial data (e.g., mouse embryo)
Identify spatially variable genes
Classify gradient patterns (anterior-posterior, dorsal-ventral)
Map morphogen expression (WNT, BMP, FGF)
Correlate with cell fate markers
Report: Developmental spatial patterns

问题: "识别发育组织中的基因表达空间梯度"

工作流程:

加载空间数据（如小鼠胚胎）
识别空间可变基因
分类梯度模式（前后轴、背腹轴）
绘制形态发生素表达（WNT、BMP、FGF）
与细胞命运标记物关联
报告：发育空间模式

Use Case 3: Brain Region Identification

用例3: 脑区识别

Question: "Automatically segment brain tissue into anatomical regions"

Workflow:

Load Visium mouse brain data
Spatial clustering with high resolution
Match domains to known brain regions (cortex, hippocampus, etc.)
Identify region-specific marker genes
Validate with Allen Brain Atlas
Report: Automated brain region annotation

问题: "自动将脑组织分割为解剖区域"

工作流程:

加载Visium小鼠脑数据
高分辨率空间聚类
将区域与已知脑区（皮层、海马体等）匹配
识别区域特异性标记基因
用Allen脑图谱验证
报告：自动化脑区注释

Quantified Minimums

量化最低要求

Component	Requirement
Spots/cells	At least 500 spatial locations
QC	Filter low-quality spots, verify alignment
Spatial clustering	At least one method (graph-based or spatial)
Spatial genes	Moran's I or similar spatial test
Visualization	Spatial plots on tissue images
Report	Domains, spatial genes, visualizations

组件	要求
斑点/细胞数	至少500个空间位置
质量控制	过滤低质量斑点，验证对齐情况
空间聚类	至少使用一种方法（基于图或空间的方法）
空间基因	使用Moran's I或类似的空间检验方法
可视化	组织图像上的空间图
报告	包含区域、空间基因和可视化内容

Limitations

局限性

Resolution: Visium spots contain multiple cells (not single-cell)
Gene coverage: Imaging methods have limited gene panels
3D structure: Most platforms are 2D sections
Tissue quality: Requires well-preserved tissue for imaging
Computational: Large datasets require significant memory
Reference dependency: Deconvolution quality depends on scRNA-seq reference

分辨率: Visium斑点包含多个细胞（非单细胞分辨率）
基因覆盖度: 基于成像的平台基因面板有限
3D结构: 大多数平台仅支持2D切片
组织质量: 成像需要保存完好的组织
计算需求: 大型数据集需要大量内存
参考依赖性: 反卷积质量取决于scRNA-seq参考数据

References

参考文献

Methods:

Squidpy: https://doi.org/10.1038/s41592-021-01358-2
Cell2location: https://doi.org/10.1038/s41587-021-01139-4
SpatialDE: https://doi.org/10.1038/nmeth.4636

Platforms:

10x Visium: https://www.10xgenomics.com/products/spatial-gene-expression
MERFISH: https://doi.org/10.1126/science.aaa6090
Slide-seq: https://doi.org/10.1126/science.aaw1219

方法:

Squidpy: https://doi.org/10.1038/s41592-021-01358-2
Cell2location: https://doi.org/10.1038/s41587-021-01139-4
SpatialDE: https://doi.org/10.1038/nmeth.4636

平台:

10x Visium: https://www.10xgenomics.com/products/spatial-gene-expression
MERFISH: https://doi.org/10.1126/science.aaa6090
Slide-seq: https://doi.org/10.1126/science.aaw1219