pathml

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PathML

PathML

Overview

概述

PathML is a comprehensive Python toolkit for computational pathology workflows, designed to facilitate machine learning and image analysis for whole-slide pathology images. The framework provides modular, composable tools for loading diverse slide formats, preprocessing images, constructing spatial graphs, training deep learning models, and analyzing multiparametric imaging data from technologies like CODEX and multiplex immunofluorescence.
PathML是一款用于计算病理学工作流的综合性Python工具包,旨在为全切片病理学图像的机器学习与图像分析提供支持。该框架提供模块化、可组合的工具,用于加载多种切片格式、预处理图像、构建空间图、训练深度学习模型,以及分析来自CODEX和多重免疫荧光等技术的多参数成像数据。

When to Use This Skill

适用场景

Apply this skill for:
  • Loading and processing whole-slide images (WSI) in various proprietary formats
  • Preprocessing H&E stained tissue images with stain normalization
  • Nucleus detection, segmentation, and classification workflows
  • Building cell and tissue graphs for spatial analysis
  • Training or deploying machine learning models (HoVer-Net, HACTNet) on pathology data
  • Analyzing multiparametric imaging (CODEX, Vectra, MERFISH) for spatial proteomics
  • Quantifying marker expression from multiplex immunofluorescence
  • Managing large-scale pathology datasets with HDF5 storage
  • Tile-based analysis and stitching operations
在以下场景中应用该工具包:
  • 加载和处理多种专有格式的全切片图像(WSI)
  • 通过染色归一化预处理H&E染色组织图像
  • 细胞核检测、分割与分类工作流
  • 构建用于空间分析的细胞和组织图
  • 在病理学数据上训练或部署机器学习模型(HoVer-Net、HACTNet)
  • 分析空间蛋白质组学的多参数成像数据(CODEX、Vectra、MERFISH)
  • 量化多重免疫荧光的标志物表达
  • 使用HDF5存储管理大规模病理学数据集
  • 基于切片的分析与拼接操作

Core Capabilities

核心功能

PathML provides six major capability areas documented in detail within reference files:
PathML提供六大核心功能领域,相关详细说明记录在参考文档中:

1. Image Loading & Formats

1. 图像加载与格式支持

Load whole-slide images from 160+ proprietary formats including Aperio SVS, Hamamatsu NDPI, Leica SCN, Zeiss ZVI, DICOM, and OME-TIFF. PathML automatically handles vendor-specific formats and provides unified interfaces for accessing image pyramids, metadata, and regions of interest.
See:
references/image_loading.md
for supported formats, loading strategies, and working with different slide types.
加载160多种专有格式的全切片图像,包括Aperio SVS、Hamamatsu NDPI、Leica SCN、Zeiss ZVI、DICOM和OME-TIFF。PathML可自动处理厂商特定格式,并提供统一接口用于访问图像金字塔、元数据和感兴趣区域。
参考文档:
references/image_loading.md
,包含支持的格式、加载策略以及不同切片类型的使用方法。

2. Preprocessing Pipelines

2. 预处理流水线

Build modular preprocessing pipelines by composing transforms for image manipulation, quality control, stain normalization, tissue detection, and mask operations. PathML's Pipeline architecture enables reproducible, scalable preprocessing across large datasets.
Key transforms:
  • StainNormalizationHE
    - Macenko/Vahadane stain normalization
  • TissueDetectionHE
    ,
    NucleusDetectionHE
    - Tissue/nucleus segmentation
  • MedianBlur
    ,
    GaussianBlur
    - Noise reduction
  • LabelArtifactTileHE
    - Quality control for artifacts
See:
references/preprocessing.md
for complete transform catalog, pipeline construction, and preprocessing workflows.
通过组合图像操作、质量控制、染色归一化、组织检测和掩码操作等变换,构建模块化预处理流水线。PathML的Pipeline架构可实现大规模数据集的可复现、可扩展预处理。
关键变换:
  • StainNormalizationHE
    - Macenko/Vahadane染色归一化
  • TissueDetectionHE
    NucleusDetectionHE
    - 组织/细胞核分割
  • MedianBlur
    GaussianBlur
    - 降噪处理
  • LabelArtifactTileHE
    - 伪影质量控制
参考文档:
references/preprocessing.md
,包含完整的变换目录、流水线构建方法和预处理工作流。

3. Graph Construction

3. 图构建

Construct spatial graphs representing cellular and tissue-level relationships. Extract features from segmented objects to create graph-based representations suitable for graph neural networks and spatial analysis.
See:
references/graphs.md
for graph construction methods, feature extraction, and spatial analysis workflows.
构建代表细胞和组织层面关系的空间图。从分割对象中提取特征,创建适用于图神经网络和空间分析的图表示。
参考文档:
references/graphs.md
,包含图构建方法、特征提取和空间分析工作流。

4. Machine Learning

4. 机器学习

Train and deploy deep learning models for nucleus detection, segmentation, and classification. PathML integrates PyTorch with pre-built models (HoVer-Net, HACTNet), custom DataLoaders, and ONNX support for inference.
Key models:
  • HoVer-Net - Simultaneous nucleus segmentation and classification
  • HACTNet - Hierarchical cell-type classification
See:
references/machine_learning.md
for model training, evaluation, inference workflows, and working with public datasets.
训练和部署用于细胞核检测、分割与分类的深度学习模型。PathML集成PyTorch,提供预构建模型(HoVer-Net、HACTNet)、自定义DataLoaders以及用于推理的ONNX支持。
关键模型:
  • HoVer-Net - 同时实现细胞核分割与分类
  • HACTNet - 层级细胞类型分类
参考文档:
references/machine_learning.md
,包含模型训练、评估、推理工作流以及公共数据集的使用方法。

5. Multiparametric Imaging

5. 多参数成像

Analyze spatial proteomics and gene expression data from CODEX, Vectra, MERFISH, and other multiplex imaging platforms. PathML provides specialized slide classes and transforms for processing multiparametric data, cell segmentation with Mesmer, and quantification workflows.
See:
references/multiparametric.md
for CODEX/Vectra workflows, cell segmentation, marker quantification, and integration with AnnData.
分析来自CODEX、Vectra、MERFISH等多重成像平台的空间蛋白质组学和基因表达数据。PathML提供专门的切片类和变换,用于处理多参数数据、使用Mesmer模型进行细胞分割以及量化工作流。
参考文档:
references/multiparametric.md
,包含CODEX/Vectra工作流、细胞分割、标志物量化以及与AnnData的集成方法。

6. Data Management

6. 数据管理

Efficiently store and manage large pathology datasets using HDF5 format. PathML handles tiles, masks, metadata, and extracted features in unified storage structures optimized for machine learning workflows.
See:
references/data_management.md
for HDF5 integration, tile management, dataset organization, and batch processing strategies.
使用HDF5格式高效存储和管理大规模病理学数据集。PathML在统一的存储结构中处理切片、掩码、元数据和提取的特征,优化机器学习工作流。
参考文档:
references/data_management.md
,包含HDF5集成、切片管理、数据集组织和批量处理策略。

Quick Start

快速开始

Installation

安装

bash
undefined
bash
undefined

Install PathML

Install PathML

uv pip install pathml
uv pip install pathml

With optional dependencies for all features

With optional dependencies for all features

uv pip install pathml[all]
undefined
uv pip install pathml[all]
undefined

Basic Workflow Example

基础工作流示例

python
from pathml.core import SlideData
from pathml.preprocessing import Pipeline, StainNormalizationHE, TissueDetectionHE
python
from pathml.core import SlideData
from pathml.preprocessing import Pipeline, StainNormalizationHE, TissueDetectionHE

Load a whole-slide image

Load a whole-slide image

wsi = SlideData.from_slide("path/to/slide.svs")
wsi = SlideData.from_slide("path/to/slide.svs")

Create preprocessing pipeline

Create preprocessing pipeline

pipeline = Pipeline([ TissueDetectionHE(), StainNormalizationHE(target='normalize', stain_estimation_method='macenko') ])
pipeline = Pipeline([ TissueDetectionHE(), StainNormalizationHE(target='normalize', stain_estimation_method='macenko') ])

Run pipeline

Run pipeline

pipeline.run(wsi)
pipeline.run(wsi)

Access processed tiles

Access processed tiles

for tile in wsi.tiles: processed_image = tile.image tissue_mask = tile.masks['tissue']
undefined
for tile in wsi.tiles: processed_image = tile.image tissue_mask = tile.masks['tissue']
undefined

Common Workflows

常见工作流

H&E Image Analysis:
  1. Load WSI with appropriate slide class
  2. Apply tissue detection and stain normalization
  3. Perform nucleus detection or train segmentation models
  4. Extract features and build spatial graphs
  5. Conduct downstream analysis
Multiparametric Imaging (CODEX):
  1. Load CODEX slide with
    CODEXSlide
  2. Collapse multi-run channel data
  3. Segment cells using Mesmer model
  4. Quantify marker expression
  5. Export to AnnData for single-cell analysis
Training ML Models:
  1. Prepare dataset with public pathology data
  2. Create PyTorch DataLoader with PathML datasets
  3. Train HoVer-Net or custom models
  4. Evaluate on held-out test sets
  5. Deploy with ONNX for inference
H&E图像分析:
  1. 使用合适的切片类加载WSI
  2. 应用组织检测和染色归一化
  3. 执行细胞核检测或训练分割模型
  4. 提取特征并构建空间图
  5. 进行下游分析
多参数成像(CODEX):
  1. 使用
    CODEXSlide
    加载CODEX切片
  2. 合并多轮通道数据
  3. 使用Mesmer模型分割细胞
  4. 量化标志物表达
  5. 导出至AnnData用于单细胞分析
ML模型训练:
  1. 使用公共病理学数据准备数据集
  2. 结合PathML数据集创建PyTorch DataLoader
  3. 训练HoVer-Net或自定义模型
  4. 在预留测试集上评估模型
  5. 使用ONNX部署模型进行推理

References to Detailed Documentation

详细文档参考

When working on specific tasks, refer to the appropriate reference file for comprehensive information:
  • Loading images:
    references/image_loading.md
  • Preprocessing workflows:
    references/preprocessing.md
  • Spatial analysis:
    references/graphs.md
  • Model training:
    references/machine_learning.md
  • CODEX/multiplex IF:
    references/multiparametric.md
  • Data storage:
    references/data_management.md
处理特定任务时,请参考对应参考文档获取全面信息:
  • 图像加载:
    references/image_loading.md
  • 预处理工作流:
    references/preprocessing.md
  • 空间分析:
    references/graphs.md
  • 模型训练:
    references/machine_learning.md
  • CODEX/多重免疫荧光:
    references/multiparametric.md
  • 数据存储:
    references/data_management.md

Resources

资源

This skill includes comprehensive reference documentation organized by capability area. Each reference file contains detailed API information, workflow examples, best practices, and troubleshooting guidance for specific PathML functionality.
该工具包包含按功能领域组织的全面参考文档。每个参考文档包含详细的API信息、工作流示例、最佳实践以及特定PathML功能的故障排除指南。

references/

references/

Documentation files providing in-depth coverage of PathML capabilities:
  • image_loading.md
    - Whole-slide image formats, loading strategies, slide classes
  • preprocessing.md
    - Complete transform catalog, pipeline construction, preprocessing workflows
  • graphs.md
    - Graph construction methods, feature extraction, spatial analysis
  • machine_learning.md
    - Model architectures, training workflows, evaluation, inference
  • multiparametric.md
    - CODEX, Vectra, multiplex IF analysis, cell segmentation, quantification
  • data_management.md
    - HDF5 storage, tile management, batch processing, dataset organization
Load these references as needed when working on specific computational pathology tasks.
深入介绍PathML功能的文档文件:
  • image_loading.md
    - 全切片图像格式、加载策略、切片类
  • preprocessing.md
    - 完整变换目录、流水线构建、预处理工作流
  • graphs.md
    - 图构建方法、特征提取、空间分析
  • machine_learning.md
    - 模型架构、训练工作流、评估、推理
  • multiparametric.md
    - CODEX、Vectra、多重免疫荧光分析、细胞分割、量化
  • data_management.md
    - HDF5存储、切片管理、批量处理、数据集组织
处理特定计算病理学任务时,可按需加载这些参考文档。