histolab

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Histolab

Histolab

Overview

概述

Histolab is a Python library for processing whole slide images (WSI) in digital pathology. It automates tissue detection, extracts informative tiles from gigapixel images, and prepares datasets for deep learning pipelines. The library handles multiple WSI formats, implements sophisticated tissue segmentation, and provides flexible tile extraction strategies.
Histolab是一个用于数字病理领域全切片图像(WSI)处理的Python库。它可自动检测组织、从十亿像素图像中提取信息丰富的图像块,并为深度学习管道准备数据集。该库支持多种WSI格式,实现了复杂的组织分割功能,并提供灵活的图像块提取策略。

Installation

安装

bash
uv pip install histolab
bash
uv pip install histolab

Quick Start

快速开始

Basic workflow for extracting tiles from a whole slide image:
python
from histolab.slide import Slide
from histolab.tiler import RandomTiler
从全切片图像中提取图像块的基本工作流:
python
from histolab.slide import Slide
from histolab.tiler import RandomTiler

Load slide

加载全切片

slide = Slide("slide.svs", processed_path="output/")
slide = Slide("slide.svs", processed_path="output/")

Configure tiler

配置图像块提取器

tiler = RandomTiler( tile_size=(512, 512), n_tiles=100, level=0, seed=42 )
tiler = RandomTiler( tile_size=(512, 512), n_tiles=100, level=0, seed=42 )

Preview tile locations

预览图像块位置

tiler.locate_tiles(slide, n_tiles=20)
tiler.locate_tiles(slide, n_tiles=20)

Extract tiles

提取图像块

tiler.extract(slide)
undefined
tiler.extract(slide)
undefined

Core Capabilities

核心功能

1. Slide Management

1. 全切片管理

Load, inspect, and work with whole slide images in various formats.
Common operations:
  • Loading WSI files (SVS, TIFF, NDPI, etc.)
  • Accessing slide metadata (dimensions, magnification, properties)
  • Generating thumbnails for visualization
  • Working with pyramidal image structures
  • Extracting regions at specific coordinates
Key classes:
Slide
Reference:
references/slide_management.md
contains comprehensive documentation on:
  • Slide initialization and configuration
  • Built-in sample datasets (prostate, ovarian, breast, heart, kidney tissues)
  • Accessing slide properties and metadata
  • Thumbnail generation and visualization
  • Working with pyramid levels
  • Multi-slide processing workflows
Example workflow:
python
from histolab.slide import Slide
from histolab.data import prostate_tissue
加载、查看并处理多种格式的全切片图像。
常见操作:
  • 加载WSI文件(SVS、TIFF、NDPI等)
  • 访问切片元数据(尺寸、放大倍数、属性)
  • 生成缩略图用于可视化
  • 处理金字塔结构图像
  • 提取特定坐标区域
核心类:
Slide
参考文档:
references/slide_management.md
包含以下全面说明:
  • 全切片初始化与配置
  • 内置样本数据集(前列腺、卵巢、乳腺、心脏、肾脏组织)
  • 访问切片属性与元数据
  • 缩略图生成与可视化
  • 金字塔层级处理
  • 多切片处理工作流
示例工作流:
python
from histolab.slide import Slide
from histolab.data import prostate_tissue

Load sample data

加载样本数据

prostate_svs, prostate_path = prostate_tissue()
prostate_svs, prostate_path = prostate_tissue()

Initialize slide

初始化全切片

slide = Slide(prostate_path, processed_path="output/")
slide = Slide(prostate_path, processed_path="output/")

Inspect properties

查看属性

print(f"Dimensions: {slide.dimensions}") print(f"Levels: {slide.levels}") print(f"Magnification: {slide.properties.get('openslide.objective-power')}")
print(f"尺寸: {slide.dimensions}") print(f"金字塔层级: {slide.levels}") print(f"放大倍数: {slide.properties.get('openslide.objective-power')}")

Save thumbnail

保存缩略图

slide.save_thumbnail()
undefined
slide.save_thumbnail()
undefined

2. Tissue Detection and Masks

2. 组织检测与掩膜

Automatically identify tissue regions and filter background/artifacts.
Common operations:
  • Creating binary tissue masks
  • Detecting largest tissue region
  • Excluding background and artifacts
  • Custom tissue segmentation
  • Removing pen annotations
Key classes:
TissueMask
,
BiggestTissueBoxMask
,
BinaryMask
Reference:
references/tissue_masks.md
contains comprehensive documentation on:
  • TissueMask: Segments all tissue regions using automated filters
  • BiggestTissueBoxMask: Returns bounding box of largest tissue region (default)
  • BinaryMask: Base class for custom mask implementations
  • Visualizing masks with
    locate_mask()
  • Creating custom rectangular and annotation-exclusion masks
  • Mask integration with tile extraction
  • Best practices and troubleshooting
Example workflow:
python
from histolab.masks import TissueMask, BiggestTissueBoxMask
自动识别组织区域并过滤背景/伪影。
常见操作:
  • 创建二进制组织掩膜
  • 检测最大组织区域
  • 排除背景与伪影
  • 自定义组织分割
  • 移除笔标注
核心类:
TissueMask
,
BiggestTissueBoxMask
,
BinaryMask
参考文档:
references/tissue_masks.md
包含以下全面说明:
  • TissueMask:使用自动过滤器分割所有组织区域
  • BiggestTissueBoxMask:返回最大组织区域的边界框(默认选项)
  • BinaryMask:自定义掩膜实现的基类
  • 使用
    locate_mask()
    可视化掩膜
  • 创建自定义矩形与标注排除掩膜
  • 掩膜与图像块提取的集成
  • 最佳实践与故障排除
示例工作流:
python
from histolab.masks import TissueMask, BiggestTissueBoxMask

Create tissue mask for all tissue regions

创建覆盖所有组织区域的掩膜

tissue_mask = TissueMask()
tissue_mask = TissueMask()

Visualize mask on slide

在全切片上可视化掩膜

slide.locate_mask(tissue_mask)
slide.locate_mask(tissue_mask)

Get mask array

获取掩膜数组

mask_array = tissue_mask(slide)
mask_array = tissue_mask(slide)

Use largest tissue region (default for most extractors)

使用最大组织区域(大多数提取器的默认选项)

biggest_mask = BiggestTissueBoxMask()

**When to use each mask:**
- `TissueMask`: Multiple tissue sections, comprehensive analysis
- `BiggestTissueBoxMask`: Single main tissue section, exclude artifacts (default)
- Custom `BinaryMask`: Specific ROI, exclude annotations, custom segmentation
biggest_mask = BiggestTissueBoxMask()

**各掩膜的适用场景:**
- `TissueMask`:多组织切片、全面分析场景
- `BiggestTissueBoxMask`:单一主组织切片、排除伪影(默认)
- 自定义`BinaryMask`:特定感兴趣区域(ROI)、排除标注、自定义分割

3. Tile Extraction

3. 图像块提取

Extract smaller regions from large WSI using different strategies.
Three extraction strategies:
RandomTiler: Extract fixed number of randomly positioned tiles
  • Best for: Sampling diverse regions, exploratory analysis, training data
  • Key parameters:
    n_tiles
    ,
    seed
    for reproducibility
GridTiler: Systematically extract tiles across tissue in grid pattern
  • Best for: Complete coverage, spatial analysis, reconstruction
  • Key parameters:
    pixel_overlap
    for sliding windows
ScoreTiler: Extract top-ranked tiles based on scoring functions
  • Best for: Most informative regions, quality-driven selection
  • Key parameters:
    scorer
    (NucleiScorer, CellularityScorer, custom)
Common parameters:
  • tile_size
    : Tile dimensions (e.g., (512, 512))
  • level
    : Pyramid level for extraction (0 = highest resolution)
  • check_tissue
    : Filter tiles by tissue content
  • tissue_percent
    : Minimum tissue coverage (default 80%)
  • extraction_mask
    : Mask defining extraction region
Reference:
references/tile_extraction.md
contains comprehensive documentation on:
  • Detailed explanation of each tiler strategy
  • Available scorers (NucleiScorer, CellularityScorer, custom)
  • Tile preview with
    locate_tiles()
  • Extraction workflows and reporting
  • Advanced patterns (multi-level, hierarchical extraction)
  • Performance optimization and troubleshooting
Example workflows:
python
from histolab.tiler import RandomTiler, GridTiler, ScoreTiler
from histolab.scorer import NucleiScorer
使用不同策略从大型WSI中提取小区域。
三种提取策略:
RandomTiler: 提取固定数量的随机位置图像块
  • 最佳适用:多样化区域采样、探索性分析、训练数据准备
  • 关键参数:
    n_tiles
    、用于可复现性的
    seed
GridTiler: 以网格模式系统地提取组织上的图像块
  • 最佳适用:完整覆盖、空间分析、图像重建
  • 关键参数:用于滑动窗口的
    pixel_overlap
ScoreTiler: 基于评分函数提取排名靠前的图像块
  • 最佳适用:信息最丰富的区域、基于质量的选择
  • 关键参数:
    scorer
    (NucleiScorer、CellularityScorer、自定义评分器)
通用参数:
  • tile_size
    :图像块尺寸(例如(512, 512))
  • level
    :提取使用的金字塔层级(0 = 最高分辨率)
  • check_tissue
    :按组织内容过滤图像块
  • tissue_percent
    :最小组织覆盖率(默认80%)
  • extraction_mask
    :定义提取区域的掩膜
参考文档:
references/tile_extraction.md
包含以下全面说明:
  • 每种提取器策略的详细解释
  • 可用评分器(NucleiScorer、CellularityScorer、自定义)
  • 使用
    locate_tiles()
    预览图像块
  • 提取工作流与报告
  • 高级模式(多层级、分层提取)
  • 性能优化与故障排除
示例工作流:
python
from histolab.tiler import RandomTiler, GridTiler, ScoreTiler
from histolab.scorer import NucleiScorer

Random sampling (fast, diverse)

随机采样(快速、多样化)

random_tiler = RandomTiler( tile_size=(512, 512), n_tiles=100, level=0, seed=42, check_tissue=True, tissue_percent=80.0 ) random_tiler.extract(slide)
random_tiler = RandomTiler( tile_size=(512, 512), n_tiles=100, level=0, seed=42, check_tissue=True, tissue_percent=80.0 ) random_tiler.extract(slide)

Grid coverage (comprehensive)

网格覆盖(全面性)

grid_tiler = GridTiler( tile_size=(512, 512), level=0, pixel_overlap=0, check_tissue=True ) grid_tiler.extract(slide)
grid_tiler = GridTiler( tile_size=(512, 512), level=0, pixel_overlap=0, check_tissue=True ) grid_tiler.extract(slide)

Score-based selection (most informative)

基于评分的选择(信息最丰富)

score_tiler = ScoreTiler( tile_size=(512, 512), n_tiles=50, scorer=NucleiScorer(), level=0 ) score_tiler.extract(slide, report_path="tiles_report.csv")

**Always preview before extracting:**
```python
score_tiler = ScoreTiler( tile_size=(512, 512), n_tiles=50, scorer=NucleiScorer(), level=0 ) score_tiler.extract(slide, report_path="tiles_report.csv")

**提取前务必预览:**
```python

Preview tile locations on thumbnail

在缩略图上预览图像块位置

tiler.locate_tiles(slide, n_tiles=20)
undefined
tiler.locate_tiles(slide, n_tiles=20)
undefined

4. Filters and Preprocessing

4. 过滤器与预处理

Apply image processing filters for tissue detection, quality control, and preprocessing.
Filter categories:
Image Filters: Color space conversions, thresholding, contrast enhancement
  • RgbToGrayscale
    ,
    RgbToHsv
    ,
    RgbToHed
  • OtsuThreshold
    ,
    AdaptiveThreshold
  • StretchContrast
    ,
    HistogramEqualization
Morphological Filters: Structural operations on binary images
  • BinaryDilation
    ,
    BinaryErosion
  • BinaryOpening
    ,
    BinaryClosing
  • RemoveSmallObjects
    ,
    RemoveSmallHoles
Composition: Chain multiple filters together
  • Compose
    : Create filter pipelines
Reference:
references/filters_preprocessing.md
contains comprehensive documentation on:
  • Detailed explanation of each filter type
  • Filter composition and chaining
  • Common preprocessing pipelines (tissue detection, pen removal, nuclei enhancement)
  • Applying filters to tiles
  • Custom mask filters
  • Quality control filters (blur detection, tissue coverage)
  • Best practices and troubleshooting
Example workflows:
python
from histolab.filters.compositions import Compose
from histolab.filters.image_filters import RgbToGrayscale, OtsuThreshold
from histolab.filters.morphological_filters import (
    BinaryDilation, RemoveSmallHoles, RemoveSmallObjects
)
应用图像处理过滤器用于组织检测、质量控制与预处理。
过滤器类别:
图像过滤器: 色彩空间转换、阈值处理、对比度增强
  • RgbToGrayscale
    ,
    RgbToHsv
    ,
    RgbToHed
  • OtsuThreshold
    ,
    AdaptiveThreshold
  • StretchContrast
    ,
    HistogramEqualization
形态学过滤器: 对二值图像的结构化操作
  • BinaryDilation
    ,
    BinaryErosion
  • BinaryOpening
    ,
    BinaryClosing
  • RemoveSmallObjects
    ,
    RemoveSmallHoles
组合过滤器: 将多个过滤器链式组合
  • Compose
    :创建过滤器管道
参考文档:
references/filters_preprocessing.md
包含以下全面说明:
  • 每种过滤器类型的详细解释
  • 过滤器组合与链式调用
  • 常见预处理管道(组织检测、笔标注移除、细胞核增强)
  • 对图像块应用过滤器
  • 自定义掩膜过滤器
  • 质量控制过滤器(模糊检测、组织覆盖率)
  • 最佳实践与故障排除
示例工作流:
python
from histolab.filters.compositions import Compose
from histolab.filters.image_filters import RgbToGrayscale, OtsuThreshold
from histolab.filters.morphological_filters import (
    BinaryDilation, RemoveSmallHoles, RemoveSmallObjects
)

Standard tissue detection pipeline

标准组织检测管道

tissue_detection = Compose([ RgbToGrayscale(), OtsuThreshold(), BinaryDilation(disk_size=5), RemoveSmallHoles(area_threshold=1000), RemoveSmallObjects(area_threshold=500) ])
tissue_detection = Compose([ RgbToGrayscale(), OtsuThreshold(), BinaryDilation(disk_size=5), RemoveSmallHoles(area_threshold=1000), RemoveSmallObjects(area_threshold=500) ])

Use with custom mask

与自定义掩膜配合使用

from histolab.masks import TissueMask custom_mask = TissueMask(filters=tissue_detection)
from histolab.masks import TissueMask custom_mask = TissueMask(filters=tissue_detection)

Apply filters to tile

对图像块应用过滤器

from histolab.tile import Tile filtered_tile = tile.apply_filters(tissue_detection)
undefined
from histolab.tile import Tile filtered_tile = tile.apply_filters(tissue_detection)
undefined

5. Visualization

5. 可视化

Visualize slides, masks, tile locations, and extraction quality.
Common visualization tasks:
  • Displaying slide thumbnails
  • Visualizing tissue masks
  • Previewing tile locations
  • Assessing tile quality
  • Creating reports and figures
Reference:
references/visualization.md
contains comprehensive documentation on:
  • Slide thumbnail display and saving
  • Mask visualization with
    locate_mask()
  • Tile location preview with
    locate_tiles()
  • Displaying extracted tiles and mosaics
  • Quality assessment (score distributions, top vs bottom tiles)
  • Multi-slide visualization
  • Filter effect visualization
  • Exporting high-resolution figures and PDF reports
  • Interactive visualization in Jupyter notebooks
Example workflows:
python
import matplotlib.pyplot as plt
from histolab.masks import TissueMask
可视化全切片、掩膜、图像块位置与提取质量。
常见可视化任务:
  • 显示全切片缩略图
  • 可视化组织掩膜
  • 预览图像块位置
  • 评估图像块质量
  • 创建报告与图表
参考文档:
references/visualization.md
包含以下全面说明:
  • 全切片缩略图的显示与保存
  • 使用
    locate_mask()
    进行掩膜可视化
  • 使用
    locate_tiles()
    预览图像块位置
  • 显示提取的图像块与创建拼图
  • 质量评估(评分分布、高分与低分图像块)
  • 多切片可视化
  • 过滤器效果可视化
  • 导出高分辨率图表与PDF报告
  • Jupyter笔记本中的交互式可视化
示例工作流:
python
import matplotlib.pyplot as plt
from histolab.masks import TissueMask

Display slide thumbnail

显示全切片缩略图

plt.figure(figsize=(10, 10)) plt.imshow(slide.thumbnail) plt.title(f"Slide: {slide.name}") plt.axis('off') plt.show()
plt.figure(figsize=(10, 10)) plt.imshow(slide.thumbnail) plt.title(f"全切片: {slide.name}") plt.axis('off') plt.show()

Visualize tissue mask

可视化组织掩膜

tissue_mask = TissueMask() slide.locate_mask(tissue_mask)
tissue_mask = TissueMask() slide.locate_mask(tissue_mask)

Preview tile locations

预览图像块位置

tiler = RandomTiler(tile_size=(512, 512), n_tiles=50) tiler.locate_tiles(slide, n_tiles=20)
tiler = RandomTiler(tile_size=(512, 512), n_tiles=50) tiler.locate_tiles(slide, n_tiles=20)

Display extracted tiles in grid

以网格形式显示提取的图像块

from pathlib import Path from PIL import Image
tile_paths = list(Path("output/tiles/").glob("*.png"))[:16] fig, axes = plt.subplots(4, 4, figsize=(12, 12)) axes = axes.ravel()
for idx, tile_path in enumerate(tile_paths): tile_img = Image.open(tile_path) axes[idx].imshow(tile_img) axes[idx].set_title(tile_path.stem, fontsize=8) axes[idx].axis('off')
plt.tight_layout() plt.show()
undefined
from pathlib import Path from PIL import Image
tile_paths = list(Path("output/tiles/").glob("*.png"))[:16] fig, axes = plt.subplots(4, 4, figsize=(12, 12)) axes = axes.ravel()
for idx, tile_path in enumerate(tile_paths): tile_img = Image.open(tile_path) axes[idx].imshow(tile_img) axes[idx].set_title(tile_path.stem, fontsize=8) axes[idx].axis('off')
plt.tight_layout() plt.show()
undefined

Typical Workflows

典型工作流

Workflow 1: Exploratory Tile Extraction

工作流1:探索性图像块提取

Quick sampling of diverse tissue regions for initial analysis.
python
from histolab.slide import Slide
from histolab.tiler import RandomTiler
import logging
快速采样多样化组织区域用于初始分析。
python
from histolab.slide import Slide
from histolab.tiler import RandomTiler
import logging

Enable logging for progress tracking

启用日志以跟踪进度

logging.basicConfig(level=logging.INFO)
logging.basicConfig(level=logging.INFO)

Load slide

加载全切片

slide = Slide("slide.svs", processed_path="output/random_tiles/")
slide = Slide("slide.svs", processed_path="output/random_tiles/")

Inspect slide

查看全切片

print(f"Dimensions: {slide.dimensions}") print(f"Levels: {slide.levels}") slide.save_thumbnail()
print(f"尺寸: {slide.dimensions}") print(f"金字塔层级: {slide.levels}") slide.save_thumbnail()

Configure random tiler

配置随机提取器

random_tiler = RandomTiler( tile_size=(512, 512), n_tiles=100, level=0, seed=42, check_tissue=True, tissue_percent=80.0 )
random_tiler = RandomTiler( tile_size=(512, 512), n_tiles=100, level=0, seed=42, check_tissue=True, tissue_percent=80.0 )

Preview locations

预览位置

random_tiler.locate_tiles(slide, n_tiles=20)
random_tiler.locate_tiles(slide, n_tiles=20)

Extract tiles

提取图像块

random_tiler.extract(slide)
undefined
random_tiler.extract(slide)
undefined

Workflow 2: Comprehensive Grid Extraction

工作流2:全面网格提取

Complete tissue coverage for whole-slide analysis.
python
from histolab.slide import Slide
from histolab.tiler import GridTiler
from histolab.masks import TissueMask
完整覆盖组织用于全切片分析。
python
from histolab.slide import Slide
from histolab.tiler import GridTiler
from histolab.masks import TissueMask

Load slide

加载全切片

slide = Slide("slide.svs", processed_path="output/grid_tiles/")
slide = Slide("slide.svs", processed_path="output/grid_tiles/")

Use TissueMask for all tissue sections

使用TissueMask覆盖所有组织切片

tissue_mask = TissueMask() slide.locate_mask(tissue_mask)
tissue_mask = TissueMask() slide.locate_mask(tissue_mask)

Configure grid tiler

配置网格提取器

grid_tiler = GridTiler( tile_size=(512, 512), level=1, # Use level 1 for faster extraction pixel_overlap=0, check_tissue=True, tissue_percent=70.0 )
grid_tiler = GridTiler( tile_size=(512, 512), level=1, # 使用层级1加快提取速度 pixel_overlap=0, check_tissue=True, tissue_percent=70.0 )

Preview grid

预览网格

grid_tiler.locate_tiles(slide)
grid_tiler.locate_tiles(slide)

Extract all tiles

提取所有图像块

grid_tiler.extract(slide, extraction_mask=tissue_mask)
undefined
grid_tiler.extract(slide, extraction_mask=tissue_mask)
undefined

Workflow 3: Quality-Driven Tile Selection

工作流3:基于质量的图像块选择

Extract most informative tiles based on nuclei density.
python
from histolab.slide import Slide
from histolab.tiler import ScoreTiler
from histolab.scorer import NucleiScorer
import pandas as pd
import matplotlib.pyplot as plt
基于细胞核密度提取信息最丰富的图像块。
python
from histolab.slide import Slide
from histolab.tiler import ScoreTiler
from histolab.scorer import NucleiScorer
import pandas as pd
import matplotlib.pyplot as plt

Load slide

加载全切片

slide = Slide("slide.svs", processed_path="output/scored_tiles/")
slide = Slide("slide.svs", processed_path="output/scored_tiles/")

Configure score tiler

配置评分提取器

score_tiler = ScoreTiler( tile_size=(512, 512), n_tiles=50, level=0, scorer=NucleiScorer(), check_tissue=True )
score_tiler = ScoreTiler( tile_size=(512, 512), n_tiles=50, level=0, scorer=NucleiScorer(), check_tissue=True )

Preview top tiles

预览高分图像块

score_tiler.locate_tiles(slide, n_tiles=15)
score_tiler.locate_tiles(slide, n_tiles=15)

Extract with report

提取并生成报告

score_tiler.extract(slide, report_path="tiles_report.csv")
score_tiler.extract(slide, report_path="tiles_report.csv")

Analyze scores

分析评分分布

report_df = pd.read_csv("tiles_report.csv") plt.hist(report_df['score'], bins=20, edgecolor='black') plt.xlabel('Tile Score') plt.ylabel('Frequency') plt.title('Distribution of Tile Scores') plt.show()
undefined
report_df = pd.read_csv("tiles_report.csv") plt.hist(report_df['score'], bins=20, edgecolor='black') plt.xlabel('图像块评分') plt.ylabel('频率') plt.title('图像块评分分布') plt.show()
undefined

Workflow 4: Multi-Slide Processing Pipeline

工作流4:多切片处理管道

Process entire slide collection with consistent parameters.
python
from pathlib import Path
from histolab.slide import Slide
from histolab.tiler import RandomTiler
import logging

logging.basicConfig(level=logging.INFO)
使用一致参数处理整个切片集合。
python
from pathlib import Path
from histolab.slide import Slide
from histolab.tiler import RandomTiler
import logging

logging.basicConfig(level=logging.INFO)

Configure tiler once

一次性配置提取器

tiler = RandomTiler( tile_size=(512, 512), n_tiles=50, level=0, seed=42, check_tissue=True )
tiler = RandomTiler( tile_size=(512, 512), n_tiles=50, level=0, seed=42, check_tissue=True )

Process all slides

处理所有切片

slide_dir = Path("slides/") output_base = Path("output/")
for slide_path in slide_dir.glob("*.svs"): print(f"\nProcessing: {slide_path.name}")
# Create slide-specific output directory
output_dir = output_base / slide_path.stem
output_dir.mkdir(parents=True, exist_ok=True)

# Load and process slide
slide = Slide(slide_path, processed_path=output_dir)

# Save thumbnail for review
slide.save_thumbnail()

# Extract tiles
tiler.extract(slide)

print(f"Completed: {slide_path.name}")
undefined
slide_dir = Path("slides/") output_base = Path("output/")
for slide_path in slide_dir.glob("*.svs"): print(f"\n正在处理: {slide_path.name}")
# 创建切片专属输出目录
output_dir = output_base / slide_path.stem
output_dir.mkdir(parents=True, exist_ok=True)

# 加载并处理全切片
slide = Slide(slide_path, processed_path=output_dir)

# 保存缩略图用于审核
slide.save_thumbnail()

# 提取图像块
tiler.extract(slide)

print(f"处理完成: {slide_path.name}")
undefined

Workflow 5: Custom Tissue Detection and Filtering

工作流5:自定义组织检测与过滤

Handle slides with artifacts, annotations, or unusual staining.
python
from histolab.slide import Slide
from histolab.masks import TissueMask
from histolab.tiler import RandomTiler
from histolab.filters.compositions import Compose
from histolab.filters.image_filters import RgbToGrayscale, OtsuThreshold
from histolab.filters.morphological_filters import (
    BinaryDilation, RemoveSmallObjects, RemoveSmallHoles
)
处理带有伪影、标注或异常染色的切片。
python
from histolab.slide import Slide
from histolab.masks import TissueMask
from histolab.tiler import RandomTiler
from histolab.filters.compositions import Compose
from histolab.filters.image_filters import RgbToGrayscale, OtsuThreshold
from histolab.filters.morphological_filters import (
    BinaryDilation, RemoveSmallObjects, RemoveSmallHoles
)

Define custom filter pipeline for aggressive artifact removal

定义用于强力移除伪影的自定义过滤器管道

aggressive_filters = Compose([ RgbToGrayscale(), OtsuThreshold(), BinaryDilation(disk_size=10), RemoveSmallHoles(area_threshold=5000), RemoveSmallObjects(area_threshold=3000) # Remove larger artifacts ])
aggressive_filters = Compose([ RgbToGrayscale(), OtsuThreshold(), BinaryDilation(disk_size=10), RemoveSmallHoles(area_threshold=5000), RemoveSmallObjects(area_threshold=3000) # 移除较大伪影 ])

Create custom mask

创建自定义掩膜

custom_mask = TissueMask(filters=aggressive_filters)
custom_mask = TissueMask(filters=aggressive_filters)

Load slide and visualize mask

加载全切片并可视化掩膜

slide = Slide("slide.svs", processed_path="output/") slide.locate_mask(custom_mask)
slide = Slide("slide.svs", processed_path="output/") slide.locate_mask(custom_mask)

Extract with custom mask

使用自定义掩膜提取图像块

tiler = RandomTiler(tile_size=(512, 512), n_tiles=100) tiler.extract(slide, extraction_mask=custom_mask)
undefined
tiler = RandomTiler(tile_size=(512, 512), n_tiles=100) tiler.extract(slide, extraction_mask=custom_mask)
undefined

Best Practices

最佳实践

Slide Loading and Inspection

全切片加载与检查

  1. Always inspect slide properties before processing
  2. Save thumbnails for quick visual review
  3. Check pyramid levels and dimensions
  4. Verify tissue is present using thumbnails
  1. 处理前务必检查全切片属性
  2. 保存缩略图用于快速视觉审核
  3. 检查金字塔层级与尺寸
  4. 通过缩略图验证切片中存在组织

Tissue Detection

组织检测

  1. Preview masks with
    locate_mask()
    before extraction
  2. Use
    TissueMask
    for multiple sections,
    BiggestTissueBoxMask
    for single sections
  3. Customize filters for specific stains (H&E vs IHC)
  4. Handle pen annotations with custom masks
  5. Test masks on diverse slides
  1. 提取前使用
    locate_mask()
    预览掩膜
  2. 多切片场景使用
    TissueMask
    ,单一切片场景使用
    BiggestTissueBoxMask
  3. 针对特定染色(H&E vs IHC)自定义过滤器
  4. 使用自定义掩膜处理笔标注
  5. 在多样化切片上测试掩膜

Tile Extraction

图像块提取

  1. Always preview with
    locate_tiles()
    before extracting
  2. Choose appropriate tiler:
    • RandomTiler: Sampling and exploration
    • GridTiler: Complete coverage
    • ScoreTiler: Quality-driven selection
  3. Set appropriate
    tissue_percent
    threshold (70-90% typical)
  4. Use seeds for reproducibility in RandomTiler
  5. Extract at appropriate pyramid level for analysis resolution
  6. Enable logging for large datasets
  1. 提取前务必使用
    locate_tiles()
    预览
  2. 选择合适的提取器:
    • RandomTiler:采样与探索
    • GridTiler:完整覆盖
    • ScoreTiler:基于质量的选择
  3. 设置合适的
    tissue_percent
    阈值(典型值70-90%)
  4. 在RandomTiler中使用seed保证可复现性
  5. 为分析分辨率选择合适的金字塔层级
  6. 处理大型数据集时启用日志

Performance

性能优化

  1. Extract at lower levels (1, 2) for faster processing
  2. Use
    BiggestTissueBoxMask
    over
    TissueMask
    when appropriate
  3. Adjust
    tissue_percent
    to reduce invalid tile attempts
  4. Limit
    n_tiles
    for initial exploration
  5. Use
    pixel_overlap=0
    for non-overlapping grids
  1. 使用较低层级(1、2)加快处理速度
  2. 合适时使用
    BiggestTissueBoxMask
    替代
    TissueMask
  3. 调整
    tissue_percent
    减少无效图像块尝试
  4. 初始探索时限制
    n_tiles
    数量
  5. 非重叠网格使用
    pixel_overlap=0

Quality Control

质量控制

  1. Validate tile quality (check for blur, artifacts, focus)
  2. Review score distributions for ScoreTiler
  3. Inspect top and bottom scoring tiles
  4. Monitor tissue coverage statistics
  5. Filter extracted tiles by additional quality metrics if needed
  1. 验证图像块质量(检查模糊、伪影、对焦情况)
  2. 查看ScoreTiler的评分分布
  3. 检查高分与低分图像块
  4. 监控组织覆盖率统计
  5. 必要时通过额外质量指标过滤提取的图像块

Common Use Cases

常见使用场景

Training Deep Learning Models

深度学习模型训练

  • Extract balanced datasets using RandomTiler across multiple slides
  • Use ScoreTiler with NucleiScorer to focus on cell-rich regions
  • Extract at consistent resolution (level 0 or level 1)
  • Generate CSV reports for tracking tile metadata
  • 使用RandomTiler在多个切片上提取均衡数据集
  • 结合ScoreTiler与NucleiScorer聚焦细胞密集区域
  • 以一致分辨率提取(层级0或层级1)
  • 生成CSV报告跟踪图像块元数据

Whole Slide Analysis

全切片分析

  • Use GridTiler for complete tissue coverage
  • Extract at multiple pyramid levels for hierarchical analysis
  • Maintain spatial relationships with grid positions
  • Use
    pixel_overlap
    for sliding window approaches
  • 使用GridTiler实现完整组织覆盖
  • 提取多个金字塔层级用于分层分析
  • 保留网格位置的空间关系
  • 使用
    pixel_overlap
    实现滑动窗口分析

Tissue Characterization

组织特征分析

  • Sample diverse regions with RandomTiler
  • Quantify tissue coverage with masks
  • Extract stain-specific information with HED decomposition
  • Compare tissue patterns across slides
  • 使用RandomTiler采样多样化区域
  • 通过掩膜量化组织覆盖率
  • 使用HED分解提取染色特定信息
  • 跨切片比较组织模式

Quality Assessment

质量评估

  • Identify optimal focus regions with ScoreTiler
  • Detect artifacts using custom masks and filters
  • Assess staining quality across slide collection
  • Flag problematic slides for manual review
  • 使用ScoreTiler识别最佳对焦区域
  • 使用自定义掩膜与过滤器检测伪影
  • 评估切片集合的染色质量
  • 标记问题切片用于人工审核

Dataset Curation

数据集整理

  • Use ScoreTiler to prioritize informative tiles
  • Filter tiles by tissue percentage
  • Generate reports with tile scores and metadata
  • Create stratified datasets across slides and tissue types
  • 使用ScoreTiler优先提取信息丰富的图像块
  • 按组织百分比过滤图像块
  • 生成包含图像块评分与元数据的报告
  • 创建跨切片与组织类型的分层数据集

Troubleshooting

故障排除

No tiles extracted

未提取到任何图像块

  • Lower
    tissue_percent
    threshold
  • Verify slide contains tissue (check thumbnail)
  • Ensure extraction_mask captures tissue regions
  • Check tile_size is appropriate for slide resolution
  • 降低
    tissue_percent
    阈值
  • 验证切片包含组织(检查缩略图)
  • 确保extraction_mask覆盖组织区域
  • 检查tile_size与切片分辨率匹配

Many background tiles

提取大量背景图像块

  • Enable
    check_tissue=True
  • Increase
    tissue_percent
    threshold
  • Use appropriate mask (TissueMask vs BiggestTissueBoxMask)
  • Customize mask filters to better detect tissue
  • 启用
    check_tissue=True
  • 提高
    tissue_percent
    阈值
  • 使用合适的掩膜(TissueMask vs BiggestTissueBoxMask)
  • 自定义掩膜过滤器以更好地检测组织

Extraction very slow

提取速度极慢

  • Extract at lower pyramid level (level=1 or 2)
  • Reduce
    n_tiles
    for RandomTiler/ScoreTiler
  • Use RandomTiler instead of GridTiler for sampling
  • Use BiggestTissueBoxMask instead of TissueMask
  • 使用较低金字塔层级(level=1或2)提取
  • 减少RandomTiler/ScoreTiler的
    n_tiles
    数量
  • 使用RandomTiler替代GridTiler进行采样
  • 使用BiggestTissueBoxMask替代TissueMask

Tiles have artifacts

图像块存在伪影

  • Implement custom annotation-exclusion masks
  • Adjust filter parameters for artifact removal
  • Increase small object removal threshold
  • Apply post-extraction quality filtering
  • 实现自定义标注排除掩膜
  • 调整过滤器参数移除伪影
  • 提高小物体移除阈值
  • 应用提取后质量过滤

Inconsistent results across slides

跨切片结果不一致

  • Use same seed for RandomTiler
  • Normalize staining with preprocessing filters
  • Adjust
    tissue_percent
    per staining quality
  • Implement slide-specific mask customization
  • RandomTiler使用相同的seed
  • 使用预处理过滤器归一化染色
  • 根据染色质量调整
    tissue_percent
  • 实现切片专属的掩膜自定义

Resources

资源

This skill includes detailed reference documentation in the
references/
directory:
本技能包在
references/
目录中包含详细的参考文档:

references/slide_management.md

references/slide_management.md

Comprehensive guide to loading, inspecting, and working with whole slide images:
  • Slide initialization and configuration
  • Built-in sample datasets
  • Slide properties and metadata
  • Thumbnail generation and visualization
  • Working with pyramid levels
  • Multi-slide processing workflows
  • Best practices and common patterns
关于加载、查看与处理全切片图像的全面指南:
  • 全切片初始化与配置
  • 内置样本数据集
  • 全切片属性与元数据
  • 缩略图生成与可视化
  • 金字塔层级处理
  • 多切片处理工作流
  • 最佳实践与常见模式

references/tissue_masks.md

references/tissue_masks.md

Complete documentation on tissue detection and masking:
  • TissueMask, BiggestTissueBoxMask, BinaryMask classes
  • How tissue detection filters work
  • Customizing masks with filter chains
  • Visualizing masks
  • Creating custom rectangular and annotation-exclusion masks
  • Integration with tile extraction
  • Best practices and troubleshooting
关于组织检测与掩膜的完整文档:
  • TissueMask、BiggestTissueBoxMask、BinaryMask类
  • 组织检测过滤器的工作原理
  • 使用过滤器链自定义掩膜
  • 掩膜可视化
  • 创建自定义矩形与标注排除掩膜
  • 与图像块提取的集成
  • 最佳实践与故障排除

references/tile_extraction.md

references/tile_extraction.md

Detailed explanation of tile extraction strategies:
  • RandomTiler, GridTiler, ScoreTiler comparison
  • Available scorers (NucleiScorer, CellularityScorer, custom)
  • Common and strategy-specific parameters
  • Tile preview with locate_tiles()
  • Extraction workflows and CSV reporting
  • Advanced patterns (multi-level, hierarchical)
  • Performance optimization
  • Troubleshooting common issues
关于图像块提取策略的详细说明:
  • RandomTiler、GridTiler、ScoreTiler对比
  • 可用评分器(NucleiScorer、CellularityScorer、自定义)
  • 通用与策略专属参数
  • 使用locate_tiles()预览图像块
  • 提取工作流与CSV报告
  • 高级模式(多层级、分层)
  • 性能优化
  • 常见问题故障排除

references/filters_preprocessing.md

references/filters_preprocessing.md

Complete filter reference and preprocessing guide:
  • Image filters (color conversion, thresholding, contrast)
  • Morphological filters (dilation, erosion, opening, closing)
  • Filter composition and chaining
  • Common preprocessing pipelines
  • Applying filters to tiles
  • Custom mask filters
  • Quality control filters
  • Best practices and troubleshooting
完整的过滤器参考与预处理指南:
  • 图像过滤器(色彩转换、阈值处理、对比度)
  • 形态学过滤器(膨胀、腐蚀、开运算、闭运算)
  • 过滤器组合与链式调用
  • 常见预处理管道
  • 对图像块应用过滤器
  • 自定义掩膜过滤器
  • 质量控制过滤器
  • 最佳实践与故障排除

references/visualization.md

references/visualization.md

Comprehensive visualization guide:
  • Slide thumbnail display and saving
  • Mask visualization techniques
  • Tile location preview
  • Displaying extracted tiles and creating mosaics
  • Quality assessment visualizations
  • Multi-slide comparison
  • Filter effect visualization
  • Exporting high-resolution figures and PDFs
  • Interactive visualization in Jupyter notebooks
Usage pattern: Reference files contain in-depth information to support workflows described in this main skill document. Load specific reference files as needed for detailed implementation guidance, troubleshooting, or advanced features.
全面的可视化指南:
  • 全切片缩略图的显示与保存
  • 掩膜可视化技术
  • 图像块位置预览
  • 显示提取的图像块与创建拼图
  • 质量评估可视化
  • 多切片比较
  • 过滤器效果可视化
  • 导出高分辨率图表与PDF
  • Jupyter笔记本中的交互式可视化
使用方式: 参考文件包含支持本主技能文档所述工作流的深度信息。如需详细实现指导、故障排除或高级功能支持,可按需加载特定参考文件。