histolab
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHistolab
Histolab
Overview
概述
Histolab is a Python library for processing whole slide images (WSI) in digital pathology. It automates tissue detection, extracts informative tiles from gigapixel images, and prepares datasets for deep learning pipelines. The library handles multiple WSI formats, implements sophisticated tissue segmentation, and provides flexible tile extraction strategies.
Histolab是一个用于数字病理领域全切片图像(WSI)处理的Python库。它可自动检测组织、从十亿像素图像中提取信息丰富的图像块,并为深度学习管道准备数据集。该库支持多种WSI格式,实现了复杂的组织分割功能,并提供灵活的图像块提取策略。
Installation
安装
bash
uv pip install histolabbash
uv pip install histolabQuick Start
快速开始
Basic workflow for extracting tiles from a whole slide image:
python
from histolab.slide import Slide
from histolab.tiler import RandomTiler从全切片图像中提取图像块的基本工作流:
python
from histolab.slide import Slide
from histolab.tiler import RandomTilerLoad slide
加载全切片
slide = Slide("slide.svs", processed_path="output/")
slide = Slide("slide.svs", processed_path="output/")
Configure tiler
配置图像块提取器
tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100,
level=0,
seed=42
)
tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100,
level=0,
seed=42
)
Preview tile locations
预览图像块位置
tiler.locate_tiles(slide, n_tiles=20)
tiler.locate_tiles(slide, n_tiles=20)
Extract tiles
提取图像块
tiler.extract(slide)
undefinedtiler.extract(slide)
undefinedCore Capabilities
核心功能
1. Slide Management
1. 全切片管理
Load, inspect, and work with whole slide images in various formats.
Common operations:
- Loading WSI files (SVS, TIFF, NDPI, etc.)
- Accessing slide metadata (dimensions, magnification, properties)
- Generating thumbnails for visualization
- Working with pyramidal image structures
- Extracting regions at specific coordinates
Key classes:
SlideReference: contains comprehensive documentation on:
references/slide_management.md- Slide initialization and configuration
- Built-in sample datasets (prostate, ovarian, breast, heart, kidney tissues)
- Accessing slide properties and metadata
- Thumbnail generation and visualization
- Working with pyramid levels
- Multi-slide processing workflows
Example workflow:
python
from histolab.slide import Slide
from histolab.data import prostate_tissue加载、查看并处理多种格式的全切片图像。
常见操作:
- 加载WSI文件(SVS、TIFF、NDPI等)
- 访问切片元数据(尺寸、放大倍数、属性)
- 生成缩略图用于可视化
- 处理金字塔结构图像
- 提取特定坐标区域
核心类:
Slide参考文档: 包含以下全面说明:
references/slide_management.md- 全切片初始化与配置
- 内置样本数据集(前列腺、卵巢、乳腺、心脏、肾脏组织)
- 访问切片属性与元数据
- 缩略图生成与可视化
- 金字塔层级处理
- 多切片处理工作流
示例工作流:
python
from histolab.slide import Slide
from histolab.data import prostate_tissueLoad sample data
加载样本数据
prostate_svs, prostate_path = prostate_tissue()
prostate_svs, prostate_path = prostate_tissue()
Initialize slide
初始化全切片
slide = Slide(prostate_path, processed_path="output/")
slide = Slide(prostate_path, processed_path="output/")
Inspect properties
查看属性
print(f"Dimensions: {slide.dimensions}")
print(f"Levels: {slide.levels}")
print(f"Magnification: {slide.properties.get('openslide.objective-power')}")
print(f"尺寸: {slide.dimensions}")
print(f"金字塔层级: {slide.levels}")
print(f"放大倍数: {slide.properties.get('openslide.objective-power')}")
Save thumbnail
保存缩略图
slide.save_thumbnail()
undefinedslide.save_thumbnail()
undefined2. Tissue Detection and Masks
2. 组织检测与掩膜
Automatically identify tissue regions and filter background/artifacts.
Common operations:
- Creating binary tissue masks
- Detecting largest tissue region
- Excluding background and artifacts
- Custom tissue segmentation
- Removing pen annotations
Key classes: , ,
TissueMaskBiggestTissueBoxMaskBinaryMaskReference: contains comprehensive documentation on:
references/tissue_masks.md- TissueMask: Segments all tissue regions using automated filters
- BiggestTissueBoxMask: Returns bounding box of largest tissue region (default)
- BinaryMask: Base class for custom mask implementations
- Visualizing masks with
locate_mask() - Creating custom rectangular and annotation-exclusion masks
- Mask integration with tile extraction
- Best practices and troubleshooting
Example workflow:
python
from histolab.masks import TissueMask, BiggestTissueBoxMask自动识别组织区域并过滤背景/伪影。
常见操作:
- 创建二进制组织掩膜
- 检测最大组织区域
- 排除背景与伪影
- 自定义组织分割
- 移除笔标注
核心类: , ,
TissueMaskBiggestTissueBoxMaskBinaryMask参考文档: 包含以下全面说明:
references/tissue_masks.md- TissueMask:使用自动过滤器分割所有组织区域
- BiggestTissueBoxMask:返回最大组织区域的边界框(默认选项)
- BinaryMask:自定义掩膜实现的基类
- 使用可视化掩膜
locate_mask() - 创建自定义矩形与标注排除掩膜
- 掩膜与图像块提取的集成
- 最佳实践与故障排除
示例工作流:
python
from histolab.masks import TissueMask, BiggestTissueBoxMaskCreate tissue mask for all tissue regions
创建覆盖所有组织区域的掩膜
tissue_mask = TissueMask()
tissue_mask = TissueMask()
Visualize mask on slide
在全切片上可视化掩膜
slide.locate_mask(tissue_mask)
slide.locate_mask(tissue_mask)
Get mask array
获取掩膜数组
mask_array = tissue_mask(slide)
mask_array = tissue_mask(slide)
Use largest tissue region (default for most extractors)
使用最大组织区域(大多数提取器的默认选项)
biggest_mask = BiggestTissueBoxMask()
**When to use each mask:**
- `TissueMask`: Multiple tissue sections, comprehensive analysis
- `BiggestTissueBoxMask`: Single main tissue section, exclude artifacts (default)
- Custom `BinaryMask`: Specific ROI, exclude annotations, custom segmentationbiggest_mask = BiggestTissueBoxMask()
**各掩膜的适用场景:**
- `TissueMask`:多组织切片、全面分析场景
- `BiggestTissueBoxMask`:单一主组织切片、排除伪影(默认)
- 自定义`BinaryMask`:特定感兴趣区域(ROI)、排除标注、自定义分割3. Tile Extraction
3. 图像块提取
Extract smaller regions from large WSI using different strategies.
Three extraction strategies:
RandomTiler: Extract fixed number of randomly positioned tiles
- Best for: Sampling diverse regions, exploratory analysis, training data
- Key parameters: ,
n_tilesfor reproducibilityseed
GridTiler: Systematically extract tiles across tissue in grid pattern
- Best for: Complete coverage, spatial analysis, reconstruction
- Key parameters: for sliding windows
pixel_overlap
ScoreTiler: Extract top-ranked tiles based on scoring functions
- Best for: Most informative regions, quality-driven selection
- Key parameters: (NucleiScorer, CellularityScorer, custom)
scorer
Common parameters:
- : Tile dimensions (e.g., (512, 512))
tile_size - : Pyramid level for extraction (0 = highest resolution)
level - : Filter tiles by tissue content
check_tissue - : Minimum tissue coverage (default 80%)
tissue_percent - : Mask defining extraction region
extraction_mask
Reference: contains comprehensive documentation on:
references/tile_extraction.md- Detailed explanation of each tiler strategy
- Available scorers (NucleiScorer, CellularityScorer, custom)
- Tile preview with
locate_tiles() - Extraction workflows and reporting
- Advanced patterns (multi-level, hierarchical extraction)
- Performance optimization and troubleshooting
Example workflows:
python
from histolab.tiler import RandomTiler, GridTiler, ScoreTiler
from histolab.scorer import NucleiScorer使用不同策略从大型WSI中提取小区域。
三种提取策略:
RandomTiler: 提取固定数量的随机位置图像块
- 最佳适用:多样化区域采样、探索性分析、训练数据准备
- 关键参数:、用于可复现性的
n_tilesseed
GridTiler: 以网格模式系统地提取组织上的图像块
- 最佳适用:完整覆盖、空间分析、图像重建
- 关键参数:用于滑动窗口的
pixel_overlap
ScoreTiler: 基于评分函数提取排名靠前的图像块
- 最佳适用:信息最丰富的区域、基于质量的选择
- 关键参数:(NucleiScorer、CellularityScorer、自定义评分器)
scorer
通用参数:
- :图像块尺寸(例如(512, 512))
tile_size - :提取使用的金字塔层级(0 = 最高分辨率)
level - :按组织内容过滤图像块
check_tissue - :最小组织覆盖率(默认80%)
tissue_percent - :定义提取区域的掩膜
extraction_mask
参考文档: 包含以下全面说明:
references/tile_extraction.md- 每种提取器策略的详细解释
- 可用评分器(NucleiScorer、CellularityScorer、自定义)
- 使用预览图像块
locate_tiles() - 提取工作流与报告
- 高级模式(多层级、分层提取)
- 性能优化与故障排除
示例工作流:
python
from histolab.tiler import RandomTiler, GridTiler, ScoreTiler
from histolab.scorer import NucleiScorerRandom sampling (fast, diverse)
随机采样(快速、多样化)
random_tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100,
level=0,
seed=42,
check_tissue=True,
tissue_percent=80.0
)
random_tiler.extract(slide)
random_tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100,
level=0,
seed=42,
check_tissue=True,
tissue_percent=80.0
)
random_tiler.extract(slide)
Grid coverage (comprehensive)
网格覆盖(全面性)
grid_tiler = GridTiler(
tile_size=(512, 512),
level=0,
pixel_overlap=0,
check_tissue=True
)
grid_tiler.extract(slide)
grid_tiler = GridTiler(
tile_size=(512, 512),
level=0,
pixel_overlap=0,
check_tissue=True
)
grid_tiler.extract(slide)
Score-based selection (most informative)
基于评分的选择(信息最丰富)
score_tiler = ScoreTiler(
tile_size=(512, 512),
n_tiles=50,
scorer=NucleiScorer(),
level=0
)
score_tiler.extract(slide, report_path="tiles_report.csv")
**Always preview before extracting:**
```pythonscore_tiler = ScoreTiler(
tile_size=(512, 512),
n_tiles=50,
scorer=NucleiScorer(),
level=0
)
score_tiler.extract(slide, report_path="tiles_report.csv")
**提取前务必预览:**
```pythonPreview tile locations on thumbnail
在缩略图上预览图像块位置
tiler.locate_tiles(slide, n_tiles=20)
undefinedtiler.locate_tiles(slide, n_tiles=20)
undefined4. Filters and Preprocessing
4. 过滤器与预处理
Apply image processing filters for tissue detection, quality control, and preprocessing.
Filter categories:
Image Filters: Color space conversions, thresholding, contrast enhancement
- ,
RgbToGrayscale,RgbToHsvRgbToHed - ,
OtsuThresholdAdaptiveThreshold - ,
StretchContrastHistogramEqualization
Morphological Filters: Structural operations on binary images
- ,
BinaryDilationBinaryErosion - ,
BinaryOpeningBinaryClosing - ,
RemoveSmallObjectsRemoveSmallHoles
Composition: Chain multiple filters together
- : Create filter pipelines
Compose
Reference: contains comprehensive documentation on:
references/filters_preprocessing.md- Detailed explanation of each filter type
- Filter composition and chaining
- Common preprocessing pipelines (tissue detection, pen removal, nuclei enhancement)
- Applying filters to tiles
- Custom mask filters
- Quality control filters (blur detection, tissue coverage)
- Best practices and troubleshooting
Example workflows:
python
from histolab.filters.compositions import Compose
from histolab.filters.image_filters import RgbToGrayscale, OtsuThreshold
from histolab.filters.morphological_filters import (
BinaryDilation, RemoveSmallHoles, RemoveSmallObjects
)应用图像处理过滤器用于组织检测、质量控制与预处理。
过滤器类别:
图像过滤器: 色彩空间转换、阈值处理、对比度增强
- ,
RgbToGrayscale,RgbToHsvRgbToHed - ,
OtsuThresholdAdaptiveThreshold - ,
StretchContrastHistogramEqualization
形态学过滤器: 对二值图像的结构化操作
- ,
BinaryDilationBinaryErosion - ,
BinaryOpeningBinaryClosing - ,
RemoveSmallObjectsRemoveSmallHoles
组合过滤器: 将多个过滤器链式组合
- :创建过滤器管道
Compose
参考文档: 包含以下全面说明:
references/filters_preprocessing.md- 每种过滤器类型的详细解释
- 过滤器组合与链式调用
- 常见预处理管道(组织检测、笔标注移除、细胞核增强)
- 对图像块应用过滤器
- 自定义掩膜过滤器
- 质量控制过滤器(模糊检测、组织覆盖率)
- 最佳实践与故障排除
示例工作流:
python
from histolab.filters.compositions import Compose
from histolab.filters.image_filters import RgbToGrayscale, OtsuThreshold
from histolab.filters.morphological_filters import (
BinaryDilation, RemoveSmallHoles, RemoveSmallObjects
)Standard tissue detection pipeline
标准组织检测管道
tissue_detection = Compose([
RgbToGrayscale(),
OtsuThreshold(),
BinaryDilation(disk_size=5),
RemoveSmallHoles(area_threshold=1000),
RemoveSmallObjects(area_threshold=500)
])
tissue_detection = Compose([
RgbToGrayscale(),
OtsuThreshold(),
BinaryDilation(disk_size=5),
RemoveSmallHoles(area_threshold=1000),
RemoveSmallObjects(area_threshold=500)
])
Use with custom mask
与自定义掩膜配合使用
from histolab.masks import TissueMask
custom_mask = TissueMask(filters=tissue_detection)
from histolab.masks import TissueMask
custom_mask = TissueMask(filters=tissue_detection)
Apply filters to tile
对图像块应用过滤器
from histolab.tile import Tile
filtered_tile = tile.apply_filters(tissue_detection)
undefinedfrom histolab.tile import Tile
filtered_tile = tile.apply_filters(tissue_detection)
undefined5. Visualization
5. 可视化
Visualize slides, masks, tile locations, and extraction quality.
Common visualization tasks:
- Displaying slide thumbnails
- Visualizing tissue masks
- Previewing tile locations
- Assessing tile quality
- Creating reports and figures
Reference: contains comprehensive documentation on:
references/visualization.md- Slide thumbnail display and saving
- Mask visualization with
locate_mask() - Tile location preview with
locate_tiles() - Displaying extracted tiles and mosaics
- Quality assessment (score distributions, top vs bottom tiles)
- Multi-slide visualization
- Filter effect visualization
- Exporting high-resolution figures and PDF reports
- Interactive visualization in Jupyter notebooks
Example workflows:
python
import matplotlib.pyplot as plt
from histolab.masks import TissueMask可视化全切片、掩膜、图像块位置与提取质量。
常见可视化任务:
- 显示全切片缩略图
- 可视化组织掩膜
- 预览图像块位置
- 评估图像块质量
- 创建报告与图表
参考文档: 包含以下全面说明:
references/visualization.md- 全切片缩略图的显示与保存
- 使用进行掩膜可视化
locate_mask() - 使用预览图像块位置
locate_tiles() - 显示提取的图像块与创建拼图
- 质量评估(评分分布、高分与低分图像块)
- 多切片可视化
- 过滤器效果可视化
- 导出高分辨率图表与PDF报告
- Jupyter笔记本中的交互式可视化
示例工作流:
python
import matplotlib.pyplot as plt
from histolab.masks import TissueMaskDisplay slide thumbnail
显示全切片缩略图
plt.figure(figsize=(10, 10))
plt.imshow(slide.thumbnail)
plt.title(f"Slide: {slide.name}")
plt.axis('off')
plt.show()
plt.figure(figsize=(10, 10))
plt.imshow(slide.thumbnail)
plt.title(f"全切片: {slide.name}")
plt.axis('off')
plt.show()
Visualize tissue mask
可视化组织掩膜
tissue_mask = TissueMask()
slide.locate_mask(tissue_mask)
tissue_mask = TissueMask()
slide.locate_mask(tissue_mask)
Preview tile locations
预览图像块位置
tiler = RandomTiler(tile_size=(512, 512), n_tiles=50)
tiler.locate_tiles(slide, n_tiles=20)
tiler = RandomTiler(tile_size=(512, 512), n_tiles=50)
tiler.locate_tiles(slide, n_tiles=20)
Display extracted tiles in grid
以网格形式显示提取的图像块
from pathlib import Path
from PIL import Image
tile_paths = list(Path("output/tiles/").glob("*.png"))[:16]
fig, axes = plt.subplots(4, 4, figsize=(12, 12))
axes = axes.ravel()
for idx, tile_path in enumerate(tile_paths):
tile_img = Image.open(tile_path)
axes[idx].imshow(tile_img)
axes[idx].set_title(tile_path.stem, fontsize=8)
axes[idx].axis('off')
plt.tight_layout()
plt.show()
undefinedfrom pathlib import Path
from PIL import Image
tile_paths = list(Path("output/tiles/").glob("*.png"))[:16]
fig, axes = plt.subplots(4, 4, figsize=(12, 12))
axes = axes.ravel()
for idx, tile_path in enumerate(tile_paths):
tile_img = Image.open(tile_path)
axes[idx].imshow(tile_img)
axes[idx].set_title(tile_path.stem, fontsize=8)
axes[idx].axis('off')
plt.tight_layout()
plt.show()
undefinedTypical Workflows
典型工作流
Workflow 1: Exploratory Tile Extraction
工作流1:探索性图像块提取
Quick sampling of diverse tissue regions for initial analysis.
python
from histolab.slide import Slide
from histolab.tiler import RandomTiler
import logging快速采样多样化组织区域用于初始分析。
python
from histolab.slide import Slide
from histolab.tiler import RandomTiler
import loggingEnable logging for progress tracking
启用日志以跟踪进度
logging.basicConfig(level=logging.INFO)
logging.basicConfig(level=logging.INFO)
Load slide
加载全切片
slide = Slide("slide.svs", processed_path="output/random_tiles/")
slide = Slide("slide.svs", processed_path="output/random_tiles/")
Inspect slide
查看全切片
print(f"Dimensions: {slide.dimensions}")
print(f"Levels: {slide.levels}")
slide.save_thumbnail()
print(f"尺寸: {slide.dimensions}")
print(f"金字塔层级: {slide.levels}")
slide.save_thumbnail()
Configure random tiler
配置随机提取器
random_tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100,
level=0,
seed=42,
check_tissue=True,
tissue_percent=80.0
)
random_tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100,
level=0,
seed=42,
check_tissue=True,
tissue_percent=80.0
)
Preview locations
预览位置
random_tiler.locate_tiles(slide, n_tiles=20)
random_tiler.locate_tiles(slide, n_tiles=20)
Extract tiles
提取图像块
random_tiler.extract(slide)
undefinedrandom_tiler.extract(slide)
undefinedWorkflow 2: Comprehensive Grid Extraction
工作流2:全面网格提取
Complete tissue coverage for whole-slide analysis.
python
from histolab.slide import Slide
from histolab.tiler import GridTiler
from histolab.masks import TissueMask完整覆盖组织用于全切片分析。
python
from histolab.slide import Slide
from histolab.tiler import GridTiler
from histolab.masks import TissueMaskLoad slide
加载全切片
slide = Slide("slide.svs", processed_path="output/grid_tiles/")
slide = Slide("slide.svs", processed_path="output/grid_tiles/")
Use TissueMask for all tissue sections
使用TissueMask覆盖所有组织切片
tissue_mask = TissueMask()
slide.locate_mask(tissue_mask)
tissue_mask = TissueMask()
slide.locate_mask(tissue_mask)
Configure grid tiler
配置网格提取器
grid_tiler = GridTiler(
tile_size=(512, 512),
level=1, # Use level 1 for faster extraction
pixel_overlap=0,
check_tissue=True,
tissue_percent=70.0
)
grid_tiler = GridTiler(
tile_size=(512, 512),
level=1, # 使用层级1加快提取速度
pixel_overlap=0,
check_tissue=True,
tissue_percent=70.0
)
Preview grid
预览网格
grid_tiler.locate_tiles(slide)
grid_tiler.locate_tiles(slide)
Extract all tiles
提取所有图像块
grid_tiler.extract(slide, extraction_mask=tissue_mask)
undefinedgrid_tiler.extract(slide, extraction_mask=tissue_mask)
undefinedWorkflow 3: Quality-Driven Tile Selection
工作流3:基于质量的图像块选择
Extract most informative tiles based on nuclei density.
python
from histolab.slide import Slide
from histolab.tiler import ScoreTiler
from histolab.scorer import NucleiScorer
import pandas as pd
import matplotlib.pyplot as plt基于细胞核密度提取信息最丰富的图像块。
python
from histolab.slide import Slide
from histolab.tiler import ScoreTiler
from histolab.scorer import NucleiScorer
import pandas as pd
import matplotlib.pyplot as pltLoad slide
加载全切片
slide = Slide("slide.svs", processed_path="output/scored_tiles/")
slide = Slide("slide.svs", processed_path="output/scored_tiles/")
Configure score tiler
配置评分提取器
score_tiler = ScoreTiler(
tile_size=(512, 512),
n_tiles=50,
level=0,
scorer=NucleiScorer(),
check_tissue=True
)
score_tiler = ScoreTiler(
tile_size=(512, 512),
n_tiles=50,
level=0,
scorer=NucleiScorer(),
check_tissue=True
)
Preview top tiles
预览高分图像块
score_tiler.locate_tiles(slide, n_tiles=15)
score_tiler.locate_tiles(slide, n_tiles=15)
Extract with report
提取并生成报告
score_tiler.extract(slide, report_path="tiles_report.csv")
score_tiler.extract(slide, report_path="tiles_report.csv")
Analyze scores
分析评分分布
report_df = pd.read_csv("tiles_report.csv")
plt.hist(report_df['score'], bins=20, edgecolor='black')
plt.xlabel('Tile Score')
plt.ylabel('Frequency')
plt.title('Distribution of Tile Scores')
plt.show()
undefinedreport_df = pd.read_csv("tiles_report.csv")
plt.hist(report_df['score'], bins=20, edgecolor='black')
plt.xlabel('图像块评分')
plt.ylabel('频率')
plt.title('图像块评分分布')
plt.show()
undefinedWorkflow 4: Multi-Slide Processing Pipeline
工作流4:多切片处理管道
Process entire slide collection with consistent parameters.
python
from pathlib import Path
from histolab.slide import Slide
from histolab.tiler import RandomTiler
import logging
logging.basicConfig(level=logging.INFO)使用一致参数处理整个切片集合。
python
from pathlib import Path
from histolab.slide import Slide
from histolab.tiler import RandomTiler
import logging
logging.basicConfig(level=logging.INFO)Configure tiler once
一次性配置提取器
tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=50,
level=0,
seed=42,
check_tissue=True
)
tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=50,
level=0,
seed=42,
check_tissue=True
)
Process all slides
处理所有切片
slide_dir = Path("slides/")
output_base = Path("output/")
for slide_path in slide_dir.glob("*.svs"):
print(f"\nProcessing: {slide_path.name}")
# Create slide-specific output directory
output_dir = output_base / slide_path.stem
output_dir.mkdir(parents=True, exist_ok=True)
# Load and process slide
slide = Slide(slide_path, processed_path=output_dir)
# Save thumbnail for review
slide.save_thumbnail()
# Extract tiles
tiler.extract(slide)
print(f"Completed: {slide_path.name}")undefinedslide_dir = Path("slides/")
output_base = Path("output/")
for slide_path in slide_dir.glob("*.svs"):
print(f"\n正在处理: {slide_path.name}")
# 创建切片专属输出目录
output_dir = output_base / slide_path.stem
output_dir.mkdir(parents=True, exist_ok=True)
# 加载并处理全切片
slide = Slide(slide_path, processed_path=output_dir)
# 保存缩略图用于审核
slide.save_thumbnail()
# 提取图像块
tiler.extract(slide)
print(f"处理完成: {slide_path.name}")undefinedWorkflow 5: Custom Tissue Detection and Filtering
工作流5:自定义组织检测与过滤
Handle slides with artifacts, annotations, or unusual staining.
python
from histolab.slide import Slide
from histolab.masks import TissueMask
from histolab.tiler import RandomTiler
from histolab.filters.compositions import Compose
from histolab.filters.image_filters import RgbToGrayscale, OtsuThreshold
from histolab.filters.morphological_filters import (
BinaryDilation, RemoveSmallObjects, RemoveSmallHoles
)处理带有伪影、标注或异常染色的切片。
python
from histolab.slide import Slide
from histolab.masks import TissueMask
from histolab.tiler import RandomTiler
from histolab.filters.compositions import Compose
from histolab.filters.image_filters import RgbToGrayscale, OtsuThreshold
from histolab.filters.morphological_filters import (
BinaryDilation, RemoveSmallObjects, RemoveSmallHoles
)Define custom filter pipeline for aggressive artifact removal
定义用于强力移除伪影的自定义过滤器管道
aggressive_filters = Compose([
RgbToGrayscale(),
OtsuThreshold(),
BinaryDilation(disk_size=10),
RemoveSmallHoles(area_threshold=5000),
RemoveSmallObjects(area_threshold=3000) # Remove larger artifacts
])
aggressive_filters = Compose([
RgbToGrayscale(),
OtsuThreshold(),
BinaryDilation(disk_size=10),
RemoveSmallHoles(area_threshold=5000),
RemoveSmallObjects(area_threshold=3000) # 移除较大伪影
])
Create custom mask
创建自定义掩膜
custom_mask = TissueMask(filters=aggressive_filters)
custom_mask = TissueMask(filters=aggressive_filters)
Load slide and visualize mask
加载全切片并可视化掩膜
slide = Slide("slide.svs", processed_path="output/")
slide.locate_mask(custom_mask)
slide = Slide("slide.svs", processed_path="output/")
slide.locate_mask(custom_mask)
Extract with custom mask
使用自定义掩膜提取图像块
tiler = RandomTiler(tile_size=(512, 512), n_tiles=100)
tiler.extract(slide, extraction_mask=custom_mask)
undefinedtiler = RandomTiler(tile_size=(512, 512), n_tiles=100)
tiler.extract(slide, extraction_mask=custom_mask)
undefinedBest Practices
最佳实践
Slide Loading and Inspection
全切片加载与检查
- Always inspect slide properties before processing
- Save thumbnails for quick visual review
- Check pyramid levels and dimensions
- Verify tissue is present using thumbnails
- 处理前务必检查全切片属性
- 保存缩略图用于快速视觉审核
- 检查金字塔层级与尺寸
- 通过缩略图验证切片中存在组织
Tissue Detection
组织检测
- Preview masks with before extraction
locate_mask() - Use for multiple sections,
TissueMaskfor single sectionsBiggestTissueBoxMask - Customize filters for specific stains (H&E vs IHC)
- Handle pen annotations with custom masks
- Test masks on diverse slides
- 提取前使用预览掩膜
locate_mask() - 多切片场景使用,单一切片场景使用
TissueMaskBiggestTissueBoxMask - 针对特定染色(H&E vs IHC)自定义过滤器
- 使用自定义掩膜处理笔标注
- 在多样化切片上测试掩膜
Tile Extraction
图像块提取
- Always preview with before extracting
locate_tiles() - Choose appropriate tiler:
- RandomTiler: Sampling and exploration
- GridTiler: Complete coverage
- ScoreTiler: Quality-driven selection
- Set appropriate threshold (70-90% typical)
tissue_percent - Use seeds for reproducibility in RandomTiler
- Extract at appropriate pyramid level for analysis resolution
- Enable logging for large datasets
- 提取前务必使用预览
locate_tiles() - 选择合适的提取器:
- RandomTiler:采样与探索
- GridTiler:完整覆盖
- ScoreTiler:基于质量的选择
- 设置合适的阈值(典型值70-90%)
tissue_percent - 在RandomTiler中使用seed保证可复现性
- 为分析分辨率选择合适的金字塔层级
- 处理大型数据集时启用日志
Performance
性能优化
- Extract at lower levels (1, 2) for faster processing
- Use over
BiggestTissueBoxMaskwhen appropriateTissueMask - Adjust to reduce invalid tile attempts
tissue_percent - Limit for initial exploration
n_tiles - Use for non-overlapping grids
pixel_overlap=0
- 使用较低层级(1、2)加快处理速度
- 合适时使用替代
BiggestTissueBoxMaskTissueMask - 调整减少无效图像块尝试
tissue_percent - 初始探索时限制数量
n_tiles - 非重叠网格使用
pixel_overlap=0
Quality Control
质量控制
- Validate tile quality (check for blur, artifacts, focus)
- Review score distributions for ScoreTiler
- Inspect top and bottom scoring tiles
- Monitor tissue coverage statistics
- Filter extracted tiles by additional quality metrics if needed
- 验证图像块质量(检查模糊、伪影、对焦情况)
- 查看ScoreTiler的评分分布
- 检查高分与低分图像块
- 监控组织覆盖率统计
- 必要时通过额外质量指标过滤提取的图像块
Common Use Cases
常见使用场景
Training Deep Learning Models
深度学习模型训练
- Extract balanced datasets using RandomTiler across multiple slides
- Use ScoreTiler with NucleiScorer to focus on cell-rich regions
- Extract at consistent resolution (level 0 or level 1)
- Generate CSV reports for tracking tile metadata
- 使用RandomTiler在多个切片上提取均衡数据集
- 结合ScoreTiler与NucleiScorer聚焦细胞密集区域
- 以一致分辨率提取(层级0或层级1)
- 生成CSV报告跟踪图像块元数据
Whole Slide Analysis
全切片分析
- Use GridTiler for complete tissue coverage
- Extract at multiple pyramid levels for hierarchical analysis
- Maintain spatial relationships with grid positions
- Use for sliding window approaches
pixel_overlap
- 使用GridTiler实现完整组织覆盖
- 提取多个金字塔层级用于分层分析
- 保留网格位置的空间关系
- 使用实现滑动窗口分析
pixel_overlap
Tissue Characterization
组织特征分析
- Sample diverse regions with RandomTiler
- Quantify tissue coverage with masks
- Extract stain-specific information with HED decomposition
- Compare tissue patterns across slides
- 使用RandomTiler采样多样化区域
- 通过掩膜量化组织覆盖率
- 使用HED分解提取染色特定信息
- 跨切片比较组织模式
Quality Assessment
质量评估
- Identify optimal focus regions with ScoreTiler
- Detect artifacts using custom masks and filters
- Assess staining quality across slide collection
- Flag problematic slides for manual review
- 使用ScoreTiler识别最佳对焦区域
- 使用自定义掩膜与过滤器检测伪影
- 评估切片集合的染色质量
- 标记问题切片用于人工审核
Dataset Curation
数据集整理
- Use ScoreTiler to prioritize informative tiles
- Filter tiles by tissue percentage
- Generate reports with tile scores and metadata
- Create stratified datasets across slides and tissue types
- 使用ScoreTiler优先提取信息丰富的图像块
- 按组织百分比过滤图像块
- 生成包含图像块评分与元数据的报告
- 创建跨切片与组织类型的分层数据集
Troubleshooting
故障排除
No tiles extracted
未提取到任何图像块
- Lower threshold
tissue_percent - Verify slide contains tissue (check thumbnail)
- Ensure extraction_mask captures tissue regions
- Check tile_size is appropriate for slide resolution
- 降低阈值
tissue_percent - 验证切片包含组织(检查缩略图)
- 确保extraction_mask覆盖组织区域
- 检查tile_size与切片分辨率匹配
Many background tiles
提取大量背景图像块
- Enable
check_tissue=True - Increase threshold
tissue_percent - Use appropriate mask (TissueMask vs BiggestTissueBoxMask)
- Customize mask filters to better detect tissue
- 启用
check_tissue=True - 提高阈值
tissue_percent - 使用合适的掩膜(TissueMask vs BiggestTissueBoxMask)
- 自定义掩膜过滤器以更好地检测组织
Extraction very slow
提取速度极慢
- Extract at lower pyramid level (level=1 or 2)
- Reduce for RandomTiler/ScoreTiler
n_tiles - Use RandomTiler instead of GridTiler for sampling
- Use BiggestTissueBoxMask instead of TissueMask
- 使用较低金字塔层级(level=1或2)提取
- 减少RandomTiler/ScoreTiler的数量
n_tiles - 使用RandomTiler替代GridTiler进行采样
- 使用BiggestTissueBoxMask替代TissueMask
Tiles have artifacts
图像块存在伪影
- Implement custom annotation-exclusion masks
- Adjust filter parameters for artifact removal
- Increase small object removal threshold
- Apply post-extraction quality filtering
- 实现自定义标注排除掩膜
- 调整过滤器参数移除伪影
- 提高小物体移除阈值
- 应用提取后质量过滤
Inconsistent results across slides
跨切片结果不一致
- Use same seed for RandomTiler
- Normalize staining with preprocessing filters
- Adjust per staining quality
tissue_percent - Implement slide-specific mask customization
- RandomTiler使用相同的seed
- 使用预处理过滤器归一化染色
- 根据染色质量调整
tissue_percent - 实现切片专属的掩膜自定义
Resources
资源
This skill includes detailed reference documentation in the directory:
references/本技能包在目录中包含详细的参考文档:
references/references/slide_management.md
references/slide_management.md
Comprehensive guide to loading, inspecting, and working with whole slide images:
- Slide initialization and configuration
- Built-in sample datasets
- Slide properties and metadata
- Thumbnail generation and visualization
- Working with pyramid levels
- Multi-slide processing workflows
- Best practices and common patterns
关于加载、查看与处理全切片图像的全面指南:
- 全切片初始化与配置
- 内置样本数据集
- 全切片属性与元数据
- 缩略图生成与可视化
- 金字塔层级处理
- 多切片处理工作流
- 最佳实践与常见模式
references/tissue_masks.md
references/tissue_masks.md
Complete documentation on tissue detection and masking:
- TissueMask, BiggestTissueBoxMask, BinaryMask classes
- How tissue detection filters work
- Customizing masks with filter chains
- Visualizing masks
- Creating custom rectangular and annotation-exclusion masks
- Integration with tile extraction
- Best practices and troubleshooting
关于组织检测与掩膜的完整文档:
- TissueMask、BiggestTissueBoxMask、BinaryMask类
- 组织检测过滤器的工作原理
- 使用过滤器链自定义掩膜
- 掩膜可视化
- 创建自定义矩形与标注排除掩膜
- 与图像块提取的集成
- 最佳实践与故障排除
references/tile_extraction.md
references/tile_extraction.md
Detailed explanation of tile extraction strategies:
- RandomTiler, GridTiler, ScoreTiler comparison
- Available scorers (NucleiScorer, CellularityScorer, custom)
- Common and strategy-specific parameters
- Tile preview with locate_tiles()
- Extraction workflows and CSV reporting
- Advanced patterns (multi-level, hierarchical)
- Performance optimization
- Troubleshooting common issues
关于图像块提取策略的详细说明:
- RandomTiler、GridTiler、ScoreTiler对比
- 可用评分器(NucleiScorer、CellularityScorer、自定义)
- 通用与策略专属参数
- 使用locate_tiles()预览图像块
- 提取工作流与CSV报告
- 高级模式(多层级、分层)
- 性能优化
- 常见问题故障排除
references/filters_preprocessing.md
references/filters_preprocessing.md
Complete filter reference and preprocessing guide:
- Image filters (color conversion, thresholding, contrast)
- Morphological filters (dilation, erosion, opening, closing)
- Filter composition and chaining
- Common preprocessing pipelines
- Applying filters to tiles
- Custom mask filters
- Quality control filters
- Best practices and troubleshooting
完整的过滤器参考与预处理指南:
- 图像过滤器(色彩转换、阈值处理、对比度)
- 形态学过滤器(膨胀、腐蚀、开运算、闭运算)
- 过滤器组合与链式调用
- 常见预处理管道
- 对图像块应用过滤器
- 自定义掩膜过滤器
- 质量控制过滤器
- 最佳实践与故障排除
references/visualization.md
references/visualization.md
Comprehensive visualization guide:
- Slide thumbnail display and saving
- Mask visualization techniques
- Tile location preview
- Displaying extracted tiles and creating mosaics
- Quality assessment visualizations
- Multi-slide comparison
- Filter effect visualization
- Exporting high-resolution figures and PDFs
- Interactive visualization in Jupyter notebooks
Usage pattern: Reference files contain in-depth information to support workflows described in this main skill document. Load specific reference files as needed for detailed implementation guidance, troubleshooting, or advanced features.
全面的可视化指南:
- 全切片缩略图的显示与保存
- 掩膜可视化技术
- 图像块位置预览
- 显示提取的图像块与创建拼图
- 质量评估可视化
- 多切片比较
- 过滤器效果可视化
- 导出高分辨率图表与PDF
- Jupyter笔记本中的交互式可视化
使用方式: 参考文件包含支持本主技能文档所述工作流的深度信息。如需详细实现指导、故障排除或高级功能支持,可按需加载特定参考文件。