tooluniverse-image-analysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Microscopy Image Analysis and Quantitative Imaging Data

显微镜图像分析与定量成像数据处理

Production-ready skill for analyzing microscopy-derived measurement data using pandas, numpy, scipy, statsmodels, and scikit-image. Designed for BixBench imaging questions covering colony morphometry, cell counting, fluorescence quantification, regression modeling, and statistical comparisons.
IMPORTANT: This skill handles complex multi-workflow analysis. Most implementation details have been moved to
references/
for progressive disclosure. This document focuses on high-level decision-making and workflow orchestration.

这是一款可用于生产环境的技能,使用pandas、numpy、scipy、statsmodels和scikit-image分析显微镜衍生的测量数据。专为BixBench成像相关问题设计,涵盖菌落形态分析、细胞计数、荧光定量、回归建模和统计对比。
重要提示:本技能可处理复杂的多工作流分析。大多数实现细节已移至
references/
目录,以便逐步展示。本文档重点介绍高层决策与工作流编排。

When to Use This Skill

适用场景

Apply when users:
  • Have microscopy measurement data (area, circularity, intensity, cell counts) in CSV/TSV
  • Ask about colony morphometry (bacterial swarming, biofilm, growth assays)
  • Need statistical comparisons of imaging measurements (t-test, ANOVA, Dunnett's, Mann-Whitney)
  • Ask about cell counting statistics (NeuN, DAPI, marker counts)
  • Need effect size calculations (Cohen's d) and power analysis
  • Want regression models (polynomial, spline) fitted to dose-response or ratio data
  • Ask about model comparison (R-squared, F-statistic, AIC/BIC)
  • Need Shapiro-Wilk normality testing on imaging data
  • Want confidence intervals for peak predictions from fitted models
  • Questions mention imaging software output (ImageJ, CellProfiler, QuPath)
  • Need fluorescence intensity quantification or colocalization analysis
  • Ask about image segmentation results (counts, areas, shapes)
BixBench Coverage: 21 questions across 4 projects (bix-18, bix-19, bix-41, bix-54)
NOT for (use other skills instead):
  • Phylogenetic analysis → Use
    tooluniverse-phylogenetics
  • RNA-seq differential expression → Use
    tooluniverse-rnaseq-deseq2
  • Single-cell scRNA-seq → Use
    tooluniverse-single-cell
  • Statistical regression only (no imaging context) → Use
    tooluniverse-statistical-modeling

当用户有以下需求时适用:
  • 拥有CSV/TSV格式的显微镜测量数据(面积、圆度、强度、细胞计数)
  • 询问菌落形态分析相关问题(细菌扩散、生物膜、生长实验)
  • 需要对成像测量数据进行统计对比(t检验、ANOVA、Dunnett检验、Mann-Whitney检验)
  • 询问细胞计数统计相关问题(NeuN、DAPI、标记物计数)
  • 需要计算效应量(Cohen's d)并进行功效分析
  • 希望为剂量反应或比例数据拟合回归模型(多项式、样条)
  • 询问模型对比相关问题(R平方、F统计量、AIC/BIC)
  • 需要对成像数据进行Shapiro-Wilk正态性检验
  • 希望获取拟合模型峰值预测的置信区间
  • 问题中提及成像软件输出(ImageJ、CellProfiler、QuPath)
  • 需要荧光强度定量或共定位分析
  • 询问图像分割结果(计数、面积、形状)
BixBench覆盖范围:4个项目(bix-18、bix-19、bix-41、bix-54)中的21个问题
不适用场景(请使用其他技能):
  • 系统发育分析 → 使用
    tooluniverse-phylogenetics
  • RNA-seq差异表达分析 → 使用
    tooluniverse-rnaseq-deseq2
  • 单细胞scRNA-seq分析 → 使用
    tooluniverse-single-cell
  • 仅需统计回归(无成像场景) → 使用
    tooluniverse-statistical-modeling

Core Principles

核心原则

  1. Data-first approach - Load and inspect all CSV/TSV measurement data before analysis
  2. Question-driven - Parse the exact statistic, comparison, or model requested
  3. Statistical rigor - Proper effect sizes, multiple comparison corrections, model selection
  4. Imaging-aware - Understand ImageJ/CellProfiler measurement columns (Area, Circularity, Round, Intensity)
  5. Workflow flexibility - Support both pre-quantified data (CSV) and raw image processing
  6. Precision - Match expected answer format (integer, range, decimal places)
  7. Reproducible - Use standard Python/scipy equivalents to R functions

  1. 数据优先方法 - 在分析前加载并检查所有CSV/TSV测量数据
  2. 问题驱动 - 解析用户要求的具体统计量、对比或模型
  3. 统计严谨性 - 合理计算效应量、多重比较校正、模型选择
  4. 成像感知 - 理解ImageJ/CellProfiler的测量列(Area、Circularity、Round、Intensity)
  5. 工作流灵活性 - 支持预量化数据(CSV)和原始图像处理
  6. 精度 - 匹配预期答案格式(整数、范围、小数位数)
  7. 可复现性 - 使用与R函数等效的标准Python/scipy实现

Required Python Packages

所需Python包

python
undefined
python
undefined

Core (MUST be installed)

核心包(必须安装)

import pandas as pd import numpy as np from scipy import stats from scipy.interpolate import BSpline, make_interp_spline import statsmodels.api as sm from statsmodels.formula.api import ols from statsmodels.stats.power import TTestIndPower from patsy import dmatrix, bs, cr
import pandas as pd import numpy as np from scipy import stats from scipy.interpolate import BSpline, make_interp_spline import statsmodels.api as sm from statsmodels.formula.api import ols from statsmodels.stats.power import TTestIndPower from patsy import dmatrix, bs, cr

Optional (for raw image processing)

可选包(用于原始图像处理)

import skimage import cv2 import tifffile

**Installation**:
```bash
pip install pandas numpy scipy statsmodels patsy scikit-image opencv-python-headless tifffile

import skimage import cv2 import tifffile

**安装命令**:
```bash
pip install pandas numpy scipy statsmodels patsy scikit-image opencv-python-headless tifffile

High-Level Workflow Decision Tree

高层工作流决策树

START: User question about microscopy data
├─ Q1: What type of data is available?
│  │
│  ├─ PRE-QUANTIFIED DATA (CSV/TSV with measurements)
│  │  └─ Workflow: Load → Parse question → Statistical analysis
│  │     Pattern: Most common BixBench pattern (bix-18, bix-19, bix-41, bix-54)
│  │     See: Section "Quantitative Data Analysis" below
│  │
│  └─ RAW IMAGES (TIFF, PNG, multi-channel)
│     └─ Workflow: Load → Segment → Measure → Analyze
│        See: references/image_processing.md
├─ Q2: What type of analysis is needed?
│  │
│  ├─ STATISTICAL COMPARISON
│  │  ├─ Two groups → t-test or Mann-Whitney
│  │  ├─ Multiple groups → ANOVA or Dunnett's test
│  │  ├─ Two factors → Two-way ANOVA
│  │  └─ Effect size → Cohen's d, power analysis
│  │  See: references/statistical_analysis.md
│  │
│  ├─ REGRESSION MODELING
│  │  ├─ Dose-response → Polynomial (quadratic, cubic)
│  │  ├─ Ratio optimization → Natural spline
│  │  └─ Model comparison → R-squared, F-statistic, AIC/BIC
│  │  See: references/statistical_analysis.md
│  │
│  ├─ CELL COUNTING
│  │  ├─ Fluorescence (DAPI, NeuN) → Threshold + watershed
│  │  ├─ Brightfield → Adaptive threshold
│  │  └─ High-density → CellPose or StarDist (external)
│  │  See: references/cell_counting.md
│  │
│  ├─ COLONY SEGMENTATION
│  │  ├─ Swarming assays → Otsu threshold + morphology
│  │  ├─ Biofilms → Li threshold + fill holes
│  │  └─ Growth assays → Time-lapse tracking
│  │  See: references/segmentation.md
│  │
│  └─ FLUORESCENCE QUANTIFICATION
│     ├─ Intensity measurement → regionprops
│     ├─ Colocalization → Pearson/Manders
│     └─ Multi-channel → Channel-wise quantification
│     See: references/fluorescence_analysis.md
└─ Q3: When to use scikit-image vs OpenCV?
   ├─ scikit-image: Scientific analysis, measurements, regionprops
   ├─ OpenCV: Fast processing, real-time, large batches
   └─ Both: Often interchangeable for basic operations
   See: references/image_processing.md "Library Selection Guide"

开始:用户询问显微镜数据相关问题
├─ 问题1:可用数据类型是什么?
│  │
│  ├─ 预量化数据(含测量值的CSV/TSV)
│  │  └─ 工作流:加载 → 解析问题 → 统计分析
│  │     模式:最常见的BixBench模式(bix-18、bix-19、bix-41、bix-54)
│  │     参考:下文“定量数据分析”章节
│  │
│  └─ 原始图像(TIFF、PNG、多通道)
│     └─ 工作流:加载 → 分割 → 测量 → 分析
│        参考:references/image_processing.md
├─ 问题2:需要何种类型的分析?
│  │
│  ├─ 统计对比
│  │  ├─ 两组对比 → t检验或Mann-Whitney检验
│  │  ├─ 多组对比 → ANOVA或Dunnett检验
│  │  ├─ 双因素 → 双因素ANOVA
│  │  └─ 效应量 → Cohen's d、功效分析
│  │  参考:references/statistical_analysis.md
│  │
│  ├─ 回归建模
│  │  ├─ 剂量反应 → 多项式(二次、三次)
│  │  ├─ 比例优化 → 自然样条
│  │  └─ 模型对比 → R平方、F统计量、AIC/BIC
│  │  参考:references/statistical_analysis.md
│  │
│  ├─ 细胞计数
│  │  ├─ 荧光(DAPI、NeuN) → 阈值 + 分水岭算法
│  │  ├─ 明场 → 自适应阈值
│  │  └─ 高密度 → CellPose或StarDist(外部工具)
│  │  参考:references/cell_counting.md
│  │
│  ├─ 菌落分割
│  │  ├─ 扩散实验 → Otsu阈值 + 形态学操作
│  │  ├─ 生物膜 → Li阈值 + 孔洞填充
│  │  └─ 生长实验 → 延时追踪
│  │  参考:references/segmentation.md
│  │
│  └─ 荧光定量
│     ├─ 强度测量 → regionprops
│     ├─ 共定位 → Pearson/Manders系数
│     └─ 多通道 → 分通道定量
│     参考:references/fluorescence_analysis.md
└─ 问题3:何时使用scikit-image vs OpenCV?
   ├─ scikit-image:科学分析、测量、regionprops
   ├─ OpenCV:快速处理、实时分析、大批次数据
   └─ 两者均可:基础操作通常可互换
   参考:references/image_processing.md中的“库选择指南”

Quantitative Data Analysis Workflow

定量数据分析工作流

Phase 0: Question Parsing and Data Discovery

阶段0:问题解析与数据发现

CRITICAL FIRST STEP: Before writing ANY code, identify what data files are available and what the question is asking for.
python
import os, glob, pandas as pd
关键第一步:在编写任何代码之前,先确定可用的数据文件以及用户问题的具体需求。
python
import os, glob, pandas as pd

Discover data files

发现数据文件

data_dir = "." csv_files = glob.glob(os.path.join(data_dir, '', '*.csv'), recursive=True) tsv_files = glob.glob(os.path.join(data_dir, '', '.tsv'), recursive=True) img_files = glob.glob(os.path.join(data_dir, '**', '.tif*'), recursive=True)
data_dir = "." csv_files = glob.glob(os.path.join(data_dir, '', '*.csv'), recursive=True) tsv_files = glob.glob(os.path.join(data_dir, '', '.tsv'), recursive=True) img_files = glob.glob(os.path.join(data_dir, '**', '.tif*'), recursive=True)

Load and inspect first measurement file

加载并检查第一个测量文件

if csv_files: df = pd.read_csv(csv_files[0]) print(f"Shape: {df.shape}") print(f"Columns: {list(df.columns)}") print(df.head()) print(df.describe())

**Common Column Names**:
- Area: Colony or cell area in pixels or calibrated units
- Circularity: 4*pi*area/perimeter^2, range [0,1], 1.0 = perfect circle
- Round: Roundness = 4*area/(pi*major_axis^2)
- Genotype/Strain: Biological grouping variable
- Ratio: Co-culture mixing ratio (e.g., "1:3", "5:1")
- NeuN/DAPI/GFP: Cell marker counts or intensities
if csv_files: df = pd.read_csv(csv_files[0]) print(f"数据形状: {df.shape}") print(f"列名: {list(df.columns)}") print(df.head()) print(df.describe())

**常见列名**:
- Area:菌落或细胞的面积(像素或校准单位)
- Circularity:4*pi*面积/周长²,范围[0,1],1.0表示完美圆形
- Round:圆度 = 4*面积/(pi*长轴²)
- Genotype/Strain:生物分组变量
- Ratio:共培养混合比例(如"1:3"、"5:1")
- NeuN/DAPI/GFP:细胞标记物计数或强度

Phase 1: Grouped Statistics

阶段1:分组统计

python
def grouped_summary(df, group_cols, measure_col):
    """Calculate summary statistics by group."""
    summary = df.groupby(group_cols)[measure_col].agg(
        Mean='mean',
        SD='std',
        Median='median',
        Min='min',
        Max='max',
        N='count'
    ).reset_index()
    summary['SEM'] = summary['SD'] / np.sqrt(summary['N'])
    return summary
python
def grouped_summary(df, group_cols, measure_col):
    """按分组计算汇总统计量。"""
    summary = df.groupby(group_cols)[measure_col].agg(
        Mean='mean',
        SD='std',
        Median='median',
        Min='min',
        Max='max',
        N='count'
    ).reset_index()
    summary['SEM'] = summary['SD'] / np.sqrt(summary['N'])
    return summary

Example: Colony morphometry by genotype

示例:按基因型统计菌落形态

area_summary = grouped_summary(df, 'Genotype', 'Area') circ_summary = grouped_summary(df, 'Genotype', 'Circularity')

For detailed statistical functions, see: **references/statistical_analysis.md**
area_summary = grouped_summary(df, 'Genotype', 'Area') circ_summary = grouped_summary(df, 'Genotype', 'Circularity')

详细统计函数请参考:**references/statistical_analysis.md**

Phase 2: Statistical Testing

阶段2:统计检验

Decision guide:
  • Normality test needed? → Shapiro-Wilk
  • Two groups comparison? → t-test or Mann-Whitney
  • Multiple groups vs control? → Dunnett's test
  • Multiple groups, all comparisons? → Tukey HSD
  • Two factors? → Two-way ANOVA
  • Effect size? → Cohen's d
  • Sample size planning? → Power analysis
See: references/statistical_analysis.md for complete implementations
决策指南:
  • 是否需要正态性检验? → Shapiro-Wilk检验
  • 两组对比? → t检验或Mann-Whitney检验
  • 多组与对照组对比? → Dunnett检验
  • 多组间全对比? → Tukey HSD检验
  • 双因素? → 双因素ANOVA
  • 效应量? → Cohen's d
  • 样本量规划? → 功效分析
完整实现请参考:references/statistical_analysis.md

Phase 3: Regression Modeling

阶段3:回归建模

When to use each model:
  • Polynomial (quadratic/cubic): Smooth dose-response, clear peak
  • Natural spline: Flexible, non-parametric, handles complex patterns
  • Linear: Simple relationships, checking for trends
Model comparison metrics:
  • R-squared: Overall fit (higher = better)
  • Adjusted R-squared: Penalizes complexity
  • F-statistic p-value: Model significance
  • AIC/BIC: Compare non-nested models
See: references/statistical_analysis.md for complete implementations

各模型适用场景:
  • 多项式(二次/三次):平滑剂量反应曲线、明确峰值
  • 自然样条:灵活、非参数、处理复杂模式
  • 线性:简单关系、趋势检验
模型对比指标:
  • R平方:整体拟合度(值越高越好)
  • 调整后R平方:对复杂度进行惩罚
  • F统计量p值:模型显著性
  • AIC/BIC:对比非嵌套模型
完整实现请参考:references/statistical_analysis.md

Raw Image Processing Workflow

原始图像处理工作流

When Processing Raw Images

处理原始图像时

Workflow: Load → Preprocess → Segment → Measure → Export
python
undefined
工作流:加载 → 预处理 → 分割 → 测量 → 导出
python
undefined

Quick start for cell counting

细胞计数快速入门

from scripts.segment_cells import count_cells_in_image
result = count_cells_in_image( image_path="cells.tif", channel=0, # DAPI channel min_area=50 ) print(f"Found {result['count']} cells")
undefined
from scripts.segment_cells import count_cells_in_image
result = count_cells_in_image( image_path="cells.tif", channel=0, # DAPI通道 min_area=50 ) print(f"检测到 {result['count']} 个细胞")
undefined

Segmentation Method Selection

分割方法选择

Decision guide:
Cell TypeDensityBest MethodNotes
Nuclei (DAPI)Low-MediumOtsu + watershedStandard approach
Nuclei (DAPI)HighCellPose/StarDistHandles touching
ColoniesWell-separatedOtsu thresholdFast, reliable
ColoniesTouchingWatershedEdge detection
Cells (phase)AnyAdaptive thresholdHandles uneven illumination
FluorescenceLow signalLi thresholdMore sensitive
See: references/segmentation.md and references/cell_counting.md for detailed protocols
决策指南:
细胞类型密度最佳方法说明
细胞核(DAPI)中低密度Otsu阈值 + 分水岭算法标准方法
细胞核(DAPI)高密度CellPose/StarDist处理重叠细胞
菌落分离良好Otsu阈值快速可靠
菌落相互接触分水岭算法边缘检测
细胞(相差)任意密度自适应阈值处理不均匀光照
荧光低信号Li阈值灵敏度更高
详细方案请参考:references/segmentation.mdreferences/cell_counting.md

Library Selection: scikit-image vs OpenCV

库选择:scikit-image vs OpenCV

Use scikit-image when:
  • Scientific measurements needed (area, perimeter, intensity)
  • regionprops for object properties
  • Publication-quality analysis
  • Easier syntax for scientists
Use OpenCV when:
  • Processing large image batches
  • Speed is critical
  • Real-time processing
  • Advanced computer vision features
Both work for:
  • Thresholding, filtering, morphological operations
  • Basic image transformations
  • Most segmentation tasks
See: references/image_processing.md "Library Selection Guide"

优先使用scikit-image的场景:
  • 需要科学测量(面积、周长、强度)
  • 使用regionprops获取物体属性
  • 用于发表级别的分析
  • 对科学家而言语法更简洁
优先使用OpenCV的场景:
  • 处理大批次图像
  • 对速度要求高
  • 实时处理
  • 需要高级计算机视觉功能
两者均可使用的场景:
  • 阈值化、滤波、形态学操作
  • 基础图像变换
  • 大多数分割任务
详细内容请参考:references/image_processing.md中的“库选择指南”

Common BixBench Patterns

常见BixBench模式

Pattern 1: Colony Morphometry (bix-18)

模式1:菌落形态分析(bix-18)

Question type: "Mean circularity of genotype with largest area?"
Data: CSV with Genotype, Area, Circularity columns
Workflow:
  1. Load CSV → group by Genotype
  2. Calculate mean Area per genotype
  3. Identify genotype with max mean Area
  4. Report mean Circularity for that genotype
See: references/segmentation.md "Colony Morphometry Analysis"
问题类型:“面积最大的基因型的平均圆度是多少?”
数据:包含Genotype、Area、Circularity列的CSV
工作流:
  1. 加载CSV → 按Genotype分组
  2. 计算每个基因型的平均Area
  3. 找出平均Area最大的基因型
  4. 报告该基因型的平均Circularity
参考:references/segmentation.md中的“菌落形态分析”

Pattern 2: Cell Counting Statistics (bix-19)

模式2:细胞计数统计(bix-19)

Question type: "Cohen's d for NeuN counts between conditions?"
Data: CSV with Condition, NeuN_count, Sex, Hemisphere columns
Workflow:
  1. Load CSV → filter by hemisphere/sex if needed
  2. Split by Condition (KD vs CTRL)
  3. Calculate Cohen's d with pooled SD
  4. Report effect size
See: references/statistical_analysis.md "Effect Size Calculations"
问题类型:“不同条件下NeuN计数的Cohen's d是多少?”
数据:包含Condition、NeuN_count、Sex、Hemisphere列的CSV
工作流:
  1. 加载CSV → 按需按半球/性别过滤
  2. 按Condition分组(KD vs CTRL)
  3. 使用合并标准差计算Cohen's d
  4. 报告效应量
参考:references/statistical_analysis.md中的“效应量计算”

Pattern 3: Multi-Group Comparison (bix-41)

模式3:多组对比(bix-41)

Question type: "Dunnett's test: How many ratios equivalent to control?"
Data: CSV with multiple co-culture ratios, Area, Circularity
Workflow:
  1. Create Strain_Ratio labels
  2. Run Dunnett's test for Area (vs control)
  3. Run Dunnett's test for Circularity (vs control)
  4. Count groups NOT significant in BOTH tests
See: references/statistical_analysis.md "Dunnett's Test"
问题类型:“Dunnett检验:有多少比例与对照组无显著差异?”
数据:包含多种共培养比例、Area、Circularity的CSV
工作流:
  1. 创建Strain_Ratio标签
  2. 对Area执行Dunnett检验(与对照组对比)
  3. 对Circularity执行Dunnett检验(与对照组对比)
  4. 统计在两项检验中均无显著差异的组
参考:references/statistical_analysis.md中的“Dunnett检验”

Pattern 4: Regression Optimization (bix-54)

模式4:回归优化(bix-54)

Question type: "Peak frequency from natural spline model?"
Data: CSV with co-culture frequencies and Area measurements
Workflow:
  1. Convert ratio strings to frequencies
  2. Fit natural spline model (df=4)
  3. Find peak via grid search
  4. Report peak frequency + confidence interval
See: references/statistical_analysis.md "Regression Modeling"

问题类型:“自然样条模型的峰值频率是多少?”
数据:包含共培养频率和Area测量值的CSV
工作流:
  1. 将比例字符串转换为频率
  2. 拟合自然样条模型(df=4)
  3. 通过网格搜索找到峰值
  4. 报告峰值频率及置信区间
参考:references/statistical_analysis.md中的“回归建模”

Quick Reference Table

快速参考表

TaskPrimary ToolReference
Load measurement CSVpandas.read_csv()This file
Group statisticsdf.groupby().agg()This file
T-testscipy.stats.ttest_ind()statistical_analysis.md
ANOVAstatsmodels.ols + anova_lm()statistical_analysis.md
Dunnett's testscipy.stats.dunnett()statistical_analysis.md
Cohen's dCustom function (pooled SD)statistical_analysis.md
Power analysisstatsmodels TTestIndPowerstatistical_analysis.md
Polynomial regressionstatsmodels.OLS + poly featuresstatistical_analysis.md
Natural splinepatsy.cr() + statsmodels.OLSstatistical_analysis.md
Cell segmentationskimage.filters + watershedcell_counting.md
Colony segmentationskimage.filters.threshold_otsusegmentation.md
Fluorescence quantificationskimage.measure.regionpropsfluorescence_analysis.md
ColocalizationPearson/Mandersfluorescence_analysis.md
Image loadingtifffile, skimage.ioimage_processing.md
Batch processingscripts/batch_process.pyscripts/

任务主要工具参考文档
加载测量CSVpandas.read_csv()本文档
分组统计df.groupby().agg()本文档
t检验scipy.stats.ttest_ind()statistical_analysis.md
ANOVAstatsmodels.ols + anova_lm()statistical_analysis.md
Dunnett检验scipy.stats.dunnett()statistical_analysis.md
Cohen's d自定义函数(合并标准差)statistical_analysis.md
功效分析statsmodels TTestIndPowerstatistical_analysis.md
多项式回归statsmodels.OLS + 多项式特征statistical_analysis.md
自然样条patsy.cr() + statsmodels.OLSstatistical_analysis.md
细胞分割skimage.filters + 分水岭算法cell_counting.md
菌落分割skimage.filters.threshold_otsusegmentation.md
荧光定量skimage.measure.regionpropsfluorescence_analysis.md
共定位Pearson/Manders系数fluorescence_analysis.md
图像加载tifffile, skimage.ioimage_processing.md
批量处理scripts/batch_process.pyscripts/

Example Scripts

示例脚本

Ready-to-use scripts in
scripts/
directory:
  1. segment_cells.py - Cell/nuclei counting with watershed
  2. measure_fluorescence.py - Multi-channel intensity quantification
  3. batch_process.py - Process folders of images
  4. colony_morphometry.py - Measure colony area/circularity
  5. statistical_comparison.py - Group comparison statistics
Usage:
bash
undefined
scripts/
目录下提供即用型脚本:
  1. segment_cells.py - 使用分水岭算法计数细胞/细胞核
  2. measure_fluorescence.py - 多通道强度定量
  3. batch_process.py - 处理文件夹中的图像
  4. colony_morphometry.py - 测量菌落面积/圆度
  5. statistical_comparison.py - 分组对比统计
使用方法:
bash
undefined

Count cells in image

计数图像中的细胞

python scripts/segment_cells.py cells.tif --channel 0 --min-area 50
python scripts/segment_cells.py cells.tif --channel 0 --min-area 50

Batch process folder

批量处理文件夹

python scripts/batch_process.py input_folder/ output.csv --analysis cell_count

---
python scripts/batch_process.py input_folder/ output.csv --analysis cell_count

---

Detailed Reference Guides

详细参考指南

For complete implementations and protocols:
  1. references/statistical_analysis.md - All statistical tests, regression models
  2. references/cell_counting.md - Cell/nuclei counting protocols
  3. references/segmentation.md - Colony and object segmentation
  4. references/fluorescence_analysis.md - Intensity quantification, colocalization
  5. references/image_processing.md - Image loading, preprocessing, library selection
  6. references/troubleshooting.md - Common issues and solutions

完整实现与方案请参考:
  1. references/statistical_analysis.md - 所有统计检验、回归模型
  2. references/cell_counting.md - 细胞/细胞核计数方案
  3. references/segmentation.md - 菌落与物体分割
  4. references/fluorescence_analysis.md - 强度定量、共定位
  5. references/image_processing.md - 图像加载、预处理、库选择
  6. references/troubleshooting.md - 常见问题与解决方案

Important Notes

重要说明

Matching R Statistical Functions

与R统计函数匹配

Some BixBench questions use R for analysis. Python equivalents:
  • R's Dunnett test (
    multcomp::glht
    ) →
    scipy.stats.dunnett()
    (scipy ≥ 1.10)
  • R's natural spline (
    ns(x, df=4)
    ) →
    patsy.cr(x, knots=...)
    with explicit quantile knots
  • R's t-test (
    t.test()
    ) →
    scipy.stats.ttest_ind()
  • R's ANOVA (
    aov()
    ) →
    statsmodels.formula.api.ols()
    +
    sm.stats.anova_lm()
See: references/statistical_analysis.md for exact parameter matching
部分BixBench问题使用R进行分析,对应的Python等效实现:
  • R的Dunnett检验 (
    multcomp::glht
    ) →
    scipy.stats.dunnett()
    (scipy ≥ 1.10)
  • R的自然样条 (
    ns(x, df=4)
    ) →
    patsy.cr(x, knots=...)
    (使用显式分位数节点)
  • R的t检验 (
    t.test()
    ) →
    scipy.stats.ttest_ind()
  • R的ANOVA (
    aov()
    ) →
    statsmodels.formula.api.ols()
    +
    sm.stats.anova_lm()
参数匹配细节请参考:references/statistical_analysis.md

Answer Formatting

答案格式

BixBench expects specific formats:
  • "to the nearest thousand":
    int(round(val, -3))
  • Percentages: Usually integer or 1-2 decimal places
  • Cohen's d: 3 decimal places
  • Sample sizes: Always integer (ceiling)
  • Ratios: String format "5:1"

BixBench要求特定格式:
  • “四舍五入到千位”:
    int(round(val, -3))
  • 百分比:通常为整数或1-2位小数
  • Cohen's d:3位小数
  • 样本量:始终为整数(向上取整)
  • 比例:字符串格式“5:1”

Completeness Checklist

完整性检查清单

Before returning your answer, verify:
  • Loaded all data files and inspected column names
  • Identified the specific statistic or model requested
  • Used correct grouping variables and filter conditions
  • Applied correct rounding or format
  • For "how many" questions: counted correctly based on criteria
  • For statistical tests: used appropriate multiple comparison correction
  • For regression: properly prepared and transformed data
  • Double-checked direction of comparisons
  • Verified answer falls within expected range

返回答案前,请验证:
  • 已加载所有数据文件并检查列名
  • 已明确用户要求的具体统计量或模型
  • 使用了正确的分组变量与过滤条件
  • 应用了正确的舍入或格式
  • 对于“数量多少”类问题:根据标准正确计数
  • 对于统计检验:使用了合适的多重比较校正
  • 对于回归:已正确准备与转换数据
  • 已仔细核对对比方向
  • 已验证答案在预期范围内

Getting Help

获取帮助

  • Start with decision tree at top of this file
  • Check relevant reference guide for detailed protocol
  • Use example scripts as templates
  • See troubleshooting guide for common issues
  • All statistical implementations in statistical_analysis.md
  • 从本文档顶部的决策树开始
  • 查看相关参考指南获取详细方案
  • 使用示例脚本作为模板
  • 查看故障排除指南解决常见问题
  • 所有统计实现均在statistical_analysis.md中