instrument-data-to-allotrope

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Instrument Data to Allotrope Converter

仪器数据转Allotrope转换器

Convert instrument files into standardized Allotrope Simple Model (ASM) format for LIMS upload, data lakes, or handoff to data engineering teams.
Note: This is an Example Skill
This skill demonstrates how skills can support your data engineering tasks—automating schema transformations, parsing instrument outputs, and generating production-ready code.
To customize for your organization:
  • Modify the
    references/
    files to include your company's specific schemas or ontology mappings
  • Use an MCP server to connect to systems that define your schemas (e.g., your LIMS, data catalog, or schema registry)
  • Extend the
    scripts/
    to handle proprietary instrument formats or internal data standards
This pattern can be adapted for any data transformation workflow where you need to convert between formats or validate against organizational standards.
将仪器文件转换为标准化的Allotrope Simple Model(ASM)格式,用于LIMS上传、数据湖存储,或交付给数据工程团队。
注意:这是一个示例技能
本技能展示了技能如何支持你的数据工程任务——自动化 schema 转换、解析仪器输出,以及生成可用于生产环境的代码。
为你的组织自定义:
  • 修改
    references/
    文件夹中的文件,加入公司特定的 schema 或本体映射
  • 使用MCP服务器连接定义你方 schema 的系统(如LIMS、数据目录或schema注册中心)
  • 扩展
    scripts/
    脚本以处理专有仪器格式或内部数据标准
此模式可适配任何需要在格式间转换或根据组织标准验证的数据转换工作流。

Workflow Overview

工作流概述

  1. Detect instrument type from file contents (auto-detect or user-specified)
  2. Parse file using allotropy library (native) or flexible fallback parser
  3. Generate outputs:
    • ASM JSON (full semantic structure)
    • Flattened CSV (2D tabular format)
    • Python parser code (for data engineer handoff)
  4. Deliver files with summary and usage instructions
When Uncertain: If you're unsure how to map a field to ASM (e.g., is this raw data or calculated? device setting or environmental condition?), ask the user for clarification. Refer to
references/field_classification_guide.md
for guidance, but when ambiguity remains, confirm with the user rather than guessing.
  1. 检测仪器类型:从文件内容自动检测或由用户指定
  2. 解析文件:使用allotropy库(原生方式)或灵活的备用解析器
  3. 生成输出
    • ASM JSON(完整语义结构)
    • 扁平化CSV(二维表格格式)
    • Python解析器代码(交付给数据工程师)
  4. 交付:提供文件及使用说明摘要
存在疑问时: 如果你不确定如何将某个字段映射到ASM(例如:这是原始数据还是计算值?是设备设置还是环境条件?),请向用户确认。可参考
references/field_classification_guide.md
获取指导,但如果仍存在歧义,请与用户确认,不要猜测。

Quick Start

快速开始

python
undefined
python
undefined

Install requirements first

先安装依赖

pip install allotropy pandas openpyxl pdfplumber --break-system-packages
pip install allotropy pandas openpyxl pdfplumber --break-system-packages

Core conversion

核心转换代码

from allotropy.parser_factory import Vendor from allotropy.to_allotrope import allotrope_from_file
from allotropy.parser_factory import Vendor from allotropy.to_allotrope import allotrope_from_file

Convert with allotropy

使用allotropy进行转换

asm = allotrope_from_file("instrument_data.csv", Vendor.BECKMAN_VI_CELL_BLU)
undefined
asm = allotrope_from_file("instrument_data.csv", Vendor.BECKMAN_VI_CELL_BLU)
undefined

Output Format Selection

输出格式选择

ASM JSON (default) - Full semantic structure with ontology URIs
  • Best for: LIMS systems expecting ASM, data lakes, long-term archival
  • Validates against Allotrope schemas
Flattened CSV - 2D tabular representation
  • Best for: Quick analysis, Excel users, systems without JSON support
  • Each measurement becomes one row with metadata repeated
Both - Generate both formats for maximum flexibility
ASM JSON(默认) - 带有本体URI的完整语义结构
  • 最佳适用场景:期望ASM格式的LIMS系统、数据湖、长期归档
  • 会根据Allotrope schema进行验证
扁平化CSV - 二维表格表示
  • 最佳适用场景:快速分析、Excel用户、不支持JSON的系统
  • 每个测量值为一行,元数据重复显示
同时生成两种格式 - 生成两种格式以获得最大灵活性

Calculated Data Handling

计算数据处理

IMPORTANT: Separate raw measurements from calculated/derived values.
  • Raw data
    measurement-document
    (direct instrument readings)
  • Calculated data
    calculated-data-aggregate-document
    (derived values)
Calculated values MUST include traceability via
data-source-aggregate-document
:
json
"calculated-data-aggregate-document": {
  "calculated-data-document": [{
    "calculated-data-identifier": "SAMPLE_B1_DIN_001",
    "calculated-data-name": "DNA integrity number",
    "calculated-result": {"value": 9.5, "unit": "(unitless)"},
    "data-source-aggregate-document": {
      "data-source-document": [{
        "data-source-identifier": "SAMPLE_B1_MEASUREMENT",
        "data-source-feature": "electrophoresis trace"
      }]
    }
  }]
}
Common calculated fields by instrument type:
InstrumentCalculated Fields
Cell counterViability %, cell density dilution-adjusted values
SpectrophotometerConcentration (from absorbance), 260/280 ratio
Plate readerConcentrations from standard curve, %CV
ElectrophoresisDIN/RIN, region concentrations, average sizes
qPCRRelative quantities, fold change
See
references/field_classification_guide.md
for detailed guidance on raw vs. calculated classification.
重要提示: 将原始测量值与计算/衍生值分开。
  • 原始数据
    measurement-document
    (仪器直接读数)
  • 计算数据
    calculated-data-aggregate-document
    (衍生值)
计算值必须通过
data-source-aggregate-document
提供可追溯性:
json
"calculated-data-aggregate-document": {
  "calculated-data-document": [{
    "calculated-data-identifier": "SAMPLE_B1_DIN_001",
    "calculated-data-name": "DNA完整性数值",
    "calculated-result": {"value": 9.5, "unit": "(无单位)"},
    "data-source-aggregate-document": {
      "data-source-document": [{
        "data-source-identifier": "SAMPLE_B1_MEASUREMENT",
        "data-source-feature": "电泳轨迹"
      }]
    }
  }]
}
按仪器类型划分的常见计算字段:
仪器类型计算字段
细胞计数器存活率百分比、经稀释调整的细胞密度值
分光光度计浓度(来自吸光度)、260/280比值
酶标仪标准曲线计算浓度、%CV
电泳仪DIN/RIN值、区域浓度、平均大小
qPCR相对定量值、倍数变化
详细的原始数据与计算数据分类指导,请查看
references/field_classification_guide.md

Validation

验证

Always validate ASM output before delivering to the user:
bash
python scripts/validate_asm.py output.json
python scripts/validate_asm.py output.json --reference known_good.json  # Compare to reference
python scripts/validate_asm.py output.json --strict  # Treat warnings as errors
Validation Rules:
Soft Validation Approach: Unknown techniques, units, or sample roles generate warnings (not errors) to allow for forward compatibility. If Allotrope adds new values after December 2024, the validator won't block them—it will flag them for manual verification. Use
--strict
mode to treat warnings as errors if you need stricter validation.
What it checks:
  • Correct technique selection (e.g., multi-analyte profiling vs plate reader)
  • Field naming conventions (space-separated, not hyphenated)
  • Calculated data has traceability (
    data-source-aggregate-document
    )
  • Unique identifiers exist for measurements and calculated values
  • Required metadata present
  • Valid units and sample roles (with soft validation for unknown values)
在交付给用户之前,务必验证ASM输出:
bash
python scripts/validate_asm.py output.json
python scripts/validate_asm.py output.json --reference known_good.json  # 与参考文件对比
python scripts/validate_asm.py output.json --strict  # 将警告视为错误
验证规则:
软验证方式: 未知技术、单位或样本角色会生成警告(而非错误),以实现向前兼容性。如果Allotrope在2024年12月之后新增了值,验证器不会阻止它们——只会标记出来以便手动验证。如果需要更严格的验证,请使用
--strict
模式将警告视为错误。
验证内容:
  • 技术选择是否正确(如多分析物分析 vs 酶标仪)
  • 字段命名规范(空格分隔,而非连字符)
  • 计算数据是否具备可追溯性(
    data-source-aggregate-document
  • 测量值和计算值是否存在唯一标识符
  • 是否存在必填元数据
  • 单位和样本角色是否有效(对未知值采用软验证)

Supported Instruments

支持的仪器

See
references/supported_instruments.md
for complete list. Key instruments:
CategoryInstruments
Cell CountingVi-CELL BLU, Vi-CELL XR, NucleoCounter
SpectrophotometryNanoDrop One/Eight/8000, Lunatic
Plate ReadersSoftMax Pro, EnVision, Gen5, CLARIOstar
ELISASoftMax Pro, BMG MARS, MSD Workbench
qPCRQuantStudio, Bio-Rad CFX
ChromatographyEmpower, Chromeleon
完整列表请查看
references/supported_instruments.md
。主要支持的仪器:
类别仪器
细胞计数Vi-CELL BLU、Vi-CELL XR、NucleoCounter
分光光度法NanoDrop One/Eight/8000、Lunatic
酶标仪SoftMax Pro、EnVision、Gen5、CLARIOstar
ELISASoftMax Pro、BMG MARS、MSD Workbench
qPCRQuantStudio、Bio-Rad CFX
色谱法Empower、Chromeleon

Detection & Parsing Strategy

检测与解析策略

Tier 1: Native allotropy parsing (PREFERRED)

第一层级:原生allotropy解析(首选)

Always try allotropy first. Check available vendors directly:
python
from allotropy.parser_factory import Vendor
始终优先尝试allotropy。直接检查可用的供应商:
python
from allotropy.parser_factory import Vendor

List all supported vendors

列出所有支持的供应商

for v in Vendor: print(f"{v.name}")
for v in Vendor: print(f"{v.name}")

Common vendors:

常见供应商:

AGILENT_TAPESTATION_ANALYSIS (for TapeStation XML)

AGILENT_TAPESTATION_ANALYSIS (适用于TapeStation XML)

BECKMAN_VI_CELL_BLU

BECKMAN_VI_CELL_BLU

THERMO_FISHER_NANODROP_EIGHT

THERMO_FISHER_NANODROP_EIGHT

MOLDEV_SOFTMAX_PRO

MOLDEV_SOFTMAX_PRO

APPBIO_QUANTSTUDIO

APPBIO_QUANTSTUDIO

... many more

... 更多供应商


**When the user provides a file, check if allotropy supports it before falling back to manual parsing.** The `scripts/convert_to_asm.py` auto-detection only covers a subset of allotropy vendors.

**当用户提供文件时,在使用手动解析作为备用方案之前,请先检查allotropy是否支持该仪器。** `scripts/convert_to_asm.py` 的自动检测仅覆盖allotropy供应商的一个子集。

Tier 2: Flexible fallback parsing

第二层级:灵活的备用解析

Only use if allotropy doesn't support the instrument. This fallback:
  • Does NOT generate
    calculated-data-aggregate-document
  • Does NOT include full traceability
  • Produces simplified ASM structure
Use flexible parser with:
  • Column name fuzzy matching
  • Unit extraction from headers
  • Metadata extraction from file structure
仅当allotropy不支持该仪器时使用。此备用方案:
  • 不会生成
    calculated-data-aggregate-document
  • 不包含完整的可追溯性
  • 生成简化的ASM结构
使用灵活解析器的特性:
  • 列名模糊匹配
  • 从表头提取单位
  • 从文件结构提取元数据

Tier 3: PDF extraction

第三层级:PDF提取

For PDF-only files, extract tables using pdfplumber, then apply Tier 2 parsing.
对于仅有的PDF文件,使用pdfplumber提取表格,然后应用第二层级的解析。

Pre-Parsing Checklist

解析前检查清单

Before writing a custom parser, ALWAYS:
  1. Check if allotropy supports it - Use native parser if available
  2. Find a reference ASM file - Check
    references/examples/
    or ask user
  3. Review instrument-specific guide - Check
    references/instrument_guides/
  4. Validate against reference - Run
    validate_asm.py --reference <file>
在编写自定义解析器之前,务必:
  1. 检查allotropy是否支持 - 如果有原生解析器则优先使用
  2. 查找参考ASM文件 - 查看
    references/examples/
    或询问用户
  3. 查看仪器特定指南 - 查看
    references/instrument_guides/
  4. 与参考文件对比验证 - 运行
    validate_asm.py --reference <file>

Common Mistakes to Avoid

需避免的常见错误

MistakeCorrect Approach
Manifest as objectUse URL string
Lowercase detection typesUse "Absorbance" not "absorbance"
"emission wavelength setting"Use "detector wavelength setting" for emission
All measurements in one documentGroup by well/sample location
Missing procedure metadataExtract ALL device settings per measurement
错误正确做法
将清单作为对象使用URL字符串
检测类型使用小写使用“Absorbance”而非“absorbance”
使用“emission wavelength setting”对于发射光,使用“detector wavelength setting”
所有测量值放在一个文档中按孔/样本位置分组
缺少实验流程元数据提取每次测量的所有设备设置

Code Export for Data Engineers

为数据工程师导出代码

Generate standalone Python scripts that scientists can hand off:
python
undefined
生成可供科学家交付的独立Python脚本:
python
undefined

Export parser code

导出解析器代码

python scripts/export_parser.py --input "data.csv" --vendor "VI_CELL_BLU" --output "parser_script.py"

The exported script:
- Has no external dependencies beyond pandas/allotropy
- Includes inline documentation
- Can run in Jupyter notebooks
- Is production-ready for data pipelines
python scripts/export_parser.py --input "data.csv" --vendor "VI_CELL_BLU" --output "parser_script.py"

导出的脚本:
- 除了pandas和allotropy之外,无其他外部依赖
- 包含内联文档
- 可在Jupyter笔记本中运行
- 可直接用于生产数据流水线

File Structure

文件结构

instrument-data-to-allotrope/
├── SKILL.md                          # This file
├── scripts/
│   ├── convert_to_asm.py            # Main conversion script
│   ├── flatten_asm.py               # ASM → 2D CSV conversion
│   ├── export_parser.py             # Generate standalone parser code
│   └── validate_asm.py              # Validate ASM output quality
└── references/
    ├── supported_instruments.md     # Full instrument list with Vendor enums
    ├── asm_schema_overview.md       # ASM structure reference
    ├── field_classification_guide.md # Where to put different field types
    └── flattening_guide.md          # How flattening works
instrument-data-to-allotrope/
├── SKILL.md                          # 本文件
├── scripts/
│   ├── convert_to_asm.py            # 主转换脚本
│   ├── flatten_asm.py               # ASM → 二维CSV转换脚本
│   ├── export_parser.py             # 生成独立解析器代码
│   └── validate_asm.py              # 验证ASM输出质量
└── references/
    ├── supported_instruments.md     # 完整仪器列表及Vendor枚举
    ├── asm_schema_overview.md       # ASM结构参考
    ├── field_classification_guide.md # 不同字段类型的放置指导
    └── flattening_guide.md          # 扁平化规则说明

Usage Examples

使用示例

Example 1: Vi-CELL BLU file

示例1:Vi-CELL BLU文件

User: "Convert this cell counting data to Allotrope format"
[uploads viCell_Results.xlsx]

Claude:
1. Detects Vi-CELL BLU (95% confidence)
2. Converts using allotropy native parser
3. Outputs:
   - viCell_Results_asm.json (full ASM)
   - viCell_Results_flat.csv (2D format)
   - viCell_parser.py (exportable code)
用户:“将这份细胞计数数据转换为Allotrope格式”
[上传viCell_Results.xlsx]

Claude操作:
1. 检测到Vi-CELL BLU(95%置信度)
2. 使用allotropy原生解析器进行转换
3. 输出:
   - viCell_Results_asm.json(完整ASM)
   - viCell_Results_flat.csv(二维格式)
   - viCell_parser.py(可导出代码)

Example 2: Request for code handoff

示例2:代码交付请求

User: "I need to give our data engineer code to parse NanoDrop files"

Claude:
1. Generates self-contained Python script
2. Includes sample input/output
3. Documents all assumptions
4. Provides Jupyter notebook version
用户:“我需要给我们的数据工程师提供解析NanoDrop文件的代码”

Claude操作:
1. 生成独立的Python脚本
2. 包含示例输入/输出
3. 记录所有假设
4. 提供Jupyter笔记本版本

Example 3: LIMS-ready flattened output

示例3:适用于LIMS的扁平化输出

User: "Convert this ELISA data to a CSV I can upload to our LIMS"

Claude:
1. Parses plate reader data
2. Generates flattened CSV with columns:
   - sample_identifier, well_position, measurement_value, measurement_unit
   - instrument_serial_number, analysis_datetime, assay_type
3. Validates against common LIMS import requirements
用户:“将这份ELISA数据转换为可上传至我们LIMS的CSV格式”

Claude操作:
1. 解析酶标仪数据
2. 生成包含以下列的扁平化CSV:
   - sample_identifier, well_position, measurement_value, measurement_unit
   - instrument_serial_number, analysis_datetime, assay_type
3. 根据常见LIMS导入要求进行验证

Implementation Notes

实现说明

Installing allotropy

安装allotropy

bash
pip install allotropy --break-system-packages
bash
pip install allotropy --break-system-packages

Handling parse failures

处理解析失败

If allotropy native parsing fails:
  1. Log the error for debugging
  2. Fall back to flexible parser
  3. Report reduced metadata completeness to user
  4. Suggest exporting different format from instrument
如果allotropy原生解析失败:
  1. 记录错误以便调试
  2. 切换到灵活解析器
  3. 向用户报告元数据完整性降低
  4. 建议从仪器导出其他格式

ASM Schema Validation

ASM Schema验证

Validate output against Allotrope schemas when available:
python
import jsonschema
当有可用的Allotrope schema时,验证输出:
python
import jsonschema

Schema URLs in references/asm_schema_overview.md

Schema URL可在references/asm_schema_overview.md中找到

undefined
undefined