instrument-data-to-allotrope
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseInstrument Data to Allotrope Converter
仪器数据转Allotrope转换器
Convert instrument files into standardized Allotrope Simple Model (ASM) format for LIMS upload, data lakes, or handoff to data engineering teams.
Note: This is an Example SkillThis skill demonstrates how skills can support your data engineering tasks—automating schema transformations, parsing instrument outputs, and generating production-ready code.To customize for your organization:
- Modify the
files to include your company's specific schemas or ontology mappingsreferences/- Use an MCP server to connect to systems that define your schemas (e.g., your LIMS, data catalog, or schema registry)
- Extend the
to handle proprietary instrument formats or internal data standardsscripts/This pattern can be adapted for any data transformation workflow where you need to convert between formats or validate against organizational standards.
将仪器文件转换为标准化的Allotrope Simple Model(ASM)格式,用于LIMS上传、数据湖存储,或交付给数据工程团队。
注意:这是一个示例技能本技能展示了技能如何支持你的数据工程任务——自动化 schema 转换、解析仪器输出,以及生成可用于生产环境的代码。为你的组织自定义:
- 修改
文件夹中的文件,加入公司特定的 schema 或本体映射references/- 使用MCP服务器连接定义你方 schema 的系统(如LIMS、数据目录或schema注册中心)
- 扩展
脚本以处理专有仪器格式或内部数据标准scripts/此模式可适配任何需要在格式间转换或根据组织标准验证的数据转换工作流。
Workflow Overview
工作流概述
- Detect instrument type from file contents (auto-detect or user-specified)
- Parse file using allotropy library (native) or flexible fallback parser
- Generate outputs:
- ASM JSON (full semantic structure)
- Flattened CSV (2D tabular format)
- Python parser code (for data engineer handoff)
- Deliver files with summary and usage instructions
When Uncertain: If you're unsure how to map a field to ASM (e.g., is this raw data or calculated? device setting or environmental condition?), ask the user for clarification. Refer tofor guidance, but when ambiguity remains, confirm with the user rather than guessing.references/field_classification_guide.md
- 检测仪器类型:从文件内容自动检测或由用户指定
- 解析文件:使用allotropy库(原生方式)或灵活的备用解析器
- 生成输出:
- ASM JSON(完整语义结构)
- 扁平化CSV(二维表格格式)
- Python解析器代码(交付给数据工程师)
- 交付:提供文件及使用说明摘要
存在疑问时: 如果你不确定如何将某个字段映射到ASM(例如:这是原始数据还是计算值?是设备设置还是环境条件?),请向用户确认。可参考获取指导,但如果仍存在歧义,请与用户确认,不要猜测。references/field_classification_guide.md
Quick Start
快速开始
python
undefinedpython
undefinedInstall requirements first
先安装依赖
pip install allotropy pandas openpyxl pdfplumber --break-system-packages
pip install allotropy pandas openpyxl pdfplumber --break-system-packages
Core conversion
核心转换代码
from allotropy.parser_factory import Vendor
from allotropy.to_allotrope import allotrope_from_file
from allotropy.parser_factory import Vendor
from allotropy.to_allotrope import allotrope_from_file
Convert with allotropy
使用allotropy进行转换
asm = allotrope_from_file("instrument_data.csv", Vendor.BECKMAN_VI_CELL_BLU)
undefinedasm = allotrope_from_file("instrument_data.csv", Vendor.BECKMAN_VI_CELL_BLU)
undefinedOutput Format Selection
输出格式选择
ASM JSON (default) - Full semantic structure with ontology URIs
- Best for: LIMS systems expecting ASM, data lakes, long-term archival
- Validates against Allotrope schemas
Flattened CSV - 2D tabular representation
- Best for: Quick analysis, Excel users, systems without JSON support
- Each measurement becomes one row with metadata repeated
Both - Generate both formats for maximum flexibility
ASM JSON(默认) - 带有本体URI的完整语义结构
- 最佳适用场景:期望ASM格式的LIMS系统、数据湖、长期归档
- 会根据Allotrope schema进行验证
扁平化CSV - 二维表格表示
- 最佳适用场景:快速分析、Excel用户、不支持JSON的系统
- 每个测量值为一行,元数据重复显示
同时生成两种格式 - 生成两种格式以获得最大灵活性
Calculated Data Handling
计算数据处理
IMPORTANT: Separate raw measurements from calculated/derived values.
- Raw data → (direct instrument readings)
measurement-document - Calculated data → (derived values)
calculated-data-aggregate-document
Calculated values MUST include traceability via :
data-source-aggregate-documentjson
"calculated-data-aggregate-document": {
"calculated-data-document": [{
"calculated-data-identifier": "SAMPLE_B1_DIN_001",
"calculated-data-name": "DNA integrity number",
"calculated-result": {"value": 9.5, "unit": "(unitless)"},
"data-source-aggregate-document": {
"data-source-document": [{
"data-source-identifier": "SAMPLE_B1_MEASUREMENT",
"data-source-feature": "electrophoresis trace"
}]
}
}]
}Common calculated fields by instrument type:
| Instrument | Calculated Fields |
|---|---|
| Cell counter | Viability %, cell density dilution-adjusted values |
| Spectrophotometer | Concentration (from absorbance), 260/280 ratio |
| Plate reader | Concentrations from standard curve, %CV |
| Electrophoresis | DIN/RIN, region concentrations, average sizes |
| qPCR | Relative quantities, fold change |
See for detailed guidance on raw vs. calculated classification.
references/field_classification_guide.md重要提示: 将原始测量值与计算/衍生值分开。
- 原始数据 → (仪器直接读数)
measurement-document - 计算数据 → (衍生值)
calculated-data-aggregate-document
计算值必须通过 提供可追溯性:
data-source-aggregate-documentjson
"calculated-data-aggregate-document": {
"calculated-data-document": [{
"calculated-data-identifier": "SAMPLE_B1_DIN_001",
"calculated-data-name": "DNA完整性数值",
"calculated-result": {"value": 9.5, "unit": "(无单位)"},
"data-source-aggregate-document": {
"data-source-document": [{
"data-source-identifier": "SAMPLE_B1_MEASUREMENT",
"data-source-feature": "电泳轨迹"
}]
}
}]
}按仪器类型划分的常见计算字段:
| 仪器类型 | 计算字段 |
|---|---|
| 细胞计数器 | 存活率百分比、经稀释调整的细胞密度值 |
| 分光光度计 | 浓度(来自吸光度)、260/280比值 |
| 酶标仪 | 标准曲线计算浓度、%CV |
| 电泳仪 | DIN/RIN值、区域浓度、平均大小 |
| qPCR | 相对定量值、倍数变化 |
详细的原始数据与计算数据分类指导,请查看 。
references/field_classification_guide.mdValidation
验证
Always validate ASM output before delivering to the user:
bash
python scripts/validate_asm.py output.json
python scripts/validate_asm.py output.json --reference known_good.json # Compare to reference
python scripts/validate_asm.py output.json --strict # Treat warnings as errorsValidation Rules:
- Based on Allotrope ASM specification (December 2024)
- Last updated: 2026-01-07
- Source: https://gitlab.com/allotrope-public/asm
Soft Validation Approach:
Unknown techniques, units, or sample roles generate warnings (not errors) to allow for forward compatibility. If Allotrope adds new values after December 2024, the validator won't block them—it will flag them for manual verification. Use mode to treat warnings as errors if you need stricter validation.
--strictWhat it checks:
- Correct technique selection (e.g., multi-analyte profiling vs plate reader)
- Field naming conventions (space-separated, not hyphenated)
- Calculated data has traceability ()
data-source-aggregate-document - Unique identifiers exist for measurements and calculated values
- Required metadata present
- Valid units and sample roles (with soft validation for unknown values)
在交付给用户之前,务必验证ASM输出:
bash
python scripts/validate_asm.py output.json
python scripts/validate_asm.py output.json --reference known_good.json # 与参考文件对比
python scripts/validate_asm.py output.json --strict # 将警告视为错误验证规则:
- 基于Allotrope ASM规范(2024年12月版)
- 最后更新时间:2026-01-07
- 来源:https://gitlab.com/allotrope-public/asm
软验证方式:
未知技术、单位或样本角色会生成警告(而非错误),以实现向前兼容性。如果Allotrope在2024年12月之后新增了值,验证器不会阻止它们——只会标记出来以便手动验证。如果需要更严格的验证,请使用 模式将警告视为错误。
--strict验证内容:
- 技术选择是否正确(如多分析物分析 vs 酶标仪)
- 字段命名规范(空格分隔,而非连字符)
- 计算数据是否具备可追溯性()
data-source-aggregate-document - 测量值和计算值是否存在唯一标识符
- 是否存在必填元数据
- 单位和样本角色是否有效(对未知值采用软验证)
Supported Instruments
支持的仪器
See for complete list. Key instruments:
references/supported_instruments.md| Category | Instruments |
|---|---|
| Cell Counting | Vi-CELL BLU, Vi-CELL XR, NucleoCounter |
| Spectrophotometry | NanoDrop One/Eight/8000, Lunatic |
| Plate Readers | SoftMax Pro, EnVision, Gen5, CLARIOstar |
| ELISA | SoftMax Pro, BMG MARS, MSD Workbench |
| qPCR | QuantStudio, Bio-Rad CFX |
| Chromatography | Empower, Chromeleon |
完整列表请查看 。主要支持的仪器:
references/supported_instruments.md| 类别 | 仪器 |
|---|---|
| 细胞计数 | Vi-CELL BLU、Vi-CELL XR、NucleoCounter |
| 分光光度法 | NanoDrop One/Eight/8000、Lunatic |
| 酶标仪 | SoftMax Pro、EnVision、Gen5、CLARIOstar |
| ELISA | SoftMax Pro、BMG MARS、MSD Workbench |
| qPCR | QuantStudio、Bio-Rad CFX |
| 色谱法 | Empower、Chromeleon |
Detection & Parsing Strategy
检测与解析策略
Tier 1: Native allotropy parsing (PREFERRED)
第一层级:原生allotropy解析(首选)
Always try allotropy first. Check available vendors directly:
python
from allotropy.parser_factory import Vendor始终优先尝试allotropy。直接检查可用的供应商:
python
from allotropy.parser_factory import VendorList all supported vendors
列出所有支持的供应商
for v in Vendor:
print(f"{v.name}")
for v in Vendor:
print(f"{v.name}")
Common vendors:
常见供应商:
AGILENT_TAPESTATION_ANALYSIS (for TapeStation XML)
AGILENT_TAPESTATION_ANALYSIS (适用于TapeStation XML)
BECKMAN_VI_CELL_BLU
BECKMAN_VI_CELL_BLU
THERMO_FISHER_NANODROP_EIGHT
THERMO_FISHER_NANODROP_EIGHT
MOLDEV_SOFTMAX_PRO
MOLDEV_SOFTMAX_PRO
APPBIO_QUANTSTUDIO
APPBIO_QUANTSTUDIO
... many more
... 更多供应商
**When the user provides a file, check if allotropy supports it before falling back to manual parsing.** The `scripts/convert_to_asm.py` auto-detection only covers a subset of allotropy vendors.
**当用户提供文件时,在使用手动解析作为备用方案之前,请先检查allotropy是否支持该仪器。** `scripts/convert_to_asm.py` 的自动检测仅覆盖allotropy供应商的一个子集。Tier 2: Flexible fallback parsing
第二层级:灵活的备用解析
Only use if allotropy doesn't support the instrument. This fallback:
- Does NOT generate
calculated-data-aggregate-document - Does NOT include full traceability
- Produces simplified ASM structure
Use flexible parser with:
- Column name fuzzy matching
- Unit extraction from headers
- Metadata extraction from file structure
仅当allotropy不支持该仪器时使用。此备用方案:
- 不会生成
calculated-data-aggregate-document - 不包含完整的可追溯性
- 生成简化的ASM结构
使用灵活解析器的特性:
- 列名模糊匹配
- 从表头提取单位
- 从文件结构提取元数据
Tier 3: PDF extraction
第三层级:PDF提取
For PDF-only files, extract tables using pdfplumber, then apply Tier 2 parsing.
对于仅有的PDF文件,使用pdfplumber提取表格,然后应用第二层级的解析。
Pre-Parsing Checklist
解析前检查清单
Before writing a custom parser, ALWAYS:
- Check if allotropy supports it - Use native parser if available
- Find a reference ASM file - Check or ask user
references/examples/ - Review instrument-specific guide - Check
references/instrument_guides/ - Validate against reference - Run
validate_asm.py --reference <file>
在编写自定义解析器之前,务必:
- 检查allotropy是否支持 - 如果有原生解析器则优先使用
- 查找参考ASM文件 - 查看 或询问用户
references/examples/ - 查看仪器特定指南 - 查看
references/instrument_guides/ - 与参考文件对比验证 - 运行
validate_asm.py --reference <file>
Common Mistakes to Avoid
需避免的常见错误
| Mistake | Correct Approach |
|---|---|
| Manifest as object | Use URL string |
| Lowercase detection types | Use "Absorbance" not "absorbance" |
| "emission wavelength setting" | Use "detector wavelength setting" for emission |
| All measurements in one document | Group by well/sample location |
| Missing procedure metadata | Extract ALL device settings per measurement |
| 错误 | 正确做法 |
|---|---|
| 将清单作为对象 | 使用URL字符串 |
| 检测类型使用小写 | 使用“Absorbance”而非“absorbance” |
| 使用“emission wavelength setting” | 对于发射光,使用“detector wavelength setting” |
| 所有测量值放在一个文档中 | 按孔/样本位置分组 |
| 缺少实验流程元数据 | 提取每次测量的所有设备设置 |
Code Export for Data Engineers
为数据工程师导出代码
Generate standalone Python scripts that scientists can hand off:
python
undefined生成可供科学家交付的独立Python脚本:
python
undefinedExport parser code
导出解析器代码
python scripts/export_parser.py --input "data.csv" --vendor "VI_CELL_BLU" --output "parser_script.py"
The exported script:
- Has no external dependencies beyond pandas/allotropy
- Includes inline documentation
- Can run in Jupyter notebooks
- Is production-ready for data pipelinespython scripts/export_parser.py --input "data.csv" --vendor "VI_CELL_BLU" --output "parser_script.py"
导出的脚本:
- 除了pandas和allotropy之外,无其他外部依赖
- 包含内联文档
- 可在Jupyter笔记本中运行
- 可直接用于生产数据流水线File Structure
文件结构
instrument-data-to-allotrope/
├── SKILL.md # This file
├── scripts/
│ ├── convert_to_asm.py # Main conversion script
│ ├── flatten_asm.py # ASM → 2D CSV conversion
│ ├── export_parser.py # Generate standalone parser code
│ └── validate_asm.py # Validate ASM output quality
└── references/
├── supported_instruments.md # Full instrument list with Vendor enums
├── asm_schema_overview.md # ASM structure reference
├── field_classification_guide.md # Where to put different field types
└── flattening_guide.md # How flattening worksinstrument-data-to-allotrope/
├── SKILL.md # 本文件
├── scripts/
│ ├── convert_to_asm.py # 主转换脚本
│ ├── flatten_asm.py # ASM → 二维CSV转换脚本
│ ├── export_parser.py # 生成独立解析器代码
│ └── validate_asm.py # 验证ASM输出质量
└── references/
├── supported_instruments.md # 完整仪器列表及Vendor枚举
├── asm_schema_overview.md # ASM结构参考
├── field_classification_guide.md # 不同字段类型的放置指导
└── flattening_guide.md # 扁平化规则说明Usage Examples
使用示例
Example 1: Vi-CELL BLU file
示例1:Vi-CELL BLU文件
User: "Convert this cell counting data to Allotrope format"
[uploads viCell_Results.xlsx]
Claude:
1. Detects Vi-CELL BLU (95% confidence)
2. Converts using allotropy native parser
3. Outputs:
- viCell_Results_asm.json (full ASM)
- viCell_Results_flat.csv (2D format)
- viCell_parser.py (exportable code)用户:“将这份细胞计数数据转换为Allotrope格式”
[上传viCell_Results.xlsx]
Claude操作:
1. 检测到Vi-CELL BLU(95%置信度)
2. 使用allotropy原生解析器进行转换
3. 输出:
- viCell_Results_asm.json(完整ASM)
- viCell_Results_flat.csv(二维格式)
- viCell_parser.py(可导出代码)Example 2: Request for code handoff
示例2:代码交付请求
User: "I need to give our data engineer code to parse NanoDrop files"
Claude:
1. Generates self-contained Python script
2. Includes sample input/output
3. Documents all assumptions
4. Provides Jupyter notebook version用户:“我需要给我们的数据工程师提供解析NanoDrop文件的代码”
Claude操作:
1. 生成独立的Python脚本
2. 包含示例输入/输出
3. 记录所有假设
4. 提供Jupyter笔记本版本Example 3: LIMS-ready flattened output
示例3:适用于LIMS的扁平化输出
User: "Convert this ELISA data to a CSV I can upload to our LIMS"
Claude:
1. Parses plate reader data
2. Generates flattened CSV with columns:
- sample_identifier, well_position, measurement_value, measurement_unit
- instrument_serial_number, analysis_datetime, assay_type
3. Validates against common LIMS import requirements用户:“将这份ELISA数据转换为可上传至我们LIMS的CSV格式”
Claude操作:
1. 解析酶标仪数据
2. 生成包含以下列的扁平化CSV:
- sample_identifier, well_position, measurement_value, measurement_unit
- instrument_serial_number, analysis_datetime, assay_type
3. 根据常见LIMS导入要求进行验证Implementation Notes
实现说明
Installing allotropy
安装allotropy
bash
pip install allotropy --break-system-packagesbash
pip install allotropy --break-system-packagesHandling parse failures
处理解析失败
If allotropy native parsing fails:
- Log the error for debugging
- Fall back to flexible parser
- Report reduced metadata completeness to user
- Suggest exporting different format from instrument
如果allotropy原生解析失败:
- 记录错误以便调试
- 切换到灵活解析器
- 向用户报告元数据完整性降低
- 建议从仪器导出其他格式
ASM Schema Validation
ASM Schema验证
Validate output against Allotrope schemas when available:
python
import jsonschema当有可用的Allotrope schema时,验证输出:
python
import jsonschemaSchema URLs in references/asm_schema_overview.md
Schema URL可在references/asm_schema_overview.md中找到
undefinedundefined