pyopenms

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PyOpenMS

PyOpenMS

Overview

概述

PyOpenMS provides Python bindings to the OpenMS library for computational mass spectrometry, enabling analysis of proteomics and metabolomics data. Use for handling mass spectrometry file formats, processing spectral data, detecting features, identifying peptides/proteins, and performing quantitative analysis.
PyOpenMS为计算质谱分析提供了OpenMS库的Python绑定,支持蛋白质组学和代谢组学数据分析。可用于处理质谱文件格式、处理光谱数据、检测特征、鉴定肽段/蛋白质以及执行定量分析。

Installation

安装

Install using uv:
bash
uv uv pip install pyopenms
Verify installation:
python
import pyopenms
print(pyopenms.__version__)
使用uv进行安装:
bash
uv uv pip install pyopenms
验证安装:
python
import pyopenms
print(pyopenms.__version__)

Core Capabilities

核心功能

PyOpenMS organizes functionality into these domains:
PyOpenMS的功能分为以下几个领域:

1. File I/O and Data Formats

1. 文件输入输出与数据格式

Handle mass spectrometry file formats and convert between representations.
Supported formats: mzML, mzXML, TraML, mzTab, FASTA, pepXML, protXML, mzIdentML, featureXML, consensusXML, idXML
Basic file reading:
python
import pyopenms as ms
处理质谱文件格式并在不同表示形式之间转换。
支持的格式:mzML、mzXML、TraML、mzTab、FASTA、pepXML、protXML、mzIdentML、featureXML、consensusXML、idXML
基础文件读取:
python
import pyopenms as ms

Read mzML file

读取mzML文件

exp = ms.MSExperiment() ms.MzMLFile().load("data.mzML", exp)
exp = ms.MSExperiment() ms.MzMLFile().load("data.mzML", exp)

Access spectra

访问光谱数据

for spectrum in exp: mz, intensity = spectrum.get_peaks() print(f"Spectrum: {len(mz)} peaks")

**For detailed file handling**: See `references/file_io.md`
for spectrum in exp: mz, intensity = spectrum.get_peaks() print(f"Spectrum: {len(mz)} peaks")

**详细文件处理说明**:参见`references/file_io.md`

2. Signal Processing

2. 信号处理

Process raw spectral data with smoothing, filtering, centroiding, and normalization.
Basic spectrum processing:
python
undefined
对原始光谱数据进行平滑、过滤、centroiding和归一化处理。
基础光谱处理:
python
undefined

Smooth spectrum with Gaussian filter

使用高斯滤波器平滑光谱

gaussian = ms.GaussFilter() params = gaussian.getParameters() params.setValue("gaussian_width", 0.1) gaussian.setParameters(params) gaussian.filterExperiment(exp)

**For algorithm details**: See `references/signal_processing.md`
gaussian = ms.GaussFilter() params = gaussian.getParameters() params.setValue("gaussian_width", 0.1) gaussian.setParameters(params) gaussian.filterExperiment(exp)

**算法详情**:参见`references/signal_processing.md`

3. Feature Detection

3. 特征检测

Detect and link features across spectra and samples for quantitative analysis.
python
undefined
检测并关联不同光谱和样本中的特征,用于定量分析。
python
undefined

Detect features

检测特征

ff = ms.FeatureFinder() ff.run("centroided", exp, features, params, ms.FeatureMap())

**For complete workflows**: See `references/feature_detection.md`
ff = ms.FeatureFinder() ff.run("centroided", exp, features, params, ms.FeatureMap())

**完整工作流程**:参见`references/feature_detection.md`

4. Peptide and Protein Identification

4. 肽段与蛋白质鉴定

Integrate with search engines and process identification results.
Supported engines: Comet, Mascot, MSGFPlus, XTandem, OMSSA, Myrimatch
Basic identification workflow:
python
undefined
与搜索引擎集成并处理鉴定结果。
支持的引擎:Comet、Mascot、MSGFPlus、XTandem、OMSSA、Myrimatch
基础鉴定工作流程:
python
undefined

Load identification data

加载鉴定数据

protein_ids = [] peptide_ids = [] ms.IdXMLFile().load("identifications.idXML", protein_ids, peptide_ids)
protein_ids = [] peptide_ids = [] ms.IdXMLFile().load("identifications.idXML", protein_ids, peptide_ids)

Apply FDR filtering

应用FDR过滤

fdr = ms.FalseDiscoveryRate() fdr.apply(peptide_ids)

**For detailed workflows**: See `references/identification.md`
fdr = ms.FalseDiscoveryRate() fdr.apply(peptide_ids)

**详细工作流程**:参见`references/identification.md`

5. Metabolomics Analysis

5. 代谢组学分析

Perform untargeted metabolomics preprocessing and analysis.
Typical workflow:
  1. Load and process raw data
  2. Detect features
  3. Align retention times across samples
  4. Link features to consensus map
  5. Annotate with compound databases
For complete metabolomics workflows: See
references/metabolomics.md
执行非靶向代谢组学预处理和分析。
典型工作流程:
  1. 加载并处理原始数据
  2. 检测特征
  3. 对齐样本间的保留时间
  4. 将特征关联到共识图谱
  5. 利用化合物数据库进行注释
完整代谢组学工作流程:参见
references/metabolomics.md

Data Structures

数据结构

PyOpenMS uses these primary objects:
  • MSExperiment: Collection of spectra and chromatograms
  • MSSpectrum: Single mass spectrum with m/z and intensity pairs
  • MSChromatogram: Chromatographic trace
  • Feature: Detected chromatographic peak with quality metrics
  • FeatureMap: Collection of features
  • PeptideIdentification: Search results for peptides
  • ProteinIdentification: Search results for proteins
For detailed documentation: See
references/data_structures.md
PyOpenMS使用以下主要对象:
  • MSExperiment:光谱和色谱图的集合
  • MSSpectrum:包含m/z和强度对的单个质谱图
  • MSChromatogram:色谱轨迹
  • Feature:带有质量指标的检测到的色谱峰
  • FeatureMap:特征的集合
  • PeptideIdentification:肽段的搜索结果
  • ProteinIdentification:蛋白质的搜索结果
详细文档:参见
references/data_structures.md

Common Workflows

常见工作流程

Quick Start: Load and Explore Data

快速入门:加载并探索数据

python
import pyopenms as ms
python
import pyopenms as ms

Load mzML file

加载mzML文件

exp = ms.MSExperiment() ms.MzMLFile().load("sample.mzML", exp)
exp = ms.MSExperiment() ms.MzMLFile().load("sample.mzML", exp)

Get basic statistics

获取基本统计信息

print(f"Number of spectra: {exp.getNrSpectra()}") print(f"Number of chromatograms: {exp.getNrChromatograms()}")
print(f"Number of spectra: {exp.getNrSpectra()}") print(f"Number of chromatograms: {exp.getNrChromatograms()}")

Examine first spectrum

检查第一个光谱

spec = exp.getSpectrum(0) print(f"MS level: {spec.getMSLevel()}") print(f"Retention time: {spec.getRT()}") mz, intensity = spec.get_peaks() print(f"Peaks: {len(mz)}")
undefined
spec = exp.getSpectrum(0) print(f"MS level: {spec.getMSLevel()}") print(f"Retention time: {spec.getRT()}") mz, intensity = spec.get_peaks() print(f"Peaks: {len(mz)}")
undefined

Parameter Management

参数管理

Most algorithms use a parameter system:
python
undefined
大多数算法使用参数系统:
python
undefined

Get algorithm parameters

获取算法参数

algo = ms.GaussFilter() params = algo.getParameters()
algo = ms.GaussFilter() params = algo.getParameters()

View available parameters

查看可用参数

for param in params.keys(): print(f"{param}: {params.getValue(param)}")
for param in params.keys(): print(f"{param}: {params.getValue(param)}")

Modify parameters

修改参数

params.setValue("gaussian_width", 0.2) algo.setParameters(params)
undefined
params.setValue("gaussian_width", 0.2) algo.setParameters(params)
undefined

Export to Pandas

导出到Pandas

Convert data to pandas DataFrames for analysis:
python
import pyopenms as ms
import pandas as pd
将数据转换为pandas DataFrame进行分析:
python
import pyopenms as ms
import pandas as pd

Load feature map

加载特征图谱

fm = ms.FeatureMap() ms.FeatureXMLFile().load("features.featureXML", fm)
fm = ms.FeatureMap() ms.FeatureXMLFile().load("features.featureXML", fm)

Convert to DataFrame

转换为DataFrame

df = fm.get_df() print(df.head())
undefined
df = fm.get_df() print(df.head())
undefined

Integration with Other Tools

与其他工具的集成

PyOpenMS integrates with:
  • Pandas: Export data to DataFrames
  • NumPy: Work with peak arrays
  • Scikit-learn: Machine learning on MS data
  • Matplotlib/Seaborn: Visualization
  • R: Via rpy2 bridge
PyOpenMS可与以下工具集成:
  • Pandas:将数据导出为DataFrame
  • NumPy:处理峰数组
  • Scikit-learn:对质谱数据进行机器学习
  • Matplotlib/Seaborn:可视化
  • R:通过rpy2桥接

Resources

资源

References

参考资料

  • references/file_io.md
    - Comprehensive file format handling
  • references/signal_processing.md
    - Signal processing algorithms
  • references/feature_detection.md
    - Feature detection and linking
  • references/identification.md
    - Peptide and protein identification
  • references/metabolomics.md
    - Metabolomics-specific workflows
  • references/data_structures.md
    - Core objects and data structures
  • references/file_io.md
    - 全面的文件格式处理
  • references/signal_processing.md
    - 信号处理算法
  • references/feature_detection.md
    - 特征检测与关联
  • references/identification.md
    - 肽段与蛋白质鉴定
  • references/metabolomics.md
    - 代谢组学专属工作流程
  • references/data_structures.md
    - 核心对象与数据结构