pyopenms

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

PyOpenMS

Overview

概述

PyOpenMS provides Python bindings to the OpenMS library for computational mass spectrometry, enabling analysis of proteomics and metabolomics data. Use for handling mass spectrometry file formats, processing spectral data, detecting features, identifying peptides/proteins, and performing quantitative analysis.

PyOpenMS为计算质谱分析提供了OpenMS库的Python绑定，支持蛋白质组学和代谢组学数据分析。可用于处理质谱文件格式、处理光谱数据、检测特征、鉴定肽段/蛋白质以及执行定量分析。

Installation

安装

Install using uv:

bash

uv uv pip install pyopenms

Verify installation:

python

import pyopenms
print(pyopenms.__version__)

使用uv进行安装：

bash

uv uv pip install pyopenms

验证安装：

python

import pyopenms
print(pyopenms.__version__)

Core Capabilities

核心功能

PyOpenMS organizes functionality into these domains:

PyOpenMS的功能分为以下几个领域：

1. File I/O and Data Formats

1. 文件输入输出与数据格式

Handle mass spectrometry file formats and convert between representations.

Supported formats: mzML, mzXML, TraML, mzTab, FASTA, pepXML, protXML, mzIdentML, featureXML, consensusXML, idXML

Basic file reading:

python

import pyopenms as ms

处理质谱文件格式并在不同表示形式之间转换。

支持的格式：mzML、mzXML、TraML、mzTab、FASTA、pepXML、protXML、mzIdentML、featureXML、consensusXML、idXML

基础文件读取：

python

import pyopenms as ms

Read mzML file

读取mzML文件

exp = ms.MSExperiment() ms.MzMLFile().load("data.mzML", exp)

Access spectra

访问光谱数据

for spectrum in exp: mz, intensity = spectrum.get_peaks() print(f"Spectrum: {len(mz)} peaks")


**For detailed file handling**: See `references/file_io.md`

for spectrum in exp: mz, intensity = spectrum.get_peaks() print(f"Spectrum: {len(mz)} peaks")


**详细文件处理说明**：参见`references/file_io.md`

2. Signal Processing

2. 信号处理

Process raw spectral data with smoothing, filtering, centroiding, and normalization.

Basic spectrum processing:

python

undefined

对原始光谱数据进行平滑、过滤、centroiding和归一化处理。

基础光谱处理：

python

undefined

Smooth spectrum with Gaussian filter

使用高斯滤波器平滑光谱

gaussian = ms.GaussFilter() params = gaussian.getParameters() params.setValue("gaussian_width", 0.1) gaussian.setParameters(params) gaussian.filterExperiment(exp)


**For algorithm details**: See `references/signal_processing.md`

gaussian = ms.GaussFilter() params = gaussian.getParameters() params.setValue("gaussian_width", 0.1) gaussian.setParameters(params) gaussian.filterExperiment(exp)


**算法详情**：参见`references/signal_processing.md`

3. Feature Detection

3. 特征检测

Detect and link features across spectra and samples for quantitative analysis.

python

undefined

检测并关联不同光谱和样本中的特征，用于定量分析。

python

undefined

Detect features

检测特征

ff = ms.FeatureFinder() ff.run("centroided", exp, features, params, ms.FeatureMap())


**For complete workflows**: See `references/feature_detection.md`

ff = ms.FeatureFinder() ff.run("centroided", exp, features, params, ms.FeatureMap())


**完整工作流程**：参见`references/feature_detection.md`

4. Peptide and Protein Identification

4. 肽段与蛋白质鉴定

Integrate with search engines and process identification results.

Supported engines: Comet, Mascot, MSGFPlus, XTandem, OMSSA, Myrimatch

Basic identification workflow:

python

undefined

与搜索引擎集成并处理鉴定结果。

支持的引擎：Comet、Mascot、MSGFPlus、XTandem、OMSSA、Myrimatch

基础鉴定工作流程：

python

undefined

Load identification data

加载鉴定数据

protein_ids = [] peptide_ids = [] ms.IdXMLFile().load("identifications.idXML", protein_ids, peptide_ids)

Apply FDR filtering

应用FDR过滤

fdr = ms.FalseDiscoveryRate() fdr.apply(peptide_ids)


**For detailed workflows**: See `references/identification.md`

fdr = ms.FalseDiscoveryRate() fdr.apply(peptide_ids)


**详细工作流程**：参见`references/identification.md`

5. Metabolomics Analysis

5. 代谢组学分析

Perform untargeted metabolomics preprocessing and analysis.

Typical workflow:

Load and process raw data
Detect features
Align retention times across samples
Link features to consensus map
Annotate with compound databases

For complete metabolomics workflows: See

references/metabolomics.md

执行非靶向代谢组学预处理和分析。

典型工作流程：

加载并处理原始数据
检测特征
对齐样本间的保留时间
将特征关联到共识图谱
利用化合物数据库进行注释

完整代谢组学工作流程：参见

references/metabolomics.md

Data Structures

数据结构

PyOpenMS uses these primary objects:

MSExperiment: Collection of spectra and chromatograms
MSSpectrum: Single mass spectrum with m/z and intensity pairs
MSChromatogram: Chromatographic trace
Feature: Detected chromatographic peak with quality metrics
FeatureMap: Collection of features
PeptideIdentification: Search results for peptides
ProteinIdentification: Search results for proteins

For detailed documentation: See

references/data_structures.md

PyOpenMS使用以下主要对象：

MSExperiment：光谱和色谱图的集合
MSSpectrum：包含m/z和强度对的单个质谱图
MSChromatogram：色谱轨迹
Feature：带有质量指标的检测到的色谱峰
FeatureMap：特征的集合
PeptideIdentification：肽段的搜索结果
ProteinIdentification：蛋白质的搜索结果

详细文档：参见

references/data_structures.md

Common Workflows

常见工作流程

Quick Start: Load and Explore Data

快速入门：加载并探索数据

python

import pyopenms as ms

python

import pyopenms as ms

Load mzML file

加载mzML文件

exp = ms.MSExperiment() ms.MzMLFile().load("sample.mzML", exp)

Get basic statistics

获取基本统计信息

print(f"Number of spectra: {exp.getNrSpectra()}") print(f"Number of chromatograms: {exp.getNrChromatograms()}")

Examine first spectrum

检查第一个光谱

spec = exp.getSpectrum(0) print(f"MS level: {spec.getMSLevel()}") print(f"Retention time: {spec.getRT()}") mz, intensity = spec.get_peaks() print(f"Peaks: {len(mz)}")

undefined

spec = exp.getSpectrum(0) print(f"MS level: {spec.getMSLevel()}") print(f"Retention time: {spec.getRT()}") mz, intensity = spec.get_peaks() print(f"Peaks: {len(mz)}")

undefined

Parameter Management

参数管理

Most algorithms use a parameter system:

python

undefined

大多数算法使用参数系统：

python

undefined

Get algorithm parameters

获取算法参数

algo = ms.GaussFilter() params = algo.getParameters()

View available parameters

查看可用参数

for param in params.keys(): print(f"{param}: {params.getValue(param)}")

Modify parameters

修改参数

params.setValue("gaussian_width", 0.2) algo.setParameters(params)

undefined

params.setValue("gaussian_width", 0.2) algo.setParameters(params)

undefined

Export to Pandas

导出到Pandas

Convert data to pandas DataFrames for analysis:

python

import pyopenms as ms
import pandas as pd

将数据转换为pandas DataFrame进行分析：

python

import pyopenms as ms
import pandas as pd

Load feature map

加载特征图谱

fm = ms.FeatureMap() ms.FeatureXMLFile().load("features.featureXML", fm)

Convert to DataFrame

转换为DataFrame

df = fm.get_df() print(df.head())

undefined

df = fm.get_df() print(df.head())

undefined

Integration with Other Tools

与其他工具的集成

PyOpenMS integrates with:

Pandas: Export data to DataFrames
NumPy: Work with peak arrays
Scikit-learn: Machine learning on MS data
Matplotlib/Seaborn: Visualization
R: Via rpy2 bridge

PyOpenMS可与以下工具集成：

Pandas：将数据导出为DataFrame
NumPy：处理峰数组
Scikit-learn：对质谱数据进行机器学习
Matplotlib/Seaborn：可视化
R：通过rpy2桥接

Resources

资源

Official documentation: https://pyopenms.readthedocs.io
OpenMS documentation: https://www.openms.org
GitHub: https://github.com/OpenMS/OpenMS

官方文档：https://pyopenms.readthedocs.io
OpenMS文档：https://www.openms.org
GitHub：https://github.com/OpenMS/OpenMS

References

参考资料

```
references/file_io.md
```
- Comprehensive file format handling
```
references/signal_processing.md
```
- Signal processing algorithms
```
references/feature_detection.md
```
- Feature detection and linking
```
references/identification.md
```
- Peptide and protein identification
```
references/metabolomics.md
```
- Metabolomics-specific workflows
```
references/data_structures.md
```
- Core objects and data structures

```
references/file_io.md
```
- 全面的文件格式处理
```
references/signal_processing.md
```
- 信号处理算法
```
references/feature_detection.md
```
- 特征检测与关联
```
references/identification.md
```
- 肽段与蛋白质鉴定
```
references/metabolomics.md
```
- 代谢组学专属工作流程
```
references/data_structures.md
```
- 核心对象与数据结构