pyOpenMS

Overview

pyOpenMS is an open-source Python library for mass spectrometry data analysis in proteomics and metabolomics. Process LC-MS/MS data, perform peptide identification, detect and quantify features, and integrate with common proteomics tools (Comet, Mascot, MSGF+, Percolator, MSstats) using Python bindings to the OpenMS C++ library.

When to Use This Skill

This skill should be used when:

Processing mass spectrometry data (mzML, mzXML files)
Performing peak picking and feature detection in LC-MS data
Conducting peptide and protein identification workflows
Quantifying metabolites or proteins
Integrating proteomics or metabolomics tools into Python pipelines
Working with OpenMS tools and file formats

Core Capabilities

1. File I/O and Data Import/Export

Handle diverse mass spectrometry file formats efficiently:

Supported Formats:

mzML/mzXML: Primary raw MS data formats (profile or centroid)
FASTA: Protein/peptide sequence databases
mzTab: Standardized reporting format for identification and quantification
mzIdentML: Peptide and protein identification data
TraML: Transition lists for targeted experiments
pepXML/protXML: Search engine results

Reading mzML Files:

python

import pyopenms as oms

# Load MS data
exp = oms.MSExperiment()
oms.MzMLFile().load("input_data.mzML", exp)

# Access basic information
print(f"Number of spectra: {exp.getNrSpectra()}")
print(f"Number of chromatograms: {exp.getNrChromatograms()}")

Writing mzML Files:

python

# Save processed data
oms.MzMLFile().store("output_data.mzML", exp)

File Encoding: pyOpenMS automatically handles Base64 encoding, zlib compression, and Numpress compression internally.

2. MS Data Structures and Manipulation

Work with core mass spectrometry data structures. See

references/data_structures.md

for comprehensive details.

MSSpectrum - Individual mass spectrum:

python

# Create spectrum with metadata
spectrum = oms.MSSpectrum()
spectrum.setRT(205.2)  # Retention time in seconds
spectrum.setMSLevel(2)  # MS2 spectrum

# Set peak data (m/z, intensity arrays)
mz_array = [100.5, 200.3, 300.7, 400.2]
intensity_array = [1000, 5000, 3000, 2000]
spectrum.set_peaks((mz_array, intensity_array))

# Add precursor information for MS2
precursor = oms.Precursor()
precursor.setMZ(450.5)
precursor.setCharge(2)
spectrum.setPrecursors([precursor])

MSExperiment - Complete LC-MS/MS run:

python

# Create experiment and add spectra
exp = oms.MSExperiment()
exp.addSpectrum(spectrum)

# Access spectra
first_spectrum = exp.getSpectrum(0)
for spec in exp:
    print(f"RT: {spec.getRT()}, MS Level: {spec.getMSLevel()}")

MSChromatogram - Extracted ion chromatogram:

python

# Create chromatogram
chrom = oms.MSChromatogram()
chrom.set_peaks(([10.5, 11.2, 11.8], [1000, 5000, 3000]))  # RT, intensity
exp.addChromatogram(chrom)

Efficient Peak Access:

python

# Get peaks as numpy arrays for fast processing
mz_array, intensity_array = spectrum.get_peaks()

# Modify and set back
intensity_array *= 2  # Double all intensities
spectrum.set_peaks((mz_array, intensity_array))

3. Chemistry and Peptide Handling

Perform chemical calculations for proteomics and metabolomics. See

references/chemistry.md

for detailed examples.

Molecular Formulas and Mass Calculations:

python

# Create empirical formula
formula = oms.EmpiricalFormula("C6H12O6")  # Glucose
print(f"Monoisotopic mass: {formula.getMonoWeight()}")
print(f"Average mass: {formula.getAverageWeight()}")

# Formula arithmetic
water = oms.EmpiricalFormula("H2O")
dehydrated = formula - water

# Isotope-specific formulas
heavy_carbon = oms.EmpiricalFormula("(13)C6H12O6")

Isotopic Distributions:

python

# Generate coarse isotope pattern (unit mass resolution)
coarse_gen = oms.CoarseIsotopePatternGenerator()
pattern = coarse_gen.run(formula)

# Generate fine structure (high resolution)
fine_gen = oms.FineIsotopePatternGenerator(0.01)  # 0.01 Da resolution
fine_pattern = fine_gen.run(formula)

Amino Acids and Residues:

python

# Access residue information
res_db = oms.ResidueDB()
leucine = res_db.getResidue("Leucine")
print(f"L monoisotopic mass: {leucine.getMonoWeight()}")
print(f"L formula: {leucine.getFormula()}")
print(f"L pKa: {leucine.getPka()}")

Peptide Sequences:

python

# Create peptide sequence
peptide = oms.AASequence.fromString("PEPTIDE")
print(f"Peptide mass: {peptide.getMonoWeight()}")
print(f"Formula: {peptide.getFormula()}")

# Add modifications
modified = oms.AASequence.fromString("PEPTIDEM(Oxidation)")
print(f"Modified mass: {modified.getMonoWeight()}")

# Theoretical fragmentation
ions = []
for i in range(1, peptide.size()):
    b_ion = peptide.getPrefix(i)
    y_ion = peptide.getSuffix(i)
    ions.append(('b', i, b_ion.getMonoWeight()))
    ions.append(('y', i, y_ion.getMonoWeight()))

Protein Digestion:

python

# Enzymatic digestion
dig = oms.ProteaseDigestion()
dig.setEnzyme("Trypsin")
dig.setMissedCleavages(2)

protein_seq = oms.AASequence.fromString("MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEK")
peptides = []
dig.digest(protein_seq, peptides)

for pep in peptides:
    print(f"{pep.toString()}: {pep.getMonoWeight():.2f} Da")

Modifications:

python

# Access modification database
mod_db = oms.ModificationsDB()
oxidation = mod_db.getModification("Oxidation")
print(f"Oxidation mass diff: {oxidation.getDiffMonoMass()}")
print(f"Residues: {oxidation.getResidues()}")

4. Signal Processing and Filtering

Apply algorithms to process and filter MS data. See

references/algorithms.md

for comprehensive coverage.

Spectral Smoothing:

python

# Gaussian smoothing
gauss_filter = oms.GaussFilter()
params = gauss_filter.getParameters()
params.setValue("gaussian_width", 0.2)
gauss_filter.setParameters(params)
gauss_filter.filterExperiment(exp)

# Savitzky-Golay filter
sg_filter = oms.SavitzkyGolayFilter()
sg_filter.filterExperiment(exp)

Peak Filtering:

python

# Keep only N largest peaks per spectrum
n_largest = oms.NLargest()
params = n_largest.getParameters()
params.setValue("n", 100)  # Keep top 100 peaks
n_largest.setParameters(params)
n_largest.filterExperiment(exp)

# Threshold filtering
threshold_filter = oms.ThresholdMower()
params = threshold_filter.getParameters()
params.setValue("threshold", 1000.0)  # Remove peaks below 1000 intensity
threshold_filter.setParameters(params)
threshold_filter.filterExperiment(exp)

# Window-based filtering
window_filter = oms.WindowMower()
params = window_filter.getParameters()
params.setValue("windowsize", 50.0)  # 50 m/z windows
params.setValue("peakcount", 10)     # Keep 10 highest per window
window_filter.setParameters(params)
window_filter.filterExperiment(exp)

Spectrum Normalization:

python

normalizer = oms.Normalizer()
normalizer.filterExperiment(exp)

MS Level Filtering:

python

# Keep only MS2 spectra
exp.filterMSLevel(2)

# Filter by retention time range
exp.filterRT(100.0, 500.0)  # Keep RT between 100-500 seconds

# Filter by m/z range
exp.filterMZ(400.0, 1500.0)  # Keep m/z between 400-1500

5. Feature Detection and Quantification

Detect and quantify features in LC-MS data:

Peak Picking (Centroiding):

python

# Convert profile data to centroid
picker = oms.PeakPickerHiRes()
params = picker.getParameters()
params.setValue("signal_to_noise", 1.0)
picker.setParameters(params)

exp_centroided = oms.MSExperiment()
picker.pickExperiment(exp, exp_centroided)

Feature Detection:

python

# Detect features across LC-MS runs
feature_finder = oms.FeatureFinderMultiplex()

features = oms.FeatureMap()
feature_finder.run(exp, features, params)

print(f"Found {features.size()} features")
for feature in features:
    print(f"m/z: {feature.getMZ():.4f}, RT: {feature.getRT():.2f}, "
          f"Intensity: {feature.getIntensity():.0f}")

Feature Linking (Map Alignment):

python

# Link features across multiple samples
feature_grouper = oms.FeatureGroupingAlgorithmQT()
consensus_map = oms.ConsensusMap()

# Provide multiple feature maps from different samples
feature_maps = [features1, features2, features3]
feature_grouper.group(feature_maps, consensus_map)

6. Peptide Identification Workflows

Integrate with search engines and process identification results:

Database Searching:

python

# Prepare parameters for search engine
params = oms.Param()
params.setValue("database", "uniprot_human.fasta")
params.setValue("precursor_mass_tolerance", 10.0)  # ppm
params.setValue("fragment_mass_tolerance", 0.5)     # Da
params.setValue("enzyme", "Trypsin")
params.setValue("missed_cleavages", 2)

# Variable modifications
params.setValue("variable_modifications", ["Oxidation (M)", "Phospho (STY)"])

# Fixed modifications
params.setValue("fixed_modifications", ["Carbamidomethyl (C)"])

FDR Control:

python

# False discovery rate estimation
fdr = oms.FalseDiscoveryRate()
fdr_threshold = 0.01  # 1% FDR

# Apply to peptide identifications
protein_ids = []
peptide_ids = []
oms.IdXMLFile().load("search_results.idXML", protein_ids, peptide_ids)

fdr.apply(protein_ids, peptide_ids)

7. Metabolomics Workflows

Analyze small molecule data:

Adduct Detection:

python

# Common metabolite adducts
adducts = ["[M+H]+", "[M+Na]+", "[M+K]+", "[M-H]-", "[M+Cl]-"]

# Feature annotation with adducts
for feature in features:
    mz = feature.getMZ()
    # Calculate neutral mass for each adduct hypothesis
    for adduct in adducts:
        # Annotation logic
        pass

Isotope Pattern Matching:

python

# Compare experimental to theoretical isotope patterns
experimental_pattern = []  # Extract from feature
theoretical = coarse_gen.run(formula)

# Calculate similarity score
similarity = compare_isotope_patterns(experimental_pattern, theoretical)

8. Quality Control and Visualization

Monitor data quality and visualize results:

Basic Statistics:

python

# Calculate TIC (Total Ion Current)
tic_values = []
rt_values = []
for spectrum in exp:
    if spectrum.getMSLevel() == 1:
        tic = sum(spectrum.get_peaks()[1])  # Sum intensities
        tic_values.append(tic)
        rt_values.append(spectrum.getRT())

# Base peak chromatogram
bpc_values = []
for spectrum in exp:
    if spectrum.getMSLevel() == 1:
        max_intensity = max(spectrum.get_peaks()[1]) if spectrum.size() > 0 else 0
        bpc_values.append(max_intensity)

Plotting (with pyopenms.plotting or matplotlib):

python

import matplotlib.pyplot as plt

# Plot TIC
plt.figure(figsize=(10, 4))
plt.plot(rt_values, tic_values)
plt.xlabel('Retention Time (s)')
plt.ylabel('Total Ion Current')
plt.title('TIC')
plt.show()

# Plot single spectrum
spectrum = exp.getSpectrum(0)
mz, intensity = spectrum.get_peaks()
plt.stem(mz, intensity, basefmt=' ')
plt.xlabel('m/z')
plt.ylabel('Intensity')
plt.title(f'Spectrum at RT {spectrum.getRT():.2f}s')
plt.show()

Common Workflows

Complete LC-MS/MS Processing Pipeline

python

import pyopenms as oms

# 1. Load data
exp = oms.MSExperiment()
oms.MzMLFile().load("raw_data.mzML", exp)

# 2. Filter and smooth
exp.filterMSLevel(1)  # Keep only MS1 for feature detection
gauss = oms.GaussFilter()
gauss.filterExperiment(exp)

# 3. Peak picking
picker = oms.PeakPickerHiRes()
exp_centroid = oms.MSExperiment()
picker.pickExperiment(exp, exp_centroid)

# 4. Feature detection
ff = oms.FeatureFinderMultiplex()
features = oms.FeatureMap()
ff.run(exp_centroid, features, oms.Param())

# 5. Export results
oms.FeatureXMLFile().store("features.featureXML", features)
print(f"Detected {features.size()} features")

Theoretical Peptide Mass Calculation

python

# Calculate masses for peptide with modifications
peptide = oms.AASequence.fromString("PEPTIDEK")
print(f"Unmodified [M+H]+: {peptide.getMonoWeight() + 1.007276:.4f}")

# With modification
modified = oms.AASequence.fromString("PEPTIDEM(Oxidation)K")
print(f"Oxidized [M+H]+: {modified.getMonoWeight() + 1.007276:.4f}")

# Calculate for different charge states
for z in [1, 2, 3]:
    mz = (peptide.getMonoWeight() + z * 1.007276) / z
    print(f"[M+{z}H]^{z}+: {mz:.4f}")

Installation

Ensure pyOpenMS is installed before using this skill:

bash

# Via conda (recommended)
conda install -c bioconda pyopenms

# Via pip
pip install pyopenms

Integration with Other Tools

pyOpenMS integrates seamlessly with:

Search Engines: Comet, Mascot, MSGF+, MSFragger, Sage, SpectraST
Post-processing: Percolator, MSstats, Epiphany
Metabolomics: SIRIUS, CSI:FingerID
Data Analysis: Pandas, NumPy, SciPy for downstream analysis
Visualization: Matplotlib, Seaborn for plotting

Resources

references/

Detailed documentation on core concepts:

data_structures.md - Comprehensive guide to MSExperiment, MSSpectrum, MSChromatogram, and peak data handling
algorithms.md - Complete reference for signal processing, filtering, feature detection, and quantification algorithms
chemistry.md - In-depth coverage of chemistry calculations, peptide handling, modifications, and isotope distributions

Load these references when needing detailed information about specific pyOpenMS capabilities.

Best Practices

File Format: Always use mzML for raw MS data (standardized, well-supported)
Peak Access: Use
```
get_peaks()
```
and
```
set_peaks()
```
with numpy arrays for efficient processing
Parameters: Always check and configure algorithm parameters via
```
getParameters()
```
and
```
setParameters()
```
Memory: For large datasets, process spectra iteratively rather than loading entire experiments
Validation: Check data integrity (MS levels, RT ordering, precursor information) after loading
Modifications: Use standard modification names from UniMod database
Units: RT in seconds, m/z in Thomson (Da/charge), intensity in arbitrary units

Common Patterns

Algorithm Application Pattern:

python

# 1. Instantiate algorithm
algorithm = oms.SomeAlgorithm()

# 2. Get and configure parameters
params = algorithm.getParameters()
params.setValue("parameter_name", value)
algorithm.setParameters(params)

# 3. Apply to data
algorithm.filterExperiment(exp)  # or .process(), .run(), depending on algorithm

File I/O Pattern:

python

# Read
data_container = oms.DataContainer()  # MSExperiment, FeatureMap, etc.
oms.FileHandler().load("input.format", data_container)

# Process
# ... manipulate data_container ...

# Write
oms.FileHandler().store("output.format", data_container)

Getting Help

Documentation: https://pyopenms.readthedocs.io/
API Reference: Browse class documentation for detailed method signatures
OpenMS Website: https://www.openms.org/
GitHub Issues: https://github.com/OpenMS/OpenMS/issues

pyopenms

NPX Install

Tags

SKILL.md Content

pyOpenMS

Overview

When to Use This Skill

Core Capabilities

1. File I/O and Data Import/Export

2. MS Data Structures and Manipulation

3. Chemistry and Peptide Handling

4. Signal Processing and Filtering

5. Feature Detection and Quantification

6. Peptide Identification Workflows

7. Metabolomics Workflows

8. Quality Control and Visualization

Common Workflows

Complete LC-MS/MS Processing Pipeline

Theoretical Peptide Mass Calculation

Installation

Integration with Other Tools

Resources

references/

Best Practices

Common Patterns

Getting Help