matchms

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Matchms

Matchms

Overview

概述

Matchms is an open-source Python library for mass spectrometry data processing and analysis. Import spectra from various formats, standardize metadata, filter peaks, calculate spectral similarities, and build reproducible analytical workflows.
Matchms是一个开源Python库,用于质谱数据处理与分析。支持从多种格式导入谱图、标准化元数据、过滤峰数据、计算谱图相似度,以及构建可复现的分析工作流。

Core Capabilities

核心功能

1. Importing and Exporting Mass Spectrometry Data

1. 质谱数据的导入与导出

Load spectra from multiple file formats and export processed data:
python
from matchms.importing import load_from_mgf, load_from_mzml, load_from_msp, load_from_json
from matchms.exporting import save_as_mgf, save_as_msp, save_as_json
从多种文件格式加载谱图,并导出处理后的数据:
python
from matchms.importing import load_from_mgf, load_from_mzml, load_from_msp, load_from_json
from matchms.exporting import save_as_mgf, save_as_msp, save_as_json

Import spectra

Import spectra

spectra = list(load_from_mgf("spectra.mgf")) spectra = list(load_from_mzml("data.mzML")) spectra = list(load_from_msp("library.msp"))
spectra = list(load_from_mgf("spectra.mgf")) spectra = list(load_from_mzml("data.mzML")) spectra = list(load_from_msp("library.msp"))

Export processed spectra

Export processed spectra

save_as_mgf(spectra, "output.mgf") save_as_json(spectra, "output.json")

**Supported formats:**
- mzML and mzXML (raw mass spectrometry formats)
- MGF (Mascot Generic Format)
- MSP (spectral library format)
- JSON (GNPS-compatible)
- metabolomics-USI references
- Pickle (Python serialization)

For detailed importing/exporting documentation, consult `references/importing_exporting.md`.
save_as_mgf(spectra, "output.mgf") save_as_json(spectra, "output.json")

**支持的格式:**
- mzML和mzXML(原始质谱格式)
- MGF(Mascot通用格式)
- MSP(谱图库格式)
- JSON(兼容GNPS)
- metabolomics-USI参考格式
- Pickle(Python序列化格式)

如需了解导入/导出的详细文档,请查阅`references/importing_exporting.md`。

2. Spectrum Filtering and Processing

2. 谱图过滤与处理

Apply comprehensive filters to standardize metadata and refine peak data:
python
from matchms.filtering import default_filters, normalize_intensities
from matchms.filtering import select_by_relative_intensity, require_minimum_number_of_peaks
应用全面的过滤器来标准化元数据并优化峰数据:
python
from matchms.filtering import default_filters, normalize_intensities
from matchms.filtering import select_by_relative_intensity, require_minimum_number_of_peaks

Apply default metadata harmonization filters

Apply default metadata harmonization filters

spectrum = default_filters(spectrum)
spectrum = default_filters(spectrum)

Normalize peak intensities

Normalize peak intensities

spectrum = normalize_intensities(spectrum)
spectrum = normalize_intensities(spectrum)

Filter peaks by relative intensity

Filter peaks by relative intensity

spectrum = select_by_relative_intensity(spectrum, intensity_from=0.01, intensity_to=1.0)
spectrum = select_by_relative_intensity(spectrum, intensity_from=0.01, intensity_to=1.0)

Require minimum peaks

Require minimum peaks

spectrum = require_minimum_number_of_peaks(spectrum, n_required=5)

**Filter categories:**
- **Metadata processing**: Harmonize compound names, derive chemical structures, standardize adducts, correct charges
- **Peak filtering**: Normalize intensities, select by m/z or intensity, remove precursor peaks
- **Quality control**: Require minimum peaks, validate precursor m/z, ensure metadata completeness
- **Chemical annotation**: Add fingerprints, derive InChI/SMILES, repair structural mismatches

Matchms provides 40+ filters. For the complete filter reference, consult `references/filtering.md`.
spectrum = require_minimum_number_of_peaks(spectrum, n_required=5)

**过滤器分类:**
- **元数据处理**:统一化合物名称、推导化学结构、标准化加合物、校正电荷
- **峰过滤**:归一化强度、按质荷比(m/z)或强度筛选、移除前体峰
- **质量控制**:要求最小峰数量、验证前体质荷比、确保元数据完整性
- **化学注释**:添加指纹、推导InChI/SMILES、修复结构不匹配问题

Matchms提供40余种过滤器。如需完整的过滤器参考,请查阅`references/filtering.md`。

3. Calculating Spectral Similarities

3. 谱图相似度计算

Compare spectra using various similarity metrics:
python
from matchms import calculate_scores
from matchms.similarity import CosineGreedy, ModifiedCosine, CosineHungarian
使用多种相似度指标对比谱图:
python
from matchms import calculate_scores
from matchms.similarity import CosineGreedy, ModifiedCosine, CosineHungarian

Calculate cosine similarity (fast, greedy algorithm)

Calculate cosine similarity (fast, greedy algorithm)

scores = calculate_scores(references=library_spectra, queries=query_spectra, similarity_function=CosineGreedy())
scores = calculate_scores(references=library_spectra, queries=query_spectra, similarity_function=CosineGreedy())

Calculate modified cosine (accounts for precursor m/z differences)

Calculate modified cosine (accounts for precursor m/z differences)

scores = calculate_scores(references=library_spectra, queries=query_spectra, similarity_function=ModifiedCosine(tolerance=0.1))
scores = calculate_scores(references=library_spectra, queries=query_spectra, similarity_function=ModifiedCosine(tolerance=0.1))

Get best matches

Get best matches

best_matches = scores.scores_by_query(query_spectra[0], sort=True)[:10]

**Available similarity functions:**
- **CosineGreedy/CosineHungarian**: Peak-based cosine similarity with different matching algorithms
- **ModifiedCosine**: Cosine similarity accounting for precursor mass differences
- **NeutralLossesCosine**: Similarity based on neutral loss patterns
- **FingerprintSimilarity**: Molecular structure similarity using fingerprints
- **MetadataMatch**: Compare user-defined metadata fields
- **PrecursorMzMatch/ParentMassMatch**: Simple mass-based filtering

For detailed similarity function documentation, consult `references/similarity.md`.
best_matches = scores.scores_by_query(query_spectra[0], sort=True)[:10]

**可用的相似度函数:**
- **CosineGreedy/CosineHungarian**:基于峰的余弦相似度,采用不同匹配算法
- **ModifiedCosine**:考虑前体质量差异的余弦相似度
- **NeutralLossesCosine**:基于中性丢失模式的相似度
- **FingerprintSimilarity**:使用指纹计算分子结构相似度
- **MetadataMatch**:对比用户定义的元数据字段
- **PrecursorMzMatch/ParentMassMatch**:基于质量的简单过滤

如需相似度函数的详细文档,请查阅`references/similarity.md`。

4. Building Processing Pipelines

4. 构建处理工作流

Create reproducible, multi-step analysis workflows:
python
from matchms import SpectrumProcessor
from matchms.filtering import default_filters, normalize_intensities
from matchms.filtering import select_by_relative_intensity, remove_peaks_around_precursor_mz
创建可复现的多步骤分析工作流:
python
from matchms import SpectrumProcessor
from matchms.filtering import default_filters, normalize_intensities
from matchms.filtering import select_by_relative_intensity, remove_peaks_around_precursor_mz

Define a processing pipeline

Define a processing pipeline

processor = SpectrumProcessor([ default_filters, normalize_intensities, lambda s: select_by_relative_intensity(s, intensity_from=0.01), lambda s: remove_peaks_around_precursor_mz(s, mz_tolerance=17) ])
processor = SpectrumProcessor([ default_filters, normalize_intensities, lambda s: select_by_relative_intensity(s, intensity_from=0.01), lambda s: remove_peaks_around_precursor_mz(s, mz_tolerance=17) ])

Apply to all spectra

Apply to all spectra

processed_spectra = [processor(s) for s in spectra]
undefined
processed_spectra = [processor(s) for s in spectra]
undefined

5. Working with Spectrum Objects

5. 谱图对象操作

The core
Spectrum
class contains mass spectral data:
python
from matchms import Spectrum
import numpy as np
核心
Spectrum
类包含质谱数据:
python
from matchms import Spectrum
import numpy as np

Create a spectrum

Create a spectrum

mz = np.array([100.0, 150.0, 200.0, 250.0]) intensities = np.array([0.1, 0.5, 0.9, 0.3]) metadata = {"precursor_mz": 250.5, "ionmode": "positive"}
spectrum = Spectrum(mz=mz, intensities=intensities, metadata=metadata)
mz = np.array([100.0, 150.0, 200.0, 250.0]) intensities = np.array([0.1, 0.5, 0.9, 0.3]) metadata = {"precursor_mz": 250.5, "ionmode": "positive"}
spectrum = Spectrum(mz=mz, intensities=intensities, metadata=metadata)

Access spectrum properties

Access spectrum properties

print(spectrum.peaks.mz) # m/z values print(spectrum.peaks.intensities) # Intensity values print(spectrum.get("precursor_mz")) # Metadata field
print(spectrum.peaks.mz) # m/z values print(spectrum.peaks.intensities) # Intensity values print(spectrum.get("precursor_mz")) # Metadata field

Visualize spectra

Visualize spectra

spectrum.plot() spectrum.plot_against(reference_spectrum)
undefined
spectrum.plot() spectrum.plot_against(reference_spectrum)
undefined

6. Metadata Management

6. 元数据管理

Standardize and harmonize spectrum metadata:
python
undefined
标准化与统一谱图元数据:
python
undefined

Metadata is automatically harmonized

Metadata is automatically harmonized

spectrum.set("Precursor_mz", 250.5) # Gets harmonized to lowercase key print(spectrum.get("precursor_mz")) # Returns 250.5
spectrum.set("Precursor_mz", 250.5) # Gets harmonized to lowercase key print(spectrum.get("precursor_mz")) # Returns 250.5

Derive chemical information

Derive chemical information

from matchms.filtering import derive_inchi_from_smiles, derive_inchikey_from_inchi from matchms.filtering import add_fingerprint
spectrum = derive_inchi_from_smiles(spectrum) spectrum = derive_inchikey_from_inchi(spectrum) spectrum = add_fingerprint(spectrum, fingerprint_type="morgan", nbits=2048)
undefined
from matchms.filtering import derive_inchi_from_smiles, derive_inchikey_from_inchi from matchms.filtering import add_fingerprint
spectrum = derive_inchi_from_smiles(spectrum) spectrum = derive_inchikey_from_inchi(spectrum) spectrum = add_fingerprint(spectrum, fingerprint_type="morgan", nbits=2048)
undefined

Common Workflows

常见工作流

For typical mass spectrometry analysis workflows, including:
  • Loading and preprocessing spectral libraries
  • Matching unknown spectra against reference libraries
  • Quality filtering and data cleaning
  • Large-scale similarity comparisons
  • Network-based spectral clustering
Consult
references/workflows.md
for detailed examples.
如需了解典型的质谱分析工作流,包括:
  • 加载与预处理谱图库
  • 未知谱图与参考库匹配
  • 质量过滤与数据清洗
  • 大规模相似度对比
  • 基于网络的谱图聚类
请查阅
references/workflows.md
获取详细示例。

Installation

安装

bash
uv pip install matchms
For molecular structure processing (SMILES, InChI):
bash
uv pip install matchms[chemistry]
bash
uv pip install matchms
如需处理分子结构(SMILES、InChI):
bash
uv pip install matchms[chemistry]

Reference Documentation

参考文档

Detailed reference documentation is available in the
references/
directory:
  • filtering.md
    - Complete filter function reference with descriptions
  • similarity.md
    - All similarity metrics and when to use them
  • importing_exporting.md
    - File format details and I/O operations
  • workflows.md
    - Common analysis patterns and examples
Load these references as needed for detailed information about specific matchms capabilities.
详细的参考文档位于
references/
目录中:
  • filtering.md
    - 完整的过滤器函数参考及说明
  • similarity.md
    - 所有相似度指标及其适用场景
  • importing_exporting.md
    - 文件格式详情与I/O操作说明
  • workflows.md
    - 常见分析模式与示例
根据需要加载这些文档,以了解Matchms特定功能的详细信息。