matchms
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMatchms
Matchms
Overview
概述
Matchms is an open-source Python library for mass spectrometry data processing and analysis. Import spectra from various formats, standardize metadata, filter peaks, calculate spectral similarities, and build reproducible analytical workflows.
Matchms是一个开源Python库,用于质谱数据处理与分析。支持从多种格式导入谱图、标准化元数据、过滤峰数据、计算谱图相似度,以及构建可复现的分析工作流。
Core Capabilities
核心功能
1. Importing and Exporting Mass Spectrometry Data
1. 质谱数据的导入与导出
Load spectra from multiple file formats and export processed data:
python
from matchms.importing import load_from_mgf, load_from_mzml, load_from_msp, load_from_json
from matchms.exporting import save_as_mgf, save_as_msp, save_as_json从多种文件格式加载谱图,并导出处理后的数据:
python
from matchms.importing import load_from_mgf, load_from_mzml, load_from_msp, load_from_json
from matchms.exporting import save_as_mgf, save_as_msp, save_as_jsonImport spectra
Import spectra
spectra = list(load_from_mgf("spectra.mgf"))
spectra = list(load_from_mzml("data.mzML"))
spectra = list(load_from_msp("library.msp"))
spectra = list(load_from_mgf("spectra.mgf"))
spectra = list(load_from_mzml("data.mzML"))
spectra = list(load_from_msp("library.msp"))
Export processed spectra
Export processed spectra
save_as_mgf(spectra, "output.mgf")
save_as_json(spectra, "output.json")
**Supported formats:**
- mzML and mzXML (raw mass spectrometry formats)
- MGF (Mascot Generic Format)
- MSP (spectral library format)
- JSON (GNPS-compatible)
- metabolomics-USI references
- Pickle (Python serialization)
For detailed importing/exporting documentation, consult `references/importing_exporting.md`.save_as_mgf(spectra, "output.mgf")
save_as_json(spectra, "output.json")
**支持的格式:**
- mzML和mzXML(原始质谱格式)
- MGF(Mascot通用格式)
- MSP(谱图库格式)
- JSON(兼容GNPS)
- metabolomics-USI参考格式
- Pickle(Python序列化格式)
如需了解导入/导出的详细文档,请查阅`references/importing_exporting.md`。2. Spectrum Filtering and Processing
2. 谱图过滤与处理
Apply comprehensive filters to standardize metadata and refine peak data:
python
from matchms.filtering import default_filters, normalize_intensities
from matchms.filtering import select_by_relative_intensity, require_minimum_number_of_peaks应用全面的过滤器来标准化元数据并优化峰数据:
python
from matchms.filtering import default_filters, normalize_intensities
from matchms.filtering import select_by_relative_intensity, require_minimum_number_of_peaksApply default metadata harmonization filters
Apply default metadata harmonization filters
spectrum = default_filters(spectrum)
spectrum = default_filters(spectrum)
Normalize peak intensities
Normalize peak intensities
spectrum = normalize_intensities(spectrum)
spectrum = normalize_intensities(spectrum)
Filter peaks by relative intensity
Filter peaks by relative intensity
spectrum = select_by_relative_intensity(spectrum, intensity_from=0.01, intensity_to=1.0)
spectrum = select_by_relative_intensity(spectrum, intensity_from=0.01, intensity_to=1.0)
Require minimum peaks
Require minimum peaks
spectrum = require_minimum_number_of_peaks(spectrum, n_required=5)
**Filter categories:**
- **Metadata processing**: Harmonize compound names, derive chemical structures, standardize adducts, correct charges
- **Peak filtering**: Normalize intensities, select by m/z or intensity, remove precursor peaks
- **Quality control**: Require minimum peaks, validate precursor m/z, ensure metadata completeness
- **Chemical annotation**: Add fingerprints, derive InChI/SMILES, repair structural mismatches
Matchms provides 40+ filters. For the complete filter reference, consult `references/filtering.md`.spectrum = require_minimum_number_of_peaks(spectrum, n_required=5)
**过滤器分类:**
- **元数据处理**:统一化合物名称、推导化学结构、标准化加合物、校正电荷
- **峰过滤**:归一化强度、按质荷比(m/z)或强度筛选、移除前体峰
- **质量控制**:要求最小峰数量、验证前体质荷比、确保元数据完整性
- **化学注释**:添加指纹、推导InChI/SMILES、修复结构不匹配问题
Matchms提供40余种过滤器。如需完整的过滤器参考,请查阅`references/filtering.md`。3. Calculating Spectral Similarities
3. 谱图相似度计算
Compare spectra using various similarity metrics:
python
from matchms import calculate_scores
from matchms.similarity import CosineGreedy, ModifiedCosine, CosineHungarian使用多种相似度指标对比谱图:
python
from matchms import calculate_scores
from matchms.similarity import CosineGreedy, ModifiedCosine, CosineHungarianCalculate cosine similarity (fast, greedy algorithm)
Calculate cosine similarity (fast, greedy algorithm)
scores = calculate_scores(references=library_spectra,
queries=query_spectra,
similarity_function=CosineGreedy())
scores = calculate_scores(references=library_spectra,
queries=query_spectra,
similarity_function=CosineGreedy())
Calculate modified cosine (accounts for precursor m/z differences)
Calculate modified cosine (accounts for precursor m/z differences)
scores = calculate_scores(references=library_spectra,
queries=query_spectra,
similarity_function=ModifiedCosine(tolerance=0.1))
scores = calculate_scores(references=library_spectra,
queries=query_spectra,
similarity_function=ModifiedCosine(tolerance=0.1))
Get best matches
Get best matches
best_matches = scores.scores_by_query(query_spectra[0], sort=True)[:10]
**Available similarity functions:**
- **CosineGreedy/CosineHungarian**: Peak-based cosine similarity with different matching algorithms
- **ModifiedCosine**: Cosine similarity accounting for precursor mass differences
- **NeutralLossesCosine**: Similarity based on neutral loss patterns
- **FingerprintSimilarity**: Molecular structure similarity using fingerprints
- **MetadataMatch**: Compare user-defined metadata fields
- **PrecursorMzMatch/ParentMassMatch**: Simple mass-based filtering
For detailed similarity function documentation, consult `references/similarity.md`.best_matches = scores.scores_by_query(query_spectra[0], sort=True)[:10]
**可用的相似度函数:**
- **CosineGreedy/CosineHungarian**:基于峰的余弦相似度,采用不同匹配算法
- **ModifiedCosine**:考虑前体质量差异的余弦相似度
- **NeutralLossesCosine**:基于中性丢失模式的相似度
- **FingerprintSimilarity**:使用指纹计算分子结构相似度
- **MetadataMatch**:对比用户定义的元数据字段
- **PrecursorMzMatch/ParentMassMatch**:基于质量的简单过滤
如需相似度函数的详细文档,请查阅`references/similarity.md`。4. Building Processing Pipelines
4. 构建处理工作流
Create reproducible, multi-step analysis workflows:
python
from matchms import SpectrumProcessor
from matchms.filtering import default_filters, normalize_intensities
from matchms.filtering import select_by_relative_intensity, remove_peaks_around_precursor_mz创建可复现的多步骤分析工作流:
python
from matchms import SpectrumProcessor
from matchms.filtering import default_filters, normalize_intensities
from matchms.filtering import select_by_relative_intensity, remove_peaks_around_precursor_mzDefine a processing pipeline
Define a processing pipeline
processor = SpectrumProcessor([
default_filters,
normalize_intensities,
lambda s: select_by_relative_intensity(s, intensity_from=0.01),
lambda s: remove_peaks_around_precursor_mz(s, mz_tolerance=17)
])
processor = SpectrumProcessor([
default_filters,
normalize_intensities,
lambda s: select_by_relative_intensity(s, intensity_from=0.01),
lambda s: remove_peaks_around_precursor_mz(s, mz_tolerance=17)
])
Apply to all spectra
Apply to all spectra
processed_spectra = [processor(s) for s in spectra]
undefinedprocessed_spectra = [processor(s) for s in spectra]
undefined5. Working with Spectrum Objects
5. 谱图对象操作
The core class contains mass spectral data:
Spectrumpython
from matchms import Spectrum
import numpy as np核心类包含质谱数据:
Spectrumpython
from matchms import Spectrum
import numpy as npCreate a spectrum
Create a spectrum
mz = np.array([100.0, 150.0, 200.0, 250.0])
intensities = np.array([0.1, 0.5, 0.9, 0.3])
metadata = {"precursor_mz": 250.5, "ionmode": "positive"}
spectrum = Spectrum(mz=mz, intensities=intensities, metadata=metadata)
mz = np.array([100.0, 150.0, 200.0, 250.0])
intensities = np.array([0.1, 0.5, 0.9, 0.3])
metadata = {"precursor_mz": 250.5, "ionmode": "positive"}
spectrum = Spectrum(mz=mz, intensities=intensities, metadata=metadata)
Access spectrum properties
Access spectrum properties
print(spectrum.peaks.mz) # m/z values
print(spectrum.peaks.intensities) # Intensity values
print(spectrum.get("precursor_mz")) # Metadata field
print(spectrum.peaks.mz) # m/z values
print(spectrum.peaks.intensities) # Intensity values
print(spectrum.get("precursor_mz")) # Metadata field
Visualize spectra
Visualize spectra
spectrum.plot()
spectrum.plot_against(reference_spectrum)
undefinedspectrum.plot()
spectrum.plot_against(reference_spectrum)
undefined6. Metadata Management
6. 元数据管理
Standardize and harmonize spectrum metadata:
python
undefined标准化与统一谱图元数据:
python
undefinedMetadata is automatically harmonized
Metadata is automatically harmonized
spectrum.set("Precursor_mz", 250.5) # Gets harmonized to lowercase key
print(spectrum.get("precursor_mz")) # Returns 250.5
spectrum.set("Precursor_mz", 250.5) # Gets harmonized to lowercase key
print(spectrum.get("precursor_mz")) # Returns 250.5
Derive chemical information
Derive chemical information
from matchms.filtering import derive_inchi_from_smiles, derive_inchikey_from_inchi
from matchms.filtering import add_fingerprint
spectrum = derive_inchi_from_smiles(spectrum)
spectrum = derive_inchikey_from_inchi(spectrum)
spectrum = add_fingerprint(spectrum, fingerprint_type="morgan", nbits=2048)
undefinedfrom matchms.filtering import derive_inchi_from_smiles, derive_inchikey_from_inchi
from matchms.filtering import add_fingerprint
spectrum = derive_inchi_from_smiles(spectrum)
spectrum = derive_inchikey_from_inchi(spectrum)
spectrum = add_fingerprint(spectrum, fingerprint_type="morgan", nbits=2048)
undefinedCommon Workflows
常见工作流
For typical mass spectrometry analysis workflows, including:
- Loading and preprocessing spectral libraries
- Matching unknown spectra against reference libraries
- Quality filtering and data cleaning
- Large-scale similarity comparisons
- Network-based spectral clustering
Consult for detailed examples.
references/workflows.md如需了解典型的质谱分析工作流,包括:
- 加载与预处理谱图库
- 未知谱图与参考库匹配
- 质量过滤与数据清洗
- 大规模相似度对比
- 基于网络的谱图聚类
请查阅获取详细示例。
references/workflows.mdInstallation
安装
bash
uv pip install matchmsFor molecular structure processing (SMILES, InChI):
bash
uv pip install matchms[chemistry]bash
uv pip install matchms如需处理分子结构(SMILES、InChI):
bash
uv pip install matchms[chemistry]Reference Documentation
参考文档
Detailed reference documentation is available in the directory:
references/- - Complete filter function reference with descriptions
filtering.md - - All similarity metrics and when to use them
similarity.md - - File format details and I/O operations
importing_exporting.md - - Common analysis patterns and examples
workflows.md
Load these references as needed for detailed information about specific matchms capabilities.
详细的参考文档位于目录中:
references/- - 完整的过滤器函数参考及说明
filtering.md - - 所有相似度指标及其适用场景
similarity.md - - 文件格式详情与I/O操作说明
importing_exporting.md - - 常见分析模式与示例
workflows.md
根据需要加载这些文档,以了解Matchms特定功能的详细信息。