Medchem

Overview

概述

Medchem is a Python library for molecular filtering and prioritization in drug discovery workflows. Apply hundreds of well-established and novel molecular filters, structural alerts, and medicinal chemistry rules to efficiently triage and prioritize compound libraries at scale. Rules and filters are context-specific—use as guidelines combined with domain expertise.

Medchem是一个用于药物发现工作流中分子筛选和优先级排序的Python库。应用数百种成熟且新颖的分子过滤器、结构警示和药物化学规则，高效地大规模分类和优先处理化合物库。规则和过滤器具有上下文特异性——需结合领域专业知识作为指南使用。

When to Use This Skill

何时使用本工具

This skill should be used when:

Applying drug-likeness rules (Lipinski, Veber, etc.) to compound libraries
Filtering molecules by structural alerts or PAINS patterns
Prioritizing compounds for lead optimization
Assessing compound quality and medicinal chemistry properties
Detecting reactive or problematic functional groups
Calculating molecular complexity metrics

本工具适用于以下场景：

对化合物库应用类药规则（Lipinski、Veber等）
按结构警示或PAINS模式筛选分子
为先导化合物优化确定优先级
评估化合物质量和药物化学特性
检测反应性或有问题的官能团
计算分子复杂度指标

Installation

安装

bash

uv pip install medchem

bash

uv pip install medchem

Core Capabilities

核心功能

1. Medicinal Chemistry Rules

1. 药物化学规则

Apply established drug-likeness rules to molecules using the

medchem.rules

module.

Available Rules:

Rule of Five (Lipinski)
Rule of Oprea
Rule of CNS
Rule of leadlike (soft and strict)
Rule of three
Rule of Reos
Rule of drug
Rule of Veber
Golden triangle
PAINS filters

Single Rule Application:

python

import medchem as mc

使用

medchem.rules

模块对分子应用成熟的类药规则。

可用规则：

类药五规则（Lipinski）
Oprea规则
CNS规则
类先导化合物规则（宽松版和严格版）
三规则
Reos规则
药物规则
Veber规则
金三角规则
PAINS过滤器

单规则应用：

python

import medchem as mc

Apply Rule of Five to a SMILES string

对SMILES字符串应用类药五规则

smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # Aspirin passes = mc.rules.basic_rules.rule_of_five(smiles)

smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # 阿司匹林 passes = mc.rules.basic_rules.rule_of_five(smiles)

Returns: True

返回: True

Check specific rules

检查特定规则

passes_oprea = mc.rules.basic_rules.rule_of_oprea(smiles) passes_cns = mc.rules.basic_rules.rule_of_cns(smiles)


**Multiple Rules with RuleFilters:**

```python
import datamol as dm
import medchem as mc

passes_oprea = mc.rules.basic_rules.rule_of_oprea(smiles) passes_cns = mc.rules.basic_rules.rule_of_cns(smiles)


**多规则组合筛选（RuleFilters）：**

```python
import datamol as dm
import medchem as mc

Load molecules

加载分子

mols = [dm.to_mol(smiles) for smiles in smiles_list]

Create filter with multiple rules

创建包含多规则的筛选器

rfilter = mc.rules.RuleFilters( rule_list=[ "rule_of_five", "rule_of_oprea", "rule_of_cns", "rule_of_leadlike_soft" ] )

Apply filters with parallelization

并行应用筛选器

results = rfilter( mols=mols, n_jobs=-1, # Use all CPU cores progress=True )


**Result Format:**
Results are returned as dictionaries with pass/fail status and detailed information for each rule.

results = rfilter( mols=mols, n_jobs=-1, # 使用全部CPU核心 progress=True )


**结果格式：**
结果以字典形式返回，包含每条规则的通过/失败状态及详细信息。

2. Structural Alert Filters

2. 结构警示筛选器

Detect potentially problematic structural patterns using the

medchem.structural

module.

Available Filters:

Common Alerts - General structural alerts derived from ChEMBL curation and literature
NIBR Filters - Novartis Institutes for BioMedical Research filter set
Lilly Demerits - Eli Lilly's demerit-based system (275 rules, molecules rejected at >100 demerits)

Common Alerts:

python

import medchem as mc

使用

medchem.structural

模块检测潜在的有问题结构模式。

可用筛选器：

通用结构警示 - 源自ChEMBL整理和文献的通用结构警示
NIBR筛选器 - 诺华生物医学研究所的筛选器集合
Lilly扣分系统 - 礼来公司的扣分体系（275条规则，扣分>100的分子被排除）

通用结构警示：

python

import medchem as mc

Create filter

创建筛选器

alert_filter = mc.structural.CommonAlertsFilters()

Check single molecule

检测单个分子

mol = dm.to_mol("c1ccccc1") has_alerts, details = alert_filter.check_mol(mol)

Batch filtering with parallelization

批量并行筛选

results = alert_filter( mols=mol_list, n_jobs=-1, progress=True )


**NIBR Filters:**

```python
import medchem as mc

results = alert_filter( mols=mol_list, n_jobs=-1, progress=True )


**NIBR筛选器：**

```python
import medchem as mc

Apply NIBR filters

应用NIBR筛选器

nibr_filter = mc.structural.NIBRFilters() results = nibr_filter(mols=mol_list, n_jobs=-1)


**Lilly Demerits:**

```python
import medchem as mc

nibr_filter = mc.structural.NIBRFilters() results = nibr_filter(mols=mol_list, n_jobs=-1)


**Lilly扣分系统：**

```python
import medchem as mc

Calculate Lilly demerits

计算Lilly扣分

lilly = mc.structural.LillyDemeritsFilters() results = lilly(mols=mol_list, n_jobs=-1)

Each result includes demerit score and whether it passes (≤100 demerits)

每个结果包含扣分和是否通过（扣分≤100）

undefined

undefined

3. Functional API for High-Level Operations

3. 高级操作函数API

The

medchem.functional

module provides convenient functions for common workflows.

Quick Filtering:

python

import medchem as mc

medchem.functional

模块提供常见工作流的便捷函数。

快速筛选：

python

import medchem as mc

Apply NIBR filters to a list

对列表应用NIBR筛选器

filter_ok = mc.functional.nibr_filter( mols=mol_list, n_jobs=-1 )

Apply common alerts

应用通用结构警示筛选

alert_results = mc.functional.common_alerts_filter( mols=mol_list, n_jobs=-1 )

undefined

alert_results = mc.functional.common_alerts_filter( mols=mol_list, n_jobs=-1 )

undefined

4. Chemical Groups Detection

4. 化学基团检测

Identify specific chemical groups and functional groups using

medchem.groups

.

Available Groups:

Hinge binders
Phosphate binders
Michael acceptors
Reactive groups
Custom SMARTS patterns

Usage:

python

import medchem as mc

使用

medchem.groups

识别特定化学基团和官能团。

可用基团：

铰链结合剂
磷酸盐结合剂
Michael受体
反应性基团
自定义SMARTS模式

使用示例：

python

import medchem as mc

Create group detector

创建基团检测器

group = mc.groups.ChemicalGroup(groups=["hinge_binders"])

Check for matches

检测匹配情况

has_matches = group.has_match(mol_list)

Get detailed match information

获取详细匹配信息

matches = group.get_matches(mol)

undefined

matches = group.get_matches(mol)

undefined

5. Named Catalogs

5. 预定义化合物目录

Access curated collections of chemical structures through

medchem.catalogs

.

Available Catalogs:

Functional groups
Protecting groups
Common reagents
Standard fragments

Usage:

python

import medchem as mc

通过

medchem.catalogs

访问整理好的化合物结构集合。

可用目录：

官能团
保护基
常用试剂
标准片段

使用示例：

python

import medchem as mc

Access named catalogs

访问预定义目录

catalogs = mc.catalogs.NamedCatalogs

Use catalog for matching

使用目录进行匹配

catalog = catalogs.get("functional_groups") matches = catalog.get_matches(mol)

undefined

catalog = catalogs.get("functional_groups") matches = catalog.get_matches(mol)

undefined

6. Molecular Complexity

6. 分子复杂度计算

Calculate complexity metrics that approximate synthetic accessibility using

medchem.complexity

.

Common Metrics:

Bertz complexity
Whitlock complexity
Barone complexity

Usage:

python

import medchem as mc

使用

medchem.complexity

计算近似合成可及性的复杂度指标。

常用指标：

Bertz复杂度
Whitlock复杂度
Barone复杂度

使用示例：

python

import medchem as mc

Calculate complexity

计算复杂度

complexity_score = mc.complexity.calculate_complexity(mol)

Filter by complexity threshold

按复杂度阈值筛选

complex_filter = mc.complexity.ComplexityFilter(max_complexity=500) results = complex_filter(mols=mol_list)

undefined

complex_filter = mc.complexity.ComplexityFilter(max_complexity=500) results = complex_filter(mols=mol_list)

undefined

7. Constraints Filtering

7. 约束条件筛选

Apply custom property-based constraints using

medchem.constraints

.

Example Constraints:

Molecular weight ranges
LogP bounds
TPSA limits
Rotatable bond counts

Usage:

python

import medchem as mc

使用

medchem.constraints

应用基于属性的自定义约束。

示例约束：

分子量范围
LogP边界
TPSA限制
可旋转键数量

使用示例：

python

import medchem as mc

Define constraints

定义约束条件

constraints = mc.constraints.Constraints( mw_range=(200, 500), logp_range=(-2, 5), tpsa_max=140, rotatable_bonds_max=10 )

Apply constraints

应用约束条件

results = constraints(mols=mol_list, n_jobs=-1)

undefined

results = constraints(mols=mol_list, n_jobs=-1)

undefined

8. Medchem Query Language

8. Medchem查询语言

Use a specialized query language for complex filtering criteria.

Query Examples:

undefined

使用专用查询语言定义复杂筛选条件。

查询示例：

undefined

Molecules passing Ro5 AND not having common alerts

通过类药五规则且无通用结构警示的分子

"rule_of_five AND NOT common_alerts"

CNS-like molecules with low complexity

具有CNS特性且复杂度低的分子

"rule_of_cns AND complexity < 400"

Leadlike molecules without Lilly demerits

类先导化合物且无Lilly扣分的分子

"rule_of_leadlike AND lilly_demerits == 0"


**Usage:**

```python
import medchem as mc

"rule_of_leadlike AND lilly_demerits == 0"


**使用示例：**

```python
import medchem as mc

Parse and apply query

解析并应用查询

query = mc.query.parse("rule_of_five AND NOT common_alerts") results = query.apply(mols=mol_list, n_jobs=-1)

undefined

query = mc.query.parse("rule_of_five AND NOT common_alerts") results = query.apply(mols=mol_list, n_jobs=-1)

undefined

Workflow Patterns

工作流模式

Pattern 1: Initial Triage of Compound Library

模式1：化合物库初步分类

Filter a large compound collection to identify drug-like candidates.

python

import datamol as dm
import medchem as mc
import pandas as pd

筛选大型化合物集合，识别类药候选物。

python

import datamol as dm
import medchem as mc
import pandas as pd

Load compound library

加载化合物库

df = pd.read_csv("compounds.csv") mols = [dm.to_mol(smi) for smi in df["smiles"]]

Apply primary filters

应用基础规则筛选

rule_filter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_veber"]) rule_results = rule_filter(mols=mols, n_jobs=-1, progress=True)

Apply structural alerts

应用结构警示筛选

alert_filter = mc.structural.CommonAlertsFilters() alert_results = alert_filter(mols=mols, n_jobs=-1, progress=True)

Combine results

合并结果

df["passes_rules"] = rule_results["pass"] df["has_alerts"] = alert_results["has_alerts"] df["drug_like"] = df["passes_rules"] & ~df["has_alerts"]

Save filtered compounds

保存筛选后的化合物

filtered_df = df[df["drug_like"]] filtered_df.to_csv("filtered_compounds.csv", index=False)

undefined

filtered_df = df[df["drug_like"]] filtered_df.to_csv("filtered_compounds.csv", index=False)

undefined

Pattern 2: Lead Optimization Filtering

模式2：先导化合物优化筛选

Apply stricter criteria during lead optimization.

python

import medchem as mc

在先导化合物优化阶段应用更严格的筛选标准。

python

import medchem as mc

Create comprehensive filter

创建综合筛选器集合

filters = { "rules": mc.rules.RuleFilters(rule_list=["rule_of_leadlike_strict"]), "alerts": mc.structural.NIBRFilters(), "lilly": mc.structural.LillyDemeritsFilters(), "complexity": mc.complexity.ComplexityFilter(max_complexity=400) }

Apply all filters

应用所有筛选器

results = {} for name, filt in filters.items(): results[name] = filt(mols=candidate_mols, n_jobs=-1)

Identify compounds passing all filters

识别通过所有筛选的化合物

passes_all = all(r["pass"] for r in results.values())

undefined

passes_all = all(r["pass"] for r in results.values())

undefined

Pattern 3: Identify Specific Chemical Groups

模式3：识别特定化学基团

Find molecules containing specific functional groups or scaffolds.

python

import medchem as mc

查找包含特定官能团或骨架的分子。

python

import medchem as mc

Create group detector for multiple groups

创建多基团检测器

group_detector = mc.groups.ChemicalGroup( groups=["hinge_binders", "phosphate_binders"] )

Screen library

筛选化合物库

matches = group_detector.get_all_matches(mol_list)

Filter molecules with desired groups

筛选出含目标基团的分子

mol_with_groups = [mol for mol, match in zip(mol_list, matches) if match]

undefined

mol_with_groups = [mol for mol, match in zip(mol_list, matches) if match]

undefined

Best Practices

最佳实践

Context Matters: Don't blindly apply filters. Understand the biological target and chemical space.
Combine Multiple Filters: Use rules, structural alerts, and domain knowledge together for better decisions.
Use Parallelization: For large datasets (>1000 molecules), always use
```
n_jobs=-1
```
for parallel processing.
Iterative Refinement: Start with broad filters (Ro5), then apply more specific criteria (CNS, leadlike) as needed.
Document Filtering Decisions: Track which molecules were filtered out and why for reproducibility.
Validate Results: Remember that marketed drugs often fail standard filters—use these as guidelines, not absolute rules.
Consider Prodrugs: Molecules designed as prodrugs may intentionally violate standard medicinal chemistry rules.

上下文至关重要：不要盲目应用筛选器。了解生物靶点和化学空间。
组合多筛选器：结合规则、结构警示和领域知识，做出更优决策。
使用并行处理：对于大型数据集（>1000个分子），始终使用
```
n_jobs=-1
```
进行并行处理。
迭代优化：从宽泛的筛选器（如类药五规则）开始，根据需要应用更具体的标准（如CNS、类先导化合物规则）。
记录筛选决策：跟踪哪些分子被筛选掉及原因，确保可重复性。
验证结果：上市药物通常不符合标准筛选器——将这些作为指南，而非绝对规则。
考虑前药：设计为前药的分子可能故意违反标准药物化学规则。

Resources

资源

references/api_guide.md

Comprehensive API reference covering all medchem modules with detailed function signatures, parameters, and return types.

涵盖所有medchem模块的综合API参考，包含详细的函数签名、参数和返回类型。

references/rules_catalog.md

Complete catalog of available rules, filters, and alerts with descriptions, thresholds, and literature references.

完整的可用规则、筛选器和警示目录，包含描述、阈值和文献参考。

scripts/filter_molecules.py

Production-ready script for batch filtering workflows. Supports multiple input formats (CSV, SDF, SMILES), configurable filter combinations, and detailed reporting.

Usage:

bash

python scripts/filter_molecules.py input.csv --rules rule_of_five,rule_of_cns --alerts nibr --output filtered.csv

用于批量筛选工作流的生产级脚本。支持多种输入格式（CSV、SDF、SMILES）、可配置的筛选组合和详细报告。

使用方法：

bash

python scripts/filter_molecules.py input.csv --rules rule_of_five,rule_of_cns --alerts nibr --output filtered.csv

Documentation

文档

Official documentation: https://medchem-docs.datamol.io/ GitHub repository: https://github.com/datamol-io/medchem

官方文档：https://medchem-docs.datamol.io/ GitHub仓库：https://github.com/datamol-io/medchem

medchem

Original

Translation

Medchem

Medchem

Overview

概述

When to Use This Skill

何时使用本工具

Installation

安装

Core Capabilities

核心功能

1. Medicinal Chemistry Rules

1. 药物化学规则

Apply Rule of Five to a SMILES string

对SMILES字符串应用类药五规则

Returns: True

返回: True

Check specific rules

检查特定规则

Load molecules

加载分子

Create filter with multiple rules

创建包含多规则的筛选器

Apply filters with parallelization

并行应用筛选器

2. Structural Alert Filters

2. 结构警示筛选器

Create filter

创建筛选器

Check single molecule

检测单个分子

Batch filtering with parallelization

批量并行筛选

Apply NIBR filters

应用NIBR筛选器

Calculate Lilly demerits

计算Lilly扣分

Each result includes demerit score and whether it passes (≤100 demerits)

每个结果包含扣分和是否通过（扣分≤100）

3. Functional API for High-Level Operations

3. 高级操作函数API

Apply NIBR filters to a list

对列表应用NIBR筛选器

Apply common alerts

应用通用结构警示筛选

4. Chemical Groups Detection

4. 化学基团检测

Create group detector

创建基团检测器

Check for matches

检测匹配情况

Get detailed match information

获取详细匹配信息

5. Named Catalogs

5. 预定义化合物目录

Access named catalogs

访问预定义目录

Use catalog for matching

使用目录进行匹配

6. Molecular Complexity

6. 分子复杂度计算

Calculate complexity

计算复杂度

Filter by complexity threshold

按复杂度阈值筛选

7. Constraints Filtering

7. 约束条件筛选

Define constraints

定义约束条件

Apply constraints

应用约束条件

8. Medchem Query Language

8. Medchem查询语言

Molecules passing Ro5 AND not having common alerts

通过类药五规则且无通用结构警示的分子

CNS-like molecules with low complexity

具有CNS特性且复杂度低的分子

Leadlike molecules without Lilly demerits