medchem

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Medchem

Medchem

Overview

概述

Medchem is a Python library for molecular filtering and prioritization in drug discovery workflows. Apply hundreds of well-established and novel molecular filters, structural alerts, and medicinal chemistry rules to efficiently triage and prioritize compound libraries at scale. Rules and filters are context-specific—use as guidelines combined with domain expertise.
Medchem是一个用于药物发现工作流中分子筛选和优先级排序的Python库。应用数百种成熟且新颖的分子过滤器、结构警示和药物化学规则,高效地大规模分类和优先处理化合物库。规则和过滤器具有上下文特异性——需结合领域专业知识作为指南使用。

When to Use This Skill

何时使用本工具

This skill should be used when:
  • Applying drug-likeness rules (Lipinski, Veber, etc.) to compound libraries
  • Filtering molecules by structural alerts or PAINS patterns
  • Prioritizing compounds for lead optimization
  • Assessing compound quality and medicinal chemistry properties
  • Detecting reactive or problematic functional groups
  • Calculating molecular complexity metrics
本工具适用于以下场景:
  • 对化合物库应用类药规则(Lipinski、Veber等)
  • 按结构警示或PAINS模式筛选分子
  • 为先导化合物优化确定优先级
  • 评估化合物质量和药物化学特性
  • 检测反应性或有问题的官能团
  • 计算分子复杂度指标

Installation

安装

bash
uv pip install medchem
bash
uv pip install medchem

Core Capabilities

核心功能

1. Medicinal Chemistry Rules

1. 药物化学规则

Apply established drug-likeness rules to molecules using the
medchem.rules
module.
Available Rules:
  • Rule of Five (Lipinski)
  • Rule of Oprea
  • Rule of CNS
  • Rule of leadlike (soft and strict)
  • Rule of three
  • Rule of Reos
  • Rule of drug
  • Rule of Veber
  • Golden triangle
  • PAINS filters
Single Rule Application:
python
import medchem as mc
使用
medchem.rules
模块对分子应用成熟的类药规则。
可用规则:
  • 类药五规则(Lipinski)
  • Oprea规则
  • CNS规则
  • 类先导化合物规则(宽松版和严格版)
  • 三规则
  • Reos规则
  • 药物规则
  • Veber规则
  • 金三角规则
  • PAINS过滤器
单规则应用:
python
import medchem as mc

Apply Rule of Five to a SMILES string

对SMILES字符串应用类药五规则

smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # Aspirin passes = mc.rules.basic_rules.rule_of_five(smiles)
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # 阿司匹林 passes = mc.rules.basic_rules.rule_of_five(smiles)

Returns: True

返回: True

Check specific rules

检查特定规则

passes_oprea = mc.rules.basic_rules.rule_of_oprea(smiles) passes_cns = mc.rules.basic_rules.rule_of_cns(smiles)

**Multiple Rules with RuleFilters:**

```python
import datamol as dm
import medchem as mc
passes_oprea = mc.rules.basic_rules.rule_of_oprea(smiles) passes_cns = mc.rules.basic_rules.rule_of_cns(smiles)

**多规则组合筛选(RuleFilters):**

```python
import datamol as dm
import medchem as mc

Load molecules

加载分子

mols = [dm.to_mol(smiles) for smiles in smiles_list]
mols = [dm.to_mol(smiles) for smiles in smiles_list]

Create filter with multiple rules

创建包含多规则的筛选器

rfilter = mc.rules.RuleFilters( rule_list=[ "rule_of_five", "rule_of_oprea", "rule_of_cns", "rule_of_leadlike_soft" ] )
rfilter = mc.rules.RuleFilters( rule_list=[ "rule_of_five", "rule_of_oprea", "rule_of_cns", "rule_of_leadlike_soft" ] )

Apply filters with parallelization

并行应用筛选器

results = rfilter( mols=mols, n_jobs=-1, # Use all CPU cores progress=True )

**Result Format:**
Results are returned as dictionaries with pass/fail status and detailed information for each rule.
results = rfilter( mols=mols, n_jobs=-1, # 使用全部CPU核心 progress=True )

**结果格式:**
结果以字典形式返回,包含每条规则的通过/失败状态及详细信息。

2. Structural Alert Filters

2. 结构警示筛选器

Detect potentially problematic structural patterns using the
medchem.structural
module.
Available Filters:
  1. Common Alerts - General structural alerts derived from ChEMBL curation and literature
  2. NIBR Filters - Novartis Institutes for BioMedical Research filter set
  3. Lilly Demerits - Eli Lilly's demerit-based system (275 rules, molecules rejected at >100 demerits)
Common Alerts:
python
import medchem as mc
使用
medchem.structural
模块检测潜在的有问题结构模式。
可用筛选器:
  1. 通用结构警示 - 源自ChEMBL整理和文献的通用结构警示
  2. NIBR筛选器 - 诺华生物医学研究所的筛选器集合
  3. Lilly扣分系统 - 礼来公司的扣分体系(275条规则,扣分>100的分子被排除)
通用结构警示:
python
import medchem as mc

Create filter

创建筛选器

alert_filter = mc.structural.CommonAlertsFilters()
alert_filter = mc.structural.CommonAlertsFilters()

Check single molecule

检测单个分子

mol = dm.to_mol("c1ccccc1") has_alerts, details = alert_filter.check_mol(mol)
mol = dm.to_mol("c1ccccc1") has_alerts, details = alert_filter.check_mol(mol)

Batch filtering with parallelization

批量并行筛选

results = alert_filter( mols=mol_list, n_jobs=-1, progress=True )

**NIBR Filters:**

```python
import medchem as mc
results = alert_filter( mols=mol_list, n_jobs=-1, progress=True )

**NIBR筛选器:**

```python
import medchem as mc

Apply NIBR filters

应用NIBR筛选器

nibr_filter = mc.structural.NIBRFilters() results = nibr_filter(mols=mol_list, n_jobs=-1)

**Lilly Demerits:**

```python
import medchem as mc
nibr_filter = mc.structural.NIBRFilters() results = nibr_filter(mols=mol_list, n_jobs=-1)

**Lilly扣分系统:**

```python
import medchem as mc

Calculate Lilly demerits

计算Lilly扣分

lilly = mc.structural.LillyDemeritsFilters() results = lilly(mols=mol_list, n_jobs=-1)
lilly = mc.structural.LillyDemeritsFilters() results = lilly(mols=mol_list, n_jobs=-1)

Each result includes demerit score and whether it passes (≤100 demerits)

每个结果包含扣分和是否通过(扣分≤100)

undefined
undefined

3. Functional API for High-Level Operations

3. 高级操作函数API

The
medchem.functional
module provides convenient functions for common workflows.
Quick Filtering:
python
import medchem as mc
medchem.functional
模块提供常见工作流的便捷函数。
快速筛选:
python
import medchem as mc

Apply NIBR filters to a list

对列表应用NIBR筛选器

filter_ok = mc.functional.nibr_filter( mols=mol_list, n_jobs=-1 )
filter_ok = mc.functional.nibr_filter( mols=mol_list, n_jobs=-1 )

Apply common alerts

应用通用结构警示筛选

alert_results = mc.functional.common_alerts_filter( mols=mol_list, n_jobs=-1 )
undefined
alert_results = mc.functional.common_alerts_filter( mols=mol_list, n_jobs=-1 )
undefined

4. Chemical Groups Detection

4. 化学基团检测

Identify specific chemical groups and functional groups using
medchem.groups
.
Available Groups:
  • Hinge binders
  • Phosphate binders
  • Michael acceptors
  • Reactive groups
  • Custom SMARTS patterns
Usage:
python
import medchem as mc
使用
medchem.groups
识别特定化学基团和官能团。
可用基团:
  • 铰链结合剂
  • 磷酸盐结合剂
  • Michael受体
  • 反应性基团
  • 自定义SMARTS模式
使用示例:
python
import medchem as mc

Create group detector

创建基团检测器

group = mc.groups.ChemicalGroup(groups=["hinge_binders"])
group = mc.groups.ChemicalGroup(groups=["hinge_binders"])

Check for matches

检测匹配情况

has_matches = group.has_match(mol_list)
has_matches = group.has_match(mol_list)

Get detailed match information

获取详细匹配信息

matches = group.get_matches(mol)
undefined
matches = group.get_matches(mol)
undefined

5. Named Catalogs

5. 预定义化合物目录

Access curated collections of chemical structures through
medchem.catalogs
.
Available Catalogs:
  • Functional groups
  • Protecting groups
  • Common reagents
  • Standard fragments
Usage:
python
import medchem as mc
通过
medchem.catalogs
访问整理好的化合物结构集合。
可用目录:
  • 官能团
  • 保护基
  • 常用试剂
  • 标准片段
使用示例:
python
import medchem as mc

Access named catalogs

访问预定义目录

catalogs = mc.catalogs.NamedCatalogs
catalogs = mc.catalogs.NamedCatalogs

Use catalog for matching

使用目录进行匹配

catalog = catalogs.get("functional_groups") matches = catalog.get_matches(mol)
undefined
catalog = catalogs.get("functional_groups") matches = catalog.get_matches(mol)
undefined

6. Molecular Complexity

6. 分子复杂度计算

Calculate complexity metrics that approximate synthetic accessibility using
medchem.complexity
.
Common Metrics:
  • Bertz complexity
  • Whitlock complexity
  • Barone complexity
Usage:
python
import medchem as mc
使用
medchem.complexity
计算近似合成可及性的复杂度指标。
常用指标:
  • Bertz复杂度
  • Whitlock复杂度
  • Barone复杂度
使用示例:
python
import medchem as mc

Calculate complexity

计算复杂度

complexity_score = mc.complexity.calculate_complexity(mol)
complexity_score = mc.complexity.calculate_complexity(mol)

Filter by complexity threshold

按复杂度阈值筛选

complex_filter = mc.complexity.ComplexityFilter(max_complexity=500) results = complex_filter(mols=mol_list)
undefined
complex_filter = mc.complexity.ComplexityFilter(max_complexity=500) results = complex_filter(mols=mol_list)
undefined

7. Constraints Filtering

7. 约束条件筛选

Apply custom property-based constraints using
medchem.constraints
.
Example Constraints:
  • Molecular weight ranges
  • LogP bounds
  • TPSA limits
  • Rotatable bond counts
Usage:
python
import medchem as mc
使用
medchem.constraints
应用基于属性的自定义约束。
示例约束:
  • 分子量范围
  • LogP边界
  • TPSA限制
  • 可旋转键数量
使用示例:
python
import medchem as mc

Define constraints

定义约束条件

constraints = mc.constraints.Constraints( mw_range=(200, 500), logp_range=(-2, 5), tpsa_max=140, rotatable_bonds_max=10 )
constraints = mc.constraints.Constraints( mw_range=(200, 500), logp_range=(-2, 5), tpsa_max=140, rotatable_bonds_max=10 )

Apply constraints

应用约束条件

results = constraints(mols=mol_list, n_jobs=-1)
undefined
results = constraints(mols=mol_list, n_jobs=-1)
undefined

8. Medchem Query Language

8. Medchem查询语言

Use a specialized query language for complex filtering criteria.
Query Examples:
undefined
使用专用查询语言定义复杂筛选条件。
查询示例:
undefined

Molecules passing Ro5 AND not having common alerts

通过类药五规则且无通用结构警示的分子

"rule_of_five AND NOT common_alerts"
"rule_of_five AND NOT common_alerts"

CNS-like molecules with low complexity

具有CNS特性且复杂度低的分子

"rule_of_cns AND complexity < 400"
"rule_of_cns AND complexity < 400"

Leadlike molecules without Lilly demerits

类先导化合物且无Lilly扣分的分子

"rule_of_leadlike AND lilly_demerits == 0"

**Usage:**

```python
import medchem as mc
"rule_of_leadlike AND lilly_demerits == 0"

**使用示例:**

```python
import medchem as mc

Parse and apply query

解析并应用查询

query = mc.query.parse("rule_of_five AND NOT common_alerts") results = query.apply(mols=mol_list, n_jobs=-1)
undefined
query = mc.query.parse("rule_of_five AND NOT common_alerts") results = query.apply(mols=mol_list, n_jobs=-1)
undefined

Workflow Patterns

工作流模式

Pattern 1: Initial Triage of Compound Library

模式1:化合物库初步分类

Filter a large compound collection to identify drug-like candidates.
python
import datamol as dm
import medchem as mc
import pandas as pd
筛选大型化合物集合,识别类药候选物。
python
import datamol as dm
import medchem as mc
import pandas as pd

Load compound library

加载化合物库

df = pd.read_csv("compounds.csv") mols = [dm.to_mol(smi) for smi in df["smiles"]]
df = pd.read_csv("compounds.csv") mols = [dm.to_mol(smi) for smi in df["smiles"]]

Apply primary filters

应用基础规则筛选

rule_filter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_veber"]) rule_results = rule_filter(mols=mols, n_jobs=-1, progress=True)
rule_filter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_veber"]) rule_results = rule_filter(mols=mols, n_jobs=-1, progress=True)

Apply structural alerts

应用结构警示筛选

alert_filter = mc.structural.CommonAlertsFilters() alert_results = alert_filter(mols=mols, n_jobs=-1, progress=True)
alert_filter = mc.structural.CommonAlertsFilters() alert_results = alert_filter(mols=mols, n_jobs=-1, progress=True)

Combine results

合并结果

df["passes_rules"] = rule_results["pass"] df["has_alerts"] = alert_results["has_alerts"] df["drug_like"] = df["passes_rules"] & ~df["has_alerts"]
df["passes_rules"] = rule_results["pass"] df["has_alerts"] = alert_results["has_alerts"] df["drug_like"] = df["passes_rules"] & ~df["has_alerts"]

Save filtered compounds

保存筛选后的化合物

filtered_df = df[df["drug_like"]] filtered_df.to_csv("filtered_compounds.csv", index=False)
undefined
filtered_df = df[df["drug_like"]] filtered_df.to_csv("filtered_compounds.csv", index=False)
undefined

Pattern 2: Lead Optimization Filtering

模式2:先导化合物优化筛选

Apply stricter criteria during lead optimization.
python
import medchem as mc
在先导化合物优化阶段应用更严格的筛选标准。
python
import medchem as mc

Create comprehensive filter

创建综合筛选器集合

filters = { "rules": mc.rules.RuleFilters(rule_list=["rule_of_leadlike_strict"]), "alerts": mc.structural.NIBRFilters(), "lilly": mc.structural.LillyDemeritsFilters(), "complexity": mc.complexity.ComplexityFilter(max_complexity=400) }
filters = { "rules": mc.rules.RuleFilters(rule_list=["rule_of_leadlike_strict"]), "alerts": mc.structural.NIBRFilters(), "lilly": mc.structural.LillyDemeritsFilters(), "complexity": mc.complexity.ComplexityFilter(max_complexity=400) }

Apply all filters

应用所有筛选器

results = {} for name, filt in filters.items(): results[name] = filt(mols=candidate_mols, n_jobs=-1)
results = {} for name, filt in filters.items(): results[name] = filt(mols=candidate_mols, n_jobs=-1)

Identify compounds passing all filters

识别通过所有筛选的化合物

passes_all = all(r["pass"] for r in results.values())
undefined
passes_all = all(r["pass"] for r in results.values())
undefined

Pattern 3: Identify Specific Chemical Groups

模式3:识别特定化学基团

Find molecules containing specific functional groups or scaffolds.
python
import medchem as mc
查找包含特定官能团或骨架的分子。
python
import medchem as mc

Create group detector for multiple groups

创建多基团检测器

group_detector = mc.groups.ChemicalGroup( groups=["hinge_binders", "phosphate_binders"] )
group_detector = mc.groups.ChemicalGroup( groups=["hinge_binders", "phosphate_binders"] )

Screen library

筛选化合物库

matches = group_detector.get_all_matches(mol_list)
matches = group_detector.get_all_matches(mol_list)

Filter molecules with desired groups

筛选出含目标基团的分子

mol_with_groups = [mol for mol, match in zip(mol_list, matches) if match]
undefined
mol_with_groups = [mol for mol, match in zip(mol_list, matches) if match]
undefined

Best Practices

最佳实践

  1. Context Matters: Don't blindly apply filters. Understand the biological target and chemical space.
  2. Combine Multiple Filters: Use rules, structural alerts, and domain knowledge together for better decisions.
  3. Use Parallelization: For large datasets (>1000 molecules), always use
    n_jobs=-1
    for parallel processing.
  4. Iterative Refinement: Start with broad filters (Ro5), then apply more specific criteria (CNS, leadlike) as needed.
  5. Document Filtering Decisions: Track which molecules were filtered out and why for reproducibility.
  6. Validate Results: Remember that marketed drugs often fail standard filters—use these as guidelines, not absolute rules.
  7. Consider Prodrugs: Molecules designed as prodrugs may intentionally violate standard medicinal chemistry rules.
  1. 上下文至关重要:不要盲目应用筛选器。了解生物靶点和化学空间。
  2. 组合多筛选器:结合规则、结构警示和领域知识,做出更优决策。
  3. 使用并行处理:对于大型数据集(>1000个分子),始终使用
    n_jobs=-1
    进行并行处理。
  4. 迭代优化:从宽泛的筛选器(如类药五规则)开始,根据需要应用更具体的标准(如CNS、类先导化合物规则)。
  5. 记录筛选决策:跟踪哪些分子被筛选掉及原因,确保可重复性。
  6. 验证结果:上市药物通常不符合标准筛选器——将这些作为指南,而非绝对规则。
  7. 考虑前药:设计为前药的分子可能故意违反标准药物化学规则。

Resources

资源

references/api_guide.md

references/api_guide.md

Comprehensive API reference covering all medchem modules with detailed function signatures, parameters, and return types.
涵盖所有medchem模块的综合API参考,包含详细的函数签名、参数和返回类型。

references/rules_catalog.md

references/rules_catalog.md

Complete catalog of available rules, filters, and alerts with descriptions, thresholds, and literature references.
完整的可用规则、筛选器和警示目录,包含描述、阈值和文献参考。

scripts/filter_molecules.py

scripts/filter_molecules.py

Production-ready script for batch filtering workflows. Supports multiple input formats (CSV, SDF, SMILES), configurable filter combinations, and detailed reporting.
Usage:
bash
python scripts/filter_molecules.py input.csv --rules rule_of_five,rule_of_cns --alerts nibr --output filtered.csv
用于批量筛选工作流的生产级脚本。支持多种输入格式(CSV、SDF、SMILES)、可配置的筛选组合和详细报告。
使用方法:
bash
python scripts/filter_molecules.py input.csv --rules rule_of_five,rule_of_cns --alerts nibr --output filtered.csv

Documentation

文档