medchem
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMedchem
Medchem
Overview
概述
Medchem is a Python library for molecular filtering and prioritization in drug discovery workflows. Apply hundreds of well-established and novel molecular filters, structural alerts, and medicinal chemistry rules to efficiently triage and prioritize compound libraries at scale. Rules and filters are context-specific—use as guidelines combined with domain expertise.
Medchem是一个用于药物发现工作流中分子筛选和优先级排序的Python库。应用数百种成熟且新颖的分子过滤器、结构警示和药物化学规则,高效地大规模分类和优先处理化合物库。规则和过滤器具有上下文特异性——需结合领域专业知识作为指南使用。
When to Use This Skill
何时使用本工具
This skill should be used when:
- Applying drug-likeness rules (Lipinski, Veber, etc.) to compound libraries
- Filtering molecules by structural alerts or PAINS patterns
- Prioritizing compounds for lead optimization
- Assessing compound quality and medicinal chemistry properties
- Detecting reactive or problematic functional groups
- Calculating molecular complexity metrics
本工具适用于以下场景:
- 对化合物库应用类药规则(Lipinski、Veber等)
- 按结构警示或PAINS模式筛选分子
- 为先导化合物优化确定优先级
- 评估化合物质量和药物化学特性
- 检测反应性或有问题的官能团
- 计算分子复杂度指标
Installation
安装
bash
uv pip install medchembash
uv pip install medchemCore Capabilities
核心功能
1. Medicinal Chemistry Rules
1. 药物化学规则
Apply established drug-likeness rules to molecules using the module.
medchem.rulesAvailable Rules:
- Rule of Five (Lipinski)
- Rule of Oprea
- Rule of CNS
- Rule of leadlike (soft and strict)
- Rule of three
- Rule of Reos
- Rule of drug
- Rule of Veber
- Golden triangle
- PAINS filters
Single Rule Application:
python
import medchem as mc使用模块对分子应用成熟的类药规则。
medchem.rules可用规则:
- 类药五规则(Lipinski)
- Oprea规则
- CNS规则
- 类先导化合物规则(宽松版和严格版)
- 三规则
- Reos规则
- 药物规则
- Veber规则
- 金三角规则
- PAINS过滤器
单规则应用:
python
import medchem as mcApply Rule of Five to a SMILES string
对SMILES字符串应用类药五规则
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # Aspirin
passes = mc.rules.basic_rules.rule_of_five(smiles)
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # 阿司匹林
passes = mc.rules.basic_rules.rule_of_five(smiles)
Returns: True
返回: True
Check specific rules
检查特定规则
passes_oprea = mc.rules.basic_rules.rule_of_oprea(smiles)
passes_cns = mc.rules.basic_rules.rule_of_cns(smiles)
**Multiple Rules with RuleFilters:**
```python
import datamol as dm
import medchem as mcpasses_oprea = mc.rules.basic_rules.rule_of_oprea(smiles)
passes_cns = mc.rules.basic_rules.rule_of_cns(smiles)
**多规则组合筛选(RuleFilters):**
```python
import datamol as dm
import medchem as mcLoad molecules
加载分子
mols = [dm.to_mol(smiles) for smiles in smiles_list]
mols = [dm.to_mol(smiles) for smiles in smiles_list]
Create filter with multiple rules
创建包含多规则的筛选器
rfilter = mc.rules.RuleFilters(
rule_list=[
"rule_of_five",
"rule_of_oprea",
"rule_of_cns",
"rule_of_leadlike_soft"
]
)
rfilter = mc.rules.RuleFilters(
rule_list=[
"rule_of_five",
"rule_of_oprea",
"rule_of_cns",
"rule_of_leadlike_soft"
]
)
Apply filters with parallelization
并行应用筛选器
results = rfilter(
mols=mols,
n_jobs=-1, # Use all CPU cores
progress=True
)
**Result Format:**
Results are returned as dictionaries with pass/fail status and detailed information for each rule.results = rfilter(
mols=mols,
n_jobs=-1, # 使用全部CPU核心
progress=True
)
**结果格式:**
结果以字典形式返回,包含每条规则的通过/失败状态及详细信息。2. Structural Alert Filters
2. 结构警示筛选器
Detect potentially problematic structural patterns using the module.
medchem.structuralAvailable Filters:
- Common Alerts - General structural alerts derived from ChEMBL curation and literature
- NIBR Filters - Novartis Institutes for BioMedical Research filter set
- Lilly Demerits - Eli Lilly's demerit-based system (275 rules, molecules rejected at >100 demerits)
Common Alerts:
python
import medchem as mc使用模块检测潜在的有问题结构模式。
medchem.structural可用筛选器:
- 通用结构警示 - 源自ChEMBL整理和文献的通用结构警示
- NIBR筛选器 - 诺华生物医学研究所的筛选器集合
- Lilly扣分系统 - 礼来公司的扣分体系(275条规则,扣分>100的分子被排除)
通用结构警示:
python
import medchem as mcCreate filter
创建筛选器
alert_filter = mc.structural.CommonAlertsFilters()
alert_filter = mc.structural.CommonAlertsFilters()
Check single molecule
检测单个分子
mol = dm.to_mol("c1ccccc1")
has_alerts, details = alert_filter.check_mol(mol)
mol = dm.to_mol("c1ccccc1")
has_alerts, details = alert_filter.check_mol(mol)
Batch filtering with parallelization
批量并行筛选
results = alert_filter(
mols=mol_list,
n_jobs=-1,
progress=True
)
**NIBR Filters:**
```python
import medchem as mcresults = alert_filter(
mols=mol_list,
n_jobs=-1,
progress=True
)
**NIBR筛选器:**
```python
import medchem as mcApply NIBR filters
应用NIBR筛选器
nibr_filter = mc.structural.NIBRFilters()
results = nibr_filter(mols=mol_list, n_jobs=-1)
**Lilly Demerits:**
```python
import medchem as mcnibr_filter = mc.structural.NIBRFilters()
results = nibr_filter(mols=mol_list, n_jobs=-1)
**Lilly扣分系统:**
```python
import medchem as mcCalculate Lilly demerits
计算Lilly扣分
lilly = mc.structural.LillyDemeritsFilters()
results = lilly(mols=mol_list, n_jobs=-1)
lilly = mc.structural.LillyDemeritsFilters()
results = lilly(mols=mol_list, n_jobs=-1)
Each result includes demerit score and whether it passes (≤100 demerits)
每个结果包含扣分和是否通过(扣分≤100)
undefinedundefined3. Functional API for High-Level Operations
3. 高级操作函数API
The module provides convenient functions for common workflows.
medchem.functionalQuick Filtering:
python
import medchem as mcmedchem.functional快速筛选:
python
import medchem as mcApply NIBR filters to a list
对列表应用NIBR筛选器
filter_ok = mc.functional.nibr_filter(
mols=mol_list,
n_jobs=-1
)
filter_ok = mc.functional.nibr_filter(
mols=mol_list,
n_jobs=-1
)
Apply common alerts
应用通用结构警示筛选
alert_results = mc.functional.common_alerts_filter(
mols=mol_list,
n_jobs=-1
)
undefinedalert_results = mc.functional.common_alerts_filter(
mols=mol_list,
n_jobs=-1
)
undefined4. Chemical Groups Detection
4. 化学基团检测
Identify specific chemical groups and functional groups using .
medchem.groupsAvailable Groups:
- Hinge binders
- Phosphate binders
- Michael acceptors
- Reactive groups
- Custom SMARTS patterns
Usage:
python
import medchem as mc使用识别特定化学基团和官能团。
medchem.groups可用基团:
- 铰链结合剂
- 磷酸盐结合剂
- Michael受体
- 反应性基团
- 自定义SMARTS模式
使用示例:
python
import medchem as mcCreate group detector
创建基团检测器
group = mc.groups.ChemicalGroup(groups=["hinge_binders"])
group = mc.groups.ChemicalGroup(groups=["hinge_binders"])
Check for matches
检测匹配情况
has_matches = group.has_match(mol_list)
has_matches = group.has_match(mol_list)
Get detailed match information
获取详细匹配信息
matches = group.get_matches(mol)
undefinedmatches = group.get_matches(mol)
undefined5. Named Catalogs
5. 预定义化合物目录
Access curated collections of chemical structures through .
medchem.catalogsAvailable Catalogs:
- Functional groups
- Protecting groups
- Common reagents
- Standard fragments
Usage:
python
import medchem as mc通过访问整理好的化合物结构集合。
medchem.catalogs可用目录:
- 官能团
- 保护基
- 常用试剂
- 标准片段
使用示例:
python
import medchem as mcAccess named catalogs
访问预定义目录
catalogs = mc.catalogs.NamedCatalogs
catalogs = mc.catalogs.NamedCatalogs
Use catalog for matching
使用目录进行匹配
catalog = catalogs.get("functional_groups")
matches = catalog.get_matches(mol)
undefinedcatalog = catalogs.get("functional_groups")
matches = catalog.get_matches(mol)
undefined6. Molecular Complexity
6. 分子复杂度计算
Calculate complexity metrics that approximate synthetic accessibility using .
medchem.complexityCommon Metrics:
- Bertz complexity
- Whitlock complexity
- Barone complexity
Usage:
python
import medchem as mc使用计算近似合成可及性的复杂度指标。
medchem.complexity常用指标:
- Bertz复杂度
- Whitlock复杂度
- Barone复杂度
使用示例:
python
import medchem as mcCalculate complexity
计算复杂度
complexity_score = mc.complexity.calculate_complexity(mol)
complexity_score = mc.complexity.calculate_complexity(mol)
Filter by complexity threshold
按复杂度阈值筛选
complex_filter = mc.complexity.ComplexityFilter(max_complexity=500)
results = complex_filter(mols=mol_list)
undefinedcomplex_filter = mc.complexity.ComplexityFilter(max_complexity=500)
results = complex_filter(mols=mol_list)
undefined7. Constraints Filtering
7. 约束条件筛选
Apply custom property-based constraints using .
medchem.constraintsExample Constraints:
- Molecular weight ranges
- LogP bounds
- TPSA limits
- Rotatable bond counts
Usage:
python
import medchem as mc使用应用基于属性的自定义约束。
medchem.constraints示例约束:
- 分子量范围
- LogP边界
- TPSA限制
- 可旋转键数量
使用示例:
python
import medchem as mcDefine constraints
定义约束条件
constraints = mc.constraints.Constraints(
mw_range=(200, 500),
logp_range=(-2, 5),
tpsa_max=140,
rotatable_bonds_max=10
)
constraints = mc.constraints.Constraints(
mw_range=(200, 500),
logp_range=(-2, 5),
tpsa_max=140,
rotatable_bonds_max=10
)
Apply constraints
应用约束条件
results = constraints(mols=mol_list, n_jobs=-1)
undefinedresults = constraints(mols=mol_list, n_jobs=-1)
undefined8. Medchem Query Language
8. Medchem查询语言
Use a specialized query language for complex filtering criteria.
Query Examples:
undefined使用专用查询语言定义复杂筛选条件。
查询示例:
undefinedMolecules passing Ro5 AND not having common alerts
通过类药五规则且无通用结构警示的分子
"rule_of_five AND NOT common_alerts"
"rule_of_five AND NOT common_alerts"
CNS-like molecules with low complexity
具有CNS特性且复杂度低的分子
"rule_of_cns AND complexity < 400"
"rule_of_cns AND complexity < 400"
Leadlike molecules without Lilly demerits
类先导化合物且无Lilly扣分的分子
"rule_of_leadlike AND lilly_demerits == 0"
**Usage:**
```python
import medchem as mc"rule_of_leadlike AND lilly_demerits == 0"
**使用示例:**
```python
import medchem as mcParse and apply query
解析并应用查询
query = mc.query.parse("rule_of_five AND NOT common_alerts")
results = query.apply(mols=mol_list, n_jobs=-1)
undefinedquery = mc.query.parse("rule_of_five AND NOT common_alerts")
results = query.apply(mols=mol_list, n_jobs=-1)
undefinedWorkflow Patterns
工作流模式
Pattern 1: Initial Triage of Compound Library
模式1:化合物库初步分类
Filter a large compound collection to identify drug-like candidates.
python
import datamol as dm
import medchem as mc
import pandas as pd筛选大型化合物集合,识别类药候选物。
python
import datamol as dm
import medchem as mc
import pandas as pdLoad compound library
加载化合物库
df = pd.read_csv("compounds.csv")
mols = [dm.to_mol(smi) for smi in df["smiles"]]
df = pd.read_csv("compounds.csv")
mols = [dm.to_mol(smi) for smi in df["smiles"]]
Apply primary filters
应用基础规则筛选
rule_filter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_veber"])
rule_results = rule_filter(mols=mols, n_jobs=-1, progress=True)
rule_filter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_veber"])
rule_results = rule_filter(mols=mols, n_jobs=-1, progress=True)
Apply structural alerts
应用结构警示筛选
alert_filter = mc.structural.CommonAlertsFilters()
alert_results = alert_filter(mols=mols, n_jobs=-1, progress=True)
alert_filter = mc.structural.CommonAlertsFilters()
alert_results = alert_filter(mols=mols, n_jobs=-1, progress=True)
Combine results
合并结果
df["passes_rules"] = rule_results["pass"]
df["has_alerts"] = alert_results["has_alerts"]
df["drug_like"] = df["passes_rules"] & ~df["has_alerts"]
df["passes_rules"] = rule_results["pass"]
df["has_alerts"] = alert_results["has_alerts"]
df["drug_like"] = df["passes_rules"] & ~df["has_alerts"]
Save filtered compounds
保存筛选后的化合物
filtered_df = df[df["drug_like"]]
filtered_df.to_csv("filtered_compounds.csv", index=False)
undefinedfiltered_df = df[df["drug_like"]]
filtered_df.to_csv("filtered_compounds.csv", index=False)
undefinedPattern 2: Lead Optimization Filtering
模式2:先导化合物优化筛选
Apply stricter criteria during lead optimization.
python
import medchem as mc在先导化合物优化阶段应用更严格的筛选标准。
python
import medchem as mcCreate comprehensive filter
创建综合筛选器集合
filters = {
"rules": mc.rules.RuleFilters(rule_list=["rule_of_leadlike_strict"]),
"alerts": mc.structural.NIBRFilters(),
"lilly": mc.structural.LillyDemeritsFilters(),
"complexity": mc.complexity.ComplexityFilter(max_complexity=400)
}
filters = {
"rules": mc.rules.RuleFilters(rule_list=["rule_of_leadlike_strict"]),
"alerts": mc.structural.NIBRFilters(),
"lilly": mc.structural.LillyDemeritsFilters(),
"complexity": mc.complexity.ComplexityFilter(max_complexity=400)
}
Apply all filters
应用所有筛选器
results = {}
for name, filt in filters.items():
results[name] = filt(mols=candidate_mols, n_jobs=-1)
results = {}
for name, filt in filters.items():
results[name] = filt(mols=candidate_mols, n_jobs=-1)
Identify compounds passing all filters
识别通过所有筛选的化合物
passes_all = all(r["pass"] for r in results.values())
undefinedpasses_all = all(r["pass"] for r in results.values())
undefinedPattern 3: Identify Specific Chemical Groups
模式3:识别特定化学基团
Find molecules containing specific functional groups or scaffolds.
python
import medchem as mc查找包含特定官能团或骨架的分子。
python
import medchem as mcCreate group detector for multiple groups
创建多基团检测器
group_detector = mc.groups.ChemicalGroup(
groups=["hinge_binders", "phosphate_binders"]
)
group_detector = mc.groups.ChemicalGroup(
groups=["hinge_binders", "phosphate_binders"]
)
Screen library
筛选化合物库
matches = group_detector.get_all_matches(mol_list)
matches = group_detector.get_all_matches(mol_list)
Filter molecules with desired groups
筛选出含目标基团的分子
mol_with_groups = [mol for mol, match in zip(mol_list, matches) if match]
undefinedmol_with_groups = [mol for mol, match in zip(mol_list, matches) if match]
undefinedBest Practices
最佳实践
-
Context Matters: Don't blindly apply filters. Understand the biological target and chemical space.
-
Combine Multiple Filters: Use rules, structural alerts, and domain knowledge together for better decisions.
-
Use Parallelization: For large datasets (>1000 molecules), always usefor parallel processing.
n_jobs=-1 -
Iterative Refinement: Start with broad filters (Ro5), then apply more specific criteria (CNS, leadlike) as needed.
-
Document Filtering Decisions: Track which molecules were filtered out and why for reproducibility.
-
Validate Results: Remember that marketed drugs often fail standard filters—use these as guidelines, not absolute rules.
-
Consider Prodrugs: Molecules designed as prodrugs may intentionally violate standard medicinal chemistry rules.
-
上下文至关重要:不要盲目应用筛选器。了解生物靶点和化学空间。
-
组合多筛选器:结合规则、结构警示和领域知识,做出更优决策。
-
使用并行处理:对于大型数据集(>1000个分子),始终使用进行并行处理。
n_jobs=-1 -
迭代优化:从宽泛的筛选器(如类药五规则)开始,根据需要应用更具体的标准(如CNS、类先导化合物规则)。
-
记录筛选决策:跟踪哪些分子被筛选掉及原因,确保可重复性。
-
验证结果:上市药物通常不符合标准筛选器——将这些作为指南,而非绝对规则。
-
考虑前药:设计为前药的分子可能故意违反标准药物化学规则。
Resources
资源
references/api_guide.md
references/api_guide.md
Comprehensive API reference covering all medchem modules with detailed function signatures, parameters, and return types.
涵盖所有medchem模块的综合API参考,包含详细的函数签名、参数和返回类型。
references/rules_catalog.md
references/rules_catalog.md
Complete catalog of available rules, filters, and alerts with descriptions, thresholds, and literature references.
完整的可用规则、筛选器和警示目录,包含描述、阈值和文献参考。
scripts/filter_molecules.py
scripts/filter_molecules.py
Production-ready script for batch filtering workflows. Supports multiple input formats (CSV, SDF, SMILES), configurable filter combinations, and detailed reporting.
Usage:
bash
python scripts/filter_molecules.py input.csv --rules rule_of_five,rule_of_cns --alerts nibr --output filtered.csv用于批量筛选工作流的生产级脚本。支持多种输入格式(CSV、SDF、SMILES)、可配置的筛选组合和详细报告。
使用方法:
bash
python scripts/filter_molecules.py input.csv --rules rule_of_five,rule_of_cns --alerts nibr --output filtered.csvDocumentation
文档
Official documentation: https://medchem-docs.datamol.io/
GitHub repository: https://github.com/datamol-io/medchem