tooluniverse-small-molecule-discovery

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Small Molecule Discovery Skill

小分子发现技能

Systematic small molecule identification, characterization, and sourcing using PubChem, ChEMBL, BindingDB, ADMET-AI, SwissADME, eMolecules, and Enamine. Covers the full pipeline from compound name to structure, activity, ADMET properties, and commercial procurement.
借助PubChem、ChEMBL、BindingDB、ADMET-AI、SwissADME、eMolecules和Enamine实现系统化的小分子鉴定、表征与采购。覆盖从化合物名称到结构、活性、ADMET属性、商业采购的全流程。

Domain Reasoning

领域逻辑说明

Drug-likeness is not a binary property. Lipinski's Rule of 5 was derived from orally administered, passively absorbed drugs and has many well-known exceptions: natural products, macrocycles, PROTACs, and many approved drugs violate one or more rules. The relevant question is not "does this pass Ro5?" but "does this compound's physicochemical profile match the requirements of the target, the intended route of administration, and the therapeutic context?" Focus on the specific requirements, not rigid rules.
类药性并不是二元属性。利平斯基五规则(Lipinski's Rule of 5)衍生于口服给药、被动吸收的药物,存在许多广为人知的例外:天然产物、大环化合物、PROTAC以及大量获批药物都违反一条或多条规则。核心问题不是「这个化合物符合五规则吗?」,而是「该化合物的理化特性是否匹配靶点、预期给药途径和治疗场景的要求?」。请聚焦具体需求,而非拘泥于僵化规则。

LOOK UP DON'T GUESS

请查询不要猜测

  • Compound identity (CID, ChEMBL ID, SMILES): call
    PubChem_get_CID_by_compound_name
    and
    ChEMBL_search_molecules
    ; do not assume IDs from memory.
  • ADMET properties: run
    SwissADME_calculate_adme
    or
    ADMETAI_predict_*
    on the actual SMILES; do not estimate logP, TPSA, or bioavailability.
  • Binding affinities against a target: query
    ChEMBL_search_activities
    or
    BindingDB_get_ligands_by_uniprot
    ; never cite IC50 values from memory.
  • Commercial availability: check
    eMolecules_search
    or
    Enamine_search_catalog
    ; do not assume availability.

KEY PRINCIPLES:
  1. Resolve identity first - Always get CID and ChEMBL ID before research
  2. SMILES required for property prediction - Extract canonical SMILES from PubChem early
  3. English names in tools - Use IUPAC or common English names; avoid abbreviations in tool calls
  4. BindingDB is often unavailable - Fall back to ChEMBL activities when BindingDB times out
  5. eMolecules/Enamine return URLs - These tools generate search URLs, not direct data; note this to user

  • 化合物标识(CID、ChEMBL ID、SMILES):调用
    PubChem_get_CID_by_compound_name
    ChEMBL_search_molecules
    ;不要凭记忆假设ID。
  • ADMET属性:基于实际SMILES运行
    SwissADME_calculate_adme
    ADMETAI_predict_*
    ;不要估算logP、TPSA或生物利用度。
  • 靶点结合亲和力:查询
    ChEMBL_search_activities
    BindingDB_get_ligands_by_uniprot
    ;绝对不要凭记忆引用IC50值。
  • 商业可得性:查询
    eMolecules_search
    Enamine_search_catalog
    ;不要假设是否可购买。

核心原则:
  1. 优先解决标识问题 - 开展研究前务必先获取CID和ChEMBL ID
  2. 属性预测需要SMILES - 尽早从PubChem提取标准SMILES
  3. 工具调用使用英文名称 - 采用IUPAC名或通用英文名;工具调用中避免使用缩写
  4. BindingDB经常不可用 - 当BindingDB超时,回退使用ChEMBL活性数据
  5. eMolecules/Enamine返回URL - 这些工具生成搜索链接而非直接数据,请告知用户该特性

COMPUTE, DON'T DESCRIBE

直接计算不要描述

When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
当分析需要计算(统计、数据处理、评分、富集分析)时,通过Bash编写并运行Python代码。不要描述你打算做什么——直接执行并返回实际结果。使用ToolUniverse工具检索数据,再通过Python(pandas、scipy、statsmodels、matplotlib)开展分析。

When to Use

适用场景

  • "Find information about compound X"
  • "What is the drug-likeness of this SMILES?"
  • "Show binding affinities for EGFR inhibitors"
  • "Search for compounds similar to imatinib"
  • "Is this compound commercially available?"
  • "What are the ADMET properties of this molecule?"
  • "Find ChEMBL activities for target Y"
  • "Predict targets for this small molecule"

  • 「查找化合物X的相关信息」
  • 「这个SMILES对应的类药性如何?」
  • 「展示EGFR抑制剂的结合亲和力数据」
  • 「搜索与伊马替尼类似的化合物」
  • 「这个化合物可以商业采购吗?」
  • 「这个分子的ADMET属性是什么?」
  • 「查找靶点Y的ChEMBL活性数据」
  • 「预测这个小分子的作用靶点」

Key Tools

核心工具

ToolPurposeKey Params
PubChem_get_CID_by_compound_name
Name to CID lookup
compound_name
PubChem_get_CID_by_SMILES
SMILES to CID lookup
smiles
PubChem_get_compound_properties_by_CID
MW, formula, SMILES, InChIKey
cid
,
properties
PubChem_search_compounds_by_similarity
Find structurally similar compounds
smiles
,
threshold
(0-100)
PubChem_search_compounds_by_substructure
Substructure search
smiles
PubChem_get_compound_synonyms_by_CID
All names/synonyms
cid
ChEMBL_search_molecules
Search ChEMBL by name or ID
query
ChEMBL_get_molecule
Full ChEMBL molecule record
chembl_id
ChEMBL_search_similar_molecules
Similarity search in ChEMBL
query
(SMILES or ChEMBL ID)
ChEMBL_search_activities
Binding affinities and assay data
molecule_chembl_id
,
target_chembl_id
,
pchembl_value__gte
ChEMBL_get_drug_mechanisms
MOA for approved drugs
drug_chembl_id
or
drug_name
ChEMBL_search_targets
Find targets by name
query
,
organism
ChEMBL_get_target_activities
All ligands for a target
target_chembl_id
SwissADME_calculate_adme
Physicochemical + ADMET properties
operation="calculate_adme"
,
smiles
SwissADME_check_druglikeness
Lipinski, Veber, Egan rules
operation="check_druglikeness"
,
smiles
ADMETAI_predict_physicochemical_properties
MW, logP, TPSA, HBD/HBA
smiles
(list)
ADMETAI_predict_bioavailability
Oral bioavailability prediction
smiles
(list)
ADMETAI_predict_BBB_penetrance
Blood-brain barrier permeability
smiles
(list)
ADMETAI_predict_toxicity
hERG, DILI, mutagenicity
smiles
(list)
ADMETAI_predict_CYP_interactions
CYP450 inhibition/substrate
smiles
(list)
SwissTargetPrediction_predict
Predict protein targets for compound
operation="predict"
,
smiles
eMolecules_search
Find commercially available compounds
query
(name or keyword)
eMolecules_search_smiles
Structure-based commercial search
smiles
eMolecules_get_vendors
Find vendors for a specific compound
compound_id
Enamine_search_catalog
Search Enamine screening library
query
Enamine_search_smiles
Search Enamine by structure
smiles
Enamine_get_libraries
List Enamine compound libraries(none required)

工具用途核心参数
PubChem_get_CID_by_compound_name
通过名称查询CID
compound_name
PubChem_get_CID_by_SMILES
通过SMILES查询CID
smiles
PubChem_get_compound_properties_by_CID
查询分子量、分子式、SMILES、InChIKey
cid
,
properties
PubChem_search_compounds_by_similarity
查找结构相似的化合物
smiles
,
threshold
(0-100)
PubChem_search_compounds_by_substructure
子结构搜索
smiles
PubChem_get_compound_synonyms_by_CID
查询所有别名/同义词
cid
ChEMBL_search_molecules
通过名称或ID搜索ChEMBL
query
ChEMBL_get_molecule
获取完整ChEMBL分子记录
chembl_id
ChEMBL_search_similar_molecules
ChEMBL内的相似度搜索
query
(SMILES或ChEMBL ID)
ChEMBL_search_activities
结合亲和力与实验数据
molecule_chembl_id
,
target_chembl_id
,
pchembl_value__gte
ChEMBL_get_drug_mechanisms
获批药物的作用机制
drug_chembl_id
drug_name
ChEMBL_search_targets
通过名称查找靶点
query
,
organism
ChEMBL_get_target_activities
获取靶点对应的所有配体
target_chembl_id
SwissADME_calculate_adme
理化属性 + ADMET属性计算
operation="calculate_adme"
,
smiles
SwissADME_check_druglikeness
利平斯基、Veber、Egan规则校验
operation="check_druglikeness"
,
smiles
ADMETAI_predict_physicochemical_properties
分子量、logP、TPSA、氢键供体/受体
smiles
(列表)
ADMETAI_predict_bioavailability
口服生物利用度预测
smiles
(列表)
ADMETAI_predict_BBB_penetrance
血脑屏障通透性预测
smiles
(列表)
ADMETAI_predict_toxicity
hERG、药物性肝损伤、致突变性预测
smiles
(列表)
ADMETAI_predict_CYP_interactions
CYP450抑制/底物特性预测
smiles
(列表)
SwissTargetPrediction_predict
预测化合物的蛋白靶点
operation="predict"
,
smiles
eMolecules_search
查找可商业采购的化合物
query
(名称或关键词)
eMolecules_search_smiles
基于结构的商业可得性搜索
smiles
eMolecules_get_vendors
查找特定化合物的供应商
compound_id
Enamine_search_catalog
搜索Enamine筛选库
query
Enamine_search_smiles
通过结构搜索Enamine库
smiles
Enamine_get_libraries
列出Enamine的化合物库(无必填参数)

Workflow

工作流程

Phase 1: Compound Identification

阶段1:化合物鉴定

undefined
undefined

Step 1: Name -> CID (PubChem canonical identity)

步骤1:名称 -> CID (PubChem标准标识)

PubChem_get_CID_by_compound_name(compound_name="imatinib")
PubChem_get_CID_by_compound_name(compound_name="imatinib")

-> CID: 5291

-> CID: 5291

Step 2: Get SMILES and properties (needed for all downstream tools)

步骤2:获取SMILES和属性(所有下游工具必需)

PubChem_get_compound_properties_by_CID( cid="5291", properties="MolecularFormula,MolecularWeight,CanonicalSMILES,InChIKey,IUPACName" )
PubChem_get_compound_properties_by_CID( cid="5291", properties="MolecularFormula,MolecularWeight,CanonicalSMILES,InChIKey,IUPACName" )

-> canonical SMILES, InChIKey (global identifier)

-> 标准SMILES, InChIKey (全局标识)

Step 3: Get ChEMBL ID (for activity data)

步骤3:获取ChEMBL ID(用于获取活性数据)

ChEMBL_search_molecules(query="imatinib")
ChEMBL_search_molecules(query="imatinib")

-> ChEMBL ID (e.g., "CHEMBL941")

-> ChEMBL ID (例如:"CHEMBL941")

Step 4: Get all synonyms (brand names, INN, etc.)

步骤4:获取所有同义词(商品名、国际非专利名等)

PubChem_get_compound_synonyms_by_CID(cid="5291")

**ID resolution priority**:
1. Start with PubChem CID (most universal)
2. Get ChEMBL ID (for bioactivity data)
3. Use canonical SMILES for structure-based searches and ADMET
PubChem_get_compound_synonyms_by_CID(cid="5291")

**ID解析优先级**:
1. 优先使用PubChem CID(通用性最强)
2. 获取ChEMBL ID(用于生物活性数据查询)
3. 使用标准SMILES开展结构搜索和ADMET预测

Phase 2: Structure-Based Search

阶段2:基于结构的搜索

Similarity search (find analogs):
PubChem_search_compounds_by_similarity(
    smiles="CANONICAL_SMILES",
    threshold=85   # Tanimoto threshold 0-100; 85 = highly similar
)
相似度搜索(查找类似物):
PubChem_search_compounds_by_similarity(
    smiles="CANONICAL_SMILES",
    threshold=85   # Tanimoto阈值0-100;85代表高度相似
)

Returns: list of CIDs of similar compounds

返回:相似化合物的CID列表

ChEMBL_search_similar_molecules(query="CHEMBL941") # Or SMILES
ChEMBL_search_similar_molecules(query="CHEMBL941") # 也可传入SMILES

Returns: ChEMBL entries sorted by similarity

返回:按相似度排序的ChEMBL条目


**Substructure search** (find compounds containing a scaffold):
PubChem_search_compounds_by_substructure(smiles="SCAFFOLD_SMILES")

**子结构搜索**(查找包含特定骨架的化合物):
PubChem_search_compounds_by_substructure(smiles="SCAFFOLD_SMILES")

Returns: CIDs of compounds containing the scaffold

返回:包含该骨架的化合物CID

undefined
undefined

Phase 3: Bioactivity and Binding Affinity

阶段3:生物活性与结合亲和力

Get all activities for a compound (across all targets):
ChEMBL_search_activities(
    molecule_chembl_id="CHEMBL941",
    pchembl_value__gte=6,   # pIC50/Ki >= 6 = IC50/Ki <= 1 µM
    limit=50
)
获取化合物的所有活性数据(覆盖所有靶点):
ChEMBL_search_activities(
    molecule_chembl_id="CHEMBL941",
    pchembl_value__gte=6,   # pIC50/Ki >= 6 对应 IC50/Ki <= 1 µM
    limit=50
)

Returns: assay_type, target_name, pchembl_value, units

返回:实验类型、靶点名称、pchembl值、单位


**Get all ligands for a target**:

**获取靶点的所有配体**:

First find target ChEMBL ID

首先查找靶点的ChEMBL ID

ChEMBL_search_targets(query="EGFR", organism="Homo sapiens")
ChEMBL_search_targets(query="EGFR", organism="Homo sapiens")

-> target_chembl_id, e.g., "CHEMBL203"

-> target_chembl_id, 例如:"CHEMBL203"

ChEMBL_get_target_activities( target_chembl_id="CHEMBL203" )
ChEMBL_get_target_activities( target_chembl_id="CHEMBL203" )

Returns: all compounds with binding data against this target

返回:该靶点所有有结合数据的化合物


**BindingDB** (when available — often times out):
BindingDB_get_ligands_by_uniprot(uniprot_id="P00533") # EGFR

**BindingDB**(可用时使用,经常超时):
BindingDB_get_ligands_by_uniprot(uniprot_id="P00533") # EGFR

Returns: Ki, IC50, Kd data with literature references

返回:带文献引用的Ki、IC50、Kd数据

Note: BindingDB REST API is frequently unavailable; fall back to ChEMBL

注意:BindingDB REST API经常不可用;请回退使用ChEMBL


**pChEMBL Value interpretation**:
| pChEMBL | IC50 / Ki | Affinity |
|---------|-----------|---------|
| >= 9 | <= 1 nM | Very potent |
| >= 7 | <= 100 nM | Potent |
| >= 6 | <= 1 µM | Moderate |
| >= 5 | <= 10 µM | Weak |
| < 5 | > 10 µM | Inactive |

**pChEMBL值解读**:
| pChEMBL | IC50 / Ki | 亲和力 |
|---------|-----------|---------|
| >= 9 | <= 1 nM | 活性极强 |
| >= 7 | <= 100 nM | 活性强 |
| >= 6 | <= 1 µM | 活性中等 |
| >= 5 | <= 10 µM | 活性弱 |
| < 5 | > 10 µM | 无活性 |

Phase 4: Drug-likeness and ADMET

阶段4:类药性与ADMET

SwissADME (comprehensive, requires SMILES string — not list):
SwissADME_calculate_adme(
    operation="calculate_adme",
    smiles="CANONICAL_SMILES"
)
SwissADME(功能全面,要求SMILES为字符串,不能是列表):
SwissADME_calculate_adme(
    operation="calculate_adme",
    smiles="CANONICAL_SMILES"
)

Returns: physicochemical, lipophilicity, water solubility, pharmacokinetics,

返回:理化属性、亲脂性、水溶性、药代动力学、

drug-likeness scores (Lipinski, Veber, Egan, Muegge), PAINS alerts

类药性评分(Lipinski、Veber、Egan、Muegge)、PAINS警报

SwissADME_check_druglikeness( operation="check_druglikeness", smiles="CANONICAL_SMILES" )
SwissADME_check_druglikeness( operation="check_druglikeness", smiles="CANONICAL_SMILES" )

Returns: Lipinski/Veber/Egan pass/fail + lead-likeness

返回:Lipinski/Veber/Egan规则校验结果 + 先导化合物相似性


**ADMET-AI** (ML-based, requires SMILES as list — install tooluniverse[ml]):
ADMETAI_predict_physicochemical_properties(smiles=["CANONICAL_SMILES"]) ADMETAI_predict_bioavailability(smiles=["CANONICAL_SMILES"]) ADMETAI_predict_BBB_penetrance(smiles=["CANONICAL_SMILES"]) ADMETAI_predict_toxicity(smiles=["CANONICAL_SMILES"]) ADMETAI_predict_CYP_interactions(smiles=["CANONICAL_SMILES"])

**Note**: ADMET-AI requires `pip install tooluniverse[ml]`. If unavailable, use SwissADME as fallback.

**Key drug-likeness rules**:
- **Lipinski Ro5**: MW <= 500, logP <= 5, HBD <= 5, HBA <= 10 (oral drugs)
- **Veber**: TPSA <= 140 Ų, rotatable bonds <= 10 (oral bioavailability)
- **Lead-like**: MW <= 350, logP <= 3, HBD <= 3, HBA <= 6 (fragment/lead)

**ADMET-AI**(基于机器学习,要求SMILES为列表——需安装tooluniverse[ml]):
ADMETAI_predict_physicochemical_properties(smiles=["CANONICAL_SMILES"]) ADMETAI_predict_bioavailability(smiles=["CANONICAL_SMILES"]) ADMETAI_predict_BBB_penetrance(smiles=["CANONICAL_SMILES"]) ADMETAI_predict_toxicity(smiles=["CANONICAL_SMILES"]) ADMETAI_predict_CYP_interactions(smiles=["CANONICAL_SMILES"])

**注意**:ADMET-AI需要执行`pip install tooluniverse[ml]`。如果不可用,使用SwissADME作为替代。

**核心类药性规则**:
- **Lipinski五规则**: 分子量 <= 500, logP <= 5, 氢键供体 <= 5, 氢键受体 <= 10(口服药物)
- **Veber规则**: TPSA <= 140 Ų, 可旋转键 <= 10(口服生物利用度)
- **先导化合物规则**: 分子量 <= 350, logP <= 3, 氢键供体 <= 3, 氢键受体 <= 6(片段/先导化合物)

Phase 5: Target Prediction

阶段5:靶点预测

When you have a novel compound and want to predict targets:
SwissTargetPrediction_predict(
    operation="predict",
    smiles="CANONICAL_SMILES"
)
当你有一个新化合物需要预测作用靶点时:
SwissTargetPrediction_predict(
    operation="predict",
    smiles="CANONICAL_SMILES"
)

Returns: predicted protein targets with probability scores

返回:带概率评分的预测蛋白靶点

Note: SwissTargetPrediction uses structure-similarity to known drug-target pairs

注意:SwissTargetPrediction基于与已知药物-靶点对的结构相似度

May time out for complex molecules

复杂分子可能会超时

undefined
undefined

Phase 6: Commercial Availability

阶段6:商业可得性

eMolecules (aggregates 200+ suppliers — returns search URL, not direct data):
eMolecules_search(query="compound_name")
eMolecules(整合200+供应商——返回搜索URL,不返回直接数据):
eMolecules_search(query="compound_name")

-> Returns search_url to visit on eMolecules.com

-> 返回eMolecules.com的搜索链接

eMolecules_search_smiles(smiles="CANONICAL_SMILES")
eMolecules_search_smiles(smiles="CANONICAL_SMILES")

-> Returns URL for exact/similar structure search

-> 返回精确/相似结构搜索的URL


**Enamine** (37B+ make-on-demand compounds — returns URL when API unavailable):
Enamine_search_catalog(query="compound_name")

**Enamine**(370亿+按需合成化合物——API不可用时返回URL):
Enamine_search_catalog(query="compound_name")

-> If API available: returns catalog entries with catalog_id, price

-> 如果API可用:返回包含目录ID、价格的目录条目

-> If API unavailable: returns search_url for manual search

-> 如果API不可用:返回手动搜索的URL

Enamine_search_smiles(smiles="CANONICAL_SMILES")
Enamine_search_smiles(smiles="CANONICAL_SMILES")

-> Exact or similarity structure search

-> 精确或相似结构搜索

Enamine_get_libraries()
Enamine_get_libraries()

-> Lists available Enamine screening collections

-> 列出可用的Enamine筛选库


**Note**: eMolecules and Enamine APIs frequently return search URLs rather than live data. Present these to the user as "search here" links.

---

**注意**:eMolecules和Enamine API通常返回搜索链接而非实时数据。请将这些作为「在此搜索」的链接提供给用户。

---

Tool Parameter Reference

工具参数参考

ToolRequired ParamsNotes
PubChem_get_CID_by_compound_name
compound_name
Returns list of CIDs; take first or most relevant
PubChem_get_CID_by_SMILES
smiles
Use canonical SMILES
PubChem_get_compound_properties_by_CID
cid
,
properties
cid
as string;
properties
comma-separated
PubChem_search_compounds_by_similarity
smiles
threshold
(int 0-100, default 90)
PubChem_search_compounds_by_substructure
smiles
Returns CIDs matching scaffold
ChEMBL_search_molecules
query
Name, ChEMBL ID, or InChIKey
ChEMBL_get_molecule
chembl_id
Full format: "CHEMBL941" not "941"
ChEMBL_search_similar_molecules
query
SMILES or ChEMBL ID
ChEMBL_search_activities
molecule_chembl_id
OR
target_chembl_id
Use
pchembl_value__gte=6
to filter potent
ChEMBL_get_drug_mechanisms
drug_chembl_id
or
drug_name
For approved drugs only
ChEMBL_search_targets
query
Add
organism="Homo sapiens"
to filter human
ChEMBL_get_target_activities
target_chembl_id
Returns all ligands for target
SwissADME_calculate_adme
operation="calculate_adme"
,
smiles
SMILES as string (not list)
SwissADME_check_druglikeness
operation="check_druglikeness"
,
smiles
SMILES as string
ADMETAI_predict_*
smiles
Must be a list:
["SMILES"]
not
"SMILES"
SwissTargetPrediction_predict
operation="predict"
,
smiles
May time out
eMolecules_search
query
Returns search URL (no live data)
eMolecules_search_smiles
smiles
Canonical SMILES
eMolecules_get_vendors
compound_id
eMolecules internal ID
Enamine_search_catalog
query
Returns URL when API unavailable
Enamine_search_smiles
smiles
search_type
: "exact", "similarity", "substructure"
Enamine_get_compound
enamine_id
Enamine-specific catalog ID
BindingDB_get_ligands_by_uniprot
uniprot_id
Frequently unavailable — use ChEMBL as fallback
BindingDB_get_targets_by_compound
smiles
SMILES-based target lookup

工具必填参数说明
PubChem_get_CID_by_compound_name
compound_name
返回CID列表;取第一个或最相关的结果
PubChem_get_CID_by_SMILES
smiles
使用标准SMILES
PubChem_get_compound_properties_by_CID
cid
,
properties
cid
为字符串;
properties
用英文逗号分隔
PubChem_search_compounds_by_similarity
smiles
threshold
(整数0-100,默认90)
PubChem_search_compounds_by_substructure
smiles
返回匹配骨架的CID
ChEMBL_search_molecules
query
名称、ChEMBL ID或InChIKey
ChEMBL_get_molecule
chembl_id
完整格式:「CHEMBL941」而非「941」
ChEMBL_search_similar_molecules
query
SMILES或ChEMBL ID
ChEMBL_search_activities
molecule_chembl_id
target_chembl_id
使用
pchembl_value__gte=6
过滤高活性化合物
ChEMBL_get_drug_mechanisms
drug_chembl_id
drug_name
仅适用于获批药物
ChEMBL_search_targets
query
添加
organism="Homo sapiens"
过滤人类靶点
ChEMBL_get_target_activities
target_chembl_id
返回靶点的所有配体
SwissADME_calculate_adme
operation="calculate_adme"
,
smiles
SMILES为字符串(非列表)
SwissADME_check_druglikeness
operation="check_druglikeness"
,
smiles
SMILES为字符串
ADMETAI_predict_*
smiles
必须是列表:
["SMILES"]
而非
"SMILES"
SwissTargetPrediction_predict
operation="predict"
,
smiles
可能超时
eMolecules_search
query
返回搜索URL(无实时数据)
eMolecules_search_smiles
smiles
标准SMILES
eMolecules_get_vendors
compound_id
eMolecules内部ID
Enamine_search_catalog
query
API不可用时返回URL
Enamine_search_smiles
smiles
search_type
: "exact", "similarity", "substructure"
Enamine_get_compound
enamine_id
Enamine专属目录ID
BindingDB_get_ligands_by_uniprot
uniprot_id
经常不可用——使用ChEMBL作为替代
BindingDB_get_targets_by_compound
smiles
基于SMILES的靶点查询

Common Patterns

常用模式

Pattern 1: Full Compound Profile

模式1:完整化合物档案

Input: Compound name (e.g., "imatinib")
Flow:
  1. PubChem_get_CID_by_compound_name -> CID + SMILES
  2. ChEMBL_search_molecules -> ChEMBL ID
  3. PubChem_get_compound_properties_by_CID -> physicochemical props
  4. SwissADME_calculate_adme / ADMETAI_predict_* -> ADMET profile
  5. ChEMBL_search_activities(molecule_chembl_id) -> binding data
  6. ChEMBL_get_drug_mechanisms -> MOA (if approved drug)
Output: Complete compound profile with identity, ADMET, and activity data
输入:化合物名称(例如:「imatinib」)
流程:
  1. PubChem_get_CID_by_compound_name -> CID + SMILES
  2. ChEMBL_search_molecules -> ChEMBL ID
  3. PubChem_get_compound_properties_by_CID -> 理化属性
  4. SwissADME_calculate_adme / ADMETAI_predict_* -> ADMET档案
  5. ChEMBL_search_activities(molecule_chembl_id) -> 结合数据
  6. ChEMBL_get_drug_mechanisms -> 作用机制(如果是获批药物)
输出:包含标识、ADMET、活性数据的完整化合物档案

Pattern 2: Analog Discovery

模式2:类似物发现

Input: Reference compound SMILES
Flow:
  1. PubChem_search_compounds_by_similarity(smiles, threshold=85) -> similar CIDs
  2. ChEMBL_search_similar_molecules(query=smiles) -> ChEMBL analogs
  3. For each hit: PubChem_get_compound_properties_by_CID -> properties
  4. SwissADME_check_druglikeness -> filter by drug-likeness
Output: Ranked list of analogs with activity data and drug-likeness scores
输入:参考化合物SMILES
流程:
  1. PubChem_search_compounds_by_similarity(smiles, threshold=85) -> 相似CID
  2. ChEMBL_search_similar_molecules(query=smiles) -> ChEMBL类似物
  3. 对每个命中化合物:PubChem_get_compound_properties_by_CID -> 属性
  4. SwissADME_check_druglikeness -> 按类药性过滤
输出:带活性数据和类药性评分的排序类似物列表

Pattern 3: Target-Based Compound Search

模式3:基于靶点的化合物搜索

Input: Target name (e.g., "EGFR")
Flow:
  1. ChEMBL_search_targets(query="EGFR", organism="Homo sapiens") -> target_chembl_id
  2. ChEMBL_get_target_activities(target_chembl_id) -> all ligands with Ki/IC50
  3. Filter by pchembl_value >= 7 (potent compounds)
  4. For top hits: SwissADME_check_druglikeness -> assess drug-likeness
  5. eMolecules_search(query=compound_name) -> check commercial availability
Output: Prioritized list of potent, drug-like, commercially available compounds
输入:靶点名称(例如:「EGFR」)
流程:
  1. ChEMBL_search_targets(query="EGFR", organism="Homo sapiens") -> target_chembl_id
  2. ChEMBL_get_target_activities(target_chembl_id) -> 所有带Ki/IC50的配体
  3. 按pchembl_value >= 7过滤(高活性化合物)
  4. 对 top 命中化合物:SwissADME_check_druglikeness -> 评估类药性
  5. eMolecules_search(query=compound_name) -> 核查商业可得性
输出:高活性、类药、可商业采购的优先级化合物列表

Pattern 4: ADMET Risk Assessment

模式4:ADMET风险评估

Input: Novel compound SMILES
Flow:
  1. SwissADME_calculate_adme(operation="calculate_adme", smiles) -> full ADMET
  2. ADMETAI_predict_toxicity(smiles=[smiles]) -> hERG, DILI, mutagenicity
  3. ADMETAI_predict_CYP_interactions(smiles=[smiles]) -> drug-drug interaction risk
  4. ADMETAI_predict_BBB_penetrance(smiles=[smiles]) -> CNS penetration
Output: ADMET risk profile with flagged liabilities

输入:新化合物SMILES
流程:
  1. SwissADME_calculate_adme(operation="calculate_adme", smiles) -> 完整ADMET
  2. ADMETAI_predict_toxicity(smiles=[smiles]) -> hERG、药物性肝损伤、致突变性
  3. ADMETAI_predict_CYP_interactions(smiles=[smiles]) -> 药物相互作用风险
  4. ADMETAI_predict_BBB_penetrance(smiles=[smiles]) -> 中枢神经系统渗透性
输出:标记风险点的ADMET风险档案

Fallback Chains

回退链路

PrimaryFallbackWhen
BindingDB_get_ligands_by_uniprot
ChEMBL_get_target_activities
BindingDB API unavailable
ADMETAI_predict_*
SwissADME_calculate_adme
ml dependencies not installed
Enamine_search_catalog
Returns URL onlyAPI returns HTTP 500 (common)
SwissTargetPrediction_predict
ChEMBL_search_similar_molecules
+ known targets
Prediction times out
PubChem_get_CID_by_compound_name
ChEMBL_search_molecules(query=name)
Name not in PubChem

首选工具回退方案触发场景
BindingDB_get_ligands_by_uniprot
ChEMBL_get_target_activities
BindingDB API不可用
ADMETAI_predict_*
SwissADME_calculate_adme
未安装机器学习依赖
Enamine_search_catalog
仅返回URLAPI返回HTTP 500(常见)
SwissTargetPrediction_predict
ChEMBL_search_similar_molecules
+ 已知靶点
预测超时
PubChem_get_CID_by_compound_name
ChEMBL_search_molecules(query=name)
名称不在PubChem中

Limitations

限制说明

  • BindingDB: REST API frequently times out; ChEMBL is the reliable alternative for binding data
  • Enamine API: Returns HTTP 500 often; tool provides search URL as fallback
  • eMolecules: No public API; tool generates search URLs only
  • ADMET-AI: Requires
    pip install tooluniverse[ml]
    ; not always available in base install
  • SwissTargetPrediction: Web scraping-based; may time out for complex molecules
  • SMILES format: ADMET-AI requires a list
    ["SMILES"]
    ; SwissADME requires a string
    "SMILES"
  • ChEMBL IDs: Always use full format
    "CHEMBL941"
    , never just
    "941"
  • BindingDB: REST API经常超时;ChEMBL是结合数据的可靠替代方案
  • Enamine API: 经常返回HTTP 500;工具会提供搜索URL作为回退
  • eMolecules: 无公开API;工具仅生成搜索URL
  • ADMET-AI: 需要
    pip install tooluniverse[ml]
    ;基础安装中不一定可用
  • SwissTargetPrediction: 基于网页爬取;复杂分子可能超时
  • SMILES格式: ADMET-AI要求传入列表
    ["SMILES"]
    ;SwissADME要求传入字符串
    "SMILES"
  • ChEMBL ID: 始终使用完整格式
    "CHEMBL941"
    ,不要仅使用
    "941"