tooluniverse-chemical-compound-retrieval

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Chemical Compound Information Retrieval

化合物信息检索

Retrieve comprehensive chemical compound data with proper disambiguation and cross-database validation.
IMPORTANT: Always use English compound names and search terms in tool calls, even if the user writes in another language (e.g., translate "阿司匹林" to "aspirin"). Only try original-language terms as a fallback if English returns no results. Respond in the user's language.
通过适当的歧义消除和跨数据库验证,检索全面的化合物数据。
重要提示:在工具调用中始终使用英文化合物名称和搜索词,即使用户使用其他语言输入(例如,将“阿司匹林”翻译为“aspirin”)。只有当英文搜索无结果时,才尝试使用原语言术语作为备选。使用用户的语言进行回复。

Workflow Overview

流程概述

Phase 0: Clarify (if needed)
Phase 1: Disambiguate Compound Identity
Phase 2: Retrieve Data (Internal)
Phase 3: Report Compound Profile

阶段0:澄清(如有需要)
阶段1:化合物身份歧义消除
阶段2:数据检索(内部)
阶段3:生成化合物档案报告

Phase 0: Clarification (When Needed)

阶段0:澄清(必要时)

Ask the user ONLY if:
  • Compound name is highly ambiguous (e.g., "vitamin E" → α, β, γ, δ-tocopherol?)
  • Multiple distinct compounds share the name (e.g., "aspirin" is clear; "sterol" is not)
Skip clarification for:
  • Unambiguous drug names (aspirin, ibuprofen, metformin)
  • Specific identifiers provided (CID, ChEMBL ID, SMILES)
  • Clear structural queries (SMILES, InChI)

仅在以下情况询问用户:
  • 化合物名称存在高度歧义(例如,“维生素E”→ α、β、γ、δ-生育酚?)
  • 多个不同化合物共享同一名称(例如,“阿司匹林”是明确的;“甾醇”则不是)
无需澄清的情况:
  • 明确的药物名称(阿司匹林、布洛芬、二甲双胍)
  • 提供了特定标识符(CID、ChEMBL ID、SMILES)
  • 明确的结构查询(SMILES、InChI)

Phase 1: Compound Disambiguation

阶段1:化合物身份歧义消除

1.1 Resolve Primary Identifier

1.1 解析主标识符

python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()

Strategy depends on input type

策略取决于输入类型

if user_provided_cid: cid = user_provided_cid elif user_provided_smiles: result = tu.tools.PubChem_get_CID_by_SMILES(smiles=smiles) cid = result["data"]["cid"] elif user_provided_name: result = tu.tools.PubChem_get_CID_by_compound_name(compound_name=name) cid = result["data"]["cid"]
undefined
if user_provided_cid: cid = user_provided_cid elif user_provided_smiles: result = tu.tools.PubChem_get_CID_by_SMILES(smiles=smiles) cid = result["data"]["cid"] elif user_provided_name: result = tu.tools.PubChem_get_CID_by_compound_name(compound_name=name) cid = result["data"]["cid"]
undefined

1.2 Cross-Reference Identifiers

1.2 交叉引用标识符

Always establish compound identity across both databases:
python
undefined
始终在两个数据库间确认化合物身份:
python
undefined

PubChem → ChEMBL cross-reference

PubChem → ChEMBL 交叉引用

chembl_result = tu.tools.ChEMBL_search_compounds(query=compound_name, limit=5) if chembl_result["data"]: chembl_id = chembl_result["data"][0]["molecule_chembl_id"]
undefined
chembl_result = tu.tools.ChEMBL_search_compounds(query=compound_name, limit=5) if chembl_result["data"]: chembl_id = chembl_result["data"][0]["molecule_chembl_id"]
undefined

1.3 Handle Naming Collisions

1.3 处理命名冲突

For generic names (e.g., "vitamin", "steroid", "acid"):
  • Search returns multiple CIDs → present top matches with structures
  • Verify SMILES/InChI matches user intent
  • Note stereoisomers or salt forms if relevant
Identity Resolution Checklist:
  • PubChem CID established
  • ChEMBL ID cross-referenced (if exists)
  • Canonical SMILES captured
  • Stereochemistry noted (if relevant)
  • Salt forms identified (if applicable)

对于通用名称(例如,“维生素”、“类固醇”、“酸”):
  • 搜索返回多个CID时,展示排名靠前的匹配结果及结构
  • 验证SMILES/InChI是否符合用户需求
  • 若相关,标注立体异构体或盐形式
身份解析检查清单:
  • 已确定PubChem CID
  • 已交叉引用ChEMBL ID(如果存在)
  • 已捕获标准SMILES
  • 已标注立体化学信息(如果相关)
  • 已识别盐形式(如果适用)

Phase 2: Data Retrieval (Internal)

阶段2:数据检索(内部)

Retrieve all data silently. Do NOT narrate the search process.
静默检索所有数据,不要描述搜索过程。

2.1 Core Properties (PubChem)

2.1 核心属性(PubChem)

python
undefined
python
undefined

Basic properties

基础属性

props = tu.tools.PubChem_get_compound_properties_by_CID(cid=cid)
props = tu.tools.PubChem_get_compound_properties_by_CID(cid=cid)

Bioactivity summary

生物活性摘要

bio = tu.tools.PubChem_get_bioactivity_summary_by_CID(cid=cid)
bio = tu.tools.PubChem_get_bioactivity_summary_by_CID(cid=cid)

Drug label (if approved drug)

药物标签(若为已获批药物)

drug = tu.tools.PubChem_get_drug_label_info_by_CID(cid=cid)
drug = tu.tools.PubChem_get_drug_label_info_by_CID(cid=cid)

Structure image

结构图像

image = tu.tools.PubChem_get_compound_2D_image_by_CID(cid=cid)
undefined
image = tu.tools.PubChem_get_compound_2D_image_by_CID(cid=cid)
undefined

2.2 Bioactivity Data (ChEMBL)

2.2 生物活性数据(ChEMBL)

python
if chembl_id:
    # Detailed bioactivity
    activity = tu.tools.ChEMBL_get_bioactivity_by_chemblid(chembl_id=chembl_id)
    
    # Protein targets
    targets = tu.tools.ChEMBL_get_target_by_chemblid(chembl_id=chembl_id)
    
    # Assay data
    assays = tu.tools.ChEMBL_get_assays_by_chemblid(chembl_id=chembl_id)
python
if chembl_id:
    # 详细生物活性数据
    activity = tu.tools.ChEMBL_get_bioactivity_by_chemblid(chembl_id=chembl_id)
    
    # 蛋白靶点
    targets = tu.tools.ChEMBL_get_target_by_chemblid(chembl_id=chembl_id)
    
    # 实验数据
    assays = tu.tools.ChEMBL_get_assays_by_chemblid(chembl_id=chembl_id)

2.3 Optional Extended Data

2.3 可选扩展数据

python
undefined
python
undefined

Patents (for drugs)

专利信息(针对药物)

patents = tu.tools.PubChem_get_associated_patents_by_CID(cid=cid)
patents = tu.tools.PubChem_get_associated_patents_by_CID(cid=cid)

Similar compounds (for SAR)

相似化合物(用于构效关系分析)

similar = tu.tools.PubChem_search_compounds_by_similarity(cid=cid, threshold=85)
undefined
similar = tu.tools.PubChem_search_compounds_by_similarity(cid=cid, threshold=85)
undefined

Fallback Chains

备选检索链

PrimaryFallbackNotes
PubChem_get_CID_by_compound_nameChEMBL_search_compounds → get SMILES → PubChem_get_CID_by_SMILESName lookup failed
ChEMBL_get_bioactivityPubChem_get_bioactivity_summaryChEMBL ID unavailable
PubChem_get_drug_label_infoNote "Drug label unavailable"Not an approved drug

主检索方式备选方式说明
PubChem_get_CID_by_compound_nameChEMBL_search_compounds → 获取SMILES → PubChem_get_CID_by_SMILES名称查找失败
ChEMBL_get_bioactivityPubChem_get_bioactivity_summaryChEMBL ID不可用
PubChem_get_drug_label_info标注“药物标签不可用”非已获批药物

Phase 3: Report Compound Profile

阶段3:生成化合物档案报告

Output Structure

输出结构

Present results as a Compound Profile Report. Hide all search process details.
markdown
undefined
将结果以化合物档案报告的形式呈现,隐藏所有搜索过程细节。
markdown
undefined

Compound Profile: [Compound Name]

化合物档案:[化合物名称]

Identity

身份信息

PropertyValue
PubChem CID[cid]
ChEMBL ID[chembl_id or "N/A"]
IUPAC Name[full name]
Common Names[synonyms]
属性
PubChem CID[cid]
ChEMBL ID[chembl_id 或 "N/A"]
IUPAC名称[完整名称]
常用名称[同义词]

Chemical Properties

化学属性

Molecular Descriptors

分子描述符

PropertyValueDrug-Likeness
FormulaC₉H₈O₄-
Molecular Weight180.16 g/mol✓ (<500)
LogP1.19✓ (-2 to 5)
H-Bond Donors1✓ (<5)
H-Bond Acceptors4✓ (<10)
Polar Surface Area63.6 Ų✓ (<140)
Rotatable Bonds3✓ (<10)
属性类药性
分子式C₉H₈O₄-
分子量180.16 g/mol✓ (<500)
LogP1.19✓ (-2至5)
氢键供体数1✓ (<5)
氢键受体数4✓ (<10)
极性表面积63.6 Ų✓ (<140)
可旋转键数3✓ (<10)

Structural Representation

结构表示

  • SMILES:
    CC(=O)Oc1ccccc1C(=O)O
  • InChI:
    InChI=1S/C9H8O4/...
[2D structure image if available]
  • SMILES:
    CC(=O)Oc1ccccc1C(=O)O
  • InChI:
    InChI=1S/C9H8O4/...
[若有2D结构图像则展示]

Bioactivity Profile

生物活性档案

Summary

摘要

  • Active in: [X] assays out of [Y] tested
  • Primary Targets: [list top targets]
  • Mechanism: [if known]
  • 活性实验数: [Y]项实验中有[X]项显示活性
  • 主要靶点: [列出排名靠前的靶点]
  • 作用机制: [若已知]

Key Target Interactions (from ChEMBL)

关键靶点相互作用(来自ChEMBL)

TargetActivity TypeValueUnits
[Target 1]IC50[value]nM
[Target 2]Ki[value]nM
靶点活性类型单位
[靶点1]IC50[数值]nM
[靶点2]Ki[数值]nM

Drug Information (if applicable)

药物信息(如适用)

Clinical Status

临床状态

PropertyValue
Approval Status[Approved/Investigational/N/A]
Drug Class[therapeutic class]
Indication[approved uses]
Route[oral/IV/topical/etc.]
属性
获批状态[已获批/研究中/N/A]
药物类别[治疗类别]
适应症[获批用途]
给药途径[口服/静脉/外用等]

Safety

安全性

  • Black Box Warning: [Yes/No]
  • Major Interactions: [if any]
  • 黑框警告: [是/否]
  • 主要相互作用: [如有]

Related Compounds (if retrieved)

相关化合物(如已检索)

Top 5 structurally similar compounds:
CIDNameSimilarityKey Difference
[cid][name]95%[note]
排名前5的结构相似化合物:
CID名称相似度关键差异
[cid][名称]95%[说明]

Data Sources

数据来源

  • PubChem: [CID link]
  • ChEMBL: [ChEMBL ID link]
  • Retrieved: [date]

---
  • PubChem: [CID链接]
  • ChEMBL: [ChEMBL ID链接]
  • 检索日期: [日期]

---

Data Quality Tiers

数据质量等级

Apply to data completeness assessment:
TierSymbolCriteria
Complete●●●All core properties + bioactivity + drug info
Substantial●●○Core properties + bioactivity OR drug info
Basic●○○Core properties only
Minimal○○○CID/name only, limited data
Include in report header:
markdown
**Data Completeness**: ●●● Complete (properties, bioactivity, drug data)

用于评估数据完整性:
等级符号标准
完整●●●包含所有核心属性+生物活性+药物信息
丰富●●○包含核心属性+生物活性或药物信息
基础●○○仅包含核心属性
最小○○○仅包含CID/名称,数据有限
在报告头部包含:
markdown
**数据完整性**: ●●● 完整(属性、生物活性、药物数据)

Completeness Checklist

完整性检查清单

Every compound profile MUST include these sections (even if "unavailable"):
每个化合物档案必须包含以下部分(即使标注为“不可用”):

Identity (Required)

身份信息(必填)

  • PubChem CID
  • ChEMBL ID (or "N/A")
  • IUPAC name
  • Canonical SMILES
  • PubChem CID
  • ChEMBL ID(或“N/A”)
  • IUPAC名称
  • 标准SMILES

Properties (Required)

属性信息(必填)

  • Molecular formula
  • Molecular weight
  • LogP
  • Lipinski rule assessment
  • 分子式
  • 分子量
  • LogP
  • Lipinski规则评估

Bioactivity (Required)

生物活性信息(必填)

  • Activity summary (or "No bioactivity data")
  • Primary targets (or "Unknown")
  • 活性摘要(或“无生物活性数据”)
  • 主要靶点(或“未知”)

Drug Info (If Approved Drug)

药物信息(若为已获批药物)

  • Approval status
  • Indication
  • Drug class
  • 获批状态
  • 适应症
  • 药物类别

Always Include

必须包含

  • Data sources with links
  • Retrieval date
  • Quality tier assessment

  • 带链接的数据来源
  • 检索日期
  • 质量等级评估

Common Use Cases

常见使用场景

Drug Property Check

药物属性查询

User: "Tell me about metformin" → Full compound profile with drug information emphasis
用户:“告诉我关于二甲双胍的信息” → 生成强调药物信息的完整化合物档案

Structure Verification

结构验证

User: "Verify this SMILES: CC(=O)Oc1ccccc1C(=O)O" → Disambiguation-focused profile, confirm identity
用户:“验证这个SMILES:CC(=O)Oc1ccccc1C(=O)O” → 生成以歧义消除为重点的档案,确认身份

SAR Analysis

构效关系分析

User: "Find compounds similar to ibuprofen" → Similarity search + comparative property table
用户:“找到与布洛芬结构相似的化合物” → 生成相似性搜索结果+对比属性表

Target Identification

靶点识别

User: "What proteins does gefitinib target?" → ChEMBL bioactivity emphasis with target list

用户:“吉非替尼作用于哪些蛋白?” → 生成以ChEMBL生物活性数据为重点的靶点列表

Error Handling

错误处理

ErrorResponse
"Compound not found"Try synonyms, verify spelling, offer SMILES search
"No ChEMBL ID"Note in Identity section, continue with PubChem data
"No bioactivity data"Include section with "No bioactivity screening data available"
"API timeout"Retry once, note unavailable data with "(retrieval failed)"

错误响应方式
“未找到化合物”尝试同义词,验证拼写,提供SMILES搜索选项
“无ChEMBL ID”在身份信息部分标注,继续使用PubChem数据
“无生物活性数据”包含该部分并标注“无生物活性筛选数据”
“API超时”重试一次,标注数据“(检索失败)”

Tool Reference

工具参考

PubChem (Chemical Database)
ToolPurpose
PubChem_get_CID_by_compound_name
Name → CID
PubChem_get_CID_by_SMILES
Structure → CID
PubChem_get_compound_properties_by_CID
Molecular properties
PubChem_get_compound_2D_image_by_CID
Structure visualization
PubChem_get_bioactivity_summary_by_CID
Activity overview
PubChem_get_drug_label_info_by_CID
FDA drug labels
PubChem_get_associated_patents_by_CID
IP information
PubChem_search_compounds_by_similarity
Find analogs
PubChem_search_compounds_by_substructure
Substructure search
ChEMBL (Bioactivity Database)
ToolPurpose
ChEMBL_search_compounds
Name/structure search
ChEMBL_get_compound_by_chemblid
Compound details
ChEMBL_get_bioactivity_by_chemblid
Activity data
ChEMBL_get_target_by_chemblid
Protein targets
ChEMBL_search_targets
Target search
ChEMBL_get_assays_by_chemblid
Assay metadata
PubChem(化学数据库)
工具用途
PubChem_get_CID_by_compound_name
名称→CID
PubChem_get_CID_by_SMILES
结构→CID
PubChem_get_compound_properties_by_CID
分子属性
PubChem_get_compound_2D_image_by_CID
结构可视化
PubChem_get_bioactivity_summary_by_CID
活性概述
PubChem_get_drug_label_info_by_CID
FDA药物标签
PubChem_get_associated_patents_by_CID
知识产权信息
PubChem_search_compounds_by_similarity
查找类似物
PubChem_search_compounds_by_substructure
子结构搜索
ChEMBL(生物活性数据库)
工具用途
ChEMBL_search_compounds
名称/结构搜索
ChEMBL_get_compound_by_chemblid
化合物详情
ChEMBL_get_bioactivity_by_chemblid
活性数据
ChEMBL_get_target_by_chemblid
蛋白靶点
ChEMBL_search_targets
靶点搜索
ChEMBL_get_assays_by_chemblid
实验元数据