tooluniverse-chemical-compound-retrieval
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseChemical Compound Information Retrieval
化合物信息检索
Retrieve comprehensive chemical compound data with proper disambiguation and cross-database validation.
IMPORTANT: Always use English compound names and search terms in tool calls, even if the user writes in another language (e.g., translate "阿司匹林" to "aspirin"). Only try original-language terms as a fallback if English returns no results. Respond in the user's language.
通过适当的歧义消除和跨数据库验证,检索全面的化合物数据。
重要提示:在工具调用中始终使用英文化合物名称和搜索词,即使用户使用其他语言输入(例如,将“阿司匹林”翻译为“aspirin”)。只有当英文搜索无结果时,才尝试使用原语言术语作为备选。使用用户的语言进行回复。
Workflow Overview
流程概述
Phase 0: Clarify (if needed)
↓
Phase 1: Disambiguate Compound Identity
↓
Phase 2: Retrieve Data (Internal)
↓
Phase 3: Report Compound Profile阶段0:澄清(如有需要)
↓
阶段1:化合物身份歧义消除
↓
阶段2:数据检索(内部)
↓
阶段3:生成化合物档案报告Phase 0: Clarification (When Needed)
阶段0:澄清(必要时)
Ask the user ONLY if:
- Compound name is highly ambiguous (e.g., "vitamin E" → α, β, γ, δ-tocopherol?)
- Multiple distinct compounds share the name (e.g., "aspirin" is clear; "sterol" is not)
Skip clarification for:
- Unambiguous drug names (aspirin, ibuprofen, metformin)
- Specific identifiers provided (CID, ChEMBL ID, SMILES)
- Clear structural queries (SMILES, InChI)
仅在以下情况询问用户:
- 化合物名称存在高度歧义(例如,“维生素E”→ α、β、γ、δ-生育酚?)
- 多个不同化合物共享同一名称(例如,“阿司匹林”是明确的;“甾醇”则不是)
无需澄清的情况:
- 明确的药物名称(阿司匹林、布洛芬、二甲双胍)
- 提供了特定标识符(CID、ChEMBL ID、SMILES)
- 明确的结构查询(SMILES、InChI)
Phase 1: Compound Disambiguation
阶段1:化合物身份歧义消除
1.1 Resolve Primary Identifier
1.1 解析主标识符
python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()Strategy depends on input type
策略取决于输入类型
if user_provided_cid:
cid = user_provided_cid
elif user_provided_smiles:
result = tu.tools.PubChem_get_CID_by_SMILES(smiles=smiles)
cid = result["data"]["cid"]
elif user_provided_name:
result = tu.tools.PubChem_get_CID_by_compound_name(compound_name=name)
cid = result["data"]["cid"]
undefinedif user_provided_cid:
cid = user_provided_cid
elif user_provided_smiles:
result = tu.tools.PubChem_get_CID_by_SMILES(smiles=smiles)
cid = result["data"]["cid"]
elif user_provided_name:
result = tu.tools.PubChem_get_CID_by_compound_name(compound_name=name)
cid = result["data"]["cid"]
undefined1.2 Cross-Reference Identifiers
1.2 交叉引用标识符
Always establish compound identity across both databases:
python
undefined始终在两个数据库间确认化合物身份:
python
undefinedPubChem → ChEMBL cross-reference
PubChem → ChEMBL 交叉引用
chembl_result = tu.tools.ChEMBL_search_compounds(query=compound_name, limit=5)
if chembl_result["data"]:
chembl_id = chembl_result["data"][0]["molecule_chembl_id"]
undefinedchembl_result = tu.tools.ChEMBL_search_compounds(query=compound_name, limit=5)
if chembl_result["data"]:
chembl_id = chembl_result["data"][0]["molecule_chembl_id"]
undefined1.3 Handle Naming Collisions
1.3 处理命名冲突
For generic names (e.g., "vitamin", "steroid", "acid"):
- Search returns multiple CIDs → present top matches with structures
- Verify SMILES/InChI matches user intent
- Note stereoisomers or salt forms if relevant
Identity Resolution Checklist:
- PubChem CID established
- ChEMBL ID cross-referenced (if exists)
- Canonical SMILES captured
- Stereochemistry noted (if relevant)
- Salt forms identified (if applicable)
对于通用名称(例如,“维生素”、“类固醇”、“酸”):
- 搜索返回多个CID时,展示排名靠前的匹配结果及结构
- 验证SMILES/InChI是否符合用户需求
- 若相关,标注立体异构体或盐形式
身份解析检查清单:
- 已确定PubChem CID
- 已交叉引用ChEMBL ID(如果存在)
- 已捕获标准SMILES
- 已标注立体化学信息(如果相关)
- 已识别盐形式(如果适用)
Phase 2: Data Retrieval (Internal)
阶段2:数据检索(内部)
Retrieve all data silently. Do NOT narrate the search process.
静默检索所有数据,不要描述搜索过程。
2.1 Core Properties (PubChem)
2.1 核心属性(PubChem)
python
undefinedpython
undefinedBasic properties
基础属性
props = tu.tools.PubChem_get_compound_properties_by_CID(cid=cid)
props = tu.tools.PubChem_get_compound_properties_by_CID(cid=cid)
Bioactivity summary
生物活性摘要
bio = tu.tools.PubChem_get_bioactivity_summary_by_CID(cid=cid)
bio = tu.tools.PubChem_get_bioactivity_summary_by_CID(cid=cid)
Drug label (if approved drug)
药物标签(若为已获批药物)
drug = tu.tools.PubChem_get_drug_label_info_by_CID(cid=cid)
drug = tu.tools.PubChem_get_drug_label_info_by_CID(cid=cid)
Structure image
结构图像
image = tu.tools.PubChem_get_compound_2D_image_by_CID(cid=cid)
undefinedimage = tu.tools.PubChem_get_compound_2D_image_by_CID(cid=cid)
undefined2.2 Bioactivity Data (ChEMBL)
2.2 生物活性数据(ChEMBL)
python
if chembl_id:
# Detailed bioactivity
activity = tu.tools.ChEMBL_get_bioactivity_by_chemblid(chembl_id=chembl_id)
# Protein targets
targets = tu.tools.ChEMBL_get_target_by_chemblid(chembl_id=chembl_id)
# Assay data
assays = tu.tools.ChEMBL_get_assays_by_chemblid(chembl_id=chembl_id)python
if chembl_id:
# 详细生物活性数据
activity = tu.tools.ChEMBL_get_bioactivity_by_chemblid(chembl_id=chembl_id)
# 蛋白靶点
targets = tu.tools.ChEMBL_get_target_by_chemblid(chembl_id=chembl_id)
# 实验数据
assays = tu.tools.ChEMBL_get_assays_by_chemblid(chembl_id=chembl_id)2.3 Optional Extended Data
2.3 可选扩展数据
python
undefinedpython
undefinedPatents (for drugs)
专利信息(针对药物)
patents = tu.tools.PubChem_get_associated_patents_by_CID(cid=cid)
patents = tu.tools.PubChem_get_associated_patents_by_CID(cid=cid)
Similar compounds (for SAR)
相似化合物(用于构效关系分析)
similar = tu.tools.PubChem_search_compounds_by_similarity(cid=cid, threshold=85)
undefinedsimilar = tu.tools.PubChem_search_compounds_by_similarity(cid=cid, threshold=85)
undefinedFallback Chains
备选检索链
| Primary | Fallback | Notes |
|---|---|---|
| PubChem_get_CID_by_compound_name | ChEMBL_search_compounds → get SMILES → PubChem_get_CID_by_SMILES | Name lookup failed |
| ChEMBL_get_bioactivity | PubChem_get_bioactivity_summary | ChEMBL ID unavailable |
| PubChem_get_drug_label_info | Note "Drug label unavailable" | Not an approved drug |
| 主检索方式 | 备选方式 | 说明 |
|---|---|---|
| PubChem_get_CID_by_compound_name | ChEMBL_search_compounds → 获取SMILES → PubChem_get_CID_by_SMILES | 名称查找失败 |
| ChEMBL_get_bioactivity | PubChem_get_bioactivity_summary | ChEMBL ID不可用 |
| PubChem_get_drug_label_info | 标注“药物标签不可用” | 非已获批药物 |
Phase 3: Report Compound Profile
阶段3:生成化合物档案报告
Output Structure
输出结构
Present results as a Compound Profile Report. Hide all search process details.
markdown
undefined将结果以化合物档案报告的形式呈现,隐藏所有搜索过程细节。
markdown
undefinedCompound Profile: [Compound Name]
化合物档案:[化合物名称]
Identity
身份信息
| Property | Value |
|---|---|
| PubChem CID | [cid] |
| ChEMBL ID | [chembl_id or "N/A"] |
| IUPAC Name | [full name] |
| Common Names | [synonyms] |
| 属性 | 值 |
|---|---|
| PubChem CID | [cid] |
| ChEMBL ID | [chembl_id 或 "N/A"] |
| IUPAC名称 | [完整名称] |
| 常用名称 | [同义词] |
Chemical Properties
化学属性
Molecular Descriptors
分子描述符
| Property | Value | Drug-Likeness |
|---|---|---|
| Formula | C₉H₈O₄ | - |
| Molecular Weight | 180.16 g/mol | ✓ (<500) |
| LogP | 1.19 | ✓ (-2 to 5) |
| H-Bond Donors | 1 | ✓ (<5) |
| H-Bond Acceptors | 4 | ✓ (<10) |
| Polar Surface Area | 63.6 Ų | ✓ (<140) |
| Rotatable Bonds | 3 | ✓ (<10) |
| 属性 | 值 | 类药性 |
|---|---|---|
| 分子式 | C₉H₈O₄ | - |
| 分子量 | 180.16 g/mol | ✓ (<500) |
| LogP | 1.19 | ✓ (-2至5) |
| 氢键供体数 | 1 | ✓ (<5) |
| 氢键受体数 | 4 | ✓ (<10) |
| 极性表面积 | 63.6 Ų | ✓ (<140) |
| 可旋转键数 | 3 | ✓ (<10) |
Structural Representation
结构表示
- SMILES:
CC(=O)Oc1ccccc1C(=O)O - InChI:
InChI=1S/C9H8O4/...
[2D structure image if available]
- SMILES:
CC(=O)Oc1ccccc1C(=O)O - InChI:
InChI=1S/C9H8O4/...
[若有2D结构图像则展示]
Bioactivity Profile
生物活性档案
Summary
摘要
- Active in: [X] assays out of [Y] tested
- Primary Targets: [list top targets]
- Mechanism: [if known]
- 活性实验数: [Y]项实验中有[X]项显示活性
- 主要靶点: [列出排名靠前的靶点]
- 作用机制: [若已知]
Key Target Interactions (from ChEMBL)
关键靶点相互作用(来自ChEMBL)
| Target | Activity Type | Value | Units |
|---|---|---|---|
| [Target 1] | IC50 | [value] | nM |
| [Target 2] | Ki | [value] | nM |
| 靶点 | 活性类型 | 值 | 单位 |
|---|---|---|---|
| [靶点1] | IC50 | [数值] | nM |
| [靶点2] | Ki | [数值] | nM |
Drug Information (if applicable)
药物信息(如适用)
Clinical Status
临床状态
| Property | Value |
|---|---|
| Approval Status | [Approved/Investigational/N/A] |
| Drug Class | [therapeutic class] |
| Indication | [approved uses] |
| Route | [oral/IV/topical/etc.] |
| 属性 | 值 |
|---|---|
| 获批状态 | [已获批/研究中/N/A] |
| 药物类别 | [治疗类别] |
| 适应症 | [获批用途] |
| 给药途径 | [口服/静脉/外用等] |
Safety
安全性
- Black Box Warning: [Yes/No]
- Major Interactions: [if any]
- 黑框警告: [是/否]
- 主要相互作用: [如有]
Related Compounds (if retrieved)
相关化合物(如已检索)
Top 5 structurally similar compounds:
| CID | Name | Similarity | Key Difference |
|---|---|---|---|
| [cid] | [name] | 95% | [note] |
排名前5的结构相似化合物:
| CID | 名称 | 相似度 | 关键差异 |
|---|---|---|---|
| [cid] | [名称] | 95% | [说明] |
Data Sources
数据来源
- PubChem: [CID link]
- ChEMBL: [ChEMBL ID link]
- Retrieved: [date]
---- PubChem: [CID链接]
- ChEMBL: [ChEMBL ID链接]
- 检索日期: [日期]
---Data Quality Tiers
数据质量等级
Apply to data completeness assessment:
| Tier | Symbol | Criteria |
|---|---|---|
| Complete | ●●● | All core properties + bioactivity + drug info |
| Substantial | ●●○ | Core properties + bioactivity OR drug info |
| Basic | ●○○ | Core properties only |
| Minimal | ○○○ | CID/name only, limited data |
Include in report header:
markdown
**Data Completeness**: ●●● Complete (properties, bioactivity, drug data)用于评估数据完整性:
| 等级 | 符号 | 标准 |
|---|---|---|
| 完整 | ●●● | 包含所有核心属性+生物活性+药物信息 |
| 丰富 | ●●○ | 包含核心属性+生物活性或药物信息 |
| 基础 | ●○○ | 仅包含核心属性 |
| 最小 | ○○○ | 仅包含CID/名称,数据有限 |
在报告头部包含:
markdown
**数据完整性**: ●●● 完整(属性、生物活性、药物数据)Completeness Checklist
完整性检查清单
Every compound profile MUST include these sections (even if "unavailable"):
每个化合物档案必须包含以下部分(即使标注为“不可用”):
Identity (Required)
身份信息(必填)
- PubChem CID
- ChEMBL ID (or "N/A")
- IUPAC name
- Canonical SMILES
- PubChem CID
- ChEMBL ID(或“N/A”)
- IUPAC名称
- 标准SMILES
Properties (Required)
属性信息(必填)
- Molecular formula
- Molecular weight
- LogP
- Lipinski rule assessment
- 分子式
- 分子量
- LogP
- Lipinski规则评估
Bioactivity (Required)
生物活性信息(必填)
- Activity summary (or "No bioactivity data")
- Primary targets (or "Unknown")
- 活性摘要(或“无生物活性数据”)
- 主要靶点(或“未知”)
Drug Info (If Approved Drug)
药物信息(若为已获批药物)
- Approval status
- Indication
- Drug class
- 获批状态
- 适应症
- 药物类别
Always Include
必须包含
- Data sources with links
- Retrieval date
- Quality tier assessment
- 带链接的数据来源
- 检索日期
- 质量等级评估
Common Use Cases
常见使用场景
Drug Property Check
药物属性查询
User: "Tell me about metformin"
→ Full compound profile with drug information emphasis
用户:“告诉我关于二甲双胍的信息”
→ 生成强调药物信息的完整化合物档案
Structure Verification
结构验证
User: "Verify this SMILES: CC(=O)Oc1ccccc1C(=O)O"
→ Disambiguation-focused profile, confirm identity
用户:“验证这个SMILES:CC(=O)Oc1ccccc1C(=O)O”
→ 生成以歧义消除为重点的档案,确认身份
SAR Analysis
构效关系分析
User: "Find compounds similar to ibuprofen"
→ Similarity search + comparative property table
用户:“找到与布洛芬结构相似的化合物”
→ 生成相似性搜索结果+对比属性表
Target Identification
靶点识别
User: "What proteins does gefitinib target?"
→ ChEMBL bioactivity emphasis with target list
用户:“吉非替尼作用于哪些蛋白?”
→ 生成以ChEMBL生物活性数据为重点的靶点列表
Error Handling
错误处理
| Error | Response |
|---|---|
| "Compound not found" | Try synonyms, verify spelling, offer SMILES search |
| "No ChEMBL ID" | Note in Identity section, continue with PubChem data |
| "No bioactivity data" | Include section with "No bioactivity screening data available" |
| "API timeout" | Retry once, note unavailable data with "(retrieval failed)" |
| 错误 | 响应方式 |
|---|---|
| “未找到化合物” | 尝试同义词,验证拼写,提供SMILES搜索选项 |
| “无ChEMBL ID” | 在身份信息部分标注,继续使用PubChem数据 |
| “无生物活性数据” | 包含该部分并标注“无生物活性筛选数据” |
| “API超时” | 重试一次,标注数据“(检索失败)” |
Tool Reference
工具参考
PubChem (Chemical Database)
| Tool | Purpose |
|---|---|
| Name → CID |
| Structure → CID |
| Molecular properties |
| Structure visualization |
| Activity overview |
| FDA drug labels |
| IP information |
| Find analogs |
| Substructure search |
ChEMBL (Bioactivity Database)
| Tool | Purpose |
|---|---|
| Name/structure search |
| Compound details |
| Activity data |
| Protein targets |
| Target search |
| Assay metadata |
PubChem(化学数据库)
| 工具 | 用途 |
|---|---|
| 名称→CID |
| 结构→CID |
| 分子属性 |
| 结构可视化 |
| 活性概述 |
| FDA药物标签 |
| 知识产权信息 |
| 查找类似物 |
| 子结构搜索 |
ChEMBL(生物活性数据库)
| 工具 | 用途 |
|---|---|
| 名称/结构搜索 |
| 化合物详情 |
| 活性数据 |
| 蛋白靶点 |
| 靶点搜索 |
| 实验元数据 |