pubchem-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePubChem Database
PubChem数据库
Overview
概述
PubChem is the world's largest freely available chemical database with 110M+ compounds and 270M+ bioactivities. Query chemical structures by name, CID, or SMILES, retrieve molecular properties, perform similarity and substructure searches, access bioactivity data using PUG-REST API and PubChemPy.
PubChem是全球最大的免费化学数据库,包含1.1亿+化合物和2.7亿+生物活性数据。可通过PUG-REST API和PubChemPy,按名称、CID或SMILES查询化学结构,获取分子属性,执行相似性和子结构搜索,访问生物活性数据。
When to Use This Skill
何时使用本技能
This skill should be used when:
- Searching for chemical compounds by name, structure (SMILES/InChI), or molecular formula
- Retrieving molecular properties (MW, LogP, TPSA, hydrogen bonding descriptors)
- Performing similarity searches to find structurally related compounds
- Conducting substructure searches for specific chemical motifs
- Accessing bioactivity data from screening assays
- Converting between chemical identifier formats (CID, SMILES, InChI)
- Batch processing multiple compounds for drug-likeness screening or property analysis
当你需要以下操作时,可使用本技能:
- 按名称、结构(SMILES/InChI)或分子式搜索化学化合物
- 获取分子属性(分子量MW、脂水分配系数LogP、拓扑极性表面积TPSA、氢键描述符等)
- 执行相似性搜索以找到结构相关的化合物
- 针对特定化学基序执行子结构搜索
- 从筛选试验中获取生物活性数据
- 在不同化学标识符格式间转换(CID、SMILES、InChI)
- 批量处理多个化合物,用于类药性筛选或属性分析
Core Capabilities
核心功能
1. Chemical Structure Search
1. 化学结构搜索
Search for compounds using multiple identifier types:
By Chemical Name:
python
import pubchempy as pcp
compounds = pcp.get_compounds('aspirin', 'name')
compound = compounds[0]By CID (Compound ID):
python
compound = pcp.Compound.from_cid(2244) # AspirinBy SMILES:
python
compound = pcp.get_compounds('CC(=O)OC1=CC=CC=C1C(=O)O', 'smiles')[0]By InChI:
python
compound = pcp.get_compounds('InChI=1S/C9H8O4/...', 'inchi')[0]By Molecular Formula:
python
compounds = pcp.get_compounds('C9H8O4', 'formula')支持通过多种标识符类型搜索化合物:
按化学名称:
python
import pubchempy as pcp
compounds = pcp.get_compounds('aspirin', 'name')
compound = compounds[0]按CID(化合物ID):
python
compound = pcp.Compound.from_cid(2244) # Aspirin按SMILES:
python
compound = pcp.get_compounds('CC(=O)OC1=CC=CC=C1C(=O)O', 'smiles')[0]按InChI:
python
compound = pcp.get_compounds('InChI=1S/C9H8O4/...', 'inchi')[0]按分子式:
python
compounds = pcp.get_compounds('C9H8O4', 'formula')Returns all compounds matching this formula
返回所有匹配该分子式的化合物
undefinedundefined2. Property Retrieval
2. 属性检索
Retrieve molecular properties for compounds using either high-level or low-level approaches:
Using PubChemPy (Recommended):
python
import pubchempy as pcp可通过高级或基础方式检索化合物的分子属性:
使用PubChemPy(推荐):
python
import pubchempy as pcpGet compound object with all properties
获取包含所有属性的化合物对象
compound = pcp.get_compounds('caffeine', 'name')[0]
compound = pcp.get_compounds('caffeine', 'name')[0]
Access individual properties
访问单个属性
molecular_formula = compound.molecular_formula
molecular_weight = compound.molecular_weight
iupac_name = compound.iupac_name
smiles = compound.canonical_smiles
inchi = compound.inchi
xlogp = compound.xlogp # Partition coefficient
tpsa = compound.tpsa # Topological polar surface area
**Get Specific Properties**:
```pythonmolecular_formula = compound.molecular_formula
molecular_weight = compound.molecular_weight
iupac_name = compound.iupac_name
smiles = compound.canonical_smiles
inchi = compound.inchi
xlogp = compound.xlogp # 分配系数
tpsa = compound.tpsa # 拓扑极性表面积
**获取特定属性**:
```pythonRequest only specific properties
仅请求特定属性
properties = pcp.get_properties(
['MolecularFormula', 'MolecularWeight', 'CanonicalSMILES', 'XLogP'],
'aspirin',
'name'
)
properties = pcp.get_properties(
['MolecularFormula', 'MolecularWeight', 'CanonicalSMILES', 'XLogP'],
'aspirin',
'name'
)
Returns list of dictionaries
返回字典列表
**Batch Property Retrieval**:
```python
import pandas as pd
compound_names = ['aspirin', 'ibuprofen', 'paracetamol']
all_properties = []
for name in compound_names:
props = pcp.get_properties(
['MolecularFormula', 'MolecularWeight', 'XLogP'],
name,
'name'
)
all_properties.extend(props)
df = pd.DataFrame(all_properties)Available Properties: MolecularFormula, MolecularWeight, CanonicalSMILES, IsomericSMILES, InChI, InChIKey, IUPACName, XLogP, TPSA, HBondDonorCount, HBondAcceptorCount, RotatableBondCount, Complexity, Charge, and many more (see for complete list).
references/api_reference.md
**批量属性检索**:
```python
import pandas as pd
compound_names = ['aspirin', 'ibuprofen', 'paracetamol']
all_properties = []
for name in compound_names:
props = pcp.get_properties(
['MolecularFormula', 'MolecularWeight', 'XLogP'],
name,
'name'
)
all_properties.extend(props)
df = pd.DataFrame(all_properties)可用属性:MolecularFormula、MolecularWeight、CanonicalSMILES、IsomericSMILES、InChI、InChIKey、IUPACName、XLogP、TPSA、HBondDonorCount、HBondAcceptorCount、RotatableBondCount、Complexity、Charge等(完整列表请参考)。
references/api_reference.md3. Similarity Search
3. 相似性搜索
Find structurally similar compounds using Tanimoto similarity:
python
import pubchempy as pcp使用Tanimoto相似性寻找结构相似的化合物:
python
import pubchempy as pcpStart with a query compound
以目标化合物为查询起点
query_compound = pcp.get_compounds('gefitinib', 'name')[0]
query_smiles = query_compound.canonical_smiles
query_compound = pcp.get_compounds('gefitinib', 'name')[0]
query_smiles = query_compound.canonical_smiles
Perform similarity search
执行相似性搜索
similar_compounds = pcp.get_compounds(
query_smiles,
'smiles',
searchtype='similarity',
Threshold=85, # Similarity threshold (0-100)
MaxRecords=50
)
similar_compounds = pcp.get_compounds(
query_smiles,
'smiles',
searchtype='similarity',
Threshold=85, # 相似性阈值(0-100)
MaxRecords=50
)
Process results
处理结果
for compound in similar_compounds[:10]:
print(f"CID {compound.cid}: {compound.iupac_name}")
print(f" MW: {compound.molecular_weight}")
**Note**: Similarity searches are asynchronous for large queries and may take 15-30 seconds to complete. PubChemPy handles the asynchronous pattern automatically.for compound in similar_compounds[:10]:
print(f"CID {compound.cid}: {compound.iupac_name}")
print(f" 分子量: {compound.molecular_weight}")
**注意**:大型查询的相似性搜索为异步操作,可能需要15-30秒完成。PubChemPy会自动处理异步逻辑。4. Substructure Search
4. 子结构搜索
Find compounds containing a specific structural motif:
python
import pubchempy as pcp寻找包含特定结构基序的化合物:
python
import pubchempy as pcpSearch for compounds containing pyridine ring
搜索包含吡啶环的化合物
pyridine_smiles = 'c1ccncc1'
matches = pcp.get_compounds(
pyridine_smiles,
'smiles',
searchtype='substructure',
MaxRecords=100
)
print(f"Found {len(matches)} compounds containing pyridine")
**Common Substructures**:
- Benzene ring: `c1ccccc1`
- Pyridine: `c1ccncc1`
- Phenol: `c1ccc(O)cc1`
- Carboxylic acid: `C(=O)O`pyridine_smiles = 'c1ccncc1'
matches = pcp.get_compounds(
pyridine_smiles,
'smiles',
searchtype='substructure',
MaxRecords=100
)
print(f"找到{len(matches)}种含吡啶环的化合物")
**常见子结构**:
- 苯环: `c1ccccc1`
- 吡啶: `c1ccncc1`
- 苯酚: `c1ccc(O)cc1`
- 羧酸: `C(=O)O`5. Format Conversion
5. 格式转换
Convert between different chemical structure formats:
python
import pubchempy as pcp
compound = pcp.get_compounds('aspirin', 'name')[0]在不同化学结构格式间转换:
python
import pubchempy as pcp
compound = pcp.get_compounds('aspirin', 'name')[0]Convert to different formats
转换为不同格式
smiles = compound.canonical_smiles
inchi = compound.inchi
inchikey = compound.inchikey
cid = compound.cid
smiles = compound.canonical_smiles
inchi = compound.inchi
inchikey = compound.inchikey
cid = compound.cid
Download structure files
下载结构文件
pcp.download('SDF', 'aspirin', 'name', 'aspirin.sdf', overwrite=True)
pcp.download('JSON', '2244', 'cid', 'aspirin.json', overwrite=True)
undefinedpcp.download('SDF', 'aspirin', 'name', 'aspirin.sdf', overwrite=True)
pcp.download('JSON', '2244', 'cid', 'aspirin.json', overwrite=True)
undefined6. Structure Visualization
6. 结构可视化
Generate 2D structure images:
python
import pubchempy as pcp生成2D结构图像:
python
import pubchempy as pcpDownload compound structure as PNG
下载化合物结构为PNG格式
pcp.download('PNG', 'caffeine', 'name', 'caffeine.png', overwrite=True)
pcp.download('PNG', 'caffeine', 'name', 'caffeine.png', overwrite=True)
Using direct URL (via requests)
通过直接URL获取(使用requests)
import requests
cid = 2244 # Aspirin
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/PNG?image_size=large"
response = requests.get(url)
with open('structure.png', 'wb') as f:
f.write(response.content)
undefinedimport requests
cid = 2244 # 阿司匹林
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/PNG?image_size=large"
response = requests.get(url)
with open('structure.png', 'wb') as f:
f.write(response.content)
undefined7. Synonym Retrieval
7. 同义词检索
Get all known names and synonyms for a compound:
python
import pubchempy as pcp
synonyms_data = pcp.get_synonyms('aspirin', 'name')
if synonyms_data:
cid = synonyms_data[0]['CID']
synonyms = synonyms_data[0]['Synonym']
print(f"CID {cid} has {len(synonyms)} synonyms:")
for syn in synonyms[:10]: # First 10
print(f" - {syn}")获取化合物的所有已知名称和同义词:
python
import pubchempy as pcp
synonyms_data = pcp.get_synonyms('aspirin', 'name')
if synonyms_data:
cid = synonyms_data[0]['CID']
synonyms = synonyms_data[0]['Synonym']
print(f"CID {cid}共有{len(synonyms)}个同义词:")
for syn in synonyms[:10]: # 显示前10个
print(f" - {syn}")8. Bioactivity Data Access
8. 生物活性数据访问
Retrieve biological activity data from assays:
python
import requests
import json从试验中检索生物活性数据:
python
import requests
import jsonGet bioassay summary for a compound
获取化合物的生物试验摘要
cid = 2244 # Aspirin
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/assaysummary/JSON"
response = requests.get(url)
if response.status_code == 200:
data = response.json()
# Process bioassay information
table = data.get('Table', {})
rows = table.get('Row', [])
print(f"Found {len(rows)} bioassay records")
**For more complex bioactivity queries**, use the `scripts/bioactivity_query.py` helper script which provides:
- Bioassay summaries with activity outcome filtering
- Assay target identification
- Search for compounds by biological target
- Active compound lists for specific assayscid = 2244 # 阿司匹林
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/assaysummary/JSON"
response = requests.get(url)
if response.status_code == 200:
data = response.json()
# 处理生物试验信息
table = data.get('Table', {})
rows = table.get('Row', [])
print(f"找到{len(rows)}条生物试验记录")
**针对更复杂的生物活性查询**,可使用`scripts/bioactivity_query.py`辅助脚本,该脚本提供:
- 带活性结果过滤的生物试验摘要
- 试验靶点识别
- 按生物靶点搜索化合物
- 特定试验的活性化合物列表9. Comprehensive Compound Annotations
9. 全面的化合物注释
Access detailed compound information through PUG-View:
python
import requests
cid = 2244
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/{cid}/JSON"
response = requests.get(url)
if response.status_code == 200:
annotations = response.json()
# Contains extensive data including:
# - Chemical and Physical Properties
# - Drug and Medication Information
# - Pharmacology and Biochemistry
# - Safety and Hazards
# - Toxicity
# - Literature references
# - PatentsGet Specific Section:
python
undefined通过PUG-View访问详细的化合物信息:
python
import requests
cid = 2244
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/{cid}/JSON"
response = requests.get(url)
if response.status_code == 200:
annotations = response.json()
# 包含以下丰富数据:
# - 化学与物理属性
# - 药物与用药信息
# - 药理学与生物化学
# - 安全与危害
# - 毒性
# - 文献参考
# - 专利获取特定章节:
python
undefinedGet only drug information
仅获取药物信息
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/{cid}/JSON?heading=Drug and Medication Information"
undefinedurl = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/{cid}/JSON?heading=Drug and Medication Information"
undefinedInstallation Requirements
安装要求
Install PubChemPy for Python-based access:
bash
uv pip install pubchempyFor direct API access and bioactivity queries:
bash
uv pip install requestsOptional for data analysis:
bash
uv pip install pandas安装PubChemPy以实现基于Python的访问:
bash
uv pip install pubchempy对于直接API访问和生物活性查询:
bash
uv pip install requests数据分析可选依赖:
bash
uv pip install pandasHelper Scripts
辅助脚本
This skill includes Python scripts for common PubChem tasks:
本技能包含用于常见PubChem任务的Python脚本:
scripts/compound_search.py
scripts/compound_search.py
Provides utility functions for searching and retrieving compound information:
Key Functions:
- : Search compounds by name
search_by_name(name, max_results=10) - : Search by SMILES string
search_by_smiles(smiles) - : Retrieve compound by CID
get_compound_by_cid(cid) - : Get specific properties
get_compound_properties(identifier, namespace, properties) - : Perform similarity search
similarity_search(smiles, threshold, max_records) - : Perform substructure search
substructure_search(smiles, max_records) - : Get all synonyms
get_synonyms(identifier, namespace) - : Batch search multiple compounds
batch_search(identifiers, namespace, properties) - : Download structures
download_structure(identifier, namespace, format, filename) - : Print formatted compound information
print_compound_info(compound)
Usage:
python
from scripts.compound_search import search_by_name, get_compound_properties提供用于搜索和检索化合物信息的工具函数:
核心函数:
- : 按名称搜索化合物
search_by_name(name, max_results=10) - : 按SMILES字符串搜索
search_by_smiles(smiles) - : 通过CID检索化合物
get_compound_by_cid(cid) - : 获取特定属性
get_compound_properties(identifier, namespace, properties) - : 执行相似性搜索
similarity_search(smiles, threshold, max_records) - : 执行子结构搜索
substructure_search(smiles, max_records) - : 获取所有同义词
get_synonyms(identifier, namespace) - : 批量搜索多个化合物
batch_search(identifiers, namespace, properties) - : 下载结构文件
download_structure(identifier, namespace, format, filename) - : 格式化打印化合物信息
print_compound_info(compound)
使用示例:
python
from scripts.compound_search import search_by_name, get_compound_propertiesSearch for a compound
搜索化合物
compounds = search_by_name('ibuprofen')
compounds = search_by_name('ibuprofen')
Get specific properties
获取特定属性
props = get_compound_properties('aspirin', 'name', ['MolecularWeight', 'XLogP'])
undefinedprops = get_compound_properties('aspirin', 'name', ['MolecularWeight', 'XLogP'])
undefinedscripts/bioactivity_query.py
scripts/bioactivity_query.py
Provides functions for retrieving biological activity data:
Key Functions:
- : Get bioassay summary for compound
get_bioassay_summary(cid) - : Get filtered bioactivities
get_compound_bioactivities(cid, activity_outcome) - : Get detailed assay information
get_assay_description(aid) - : Get biological targets for assay
get_assay_targets(aid) - : Find assays by target
search_assays_by_target(target_name, max_results) - : Get active compounds
get_active_compounds_in_assay(aid, max_results) - : Get PUG-View annotations
get_compound_annotations(cid, section) - : Generate bioactivity summary statistics
summarize_bioactivities(cid) - : Find compounds by target
find_compounds_by_bioactivity(target, threshold, max_compounds)
Usage:
python
from scripts.bioactivity_query import get_bioassay_summary, summarize_bioactivities提供用于检索生物活性数据的函数:
核心函数:
- : 获取化合物的生物试验摘要
get_bioassay_summary(cid) - : 获取过滤后的生物活性数据
get_compound_bioactivities(cid, activity_outcome) - : 获取详细的试验信息
get_assay_description(aid) - : 获取试验的生物靶点
get_assay_targets(aid) - : 按靶点查找试验
search_assays_by_target(target_name, max_results) - : 获取试验中的活性化合物
get_active_compounds_in_assay(aid, max_results) - : 获取PUG-View注释
get_compound_annotations(cid, section) - : 生成生物活性摘要统计
summarize_bioactivities(cid) - : 按靶点查找化合物
find_compounds_by_bioactivity(target, threshold, max_compounds)
使用示例:
python
from scripts.bioactivity_query import get_bioassay_summary, summarize_bioactivitiesGet bioactivity summary
获取生物活性摘要
summary = summarize_bioactivities(2244) # Aspirin
print(f"Total assays: {summary['total_assays']}")
print(f"Active: {summary['active']}, Inactive: {summary['inactive']}")
undefinedsummary = summarize_bioactivities(2244) # 阿司匹林
print(f"总试验数: {summary['total_assays']}")
print(f"活性: {summary['active']}, 非活性: {summary['inactive']}")
undefinedAPI Rate Limits and Best Practices
API速率限制与最佳实践
Rate Limits:
- Maximum 5 requests per second
- Maximum 400 requests per minute
- Maximum 300 seconds running time per minute
Best Practices:
- Use CIDs for repeated queries: CIDs are more efficient than names or structures
- Cache results locally: Store frequently accessed data
- Batch requests: Combine multiple queries when possible
- Implement delays: Add 0.2-0.3 second delays between requests
- Handle errors gracefully: Check for HTTP errors and missing data
- Use PubChemPy: Higher-level abstraction handles many edge cases
- Leverage asynchronous pattern: For large similarity/substructure searches
- Specify MaxRecords: Limit results to avoid timeouts
Error Handling:
python
from pubchempy import BadRequestError, NotFoundError, TimeoutError
try:
compound = pcp.get_compounds('query', 'name')[0]
except NotFoundError:
print("Compound not found")
except BadRequestError:
print("Invalid request format")
except TimeoutError:
print("Request timed out - try reducing scope")
except IndexError:
print("No results returned")速率限制:
- 每秒最多5次请求
- 每分钟最多400次请求
- 每分钟最多300秒运行时间
最佳实践:
- 重复查询使用CID: CID比名称或结构更高效
- 本地缓存结果: 存储频繁访问的数据
- 批量请求: 尽可能合并多个查询
- 添加延迟: 在请求间添加0.2-0.3秒延迟
- 优雅处理错误: 检查HTTP错误和缺失数据
- 使用PubChemPy: 高级抽象可处理许多边缘情况
- 利用异步模式: 针对大型相似性/子结构搜索
- 指定MaxRecords: 限制结果数量以避免超时
错误处理:
python
from pubchempy import BadRequestError, NotFoundError, TimeoutError
try:
compound = pcp.get_compounds('query', 'name')[0]
except NotFoundError:
print("未找到化合物")
except BadRequestError:
print("请求格式无效")
except TimeoutError:
print("请求超时 - 请尝试缩小查询范围")
except IndexError:
print("未返回结果")Common Workflows
常见工作流
Workflow 1: Chemical Identifier Conversion Pipeline
工作流1:化学标识符转换流水线
Convert between different chemical identifiers:
python
import pubchempy as pcp在不同化学标识符格式间转换:
python
import pubchempy as pcpStart with any identifier type
从任意标识符类型开始
compound = pcp.get_compounds('caffeine', 'name')[0]
compound = pcp.get_compounds('caffeine', 'name')[0]
Extract all identifier formats
提取所有标识符格式
identifiers = {
'CID': compound.cid,
'Name': compound.iupac_name,
'SMILES': compound.canonical_smiles,
'InChI': compound.inchi,
'InChIKey': compound.inchikey,
'Formula': compound.molecular_formula
}
undefinedidentifiers = {
'CID': compound.cid,
'名称': compound.iupac_name,
'SMILES': compound.canonical_smiles,
'InChI': compound.inchi,
'InChIKey': compound.inchikey,
'分子式': compound.molecular_formula
}
undefinedWorkflow 2: Drug-Like Property Screening
工作流2:类药性筛选
Screen compounds using Lipinski's Rule of Five:
python
import pubchempy as pcp
def check_drug_likeness(compound_name):
compound = pcp.get_compounds(compound_name, 'name')[0]
# Lipinski's Rule of Five
rules = {
'MW <= 500': compound.molecular_weight <= 500,
'LogP <= 5': compound.xlogp <= 5 if compound.xlogp else None,
'HBD <= 5': compound.h_bond_donor_count <= 5,
'HBA <= 10': compound.h_bond_acceptor_count <= 10
}
violations = sum(1 for v in rules.values() if v is False)
return rules, violations
rules, violations = check_drug_likeness('aspirin')
print(f"Lipinski violations: {violations}")使用Lipinski五规则筛选化合物:
python
import pubchempy as pcp
def check_drug_likeness(compound_name):
compound = pcp.get_compounds(compound_name, 'name')[0]
# Lipinski五规则
rules = {
'分子量 ≤ 500': compound.molecular_weight <= 500,
'LogP ≤ 5': compound.xlogp <= 5 if compound.xlogp else None,
'氢键供体 ≤ 5': compound.h_bond_donor_count <= 5,
'氢键受体 ≤ 10': compound.h_bond_acceptor_count <= 10
}
violations = sum(1 for v in rules.values() if v is False)
return rules, violations
rules, violations = check_drug_likeness('aspirin')
print(f"Lipinski规则违反次数: {violations}")Workflow 3: Finding Similar Drug Candidates
工作流3:寻找相似药物候选物
Identify structurally similar compounds to a known drug:
python
import pubchempy as pcp识别与已知药物结构相似的化合物:
python
import pubchempy as pcpStart with known drug
以已知药物为起点
reference_drug = pcp.get_compounds('imatinib', 'name')[0]
reference_smiles = reference_drug.canonical_smiles
reference_drug = pcp.get_compounds('imatinib', 'name')[0]
reference_smiles = reference_drug.canonical_smiles
Find similar compounds
寻找相似化合物
similar = pcp.get_compounds(
reference_smiles,
'smiles',
searchtype='similarity',
Threshold=85,
MaxRecords=20
)
similar = pcp.get_compounds(
reference_smiles,
'smiles',
searchtype='similarity',
Threshold=85,
MaxRecords=20
)
Filter by drug-like properties
按类药性过滤候选物
candidates = []
for comp in similar:
if comp.molecular_weight and 200 <= comp.molecular_weight <= 600:
if comp.xlogp and -1 <= comp.xlogp <= 5:
candidates.append(comp)
print(f"Found {len(candidates)} drug-like candidates")
undefinedcandidates = []
for comp in similar:
if comp.molecular_weight and 200 <= comp.molecular_weight <= 600:
if comp.xlogp and -1 <= comp.xlogp <= 5:
candidates.append(comp)
print(f"找到{len(candidates)}种类药性候选物")
undefinedWorkflow 4: Batch Compound Property Comparison
工作流4:批量化合物属性对比
Compare properties across multiple compounds:
python
import pubchempy as pcp
import pandas as pd
compound_list = ['aspirin', 'ibuprofen', 'naproxen', 'celecoxib']
properties_list = []
for name in compound_list:
try:
compound = pcp.get_compounds(name, 'name')[0]
properties_list.append({
'Name': name,
'CID': compound.cid,
'Formula': compound.molecular_formula,
'MW': compound.molecular_weight,
'LogP': compound.xlogp,
'TPSA': compound.tpsa,
'HBD': compound.h_bond_donor_count,
'HBA': compound.h_bond_acceptor_count
})
except Exception as e:
print(f"Error processing {name}: {e}")
df = pd.DataFrame(properties_list)
print(df.to_string(index=False))对比多个化合物的属性:
python
import pubchempy as pcp
import pandas as pd
compound_list = ['aspirin', 'ibuprofen', 'naproxen', 'celecoxib']
properties_list = []
for name in compound_list:
try:
compound = pcp.get_compounds(name, 'name')[0]
properties_list.append({
'名称': name,
'CID': compound.cid,
'分子式': compound.molecular_formula,
'分子量': compound.molecular_weight,
'LogP': compound.xlogp,
'TPSA': compound.tpsa,
'氢键供体数': compound.h_bond_donor_count,
'氢键受体数': compound.h_bond_acceptor_count
})
except Exception as e:
print(f"处理{name}时出错: {e}")
df = pd.DataFrame(properties_list)
print(df.to_string(index=False))Workflow 5: Substructure-Based Virtual Screening
工作流5:基于子结构的虚拟筛选
Screen for compounds containing specific pharmacophores:
python
import pubchempy as pcp筛选包含特定药效团的化合物:
python
import pubchempy as pcpDefine pharmacophore (e.g., sulfonamide group)
定义药效团(如磺酰胺基团)
pharmacophore_smiles = 'S(=O)(=O)N'
pharmacophore_smiles = 'S(=O)(=O)N'
Search for compounds containing this substructure
搜索包含该子结构的化合物
hits = pcp.get_compounds(
pharmacophore_smiles,
'smiles',
searchtype='substructure',
MaxRecords=100
)
hits = pcp.get_compounds(
pharmacophore_smiles,
'smiles',
searchtype='substructure',
MaxRecords=100
)
Further filter by properties
进一步按属性过滤
filtered_hits = [
comp for comp in hits
if comp.molecular_weight and comp.molecular_weight < 500
]
print(f"Found {len(filtered_hits)} compounds with desired substructure")
undefinedfiltered_hits = [
comp for comp in hits
if comp.molecular_weight and comp.molecular_weight < 500
]
print(f"找到{len(filtered_hits)}种含目标子结构的化合物")
undefinedReference Documentation
参考文档
For detailed API documentation, including complete property lists, URL patterns, advanced query options, and more examples, consult . This comprehensive reference includes:
references/api_reference.md- Complete PUG-REST API endpoint documentation
- Full list of available molecular properties
- Asynchronous request handling patterns
- PubChemPy API reference
- PUG-View API for annotations
- Common workflows and use cases
- Links to official PubChem documentation
如需详细的API文档,包括完整属性列表、URL模式、高级查询选项和更多示例,请参考。这份全面的参考文档包含:
references/api_reference.md- 完整的PUG-REST API端点文档
- 所有可用分子属性的完整列表
- 异步请求处理模式
- PubChemPy API参考
- 用于注释的PUG-View API
- 常见工作流与使用场景
- 官方PubChem文档链接
Troubleshooting
故障排除
Compound Not Found:
- Try alternative names or synonyms
- Use CID if known
- Check spelling and chemical name format
Timeout Errors:
- Reduce MaxRecords parameter
- Add delays between requests
- Use CIDs instead of names for faster queries
Empty Property Values:
- Not all properties are available for all compounds
- Check if property exists before accessing:
if compound.xlogp: - Some properties only available for certain compound types
Rate Limit Exceeded:
- Implement delays (0.2-0.3 seconds) between requests
- Use batch operations where possible
- Consider caching results locally
Similarity/Substructure Search Hangs:
- These are asynchronous operations that may take 15-30 seconds
- PubChemPy handles polling automatically
- Reduce MaxRecords if timing out
未找到化合物:
- 尝试使用替代名称或同义词
- 若已知CID则直接使用
- 检查拼写和化学名称格式
超时错误:
- 减小MaxRecords参数
- 在请求间添加延迟
- 使用CID替代名称以加快查询速度
属性值为空:
- 并非所有化合物都包含全部属性
- 访问前检查属性是否存在:
if compound.xlogp: - 部分属性仅适用于特定类型的化合物
超出速率限制:
- 在请求间添加0.2-0.3秒延迟
- 尽可能使用批量操作
- 考虑本地缓存结果
相似性/子结构搜索停滞:
- 这些是异步操作,可能需要15-30秒完成
- PubChemPy会自动处理轮询逻辑
- 若超时则减小MaxRecords参数
Additional Resources
额外资源
- PubChem Home: https://pubchem.ncbi.nlm.nih.gov/
- PUG-REST Documentation: https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest
- PUG-REST Tutorial: https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest-tutorial
- PubChemPy Documentation: https://pubchempy.readthedocs.io/
- PubChemPy GitHub: https://github.com/mcs07/PubChemPy
- PubChem主页: https://pubchem.ncbi.nlm.nih.gov/
- PUG-REST文档: https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest
- PUG-REST教程: https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest-tutorial
- PubChemPy文档: https://pubchempy.readthedocs.io/
- PubChemPy GitHub: https://github.com/mcs07/PubChemPy