pubchem-database

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PubChem Database

PubChem数据库

Overview

概述

PubChem is the world's largest freely available chemical database with 110M+ compounds and 270M+ bioactivities. Query chemical structures by name, CID, or SMILES, retrieve molecular properties, perform similarity and substructure searches, access bioactivity data using PUG-REST API and PubChemPy.
PubChem是全球最大的免费化学数据库,包含1.1亿+化合物和2.7亿+生物活性数据。可通过PUG-REST API和PubChemPy,按名称、CID或SMILES查询化学结构,获取分子属性,执行相似性和子结构搜索,访问生物活性数据。

When to Use This Skill

何时使用本技能

This skill should be used when:
  • Searching for chemical compounds by name, structure (SMILES/InChI), or molecular formula
  • Retrieving molecular properties (MW, LogP, TPSA, hydrogen bonding descriptors)
  • Performing similarity searches to find structurally related compounds
  • Conducting substructure searches for specific chemical motifs
  • Accessing bioactivity data from screening assays
  • Converting between chemical identifier formats (CID, SMILES, InChI)
  • Batch processing multiple compounds for drug-likeness screening or property analysis
当你需要以下操作时,可使用本技能:
  • 按名称、结构(SMILES/InChI)或分子式搜索化学化合物
  • 获取分子属性(分子量MW、脂水分配系数LogP、拓扑极性表面积TPSA、氢键描述符等)
  • 执行相似性搜索以找到结构相关的化合物
  • 针对特定化学基序执行子结构搜索
  • 从筛选试验中获取生物活性数据
  • 在不同化学标识符格式间转换(CID、SMILES、InChI)
  • 批量处理多个化合物,用于类药性筛选或属性分析

Core Capabilities

核心功能

1. Chemical Structure Search

1. 化学结构搜索

Search for compounds using multiple identifier types:
By Chemical Name:
python
import pubchempy as pcp
compounds = pcp.get_compounds('aspirin', 'name')
compound = compounds[0]
By CID (Compound ID):
python
compound = pcp.Compound.from_cid(2244)  # Aspirin
By SMILES:
python
compound = pcp.get_compounds('CC(=O)OC1=CC=CC=C1C(=O)O', 'smiles')[0]
By InChI:
python
compound = pcp.get_compounds('InChI=1S/C9H8O4/...', 'inchi')[0]
By Molecular Formula:
python
compounds = pcp.get_compounds('C9H8O4', 'formula')
支持通过多种标识符类型搜索化合物:
按化学名称:
python
import pubchempy as pcp
compounds = pcp.get_compounds('aspirin', 'name')
compound = compounds[0]
按CID(化合物ID):
python
compound = pcp.Compound.from_cid(2244)  # Aspirin
按SMILES:
python
compound = pcp.get_compounds('CC(=O)OC1=CC=CC=C1C(=O)O', 'smiles')[0]
按InChI:
python
compound = pcp.get_compounds('InChI=1S/C9H8O4/...', 'inchi')[0]
按分子式:
python
compounds = pcp.get_compounds('C9H8O4', 'formula')

Returns all compounds matching this formula

返回所有匹配该分子式的化合物

undefined
undefined

2. Property Retrieval

2. 属性检索

Retrieve molecular properties for compounds using either high-level or low-level approaches:
Using PubChemPy (Recommended):
python
import pubchempy as pcp
可通过高级或基础方式检索化合物的分子属性:
使用PubChemPy(推荐):
python
import pubchempy as pcp

Get compound object with all properties

获取包含所有属性的化合物对象

compound = pcp.get_compounds('caffeine', 'name')[0]
compound = pcp.get_compounds('caffeine', 'name')[0]

Access individual properties

访问单个属性

molecular_formula = compound.molecular_formula molecular_weight = compound.molecular_weight iupac_name = compound.iupac_name smiles = compound.canonical_smiles inchi = compound.inchi xlogp = compound.xlogp # Partition coefficient tpsa = compound.tpsa # Topological polar surface area

**Get Specific Properties**:
```python
molecular_formula = compound.molecular_formula molecular_weight = compound.molecular_weight iupac_name = compound.iupac_name smiles = compound.canonical_smiles inchi = compound.inchi xlogp = compound.xlogp # 分配系数 tpsa = compound.tpsa # 拓扑极性表面积

**获取特定属性**:
```python

Request only specific properties

仅请求特定属性

properties = pcp.get_properties( ['MolecularFormula', 'MolecularWeight', 'CanonicalSMILES', 'XLogP'], 'aspirin', 'name' )
properties = pcp.get_properties( ['MolecularFormula', 'MolecularWeight', 'CanonicalSMILES', 'XLogP'], 'aspirin', 'name' )

Returns list of dictionaries

返回字典列表


**Batch Property Retrieval**:
```python
import pandas as pd

compound_names = ['aspirin', 'ibuprofen', 'paracetamol']
all_properties = []

for name in compound_names:
    props = pcp.get_properties(
        ['MolecularFormula', 'MolecularWeight', 'XLogP'],
        name,
        'name'
    )
    all_properties.extend(props)

df = pd.DataFrame(all_properties)
Available Properties: MolecularFormula, MolecularWeight, CanonicalSMILES, IsomericSMILES, InChI, InChIKey, IUPACName, XLogP, TPSA, HBondDonorCount, HBondAcceptorCount, RotatableBondCount, Complexity, Charge, and many more (see
references/api_reference.md
for complete list).

**批量属性检索**:
```python
import pandas as pd

compound_names = ['aspirin', 'ibuprofen', 'paracetamol']
all_properties = []

for name in compound_names:
    props = pcp.get_properties(
        ['MolecularFormula', 'MolecularWeight', 'XLogP'],
        name,
        'name'
    )
    all_properties.extend(props)

df = pd.DataFrame(all_properties)
可用属性:MolecularFormula、MolecularWeight、CanonicalSMILES、IsomericSMILES、InChI、InChIKey、IUPACName、XLogP、TPSA、HBondDonorCount、HBondAcceptorCount、RotatableBondCount、Complexity、Charge等(完整列表请参考
references/api_reference.md
)。

3. Similarity Search

3. 相似性搜索

Find structurally similar compounds using Tanimoto similarity:
python
import pubchempy as pcp
使用Tanimoto相似性寻找结构相似的化合物:
python
import pubchempy as pcp

Start with a query compound

以目标化合物为查询起点

query_compound = pcp.get_compounds('gefitinib', 'name')[0] query_smiles = query_compound.canonical_smiles
query_compound = pcp.get_compounds('gefitinib', 'name')[0] query_smiles = query_compound.canonical_smiles

Perform similarity search

执行相似性搜索

similar_compounds = pcp.get_compounds( query_smiles, 'smiles', searchtype='similarity', Threshold=85, # Similarity threshold (0-100) MaxRecords=50 )
similar_compounds = pcp.get_compounds( query_smiles, 'smiles', searchtype='similarity', Threshold=85, # 相似性阈值(0-100) MaxRecords=50 )

Process results

处理结果

for compound in similar_compounds[:10]: print(f"CID {compound.cid}: {compound.iupac_name}") print(f" MW: {compound.molecular_weight}")

**Note**: Similarity searches are asynchronous for large queries and may take 15-30 seconds to complete. PubChemPy handles the asynchronous pattern automatically.
for compound in similar_compounds[:10]: print(f"CID {compound.cid}: {compound.iupac_name}") print(f" 分子量: {compound.molecular_weight}")

**注意**:大型查询的相似性搜索为异步操作,可能需要15-30秒完成。PubChemPy会自动处理异步逻辑。

4. Substructure Search

4. 子结构搜索

Find compounds containing a specific structural motif:
python
import pubchempy as pcp
寻找包含特定结构基序的化合物:
python
import pubchempy as pcp

Search for compounds containing pyridine ring

搜索包含吡啶环的化合物

pyridine_smiles = 'c1ccncc1'
matches = pcp.get_compounds( pyridine_smiles, 'smiles', searchtype='substructure', MaxRecords=100 )
print(f"Found {len(matches)} compounds containing pyridine")

**Common Substructures**:
- Benzene ring: `c1ccccc1`
- Pyridine: `c1ccncc1`
- Phenol: `c1ccc(O)cc1`
- Carboxylic acid: `C(=O)O`
pyridine_smiles = 'c1ccncc1'
matches = pcp.get_compounds( pyridine_smiles, 'smiles', searchtype='substructure', MaxRecords=100 )
print(f"找到{len(matches)}种含吡啶环的化合物")

**常见子结构**:
- 苯环: `c1ccccc1`
- 吡啶: `c1ccncc1`
- 苯酚: `c1ccc(O)cc1`
- 羧酸: `C(=O)O`

5. Format Conversion

5. 格式转换

Convert between different chemical structure formats:
python
import pubchempy as pcp

compound = pcp.get_compounds('aspirin', 'name')[0]
在不同化学结构格式间转换:
python
import pubchempy as pcp

compound = pcp.get_compounds('aspirin', 'name')[0]

Convert to different formats

转换为不同格式

smiles = compound.canonical_smiles inchi = compound.inchi inchikey = compound.inchikey cid = compound.cid
smiles = compound.canonical_smiles inchi = compound.inchi inchikey = compound.inchikey cid = compound.cid

Download structure files

下载结构文件

pcp.download('SDF', 'aspirin', 'name', 'aspirin.sdf', overwrite=True) pcp.download('JSON', '2244', 'cid', 'aspirin.json', overwrite=True)
undefined
pcp.download('SDF', 'aspirin', 'name', 'aspirin.sdf', overwrite=True) pcp.download('JSON', '2244', 'cid', 'aspirin.json', overwrite=True)
undefined

6. Structure Visualization

6. 结构可视化

Generate 2D structure images:
python
import pubchempy as pcp
生成2D结构图像:
python
import pubchempy as pcp

Download compound structure as PNG

下载化合物结构为PNG格式

pcp.download('PNG', 'caffeine', 'name', 'caffeine.png', overwrite=True)
pcp.download('PNG', 'caffeine', 'name', 'caffeine.png', overwrite=True)

Using direct URL (via requests)

通过直接URL获取(使用requests)

import requests
cid = 2244 # Aspirin url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/PNG?image_size=large" response = requests.get(url)
with open('structure.png', 'wb') as f: f.write(response.content)
undefined
import requests
cid = 2244 # 阿司匹林 url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/PNG?image_size=large" response = requests.get(url)
with open('structure.png', 'wb') as f: f.write(response.content)
undefined

7. Synonym Retrieval

7. 同义词检索

Get all known names and synonyms for a compound:
python
import pubchempy as pcp

synonyms_data = pcp.get_synonyms('aspirin', 'name')

if synonyms_data:
    cid = synonyms_data[0]['CID']
    synonyms = synonyms_data[0]['Synonym']

    print(f"CID {cid} has {len(synonyms)} synonyms:")
    for syn in synonyms[:10]:  # First 10
        print(f"  - {syn}")
获取化合物的所有已知名称和同义词:
python
import pubchempy as pcp

synonyms_data = pcp.get_synonyms('aspirin', 'name')

if synonyms_data:
    cid = synonyms_data[0]['CID']
    synonyms = synonyms_data[0]['Synonym']

    print(f"CID {cid}共有{len(synonyms)}个同义词:")
    for syn in synonyms[:10]:  # 显示前10个
        print(f"  - {syn}")

8. Bioactivity Data Access

8. 生物活性数据访问

Retrieve biological activity data from assays:
python
import requests
import json
从试验中检索生物活性数据:
python
import requests
import json

Get bioassay summary for a compound

获取化合物的生物试验摘要

response = requests.get(url) if response.status_code == 200: data = response.json() # Process bioassay information table = data.get('Table', {}) rows = table.get('Row', []) print(f"Found {len(rows)} bioassay records")

**For more complex bioactivity queries**, use the `scripts/bioactivity_query.py` helper script which provides:
- Bioassay summaries with activity outcome filtering
- Assay target identification
- Search for compounds by biological target
- Active compound lists for specific assays
response = requests.get(url) if response.status_code == 200: data = response.json() # 处理生物试验信息 table = data.get('Table', {}) rows = table.get('Row', []) print(f"找到{len(rows)}条生物试验记录")

**针对更复杂的生物活性查询**,可使用`scripts/bioactivity_query.py`辅助脚本,该脚本提供:
- 带活性结果过滤的生物试验摘要
- 试验靶点识别
- 按生物靶点搜索化合物
- 特定试验的活性化合物列表

9. Comprehensive Compound Annotations

9. 全面的化合物注释

Access detailed compound information through PUG-View:
python
import requests

cid = 2244
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/{cid}/JSON"

response = requests.get(url)
if response.status_code == 200:
    annotations = response.json()
    # Contains extensive data including:
    # - Chemical and Physical Properties
    # - Drug and Medication Information
    # - Pharmacology and Biochemistry
    # - Safety and Hazards
    # - Toxicity
    # - Literature references
    # - Patents
Get Specific Section:
python
undefined
通过PUG-View访问详细的化合物信息:
python
import requests

cid = 2244
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/{cid}/JSON"

response = requests.get(url)
if response.status_code == 200:
    annotations = response.json()
    # 包含以下丰富数据:
    # - 化学与物理属性
    # - 药物与用药信息
    # - 药理学与生物化学
    # - 安全与危害
    # - 毒性
    # - 文献参考
    # - 专利
获取特定章节:
python
undefined

Get only drug information

仅获取药物信息

Installation Requirements

安装要求

Install PubChemPy for Python-based access:
bash
uv pip install pubchempy
For direct API access and bioactivity queries:
bash
uv pip install requests
Optional for data analysis:
bash
uv pip install pandas
安装PubChemPy以实现基于Python的访问:
bash
uv pip install pubchempy
对于直接API访问和生物活性查询:
bash
uv pip install requests
数据分析可选依赖:
bash
uv pip install pandas

Helper Scripts

辅助脚本

This skill includes Python scripts for common PubChem tasks:
本技能包含用于常见PubChem任务的Python脚本:

scripts/compound_search.py

scripts/compound_search.py

Provides utility functions for searching and retrieving compound information:
Key Functions:
  • search_by_name(name, max_results=10)
    : Search compounds by name
  • search_by_smiles(smiles)
    : Search by SMILES string
  • get_compound_by_cid(cid)
    : Retrieve compound by CID
  • get_compound_properties(identifier, namespace, properties)
    : Get specific properties
  • similarity_search(smiles, threshold, max_records)
    : Perform similarity search
  • substructure_search(smiles, max_records)
    : Perform substructure search
  • get_synonyms(identifier, namespace)
    : Get all synonyms
  • batch_search(identifiers, namespace, properties)
    : Batch search multiple compounds
  • download_structure(identifier, namespace, format, filename)
    : Download structures
  • print_compound_info(compound)
    : Print formatted compound information
Usage:
python
from scripts.compound_search import search_by_name, get_compound_properties
提供用于搜索和检索化合物信息的工具函数:
核心函数:
  • search_by_name(name, max_results=10)
    : 按名称搜索化合物
  • search_by_smiles(smiles)
    : 按SMILES字符串搜索
  • get_compound_by_cid(cid)
    : 通过CID检索化合物
  • get_compound_properties(identifier, namespace, properties)
    : 获取特定属性
  • similarity_search(smiles, threshold, max_records)
    : 执行相似性搜索
  • substructure_search(smiles, max_records)
    : 执行子结构搜索
  • get_synonyms(identifier, namespace)
    : 获取所有同义词
  • batch_search(identifiers, namespace, properties)
    : 批量搜索多个化合物
  • download_structure(identifier, namespace, format, filename)
    : 下载结构文件
  • print_compound_info(compound)
    : 格式化打印化合物信息
使用示例:
python
from scripts.compound_search import search_by_name, get_compound_properties

Search for a compound

搜索化合物

compounds = search_by_name('ibuprofen')
compounds = search_by_name('ibuprofen')

Get specific properties

获取特定属性

props = get_compound_properties('aspirin', 'name', ['MolecularWeight', 'XLogP'])
undefined
props = get_compound_properties('aspirin', 'name', ['MolecularWeight', 'XLogP'])
undefined

scripts/bioactivity_query.py

scripts/bioactivity_query.py

Provides functions for retrieving biological activity data:
Key Functions:
  • get_bioassay_summary(cid)
    : Get bioassay summary for compound
  • get_compound_bioactivities(cid, activity_outcome)
    : Get filtered bioactivities
  • get_assay_description(aid)
    : Get detailed assay information
  • get_assay_targets(aid)
    : Get biological targets for assay
  • search_assays_by_target(target_name, max_results)
    : Find assays by target
  • get_active_compounds_in_assay(aid, max_results)
    : Get active compounds
  • get_compound_annotations(cid, section)
    : Get PUG-View annotations
  • summarize_bioactivities(cid)
    : Generate bioactivity summary statistics
  • find_compounds_by_bioactivity(target, threshold, max_compounds)
    : Find compounds by target
Usage:
python
from scripts.bioactivity_query import get_bioassay_summary, summarize_bioactivities
提供用于检索生物活性数据的函数:
核心函数:
  • get_bioassay_summary(cid)
    : 获取化合物的生物试验摘要
  • get_compound_bioactivities(cid, activity_outcome)
    : 获取过滤后的生物活性数据
  • get_assay_description(aid)
    : 获取详细的试验信息
  • get_assay_targets(aid)
    : 获取试验的生物靶点
  • search_assays_by_target(target_name, max_results)
    : 按靶点查找试验
  • get_active_compounds_in_assay(aid, max_results)
    : 获取试验中的活性化合物
  • get_compound_annotations(cid, section)
    : 获取PUG-View注释
  • summarize_bioactivities(cid)
    : 生成生物活性摘要统计
  • find_compounds_by_bioactivity(target, threshold, max_compounds)
    : 按靶点查找化合物
使用示例:
python
from scripts.bioactivity_query import get_bioassay_summary, summarize_bioactivities

Get bioactivity summary

获取生物活性摘要

summary = summarize_bioactivities(2244) # Aspirin print(f"Total assays: {summary['total_assays']}") print(f"Active: {summary['active']}, Inactive: {summary['inactive']}")
undefined
summary = summarize_bioactivities(2244) # 阿司匹林 print(f"总试验数: {summary['total_assays']}") print(f"活性: {summary['active']}, 非活性: {summary['inactive']}")
undefined

API Rate Limits and Best Practices

API速率限制与最佳实践

Rate Limits:
  • Maximum 5 requests per second
  • Maximum 400 requests per minute
  • Maximum 300 seconds running time per minute
Best Practices:
  1. Use CIDs for repeated queries: CIDs are more efficient than names or structures
  2. Cache results locally: Store frequently accessed data
  3. Batch requests: Combine multiple queries when possible
  4. Implement delays: Add 0.2-0.3 second delays between requests
  5. Handle errors gracefully: Check for HTTP errors and missing data
  6. Use PubChemPy: Higher-level abstraction handles many edge cases
  7. Leverage asynchronous pattern: For large similarity/substructure searches
  8. Specify MaxRecords: Limit results to avoid timeouts
Error Handling:
python
from pubchempy import BadRequestError, NotFoundError, TimeoutError

try:
    compound = pcp.get_compounds('query', 'name')[0]
except NotFoundError:
    print("Compound not found")
except BadRequestError:
    print("Invalid request format")
except TimeoutError:
    print("Request timed out - try reducing scope")
except IndexError:
    print("No results returned")
速率限制:
  • 每秒最多5次请求
  • 每分钟最多400次请求
  • 每分钟最多300秒运行时间
最佳实践:
  1. 重复查询使用CID: CID比名称或结构更高效
  2. 本地缓存结果: 存储频繁访问的数据
  3. 批量请求: 尽可能合并多个查询
  4. 添加延迟: 在请求间添加0.2-0.3秒延迟
  5. 优雅处理错误: 检查HTTP错误和缺失数据
  6. 使用PubChemPy: 高级抽象可处理许多边缘情况
  7. 利用异步模式: 针对大型相似性/子结构搜索
  8. 指定MaxRecords: 限制结果数量以避免超时
错误处理:
python
from pubchempy import BadRequestError, NotFoundError, TimeoutError

try:
    compound = pcp.get_compounds('query', 'name')[0]
except NotFoundError:
    print("未找到化合物")
except BadRequestError:
    print("请求格式无效")
except TimeoutError:
    print("请求超时 - 请尝试缩小查询范围")
except IndexError:
    print("未返回结果")

Common Workflows

常见工作流

Workflow 1: Chemical Identifier Conversion Pipeline

工作流1:化学标识符转换流水线

Convert between different chemical identifiers:
python
import pubchempy as pcp
在不同化学标识符格式间转换:
python
import pubchempy as pcp

Start with any identifier type

从任意标识符类型开始

compound = pcp.get_compounds('caffeine', 'name')[0]
compound = pcp.get_compounds('caffeine', 'name')[0]

Extract all identifier formats

提取所有标识符格式

identifiers = { 'CID': compound.cid, 'Name': compound.iupac_name, 'SMILES': compound.canonical_smiles, 'InChI': compound.inchi, 'InChIKey': compound.inchikey, 'Formula': compound.molecular_formula }
undefined
identifiers = { 'CID': compound.cid, '名称': compound.iupac_name, 'SMILES': compound.canonical_smiles, 'InChI': compound.inchi, 'InChIKey': compound.inchikey, '分子式': compound.molecular_formula }
undefined

Workflow 2: Drug-Like Property Screening

工作流2:类药性筛选

Screen compounds using Lipinski's Rule of Five:
python
import pubchempy as pcp

def check_drug_likeness(compound_name):
    compound = pcp.get_compounds(compound_name, 'name')[0]

    # Lipinski's Rule of Five
    rules = {
        'MW <= 500': compound.molecular_weight <= 500,
        'LogP <= 5': compound.xlogp <= 5 if compound.xlogp else None,
        'HBD <= 5': compound.h_bond_donor_count <= 5,
        'HBA <= 10': compound.h_bond_acceptor_count <= 10
    }

    violations = sum(1 for v in rules.values() if v is False)
    return rules, violations

rules, violations = check_drug_likeness('aspirin')
print(f"Lipinski violations: {violations}")
使用Lipinski五规则筛选化合物:
python
import pubchempy as pcp

def check_drug_likeness(compound_name):
    compound = pcp.get_compounds(compound_name, 'name')[0]

    # Lipinski五规则
    rules = {
        '分子量 ≤ 500': compound.molecular_weight <= 500,
        'LogP ≤ 5': compound.xlogp <= 5 if compound.xlogp else None,
        '氢键供体 ≤ 5': compound.h_bond_donor_count <= 5,
        '氢键受体 ≤ 10': compound.h_bond_acceptor_count <= 10
    }

    violations = sum(1 for v in rules.values() if v is False)
    return rules, violations

rules, violations = check_drug_likeness('aspirin')
print(f"Lipinski规则违反次数: {violations}")

Workflow 3: Finding Similar Drug Candidates

工作流3:寻找相似药物候选物

Identify structurally similar compounds to a known drug:
python
import pubchempy as pcp
识别与已知药物结构相似的化合物:
python
import pubchempy as pcp

Start with known drug

以已知药物为起点

reference_drug = pcp.get_compounds('imatinib', 'name')[0] reference_smiles = reference_drug.canonical_smiles
reference_drug = pcp.get_compounds('imatinib', 'name')[0] reference_smiles = reference_drug.canonical_smiles

Find similar compounds

寻找相似化合物

similar = pcp.get_compounds( reference_smiles, 'smiles', searchtype='similarity', Threshold=85, MaxRecords=20 )
similar = pcp.get_compounds( reference_smiles, 'smiles', searchtype='similarity', Threshold=85, MaxRecords=20 )

Filter by drug-like properties

按类药性过滤候选物

candidates = [] for comp in similar: if comp.molecular_weight and 200 <= comp.molecular_weight <= 600: if comp.xlogp and -1 <= comp.xlogp <= 5: candidates.append(comp)
print(f"Found {len(candidates)} drug-like candidates")
undefined
candidates = [] for comp in similar: if comp.molecular_weight and 200 <= comp.molecular_weight <= 600: if comp.xlogp and -1 <= comp.xlogp <= 5: candidates.append(comp)
print(f"找到{len(candidates)}种类药性候选物")
undefined

Workflow 4: Batch Compound Property Comparison

工作流4:批量化合物属性对比

Compare properties across multiple compounds:
python
import pubchempy as pcp
import pandas as pd

compound_list = ['aspirin', 'ibuprofen', 'naproxen', 'celecoxib']

properties_list = []
for name in compound_list:
    try:
        compound = pcp.get_compounds(name, 'name')[0]
        properties_list.append({
            'Name': name,
            'CID': compound.cid,
            'Formula': compound.molecular_formula,
            'MW': compound.molecular_weight,
            'LogP': compound.xlogp,
            'TPSA': compound.tpsa,
            'HBD': compound.h_bond_donor_count,
            'HBA': compound.h_bond_acceptor_count
        })
    except Exception as e:
        print(f"Error processing {name}: {e}")

df = pd.DataFrame(properties_list)
print(df.to_string(index=False))
对比多个化合物的属性:
python
import pubchempy as pcp
import pandas as pd

compound_list = ['aspirin', 'ibuprofen', 'naproxen', 'celecoxib']

properties_list = []
for name in compound_list:
    try:
        compound = pcp.get_compounds(name, 'name')[0]
        properties_list.append({
            '名称': name,
            'CID': compound.cid,
            '分子式': compound.molecular_formula,
            '分子量': compound.molecular_weight,
            'LogP': compound.xlogp,
            'TPSA': compound.tpsa,
            '氢键供体数': compound.h_bond_donor_count,
            '氢键受体数': compound.h_bond_acceptor_count
        })
    except Exception as e:
        print(f"处理{name}时出错: {e}")

df = pd.DataFrame(properties_list)
print(df.to_string(index=False))

Workflow 5: Substructure-Based Virtual Screening

工作流5:基于子结构的虚拟筛选

Screen for compounds containing specific pharmacophores:
python
import pubchempy as pcp
筛选包含特定药效团的化合物:
python
import pubchempy as pcp

Define pharmacophore (e.g., sulfonamide group)

定义药效团(如磺酰胺基团)

pharmacophore_smiles = 'S(=O)(=O)N'
pharmacophore_smiles = 'S(=O)(=O)N'

Search for compounds containing this substructure

搜索包含该子结构的化合物

hits = pcp.get_compounds( pharmacophore_smiles, 'smiles', searchtype='substructure', MaxRecords=100 )
hits = pcp.get_compounds( pharmacophore_smiles, 'smiles', searchtype='substructure', MaxRecords=100 )

Further filter by properties

进一步按属性过滤

filtered_hits = [ comp for comp in hits if comp.molecular_weight and comp.molecular_weight < 500 ]
print(f"Found {len(filtered_hits)} compounds with desired substructure")
undefined
filtered_hits = [ comp for comp in hits if comp.molecular_weight and comp.molecular_weight < 500 ]
print(f"找到{len(filtered_hits)}种含目标子结构的化合物")
undefined

Reference Documentation

参考文档

For detailed API documentation, including complete property lists, URL patterns, advanced query options, and more examples, consult
references/api_reference.md
. This comprehensive reference includes:
  • Complete PUG-REST API endpoint documentation
  • Full list of available molecular properties
  • Asynchronous request handling patterns
  • PubChemPy API reference
  • PUG-View API for annotations
  • Common workflows and use cases
  • Links to official PubChem documentation
如需详细的API文档,包括完整属性列表、URL模式、高级查询选项和更多示例,请参考
references/api_reference.md
。这份全面的参考文档包含:
  • 完整的PUG-REST API端点文档
  • 所有可用分子属性的完整列表
  • 异步请求处理模式
  • PubChemPy API参考
  • 用于注释的PUG-View API
  • 常见工作流与使用场景
  • 官方PubChem文档链接

Troubleshooting

故障排除

Compound Not Found:
  • Try alternative names or synonyms
  • Use CID if known
  • Check spelling and chemical name format
Timeout Errors:
  • Reduce MaxRecords parameter
  • Add delays between requests
  • Use CIDs instead of names for faster queries
Empty Property Values:
  • Not all properties are available for all compounds
  • Check if property exists before accessing:
    if compound.xlogp:
  • Some properties only available for certain compound types
Rate Limit Exceeded:
  • Implement delays (0.2-0.3 seconds) between requests
  • Use batch operations where possible
  • Consider caching results locally
Similarity/Substructure Search Hangs:
  • These are asynchronous operations that may take 15-30 seconds
  • PubChemPy handles polling automatically
  • Reduce MaxRecords if timing out
未找到化合物:
  • 尝试使用替代名称或同义词
  • 若已知CID则直接使用
  • 检查拼写和化学名称格式
超时错误:
  • 减小MaxRecords参数
  • 在请求间添加延迟
  • 使用CID替代名称以加快查询速度
属性值为空:
  • 并非所有化合物都包含全部属性
  • 访问前检查属性是否存在:
    if compound.xlogp:
  • 部分属性仅适用于特定类型的化合物
超出速率限制:
  • 在请求间添加0.2-0.3秒延迟
  • 尽可能使用批量操作
  • 考虑本地缓存结果
相似性/子结构搜索停滞:
  • 这些是异步操作,可能需要15-30秒完成
  • PubChemPy会自动处理轮询逻辑
  • 若超时则减小MaxRecords参数

Additional Resources

额外资源