pubchem-database

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

PubChem Database

PubChem数据库

Overview

概述

PubChem is the world's largest freely available chemical database with 110M+ compounds and 270M+ bioactivities. Query chemical structures by name, CID, or SMILES, retrieve molecular properties, perform similarity and substructure searches, access bioactivity data using PUG-REST API and PubChemPy.

PubChem是全球最大的免费化学数据库，包含1.1亿+化合物和2.7亿+生物活性数据。可通过PUG-REST API和PubChemPy，按名称、CID或SMILES查询化学结构，获取分子属性，执行相似性和子结构搜索，访问生物活性数据。

When to Use This Skill

何时使用本技能

This skill should be used when:

Searching for chemical compounds by name, structure (SMILES/InChI), or molecular formula
Retrieving molecular properties (MW, LogP, TPSA, hydrogen bonding descriptors)
Performing similarity searches to find structurally related compounds
Conducting substructure searches for specific chemical motifs
Accessing bioactivity data from screening assays
Converting between chemical identifier formats (CID, SMILES, InChI)
Batch processing multiple compounds for drug-likeness screening or property analysis

当你需要以下操作时，可使用本技能：

按名称、结构（SMILES/InChI）或分子式搜索化学化合物
获取分子属性（分子量MW、脂水分配系数LogP、拓扑极性表面积TPSA、氢键描述符等）
执行相似性搜索以找到结构相关的化合物
针对特定化学基序执行子结构搜索
从筛选试验中获取生物活性数据
在不同化学标识符格式间转换（CID、SMILES、InChI）
批量处理多个化合物，用于类药性筛选或属性分析

Core Capabilities

核心功能

1. Chemical Structure Search

1. 化学结构搜索

Search for compounds using multiple identifier types:

By Chemical Name:

python

import pubchempy as pcp
compounds = pcp.get_compounds('aspirin', 'name')
compound = compounds[0]

By CID (Compound ID):

python

compound = pcp.Compound.from_cid(2244)  # Aspirin

By SMILES:

python

compound = pcp.get_compounds('CC(=O)OC1=CC=CC=C1C(=O)O', 'smiles')[0]

By InChI:

python

compound = pcp.get_compounds('InChI=1S/C9H8O4/...', 'inchi')[0]

By Molecular Formula:

python

compounds = pcp.get_compounds('C9H8O4', 'formula')

支持通过多种标识符类型搜索化合物：

按化学名称:

python

import pubchempy as pcp
compounds = pcp.get_compounds('aspirin', 'name')
compound = compounds[0]

按CID（化合物ID）:

python

compound = pcp.Compound.from_cid(2244)  # Aspirin

按SMILES:

python

compound = pcp.get_compounds('CC(=O)OC1=CC=CC=C1C(=O)O', 'smiles')[0]

按InChI:

python

compound = pcp.get_compounds('InChI=1S/C9H8O4/...', 'inchi')[0]

按分子式:

python

compounds = pcp.get_compounds('C9H8O4', 'formula')

Returns all compounds matching this formula

返回所有匹配该分子式的化合物

undefined

undefined

2. Property Retrieval

2. 属性检索

Retrieve molecular properties for compounds using either high-level or low-level approaches:

Using PubChemPy (Recommended):

python

import pubchempy as pcp

可通过高级或基础方式检索化合物的分子属性：

使用PubChemPy（推荐）:

python

import pubchempy as pcp

Get compound object with all properties

获取包含所有属性的化合物对象

compound = pcp.get_compounds('caffeine', 'name')[0]

Access individual properties

访问单个属性

molecular_formula = compound.molecular_formula molecular_weight = compound.molecular_weight iupac_name = compound.iupac_name smiles = compound.canonical_smiles inchi = compound.inchi xlogp = compound.xlogp # Partition coefficient tpsa = compound.tpsa # Topological polar surface area


**Get Specific Properties**:
```python

molecular_formula = compound.molecular_formula molecular_weight = compound.molecular_weight iupac_name = compound.iupac_name smiles = compound.canonical_smiles inchi = compound.inchi xlogp = compound.xlogp # 分配系数 tpsa = compound.tpsa # 拓扑极性表面积


**获取特定属性**:
```python

Request only specific properties

仅请求特定属性

properties = pcp.get_properties( ['MolecularFormula', 'MolecularWeight', 'CanonicalSMILES', 'XLogP'], 'aspirin', 'name' )

Returns list of dictionaries

返回字典列表


**Batch Property Retrieval**:
```python
import pandas as pd

compound_names = ['aspirin', 'ibuprofen', 'paracetamol']
all_properties = []

for name in compound_names:
    props = pcp.get_properties(
        ['MolecularFormula', 'MolecularWeight', 'XLogP'],
        name,
        'name'
    )
    all_properties.extend(props)

df = pd.DataFrame(all_properties)

Available Properties: MolecularFormula, MolecularWeight, CanonicalSMILES, IsomericSMILES, InChI, InChIKey, IUPACName, XLogP, TPSA, HBondDonorCount, HBondAcceptorCount, RotatableBondCount, Complexity, Charge, and many more (see

references/api_reference.md

for complete list).


**批量属性检索**:
```python
import pandas as pd

compound_names = ['aspirin', 'ibuprofen', 'paracetamol']
all_properties = []

for name in compound_names:
    props = pcp.get_properties(
        ['MolecularFormula', 'MolecularWeight', 'XLogP'],
        name,
        'name'
    )
    all_properties.extend(props)

df = pd.DataFrame(all_properties)

可用属性：MolecularFormula、MolecularWeight、CanonicalSMILES、IsomericSMILES、InChI、InChIKey、IUPACName、XLogP、TPSA、HBondDonorCount、HBondAcceptorCount、RotatableBondCount、Complexity、Charge等（完整列表请参考

references/api_reference.md

）。

3. Similarity Search

3. 相似性搜索

Find structurally similar compounds using Tanimoto similarity:

python

import pubchempy as pcp

使用Tanimoto相似性寻找结构相似的化合物：

python

import pubchempy as pcp

Start with a query compound

以目标化合物为查询起点

query_compound = pcp.get_compounds('gefitinib', 'name')[0] query_smiles = query_compound.canonical_smiles

Perform similarity search

执行相似性搜索

similar_compounds = pcp.get_compounds( query_smiles, 'smiles', searchtype='similarity', Threshold=85, # Similarity threshold (0-100) MaxRecords=50 )

similar_compounds = pcp.get_compounds( query_smiles, 'smiles', searchtype='similarity', Threshold=85, # 相似性阈值（0-100） MaxRecords=50 )

Process results

处理结果

for compound in similar_compounds[:10]: print(f"CID {compound.cid}: {compound.iupac_name}") print(f" MW: {compound.molecular_weight}")


**Note**: Similarity searches are asynchronous for large queries and may take 15-30 seconds to complete. PubChemPy handles the asynchronous pattern automatically.

for compound in similar_compounds[:10]: print(f"CID {compound.cid}: {compound.iupac_name}") print(f" 分子量: {compound.molecular_weight}")


**注意**：大型查询的相似性搜索为异步操作，可能需要15-30秒完成。PubChemPy会自动处理异步逻辑。

4. Substructure Search

4. 子结构搜索

Find compounds containing a specific structural motif:

python

import pubchempy as pcp

寻找包含特定结构基序的化合物：

python

import pubchempy as pcp

Search for compounds containing pyridine ring

搜索包含吡啶环的化合物

pyridine_smiles = 'c1ccncc1'

matches = pcp.get_compounds( pyridine_smiles, 'smiles', searchtype='substructure', MaxRecords=100 )

print(f"Found {len(matches)} compounds containing pyridine")


**Common Substructures**:
- Benzene ring: `c1ccccc1`
- Pyridine: `c1ccncc1`
- Phenol: `c1ccc(O)cc1`
- Carboxylic acid: `C(=O)O`

pyridine_smiles = 'c1ccncc1'

matches = pcp.get_compounds( pyridine_smiles, 'smiles', searchtype='substructure', MaxRecords=100 )

print(f"找到{len(matches)}种含吡啶环的化合物")


**常见子结构**:
- 苯环: `c1ccccc1`
- 吡啶: `c1ccncc1`
- 苯酚: `c1ccc(O)cc1`
- 羧酸: `C(=O)O`

5. Format Conversion

5. 格式转换

Convert between different chemical structure formats:

python

import pubchempy as pcp

compound = pcp.get_compounds('aspirin', 'name')[0]

在不同化学结构格式间转换：

python

import pubchempy as pcp

compound = pcp.get_compounds('aspirin', 'name')[0]

Convert to different formats

转换为不同格式

smiles = compound.canonical_smiles inchi = compound.inchi inchikey = compound.inchikey cid = compound.cid

Download structure files

下载结构文件

pcp.download('SDF', 'aspirin', 'name', 'aspirin.sdf', overwrite=True) pcp.download('JSON', '2244', 'cid', 'aspirin.json', overwrite=True)

undefined

pcp.download('SDF', 'aspirin', 'name', 'aspirin.sdf', overwrite=True) pcp.download('JSON', '2244', 'cid', 'aspirin.json', overwrite=True)

undefined

6. Structure Visualization

6. 结构可视化

Generate 2D structure images:

python

import pubchempy as pcp

生成2D结构图像：

python

import pubchempy as pcp

Download compound structure as PNG

下载化合物结构为PNG格式

pcp.download('PNG', 'caffeine', 'name', 'caffeine.png', overwrite=True)

Using direct URL (via requests)

通过直接URL获取（使用requests）

import requests

cid = 2244 # Aspirin url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/PNG?image_size=large" response = requests.get(url)

with open('structure.png', 'wb') as f: f.write(response.content)

undefined

import requests

cid = 2244 # 阿司匹林 url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/PNG?image_size=large" response = requests.get(url)

with open('structure.png', 'wb') as f: f.write(response.content)

undefined

7. Synonym Retrieval

7. 同义词检索

Get all known names and synonyms for a compound:

python

import pubchempy as pcp

synonyms_data = pcp.get_synonyms('aspirin', 'name')

if synonyms_data:
    cid = synonyms_data[0]['CID']
    synonyms = synonyms_data[0]['Synonym']

    print(f"CID {cid} has {len(synonyms)} synonyms:")
    for syn in synonyms[:10]:  # First 10
        print(f"  - {syn}")

获取化合物的所有已知名称和同义词：

python

import pubchempy as pcp

synonyms_data = pcp.get_synonyms('aspirin', 'name')

if synonyms_data:
    cid = synonyms_data[0]['CID']
    synonyms = synonyms_data[0]['Synonym']

    print(f"CID {cid}共有{len(synonyms)}个同义词:")
    for syn in synonyms[:10]:  # 显示前10个
        print(f"  - {syn}")

8. Bioactivity Data Access

8. 生物活性数据访问

Retrieve biological activity data from assays:

python

import requests
import json

从试验中检索生物活性数据：

python

import requests
import json

Get bioassay summary for a compound

获取化合物的生物试验摘要

cid = 2244 # Aspirin url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/assaysummary/JSON"

response = requests.get(url) if response.status_code == 200: data = response.json() # Process bioassay information table = data.get('Table', {}) rows = table.get('Row', []) print(f"Found {len(rows)} bioassay records")


**For more complex bioactivity queries**, use the `scripts/bioactivity_query.py` helper script which provides:
- Bioassay summaries with activity outcome filtering
- Assay target identification
- Search for compounds by biological target
- Active compound lists for specific assays

cid = 2244 # 阿司匹林 url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/assaysummary/JSON"

response = requests.get(url) if response.status_code == 200: data = response.json() # 处理生物试验信息 table = data.get('Table', {}) rows = table.get('Row', []) print(f"找到{len(rows)}条生物试验记录")


**针对更复杂的生物活性查询**，可使用`scripts/bioactivity_query.py`辅助脚本，该脚本提供：
- 带活性结果过滤的生物试验摘要
- 试验靶点识别
- 按生物靶点搜索化合物
- 特定试验的活性化合物列表

9. Comprehensive Compound Annotations

9. 全面的化合物注释

Access detailed compound information through PUG-View:

python

import requests

cid = 2244
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/{cid}/JSON"

response = requests.get(url)
if response.status_code == 200:
    annotations = response.json()
    # Contains extensive data including:
    # - Chemical and Physical Properties
    # - Drug and Medication Information
    # - Pharmacology and Biochemistry
    # - Safety and Hazards
    # - Toxicity
    # - Literature references
    # - Patents

Get Specific Section:

python

undefined

通过PUG-View访问详细的化合物信息：

python

import requests

cid = 2244
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/{cid}/JSON"

response = requests.get(url)
if response.status_code == 200:
    annotations = response.json()
    # 包含以下丰富数据：
    # - 化学与物理属性
    # - 药物与用药信息
    # - 药理学与生物化学
    # - 安全与危害
    # - 毒性
    # - 文献参考
    # - 专利

获取特定章节:

python

undefined

Get only drug information

仅获取药物信息

url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/{cid}/JSON?heading=Drug and Medication Information"

undefined

url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/{cid}/JSON?heading=Drug and Medication Information"

undefined

Installation Requirements

安装要求

Install PubChemPy for Python-based access:

bash

uv pip install pubchempy

For direct API access and bioactivity queries:

bash

uv pip install requests

Optional for data analysis:

bash

uv pip install pandas

安装PubChemPy以实现基于Python的访问：

bash

uv pip install pubchempy

对于直接API访问和生物活性查询：

bash

uv pip install requests

数据分析可选依赖：

bash

uv pip install pandas

Helper Scripts

辅助脚本

This skill includes Python scripts for common PubChem tasks:

本技能包含用于常见PubChem任务的Python脚本：

scripts/compound_search.py

Provides utility functions for searching and retrieving compound information:

Key Functions:

```
search_by_name(name, max_results=10)
```
: Search compounds by name
```
search_by_smiles(smiles)
```
: Search by SMILES string
```
get_compound_by_cid(cid)
```
: Retrieve compound by CID

get_compound_properties(identifier, namespace, properties)

: Get specific properties

similarity_search(smiles, threshold, max_records)

: Perform similarity search

substructure_search(smiles, max_records)

: Perform substructure search

```
get_synonyms(identifier, namespace)
```
: Get all synonyms

batch_search(identifiers, namespace, properties)

: Batch search multiple compounds

download_structure(identifier, namespace, format, filename)

: Download structures

```
print_compound_info(compound)
```
: Print formatted compound information

Usage:

python

from scripts.compound_search import search_by_name, get_compound_properties

提供用于搜索和检索化合物信息的工具函数：

核心函数:

```
search_by_name(name, max_results=10)
```
: 按名称搜索化合物
```
search_by_smiles(smiles)
```
: 按SMILES字符串搜索
```
get_compound_by_cid(cid)
```
: 通过CID检索化合物

get_compound_properties(identifier, namespace, properties)

: 获取特定属性

similarity_search(smiles, threshold, max_records)

: 执行相似性搜索

substructure_search(smiles, max_records)

: 执行子结构搜索

```
get_synonyms(identifier, namespace)
```
: 获取所有同义词

batch_search(identifiers, namespace, properties)

: 批量搜索多个化合物

download_structure(identifier, namespace, format, filename)

: 下载结构文件

```
print_compound_info(compound)
```
: 格式化打印化合物信息

使用示例:

python

from scripts.compound_search import search_by_name, get_compound_properties

Search for a compound

搜索化合物

compounds = search_by_name('ibuprofen')

Get specific properties

获取特定属性

props = get_compound_properties('aspirin', 'name', ['MolecularWeight', 'XLogP'])

undefined

props = get_compound_properties('aspirin', 'name', ['MolecularWeight', 'XLogP'])

undefined

scripts/bioactivity_query.py

Provides functions for retrieving biological activity data:

Key Functions:

```
get_bioassay_summary(cid)
```
: Get bioassay summary for compound

get_compound_bioactivities(cid, activity_outcome)

: Get filtered bioactivities

```
get_assay_description(aid)
```
: Get detailed assay information
```
get_assay_targets(aid)
```
: Get biological targets for assay

search_assays_by_target(target_name, max_results)

: Find assays by target

get_active_compounds_in_assay(aid, max_results)

: Get active compounds

```
get_compound_annotations(cid, section)
```
: Get PUG-View annotations
```
summarize_bioactivities(cid)
```
: Generate bioactivity summary statistics

find_compounds_by_bioactivity(target, threshold, max_compounds)

: Find compounds by target

Usage:

python

from scripts.bioactivity_query import get_bioassay_summary, summarize_bioactivities

提供用于检索生物活性数据的函数：

核心函数:

```
get_bioassay_summary(cid)
```
: 获取化合物的生物试验摘要

get_compound_bioactivities(cid, activity_outcome)

: 获取过滤后的生物活性数据

```
get_assay_description(aid)
```
: 获取详细的试验信息
```
get_assay_targets(aid)
```
: 获取试验的生物靶点

search_assays_by_target(target_name, max_results)

: 按靶点查找试验

get_active_compounds_in_assay(aid, max_results)

: 获取试验中的活性化合物

```
get_compound_annotations(cid, section)
```
: 获取PUG-View注释
```
summarize_bioactivities(cid)
```
: 生成生物活性摘要统计

find_compounds_by_bioactivity(target, threshold, max_compounds)

: 按靶点查找化合物

使用示例:

python

from scripts.bioactivity_query import get_bioassay_summary, summarize_bioactivities

Get bioactivity summary

获取生物活性摘要

summary = summarize_bioactivities(2244) # Aspirin print(f"Total assays: {summary['total_assays']}") print(f"Active: {summary['active']}, Inactive: {summary['inactive']}")

undefined

summary = summarize_bioactivities(2244) # 阿司匹林 print(f"总试验数: {summary['total_assays']}") print(f"活性: {summary['active']}, 非活性: {summary['inactive']}")

undefined

API Rate Limits and Best Practices

API速率限制与最佳实践

Rate Limits:

Maximum 5 requests per second
Maximum 400 requests per minute
Maximum 300 seconds running time per minute

Best Practices:

Use CIDs for repeated queries: CIDs are more efficient than names or structures
Cache results locally: Store frequently accessed data
Batch requests: Combine multiple queries when possible
Implement delays: Add 0.2-0.3 second delays between requests
Handle errors gracefully: Check for HTTP errors and missing data
Use PubChemPy: Higher-level abstraction handles many edge cases
Leverage asynchronous pattern: For large similarity/substructure searches
Specify MaxRecords: Limit results to avoid timeouts

Error Handling:

python

from pubchempy import BadRequestError, NotFoundError, TimeoutError

try:
    compound = pcp.get_compounds('query', 'name')[0]
except NotFoundError:
    print("Compound not found")
except BadRequestError:
    print("Invalid request format")
except TimeoutError:
    print("Request timed out - try reducing scope")
except IndexError:
    print("No results returned")

速率限制:

每秒最多5次请求
每分钟最多400次请求
每分钟最多300秒运行时间

最佳实践:

重复查询使用CID: CID比名称或结构更高效
本地缓存结果: 存储频繁访问的数据
批量请求: 尽可能合并多个查询
添加延迟: 在请求间添加0.2-0.3秒延迟
优雅处理错误: 检查HTTP错误和缺失数据
使用PubChemPy: 高级抽象可处理许多边缘情况
利用异步模式: 针对大型相似性/子结构搜索
指定MaxRecords: 限制结果数量以避免超时

错误处理:

python

from pubchempy import BadRequestError, NotFoundError, TimeoutError

try:
    compound = pcp.get_compounds('query', 'name')[0]
except NotFoundError:
    print("未找到化合物")
except BadRequestError:
    print("请求格式无效")
except TimeoutError:
    print("请求超时 - 请尝试缩小查询范围")
except IndexError:
    print("未返回结果")

Common Workflows

常见工作流

Workflow 1: Chemical Identifier Conversion Pipeline

工作流1：化学标识符转换流水线

Convert between different chemical identifiers:

python

import pubchempy as pcp

在不同化学标识符格式间转换：

python

import pubchempy as pcp

Start with any identifier type

从任意标识符类型开始

compound = pcp.get_compounds('caffeine', 'name')[0]

Extract all identifier formats

提取所有标识符格式

identifiers = { 'CID': compound.cid, 'Name': compound.iupac_name, 'SMILES': compound.canonical_smiles, 'InChI': compound.inchi, 'InChIKey': compound.inchikey, 'Formula': compound.molecular_formula }

undefined

identifiers = { 'CID': compound.cid, '名称': compound.iupac_name, 'SMILES': compound.canonical_smiles, 'InChI': compound.inchi, 'InChIKey': compound.inchikey, '分子式': compound.molecular_formula }

undefined

Workflow 2: Drug-Like Property Screening

工作流2：类药性筛选

Screen compounds using Lipinski's Rule of Five:

python

import pubchempy as pcp

def check_drug_likeness(compound_name):
    compound = pcp.get_compounds(compound_name, 'name')[0]

    # Lipinski's Rule of Five
    rules = {
        'MW <= 500': compound.molecular_weight <= 500,
        'LogP <= 5': compound.xlogp <= 5 if compound.xlogp else None,
        'HBD <= 5': compound.h_bond_donor_count <= 5,
        'HBA <= 10': compound.h_bond_acceptor_count <= 10
    }

    violations = sum(1 for v in rules.values() if v is False)
    return rules, violations

rules, violations = check_drug_likeness('aspirin')
print(f"Lipinski violations: {violations}")

使用Lipinski五规则筛选化合物：

python

import pubchempy as pcp

def check_drug_likeness(compound_name):
    compound = pcp.get_compounds(compound_name, 'name')[0]

    # Lipinski五规则
    rules = {
        '分子量 ≤ 500': compound.molecular_weight <= 500,
        'LogP ≤ 5': compound.xlogp <= 5 if compound.xlogp else None,
        '氢键供体 ≤ 5': compound.h_bond_donor_count <= 5,
        '氢键受体 ≤ 10': compound.h_bond_acceptor_count <= 10
    }

    violations = sum(1 for v in rules.values() if v is False)
    return rules, violations

rules, violations = check_drug_likeness('aspirin')
print(f"Lipinski规则违反次数: {violations}")

Workflow 3: Finding Similar Drug Candidates

工作流3：寻找相似药物候选物

Identify structurally similar compounds to a known drug:

python

import pubchempy as pcp

识别与已知药物结构相似的化合物：

python

import pubchempy as pcp

Start with known drug

以已知药物为起点

reference_drug = pcp.get_compounds('imatinib', 'name')[0] reference_smiles = reference_drug.canonical_smiles

Find similar compounds

寻找相似化合物

similar = pcp.get_compounds( reference_smiles, 'smiles', searchtype='similarity', Threshold=85, MaxRecords=20 )

Filter by drug-like properties

按类药性过滤候选物

candidates = [] for comp in similar: if comp.molecular_weight and 200 <= comp.molecular_weight <= 600: if comp.xlogp and -1 <= comp.xlogp <= 5: candidates.append(comp)

print(f"Found {len(candidates)} drug-like candidates")

undefined

candidates = [] for comp in similar: if comp.molecular_weight and 200 <= comp.molecular_weight <= 600: if comp.xlogp and -1 <= comp.xlogp <= 5: candidates.append(comp)

print(f"找到{len(candidates)}种类药性候选物")

undefined

Workflow 4: Batch Compound Property Comparison

工作流4：批量化合物属性对比

Compare properties across multiple compounds:

python

import pubchempy as pcp
import pandas as pd

compound_list = ['aspirin', 'ibuprofen', 'naproxen', 'celecoxib']

properties_list = []
for name in compound_list:
    try:
        compound = pcp.get_compounds(name, 'name')[0]
        properties_list.append({
            'Name': name,
            'CID': compound.cid,
            'Formula': compound.molecular_formula,
            'MW': compound.molecular_weight,
            'LogP': compound.xlogp,
            'TPSA': compound.tpsa,
            'HBD': compound.h_bond_donor_count,
            'HBA': compound.h_bond_acceptor_count
        })
    except Exception as e:
        print(f"Error processing {name}: {e}")

df = pd.DataFrame(properties_list)
print(df.to_string(index=False))

对比多个化合物的属性：

python

import pubchempy as pcp
import pandas as pd

compound_list = ['aspirin', 'ibuprofen', 'naproxen', 'celecoxib']

properties_list = []
for name in compound_list:
    try:
        compound = pcp.get_compounds(name, 'name')[0]
        properties_list.append({
            '名称': name,
            'CID': compound.cid,
            '分子式': compound.molecular_formula,
            '分子量': compound.molecular_weight,
            'LogP': compound.xlogp,
            'TPSA': compound.tpsa,
            '氢键供体数': compound.h_bond_donor_count,
            '氢键受体数': compound.h_bond_acceptor_count
        })
    except Exception as e:
        print(f"处理{name}时出错: {e}")

df = pd.DataFrame(properties_list)
print(df.to_string(index=False))

Workflow 5: Substructure-Based Virtual Screening

工作流5：基于子结构的虚拟筛选

Screen for compounds containing specific pharmacophores:

python

import pubchempy as pcp

筛选包含特定药效团的化合物：

python

import pubchempy as pcp

Define pharmacophore (e.g., sulfonamide group)

定义药效团（如磺酰胺基团）

pharmacophore_smiles = 'S(=O)(=O)N'

Search for compounds containing this substructure

搜索包含该子结构的化合物

hits = pcp.get_compounds( pharmacophore_smiles, 'smiles', searchtype='substructure', MaxRecords=100 )

Further filter by properties

进一步按属性过滤

filtered_hits = [ comp for comp in hits if comp.molecular_weight and comp.molecular_weight < 500 ]

print(f"Found {len(filtered_hits)} compounds with desired substructure")

undefined

filtered_hits = [ comp for comp in hits if comp.molecular_weight and comp.molecular_weight < 500 ]

print(f"找到{len(filtered_hits)}种含目标子结构的化合物")

undefined

Reference Documentation

参考文档

For detailed API documentation, including complete property lists, URL patterns, advanced query options, and more examples, consult

references/api_reference.md

. This comprehensive reference includes:

Complete PUG-REST API endpoint documentation
Full list of available molecular properties
Asynchronous request handling patterns
PubChemPy API reference
PUG-View API for annotations
Common workflows and use cases
Links to official PubChem documentation

如需详细的API文档，包括完整属性列表、URL模式、高级查询选项和更多示例，请参考

references/api_reference.md

。这份全面的参考文档包含：

完整的PUG-REST API端点文档
所有可用分子属性的完整列表
异步请求处理模式
PubChemPy API参考
用于注释的PUG-View API
常见工作流与使用场景
官方PubChem文档链接

Troubleshooting

故障排除

Compound Not Found:

Try alternative names or synonyms
Use CID if known
Check spelling and chemical name format

Timeout Errors:

Reduce MaxRecords parameter
Add delays between requests
Use CIDs instead of names for faster queries

Empty Property Values:

Not all properties are available for all compounds
Check if property exists before accessing:
```
if compound.xlogp:
```
Some properties only available for certain compound types

Rate Limit Exceeded:

Implement delays (0.2-0.3 seconds) between requests
Use batch operations where possible
Consider caching results locally

Similarity/Substructure Search Hangs:

These are asynchronous operations that may take 15-30 seconds
PubChemPy handles polling automatically
Reduce MaxRecords if timing out

未找到化合物:

尝试使用替代名称或同义词
若已知CID则直接使用
检查拼写和化学名称格式

超时错误:

减小MaxRecords参数
在请求间添加延迟
使用CID替代名称以加快查询速度

属性值为空:

并非所有化合物都包含全部属性
访问前检查属性是否存在：
```
if compound.xlogp:
```
部分属性仅适用于特定类型的化合物

超出速率限制:

在请求间添加0.2-0.3秒延迟
尽可能使用批量操作
考虑本地缓存结果

相似性/子结构搜索停滞:

这些是异步操作，可能需要15-30秒完成
PubChemPy会自动处理轮询逻辑
若超时则减小MaxRecords参数

Additional Resources

额外资源

PubChem Home: https://pubchem.ncbi.nlm.nih.gov/
PUG-REST Documentation: https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest
PUG-REST Tutorial: https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest-tutorial
PubChemPy Documentation: https://pubchempy.readthedocs.io/
PubChemPy GitHub: https://github.com/mcs07/PubChemPy

PubChem主页: https://pubchem.ncbi.nlm.nih.gov/
PUG-REST文档: https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest
PUG-REST教程: https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest-tutorial
PubChemPy文档: https://pubchempy.readthedocs.io/
PubChemPy GitHub: https://github.com/mcs07/PubChemPy