chembl-database

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ChEMBL Database

ChEMBL数据库

ChEMBL is the European Bioinformatics Institute's repository of bioactive compound data, containing over 2 million compounds, 19 million bioactivity measurements, and 13,000+ drug targets.
ChEMBL是欧洲生物信息研究所的生物活性化合物数据仓库,包含超过200万种化合物、1900万条生物活性测量数据以及13000多个药物靶点。

Use Cases

使用场景

  • Find potent inhibitors for a protein target
  • Search for compounds similar to a known drug
  • Retrieve drug mechanism of action data
  • Filter compounds by molecular properties (Lipinski, etc.)
  • Export bioactivity data for ML or analysis
  • 寻找蛋白靶点的强效抑制剂
  • 搜索与已知药物结构相似的化合物
  • 获取药物作用机制数据
  • 按分子属性过滤化合物(如Lipinski规则等)
  • 导出生物活性数据用于机器学习或分析

Installation

安装

bash
uv pip install chembl_webresource_client
bash
uv pip install chembl_webresource_client

Basic Usage

基础用法

python
from chembl_webresource_client.new_client import new_client
python
from chembl_webresource_client.new_client import new_client

Fetch compound by identifier

通过标识符获取化合物

mol = new_client.molecule.get('CHEMBL192')
mol = new_client.molecule.get('CHEMBL192')

Retrieve target data

获取靶点数据

tgt = new_client.target.get('CHEMBL203')
tgt = new_client.target.get('CHEMBL203')

Query activity measurements

查询活性测量数据

acts = new_client.activity.filter( target_chembl_id='CHEMBL203', standard_type='IC50', standard_value__lte=50 )
undefined
acts = new_client.activity.filter( target_chembl_id='CHEMBL203', standard_type='IC50', standard_value__lte=50 )
undefined

Available Endpoints

可用接口

ResourceDescription
molecule
Compound structures and properties
target
Biological targets
activity
Bioassay measurements
assay
Experimental protocols
drug
Approved drug data
mechanism
Drug mechanisms of action
drug_indication
Therapeutic indications
similarity
Structure similarity search
substructure
Substructure search
document
Literature references
cell_line
Cell line data
protein_class
Protein classifications
image
SVG molecular images
资源描述
molecule
化合物结构与属性
target
生物靶点
activity
生物测定测量数据
assay
实验方案
drug
已获批药物数据
mechanism
药物作用机制
drug_indication
治疗适应症
similarity
结构相似性搜索
substructure
子结构搜索
document
文献参考
cell_line
细胞系数据
protein_class
蛋白分类
image
SVG分子图像

Query Operators

查询操作符

The client uses Django-style filtering:
OperatorFunctionExample
__exact
Exact match
pref_name__exact='Aspirin'
__icontains
Case-insensitive substring
pref_name__icontains='kinase'
__lte
,
__gte
Less/greater than or equal
standard_value__lte=10
__lt
,
__gt
Less/greater than
pchembl_value__gt=7
__range
Value within range
alogp__range=[-1, 5]
__in
Value in list
target_chembl_id__in=['CHEMBL203']
__isnull
Null check
pchembl_value__isnull=False
__startswith
Prefix match
pref_name__startswith='Proto'
__regex
Regular expression
pref_name__regex='^[A-Z]{3}'
客户端采用Django风格的过滤方式:
操作符功能示例
__exact
精确匹配
pref_name__exact='Aspirin'
__icontains
不区分大小写的子串匹配
pref_name__icontains='kinase'
__lte
,
__gte
小于等于/大于等于
standard_value__lte=10
__lt
,
__gt
小于/大于
pchembl_value__gt=7
__range
范围匹配
alogp__range=[-1, 5]
__in
列表匹配
target_chembl_id__in=['CHEMBL203']
__isnull
空值检查
pchembl_value__isnull=False
__startswith
前缀匹配
pref_name__startswith='Proto'
__regex
正则表达式匹配
pref_name__regex='^[A-Z]{3}'

Common Workflows

常见工作流

Find Target Inhibitors

寻找靶点抑制剂

python
from chembl_webresource_client.new_client import new_client

activity = new_client.activity
python
from chembl_webresource_client.new_client import new_client

activity = new_client.activity

Get potent BRAF inhibitors (IC50 < 100 nM)

获取强效BRAF抑制剂(IC50 < 100 nM)

braf_hits = activity.filter( target_chembl_id='CHEMBL5145', standard_type='IC50', standard_value__lte=100, standard_units='nM' )
for hit in braf_hits: print(f"{hit['molecule_chembl_id']}: {hit['standard_value']} nM")
undefined
braf_hits = activity.filter( target_chembl_id='CHEMBL5145', standard_type='IC50', standard_value__lte=100, standard_units='nM' )
for hit in braf_hits: print(f"{hit['molecule_chembl_id']}: {hit['standard_value']} nM")
undefined

Search by Target Name

按靶点名称搜索

python
from chembl_webresource_client.new_client import new_client

target = new_client.target
activity = new_client.activity
python
from chembl_webresource_client.new_client import new_client

target = new_client.target
activity = new_client.activity

Find CDK targets

寻找CDK靶点

cdk_targets = target.filter( pref_name__icontains='cyclin-dependent kinase', target_type='SINGLE PROTEIN' )
target_ids = [t['target_chembl_id'] for t in cdk_targets]
cdk_targets = target.filter( pref_name__icontains='cyclin-dependent kinase', target_type='SINGLE PROTEIN' )
target_ids = [t['target_chembl_id'] for t in cdk_targets]

Get activities for these targets

获取这些靶点的活性数据

cdk_activities = activity.filter( target_chembl_id__in=target_ids[:5], standard_type='IC50', standard_value__lte=100, standard_units='nM' )
undefined
cdk_activities = activity.filter( target_chembl_id__in=target_ids[:5], standard_type='IC50', standard_value__lte=100, standard_units='nM' )
undefined

Structure Similarity Search

结构相似性搜索

python
from chembl_webresource_client.new_client import new_client

sim = new_client.similarity
python
from chembl_webresource_client.new_client import new_client

sim = new_client.similarity

Find molecules 80% similar to ibuprofen

寻找与布洛芬相似度达80%的分子

ibuprofen_smiles = 'CC(C)Cc1ccc(cc1)C(C)C(=O)O' matches = sim.filter(smiles=ibuprofen_smiles, similarity=80)
for m in matches: print(f"{m['molecule_chembl_id']}: {m['similarity']}%")
undefined
ibuprofen_smiles = 'CC(C)Cc1ccc(cc1)C(C)C(=O)O' matches = sim.filter(smiles=ibuprofen_smiles, similarity=80)
for m in matches: print(f"{m['molecule_chembl_id']}: {m['similarity']}%")
undefined

Substructure Search

子结构搜索

python
from chembl_webresource_client.new_client import new_client

sub = new_client.substructure
python
from chembl_webresource_client.new_client import new_client

sub = new_client.substructure

Find compounds with benzimidazole core

寻找含有苯并咪唑母核的化合物

benzimidazole = 'c1ccc2[nH]cnc2c1' compounds = sub.filter(smiles=benzimidazole)
undefined
benzimidazole = 'c1ccc2[nH]cnc2c1' compounds = sub.filter(smiles=benzimidazole)
undefined

Filter by Molecular Properties

按分子属性过滤

python
from chembl_webresource_client.new_client import new_client

mol = new_client.molecule
python
from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

Lipinski-compliant fragments

符合Lipinski规则的片段

fragments = mol.filter( molecule_properties__mw_freebase__lte=300, molecule_properties__alogp__lte=3, molecule_properties__hbd__lte=3, molecule_properties__hba__lte=3 )
undefined
fragments = mol.filter( molecule_properties__mw_freebase__lte=300, molecule_properties__alogp__lte=3, molecule_properties__hbd__lte=3, molecule_properties__hba__lte=3 )
undefined

Drug Mechanisms of Action

药物作用机制

python
from chembl_webresource_client.new_client import new_client

mech = new_client.mechanism
drug_ind = new_client.drug_indication
python
from chembl_webresource_client.new_client import new_client

mech = new_client.mechanism
drug_ind = new_client.drug_indication

Get mechanism of metformin

获取二甲双胍的作用机制

metformin_id = 'CHEMBL1431' mechanisms = mech.filter(molecule_chembl_id=metformin_id)
for m in mechanisms: print(f"Target: {m['target_chembl_id']}") print(f"Action: {m['action_type']}")
metformin_id = 'CHEMBL1431' mechanisms = mech.filter(molecule_chembl_id=metformin_id)
for m in mechanisms: print(f"靶点: {m['target_chembl_id']}") print(f"作用类型: {m['action_type']}")

Get approved indications

获取获批适应症

indications = drug_ind.filter(molecule_chembl_id=metformin_id)
undefined
indications = drug_ind.filter(molecule_chembl_id=metformin_id)
undefined

Generate Molecule Images

生成分子图像

python
from chembl_webresource_client.new_client import new_client

img = new_client.image
python
from chembl_webresource_client.new_client import new_client

img = new_client.image

Get SVG of caffeine

获取咖啡因的SVG图像

caffeine_svg = img.get('CHEMBL113')
with open('caffeine.svg', 'w') as f: f.write(caffeine_svg)
undefined
caffeine_svg = img.get('CHEMBL113')
with open('caffeine.svg', 'w') as f: f.write(caffeine_svg)
undefined

Key Response Fields

关键响应字段

Molecule Properties

分子属性

FieldDescription
molecule_chembl_id
ChEMBL identifier
pref_name
Preferred name
molecule_structures.canonical_smiles
SMILES string
molecule_structures.standard_inchi_key
InChI key
molecule_properties.mw_freebase
Molecular weight
molecule_properties.alogp
Calculated LogP
molecule_properties.hba
/
hbd
H-bond acceptors/donors
molecule_properties.psa
Polar surface area
molecule_properties.rtb
Rotatable bonds
molecule_properties.num_ro5_violations
Lipinski violations
molecule_properties.qed_weighted
QED drug-likeness
字段描述
molecule_chembl_id
ChEMBL标识符
pref_name
首选名称
molecule_structures.canonical_smiles
SMILES字符串
molecule_structures.standard_inchi_key
InChI键
molecule_properties.mw_freebase
分子量
molecule_properties.alogp
计算所得LogP值
molecule_properties.hba
/
hbd
氢键受体/供体数量
molecule_properties.psa
极性表面积
molecule_properties.rtb
可旋转键数量
molecule_properties.num_ro5_violations
Lipinski规则违反次数
molecule_properties.qed_weighted
QED类药评分

Activity Fields

活性字段

FieldDescription
molecule_chembl_id
Compound ID
target_chembl_id
Target ID
standard_type
Measurement type (IC50, Ki, EC50)
standard_value
Numeric value
standard_units
Units (nM, uM)
pchembl_value
Normalized -log10 value
data_validity_comment
Quality flag
potential_duplicate
Duplicate indicator
字段描述
molecule_chembl_id
化合物ID
target_chembl_id
靶点ID
standard_type
测量类型(如IC50、Ki、EC50)
standard_value
数值
standard_units
单位(如nM、uM)
pchembl_value
归一化的-log10值
data_validity_comment
质量标记
potential_duplicate
重复数据标记

Target Fields

靶点字段

FieldDescription
target_chembl_id
ChEMBL target ID
pref_name
Preferred name
target_type
SINGLE PROTEIN, PROTEIN COMPLEX, etc.
organism
Species
字段描述
target_chembl_id
ChEMBL靶点ID
pref_name
首选名称
target_type
类型(如SINGLE PROTEIN、PROTEIN COMPLEX等)
organism
物种

Mechanism Fields

机制字段

FieldDescription
molecule_chembl_id
Drug ID
target_chembl_id
Target ID
mechanism_of_action
Description
action_type
INHIBITOR, AGONIST, ANTAGONIST, etc.
字段描述
molecule_chembl_id
药物ID
target_chembl_id
靶点ID
mechanism_of_action
描述
action_type
作用类型(如INHIBITOR、AGONIST、ANTAGONIST等)

Export to DataFrame

导出为DataFrame

python
import pandas as pd
from chembl_webresource_client.new_client import new_client

activity = new_client.activity

results = activity.filter(
    target_chembl_id='CHEMBL279',
    standard_type='Ki',
    pchembl_value__isnull=False
)

df = pd.DataFrame(list(results))
df.to_csv('dopamine_d2_ligands.csv', index=False)
python
import pandas as pd
from chembl_webresource_client.new_client import new_client

activity = new_client.activity

results = activity.filter(
    target_chembl_id='CHEMBL279',
    standard_type='Ki',
    pchembl_value__isnull=False
)

df = pd.DataFrame(list(results))
df.to_csv('dopamine_d2_ligands.csv', index=False)

Configuration

配置

python
from chembl_webresource_client.settings import Settings

cfg = Settings.Instance()

cfg.CACHING = True           # Enable response caching
cfg.CACHE_EXPIRE = 43200     # Cache TTL (12 hours)
cfg.TIMEOUT = 60             # Request timeout
cfg.TOTAL_RETRIES = 5        # Retry attempts
python
from chembl_webresource_client.settings import Settings

cfg = Settings.Instance()

cfg.CACHING = True           # 启用响应缓存
cfg.CACHE_EXPIRE = 43200     # 缓存过期时间(12小时)
cfg.TIMEOUT = 60             # 请求超时时间
cfg.TOTAL_RETRIES = 5        # 重试次数

Data Quality Notes

数据质量说明

  • ChEMBL data is manually curated but verify
    data_validity_comment
    fields
  • Check
    potential_duplicate
    flags when aggregating results
  • Use
    pchembl_value
    for normalized comparisons across assay types
  • Activity values without
    standard_units
    should be used cautiously
  • ChEMBL数据经过人工审核,但仍需验证
    data_validity_comment
    字段
  • 汇总结果时需检查
    potential_duplicate
    标记
  • 跨实验类型比较时使用
    pchembl_value
    归一化值
  • standard_units
    的活性数据需谨慎使用

Best Practices

最佳实践

  1. Use caching - Reduces API load and improves performance
  2. Filter early - Apply filters to reduce data transfer
  3. Limit results - Use
    [:n]
    slicing for testing
  4. Check validity - Inspect
    data_validity_comment
    fields
  5. Use pchembl_value - Normalized values enable cross-assay comparison
  6. Batch queries - Use
    __in
    operator for multiple IDs
  1. 启用缓存 - 减少API负载并提升性能
  2. 尽早过滤 - 应用过滤条件以减少数据传输量
  3. 限制结果数量 - 测试时使用
    [:n]
    切片
  4. 检查有效性 - 查看
    data_validity_comment
    字段
  5. 使用pchembl_value - 归一化值支持跨实验比较
  6. 批量查询 - 使用
    __in
    操作符处理多个ID

Error Handling

错误处理

python
from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

try:
    result = mol.get('INVALID_ID')
except Exception as e:
    if '404' in str(e):
        print("Compound not found")
    elif '503' in str(e):
        print("Service unavailable - retry later")
    else:
        raise
python
from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

try:
    result = mol.get('INVALID_ID')
except Exception as e:
    if '404' in str(e):
        print("未找到该化合物")
    elif '503' in str(e):
        print("服务不可用 - 请稍后重试")
    else:
        raise

External Links

外部链接