chembl-database

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ChEMBL Database

ChEMBL数据库

ChEMBL is the European Bioinformatics Institute's repository of bioactive compound data, containing over 2 million compounds, 19 million bioactivity measurements, and 13,000+ drug targets.

ChEMBL是欧洲生物信息研究所的生物活性化合物数据仓库，包含超过200万种化合物、1900万条生物活性测量数据以及13000多个药物靶点。

Use Cases

使用场景

Find potent inhibitors for a protein target
Search for compounds similar to a known drug
Retrieve drug mechanism of action data
Filter compounds by molecular properties (Lipinski, etc.)
Export bioactivity data for ML or analysis

寻找蛋白靶点的强效抑制剂
搜索与已知药物结构相似的化合物
获取药物作用机制数据
按分子属性过滤化合物（如Lipinski规则等）
导出生物活性数据用于机器学习或分析

Installation

安装

bash

uv pip install chembl_webresource_client

bash

uv pip install chembl_webresource_client

Basic Usage

基础用法

python

from chembl_webresource_client.new_client import new_client

python

from chembl_webresource_client.new_client import new_client

Fetch compound by identifier

通过标识符获取化合物

mol = new_client.molecule.get('CHEMBL192')

Retrieve target data

获取靶点数据

tgt = new_client.target.get('CHEMBL203')

Query activity measurements

查询活性测量数据

acts = new_client.activity.filter( target_chembl_id='CHEMBL203', standard_type='IC50', standard_value__lte=50 )

undefined

acts = new_client.activity.filter( target_chembl_id='CHEMBL203', standard_type='IC50', standard_value__lte=50 )

undefined

Available Endpoints

可用接口

Resource	Description
`molecule`	Compound structures and properties
`target`	Biological targets
`activity`	Bioassay measurements
`assay`	Experimental protocols
`drug`	Approved drug data
`mechanism`	Drug mechanisms of action
`drug_indication`	Therapeutic indications
`similarity`	Structure similarity search
`substructure`	Substructure search
`document`	Literature references
`cell_line`	Cell line data
`protein_class`	Protein classifications
`image`	SVG molecular images

资源	描述
`molecule`	化合物结构与属性
`target`	生物靶点
`activity`	生物测定测量数据
`assay`	实验方案
`drug`	已获批药物数据
`mechanism`	药物作用机制
`drug_indication`	治疗适应症
`similarity`	结构相似性搜索
`substructure`	子结构搜索
`document`	文献参考
`cell_line`	细胞系数据
`protein_class`	蛋白分类
`image`	SVG分子图像

Query Operators

查询操作符

The client uses Django-style filtering:

Operator	Function	Example
`__exact`	Exact match	`pref_name__exact='Aspirin'`
`__icontains`	Case-insensitive substring	`pref_name__icontains='kinase'`
`__lte` , `__gte`	Less/greater than or equal	`standard_value__lte=10`
`__lt` , `__gt`	Less/greater than	`pchembl_value__gt=7`
`__range`	Value within range	`alogp__range=[-1, 5]`
`__in`	Value in list	`target_chembl_id__in=['CHEMBL203']`
`__isnull`	Null check	`pchembl_value__isnull=False`
`__startswith`	Prefix match	`pref_name__startswith='Proto'`
`__regex`	Regular expression	`pref_name__regex='^[A-Z]{3}'`

客户端采用Django风格的过滤方式：

操作符	功能	示例
`__exact`	精确匹配	`pref_name__exact='Aspirin'`
`__icontains`	不区分大小写的子串匹配	`pref_name__icontains='kinase'`
`__lte` , `__gte`	小于等于/大于等于	`standard_value__lte=10`
`__lt` , `__gt`	小于/大于	`pchembl_value__gt=7`
`__range`	范围匹配	`alogp__range=[-1, 5]`
`__in`	列表匹配	`target_chembl_id__in=['CHEMBL203']`
`__isnull`	空值检查	`pchembl_value__isnull=False`
`__startswith`	前缀匹配	`pref_name__startswith='Proto'`
`__regex`	正则表达式匹配	`pref_name__regex='^[A-Z]{3}'`

Common Workflows

常见工作流

Find Target Inhibitors

寻找靶点抑制剂

python

from chembl_webresource_client.new_client import new_client

activity = new_client.activity

python

from chembl_webresource_client.new_client import new_client

activity = new_client.activity

Get potent BRAF inhibitors (IC50 < 100 nM)

获取强效BRAF抑制剂（IC50 < 100 nM）

braf_hits = activity.filter( target_chembl_id='CHEMBL5145', standard_type='IC50', standard_value__lte=100, standard_units='nM' )

for hit in braf_hits: print(f"{hit['molecule_chembl_id']}: {hit['standard_value']} nM")

undefined

braf_hits = activity.filter( target_chembl_id='CHEMBL5145', standard_type='IC50', standard_value__lte=100, standard_units='nM' )

for hit in braf_hits: print(f"{hit['molecule_chembl_id']}: {hit['standard_value']} nM")

undefined

Search by Target Name

按靶点名称搜索

python

from chembl_webresource_client.new_client import new_client

target = new_client.target
activity = new_client.activity

python

from chembl_webresource_client.new_client import new_client

target = new_client.target
activity = new_client.activity

Find CDK targets

寻找CDK靶点

cdk_targets = target.filter( pref_name__icontains='cyclin-dependent kinase', target_type='SINGLE PROTEIN' )

target_ids = [t['target_chembl_id'] for t in cdk_targets]

cdk_targets = target.filter( pref_name__icontains='cyclin-dependent kinase', target_type='SINGLE PROTEIN' )

target_ids = [t['target_chembl_id'] for t in cdk_targets]

Get activities for these targets

获取这些靶点的活性数据

cdk_activities = activity.filter( target_chembl_id__in=target_ids[:5], standard_type='IC50', standard_value__lte=100, standard_units='nM' )

undefined

cdk_activities = activity.filter( target_chembl_id__in=target_ids[:5], standard_type='IC50', standard_value__lte=100, standard_units='nM' )

undefined

Structure Similarity Search

结构相似性搜索

python

from chembl_webresource_client.new_client import new_client

sim = new_client.similarity

python

from chembl_webresource_client.new_client import new_client

sim = new_client.similarity

Find molecules 80% similar to ibuprofen

寻找与布洛芬相似度达80%的分子

ibuprofen_smiles = 'CC(C)Cc1ccc(cc1)C(C)C(=O)O' matches = sim.filter(smiles=ibuprofen_smiles, similarity=80)

for m in matches: print(f"{m['molecule_chembl_id']}: {m['similarity']}%")

undefined

ibuprofen_smiles = 'CC(C)Cc1ccc(cc1)C(C)C(=O)O' matches = sim.filter(smiles=ibuprofen_smiles, similarity=80)

for m in matches: print(f"{m['molecule_chembl_id']}: {m['similarity']}%")

undefined

Substructure Search

子结构搜索

python

from chembl_webresource_client.new_client import new_client

sub = new_client.substructure

python

from chembl_webresource_client.new_client import new_client

sub = new_client.substructure

Find compounds with benzimidazole core

寻找含有苯并咪唑母核的化合物

benzimidazole = 'c1ccc2[nH]cnc2c1' compounds = sub.filter(smiles=benzimidazole)

undefined

benzimidazole = 'c1ccc2[nH]cnc2c1' compounds = sub.filter(smiles=benzimidazole)

undefined

Filter by Molecular Properties

按分子属性过滤

python

from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

python

from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

Lipinski-compliant fragments

符合Lipinski规则的片段

fragments = mol.filter( molecule_properties__mw_freebase__lte=300, molecule_properties__alogp__lte=3, molecule_properties__hbd__lte=3, molecule_properties__hba__lte=3 )

undefined

fragments = mol.filter( molecule_properties__mw_freebase__lte=300, molecule_properties__alogp__lte=3, molecule_properties__hbd__lte=3, molecule_properties__hba__lte=3 )

undefined

Drug Mechanisms of Action

药物作用机制

python

from chembl_webresource_client.new_client import new_client

mech = new_client.mechanism
drug_ind = new_client.drug_indication

python

from chembl_webresource_client.new_client import new_client

mech = new_client.mechanism
drug_ind = new_client.drug_indication

Get mechanism of metformin

获取二甲双胍的作用机制

metformin_id = 'CHEMBL1431' mechanisms = mech.filter(molecule_chembl_id=metformin_id)

for m in mechanisms: print(f"Target: {m['target_chembl_id']}") print(f"Action: {m['action_type']}")

metformin_id = 'CHEMBL1431' mechanisms = mech.filter(molecule_chembl_id=metformin_id)

for m in mechanisms: print(f"靶点: {m['target_chembl_id']}") print(f"作用类型: {m['action_type']}")

Get approved indications

获取获批适应症

indications = drug_ind.filter(molecule_chembl_id=metformin_id)

undefined

indications = drug_ind.filter(molecule_chembl_id=metformin_id)

undefined

Generate Molecule Images

生成分子图像

python

from chembl_webresource_client.new_client import new_client

img = new_client.image

python

from chembl_webresource_client.new_client import new_client

img = new_client.image

Get SVG of caffeine

获取咖啡因的SVG图像

caffeine_svg = img.get('CHEMBL113')

with open('caffeine.svg', 'w') as f: f.write(caffeine_svg)

undefined

caffeine_svg = img.get('CHEMBL113')

with open('caffeine.svg', 'w') as f: f.write(caffeine_svg)

undefined

Key Response Fields

关键响应字段

Molecule Properties

分子属性

Field	Description
`molecule_chembl_id`	ChEMBL identifier
`pref_name`	Preferred name
`molecule_structures.canonical_smiles`	SMILES string
`molecule_structures.standard_inchi_key`	InChI key
`molecule_properties.mw_freebase`	Molecular weight
`molecule_properties.alogp`	Calculated LogP
`molecule_properties.hba` / `hbd`	H-bond acceptors/donors
`molecule_properties.psa`	Polar surface area
`molecule_properties.rtb`	Rotatable bonds
`molecule_properties.num_ro5_violations`	Lipinski violations
`molecule_properties.qed_weighted`	QED drug-likeness

字段	描述
`molecule_chembl_id`	ChEMBL标识符
`pref_name`	首选名称
`molecule_structures.canonical_smiles`	SMILES字符串
`molecule_structures.standard_inchi_key`	InChI键
`molecule_properties.mw_freebase`	分子量
`molecule_properties.alogp`	计算所得LogP值
`molecule_properties.hba` / `hbd`	氢键受体/供体数量
`molecule_properties.psa`	极性表面积
`molecule_properties.rtb`	可旋转键数量
`molecule_properties.num_ro5_violations`	Lipinski规则违反次数
`molecule_properties.qed_weighted`	QED类药评分

Activity Fields

活性字段

Field	Description
`molecule_chembl_id`	Compound ID
`target_chembl_id`	Target ID
`standard_type`	Measurement type (IC50, Ki, EC50)
`standard_value`	Numeric value
`standard_units`	Units (nM, uM)
`pchembl_value`	Normalized -log10 value
`data_validity_comment`	Quality flag
`potential_duplicate`	Duplicate indicator

字段	描述
`molecule_chembl_id`	化合物ID
`target_chembl_id`	靶点ID
`standard_type`	测量类型（如IC50、Ki、EC50）
`standard_value`	数值
`standard_units`	单位（如nM、uM）
`pchembl_value`	归一化的-log10值
`data_validity_comment`	质量标记
`potential_duplicate`	重复数据标记

Target Fields

靶点字段

Field	Description
`target_chembl_id`	ChEMBL target ID
`pref_name`	Preferred name
`target_type`	SINGLE PROTEIN, PROTEIN COMPLEX, etc.
`organism`	Species

字段	描述
`target_chembl_id`	ChEMBL靶点ID
`pref_name`	首选名称
`target_type`	类型（如SINGLE PROTEIN、PROTEIN COMPLEX等）
`organism`	物种

Mechanism Fields

机制字段

Field	Description
`molecule_chembl_id`	Drug ID
`target_chembl_id`	Target ID
`mechanism_of_action`	Description
`action_type`	INHIBITOR, AGONIST, ANTAGONIST, etc.

字段	描述
`molecule_chembl_id`	药物ID
`target_chembl_id`	靶点ID
`mechanism_of_action`	描述
`action_type`	作用类型（如INHIBITOR、AGONIST、ANTAGONIST等）

Export to DataFrame

导出为DataFrame

python

import pandas as pd
from chembl_webresource_client.new_client import new_client

activity = new_client.activity

results = activity.filter(
    target_chembl_id='CHEMBL279',
    standard_type='Ki',
    pchembl_value__isnull=False
)

df = pd.DataFrame(list(results))
df.to_csv('dopamine_d2_ligands.csv', index=False)

python

import pandas as pd
from chembl_webresource_client.new_client import new_client

activity = new_client.activity

results = activity.filter(
    target_chembl_id='CHEMBL279',
    standard_type='Ki',
    pchembl_value__isnull=False
)

df = pd.DataFrame(list(results))
df.to_csv('dopamine_d2_ligands.csv', index=False)

Configuration

配置

python

from chembl_webresource_client.settings import Settings

cfg = Settings.Instance()

cfg.CACHING = True           # Enable response caching
cfg.CACHE_EXPIRE = 43200     # Cache TTL (12 hours)
cfg.TIMEOUT = 60             # Request timeout
cfg.TOTAL_RETRIES = 5        # Retry attempts

python

from chembl_webresource_client.settings import Settings

cfg = Settings.Instance()

cfg.CACHING = True           # 启用响应缓存
cfg.CACHE_EXPIRE = 43200     # 缓存过期时间（12小时）
cfg.TIMEOUT = 60             # 请求超时时间
cfg.TOTAL_RETRIES = 5        # 重试次数

Data Quality Notes

数据质量说明

ChEMBL data is manually curated but verify
```
data_validity_comment
```
fields
Check
```
potential_duplicate
```
flags when aggregating results
Use
```
pchembl_value
```
for normalized comparisons across assay types
Activity values without
```
standard_units
```
should be used cautiously

ChEMBL数据经过人工审核，但仍需验证
```
data_validity_comment
```
字段
汇总结果时需检查
```
potential_duplicate
```
标记
跨实验类型比较时使用
```
pchembl_value
```
归一化值
无
```
standard_units
```
的活性数据需谨慎使用

Best Practices

最佳实践

Use caching - Reduces API load and improves performance
Filter early - Apply filters to reduce data transfer
Limit results - Use
```
[:n]
```
slicing for testing
Check validity - Inspect
```
data_validity_comment
```
fields
Use pchembl_value - Normalized values enable cross-assay comparison
Batch queries - Use
```
__in
```
operator for multiple IDs

启用缓存 - 减少API负载并提升性能
尽早过滤 - 应用过滤条件以减少数据传输量
限制结果数量 - 测试时使用
```
[:n]
```
切片
检查有效性 - 查看
```
data_validity_comment
```
字段
使用pchembl_value - 归一化值支持跨实验比较
批量查询 - 使用
```
__in
```
操作符处理多个ID

Error Handling

错误处理

python

from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

try:
    result = mol.get('INVALID_ID')
except Exception as e:
    if '404' in str(e):
        print("Compound not found")
    elif '503' in str(e):
        print("Service unavailable - retry later")
    else:
        raise

python

from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

try:
    result = mol.get('INVALID_ID')
except Exception as e:
    if '404' in str(e):
        print("未找到该化合物")
    elif '503' in str(e):
        print("服务不可用 - 请稍后重试")
    else:
        raise

External Links

外部链接

ChEMBL: https://www.ebi.ac.uk/chembl/
API Documentation: https://chembl.gitbook.io/chembl-interface-documentation
Python Client: https://github.com/chembl/chembl_webresource_client

ChEMBL: https://www.ebi.ac.uk/chembl/
API文档: https://chembl.gitbook.io/chembl-interface-documentation
Python客户端: https://github.com/chembl/chembl_webresource_client