chembl-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseChEMBL Database
ChEMBL数据库
ChEMBL is the European Bioinformatics Institute's repository of bioactive compound data, containing over 2 million compounds, 19 million bioactivity measurements, and 13,000+ drug targets.
ChEMBL是欧洲生物信息研究所的生物活性化合物数据仓库,包含超过200万种化合物、1900万条生物活性测量数据以及13000多个药物靶点。
Use Cases
使用场景
- Find potent inhibitors for a protein target
- Search for compounds similar to a known drug
- Retrieve drug mechanism of action data
- Filter compounds by molecular properties (Lipinski, etc.)
- Export bioactivity data for ML or analysis
- 寻找蛋白靶点的强效抑制剂
- 搜索与已知药物结构相似的化合物
- 获取药物作用机制数据
- 按分子属性过滤化合物(如Lipinski规则等)
- 导出生物活性数据用于机器学习或分析
Installation
安装
bash
uv pip install chembl_webresource_clientbash
uv pip install chembl_webresource_clientBasic Usage
基础用法
python
from chembl_webresource_client.new_client import new_clientpython
from chembl_webresource_client.new_client import new_clientFetch compound by identifier
通过标识符获取化合物
mol = new_client.molecule.get('CHEMBL192')
mol = new_client.molecule.get('CHEMBL192')
Retrieve target data
获取靶点数据
tgt = new_client.target.get('CHEMBL203')
tgt = new_client.target.get('CHEMBL203')
Query activity measurements
查询活性测量数据
acts = new_client.activity.filter(
target_chembl_id='CHEMBL203',
standard_type='IC50',
standard_value__lte=50
)
undefinedacts = new_client.activity.filter(
target_chembl_id='CHEMBL203',
standard_type='IC50',
standard_value__lte=50
)
undefinedAvailable Endpoints
可用接口
| Resource | Description |
|---|---|
| Compound structures and properties |
| Biological targets |
| Bioassay measurements |
| Experimental protocols |
| Approved drug data |
| Drug mechanisms of action |
| Therapeutic indications |
| Structure similarity search |
| Substructure search |
| Literature references |
| Cell line data |
| Protein classifications |
| SVG molecular images |
| 资源 | 描述 |
|---|---|
| 化合物结构与属性 |
| 生物靶点 |
| 生物测定测量数据 |
| 实验方案 |
| 已获批药物数据 |
| 药物作用机制 |
| 治疗适应症 |
| 结构相似性搜索 |
| 子结构搜索 |
| 文献参考 |
| 细胞系数据 |
| 蛋白分类 |
| SVG分子图像 |
Query Operators
查询操作符
The client uses Django-style filtering:
| Operator | Function | Example |
|---|---|---|
| Exact match | |
| Case-insensitive substring | |
| Less/greater than or equal | |
| Less/greater than | |
| Value within range | |
| Value in list | |
| Null check | |
| Prefix match | |
| Regular expression | |
客户端采用Django风格的过滤方式:
| 操作符 | 功能 | 示例 |
|---|---|---|
| 精确匹配 | |
| 不区分大小写的子串匹配 | |
| 小于等于/大于等于 | |
| 小于/大于 | |
| 范围匹配 | |
| 列表匹配 | |
| 空值检查 | |
| 前缀匹配 | |
| 正则表达式匹配 | |
Common Workflows
常见工作流
Find Target Inhibitors
寻找靶点抑制剂
python
from chembl_webresource_client.new_client import new_client
activity = new_client.activitypython
from chembl_webresource_client.new_client import new_client
activity = new_client.activityGet potent BRAF inhibitors (IC50 < 100 nM)
获取强效BRAF抑制剂(IC50 < 100 nM)
braf_hits = activity.filter(
target_chembl_id='CHEMBL5145',
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
for hit in braf_hits:
print(f"{hit['molecule_chembl_id']}: {hit['standard_value']} nM")
undefinedbraf_hits = activity.filter(
target_chembl_id='CHEMBL5145',
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
for hit in braf_hits:
print(f"{hit['molecule_chembl_id']}: {hit['standard_value']} nM")
undefinedSearch by Target Name
按靶点名称搜索
python
from chembl_webresource_client.new_client import new_client
target = new_client.target
activity = new_client.activitypython
from chembl_webresource_client.new_client import new_client
target = new_client.target
activity = new_client.activityFind CDK targets
寻找CDK靶点
cdk_targets = target.filter(
pref_name__icontains='cyclin-dependent kinase',
target_type='SINGLE PROTEIN'
)
target_ids = [t['target_chembl_id'] for t in cdk_targets]
cdk_targets = target.filter(
pref_name__icontains='cyclin-dependent kinase',
target_type='SINGLE PROTEIN'
)
target_ids = [t['target_chembl_id'] for t in cdk_targets]
Get activities for these targets
获取这些靶点的活性数据
cdk_activities = activity.filter(
target_chembl_id__in=target_ids[:5],
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
undefinedcdk_activities = activity.filter(
target_chembl_id__in=target_ids[:5],
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
undefinedStructure Similarity Search
结构相似性搜索
python
from chembl_webresource_client.new_client import new_client
sim = new_client.similaritypython
from chembl_webresource_client.new_client import new_client
sim = new_client.similarityFind molecules 80% similar to ibuprofen
寻找与布洛芬相似度达80%的分子
ibuprofen_smiles = 'CC(C)Cc1ccc(cc1)C(C)C(=O)O'
matches = sim.filter(smiles=ibuprofen_smiles, similarity=80)
for m in matches:
print(f"{m['molecule_chembl_id']}: {m['similarity']}%")
undefinedibuprofen_smiles = 'CC(C)Cc1ccc(cc1)C(C)C(=O)O'
matches = sim.filter(smiles=ibuprofen_smiles, similarity=80)
for m in matches:
print(f"{m['molecule_chembl_id']}: {m['similarity']}%")
undefinedSubstructure Search
子结构搜索
python
from chembl_webresource_client.new_client import new_client
sub = new_client.substructurepython
from chembl_webresource_client.new_client import new_client
sub = new_client.substructureFind compounds with benzimidazole core
寻找含有苯并咪唑母核的化合物
benzimidazole = 'c1ccc2[nH]cnc2c1'
compounds = sub.filter(smiles=benzimidazole)
undefinedbenzimidazole = 'c1ccc2[nH]cnc2c1'
compounds = sub.filter(smiles=benzimidazole)
undefinedFilter by Molecular Properties
按分子属性过滤
python
from chembl_webresource_client.new_client import new_client
mol = new_client.moleculepython
from chembl_webresource_client.new_client import new_client
mol = new_client.moleculeLipinski-compliant fragments
符合Lipinski规则的片段
fragments = mol.filter(
molecule_properties__mw_freebase__lte=300,
molecule_properties__alogp__lte=3,
molecule_properties__hbd__lte=3,
molecule_properties__hba__lte=3
)
undefinedfragments = mol.filter(
molecule_properties__mw_freebase__lte=300,
molecule_properties__alogp__lte=3,
molecule_properties__hbd__lte=3,
molecule_properties__hba__lte=3
)
undefinedDrug Mechanisms of Action
药物作用机制
python
from chembl_webresource_client.new_client import new_client
mech = new_client.mechanism
drug_ind = new_client.drug_indicationpython
from chembl_webresource_client.new_client import new_client
mech = new_client.mechanism
drug_ind = new_client.drug_indicationGet mechanism of metformin
获取二甲双胍的作用机制
metformin_id = 'CHEMBL1431'
mechanisms = mech.filter(molecule_chembl_id=metformin_id)
for m in mechanisms:
print(f"Target: {m['target_chembl_id']}")
print(f"Action: {m['action_type']}")
metformin_id = 'CHEMBL1431'
mechanisms = mech.filter(molecule_chembl_id=metformin_id)
for m in mechanisms:
print(f"靶点: {m['target_chembl_id']}")
print(f"作用类型: {m['action_type']}")
Get approved indications
获取获批适应症
indications = drug_ind.filter(molecule_chembl_id=metformin_id)
undefinedindications = drug_ind.filter(molecule_chembl_id=metformin_id)
undefinedGenerate Molecule Images
生成分子图像
python
from chembl_webresource_client.new_client import new_client
img = new_client.imagepython
from chembl_webresource_client.new_client import new_client
img = new_client.imageGet SVG of caffeine
获取咖啡因的SVG图像
caffeine_svg = img.get('CHEMBL113')
with open('caffeine.svg', 'w') as f:
f.write(caffeine_svg)
undefinedcaffeine_svg = img.get('CHEMBL113')
with open('caffeine.svg', 'w') as f:
f.write(caffeine_svg)
undefinedKey Response Fields
关键响应字段
Molecule Properties
分子属性
| Field | Description |
|---|---|
| ChEMBL identifier |
| Preferred name |
| SMILES string |
| InChI key |
| Molecular weight |
| Calculated LogP |
| H-bond acceptors/donors |
| Polar surface area |
| Rotatable bonds |
| Lipinski violations |
| QED drug-likeness |
| 字段 | 描述 |
|---|---|
| ChEMBL标识符 |
| 首选名称 |
| SMILES字符串 |
| InChI键 |
| 分子量 |
| 计算所得LogP值 |
| 氢键受体/供体数量 |
| 极性表面积 |
| 可旋转键数量 |
| Lipinski规则违反次数 |
| QED类药评分 |
Activity Fields
活性字段
| Field | Description |
|---|---|
| Compound ID |
| Target ID |
| Measurement type (IC50, Ki, EC50) |
| Numeric value |
| Units (nM, uM) |
| Normalized -log10 value |
| Quality flag |
| Duplicate indicator |
| 字段 | 描述 |
|---|---|
| 化合物ID |
| 靶点ID |
| 测量类型(如IC50、Ki、EC50) |
| 数值 |
| 单位(如nM、uM) |
| 归一化的-log10值 |
| 质量标记 |
| 重复数据标记 |
Target Fields
靶点字段
| Field | Description |
|---|---|
| ChEMBL target ID |
| Preferred name |
| SINGLE PROTEIN, PROTEIN COMPLEX, etc. |
| Species |
| 字段 | 描述 |
|---|---|
| ChEMBL靶点ID |
| 首选名称 |
| 类型(如SINGLE PROTEIN、PROTEIN COMPLEX等) |
| 物种 |
Mechanism Fields
机制字段
| Field | Description |
|---|---|
| Drug ID |
| Target ID |
| Description |
| INHIBITOR, AGONIST, ANTAGONIST, etc. |
| 字段 | 描述 |
|---|---|
| 药物ID |
| 靶点ID |
| 描述 |
| 作用类型(如INHIBITOR、AGONIST、ANTAGONIST等) |
Export to DataFrame
导出为DataFrame
python
import pandas as pd
from chembl_webresource_client.new_client import new_client
activity = new_client.activity
results = activity.filter(
target_chembl_id='CHEMBL279',
standard_type='Ki',
pchembl_value__isnull=False
)
df = pd.DataFrame(list(results))
df.to_csv('dopamine_d2_ligands.csv', index=False)python
import pandas as pd
from chembl_webresource_client.new_client import new_client
activity = new_client.activity
results = activity.filter(
target_chembl_id='CHEMBL279',
standard_type='Ki',
pchembl_value__isnull=False
)
df = pd.DataFrame(list(results))
df.to_csv('dopamine_d2_ligands.csv', index=False)Configuration
配置
python
from chembl_webresource_client.settings import Settings
cfg = Settings.Instance()
cfg.CACHING = True # Enable response caching
cfg.CACHE_EXPIRE = 43200 # Cache TTL (12 hours)
cfg.TIMEOUT = 60 # Request timeout
cfg.TOTAL_RETRIES = 5 # Retry attemptspython
from chembl_webresource_client.settings import Settings
cfg = Settings.Instance()
cfg.CACHING = True # 启用响应缓存
cfg.CACHE_EXPIRE = 43200 # 缓存过期时间(12小时)
cfg.TIMEOUT = 60 # 请求超时时间
cfg.TOTAL_RETRIES = 5 # 重试次数Data Quality Notes
数据质量说明
- ChEMBL data is manually curated but verify fields
data_validity_comment - Check flags when aggregating results
potential_duplicate - Use for normalized comparisons across assay types
pchembl_value - Activity values without should be used cautiously
standard_units
- ChEMBL数据经过人工审核,但仍需验证字段
data_validity_comment - 汇总结果时需检查标记
potential_duplicate - 跨实验类型比较时使用归一化值
pchembl_value - 无的活性数据需谨慎使用
standard_units
Best Practices
最佳实践
- Use caching - Reduces API load and improves performance
- Filter early - Apply filters to reduce data transfer
- Limit results - Use slicing for testing
[:n] - Check validity - Inspect fields
data_validity_comment - Use pchembl_value - Normalized values enable cross-assay comparison
- Batch queries - Use operator for multiple IDs
__in
- 启用缓存 - 减少API负载并提升性能
- 尽早过滤 - 应用过滤条件以减少数据传输量
- 限制结果数量 - 测试时使用切片
[:n] - 检查有效性 - 查看字段
data_validity_comment - 使用pchembl_value - 归一化值支持跨实验比较
- 批量查询 - 使用操作符处理多个ID
__in
Error Handling
错误处理
python
from chembl_webresource_client.new_client import new_client
mol = new_client.molecule
try:
result = mol.get('INVALID_ID')
except Exception as e:
if '404' in str(e):
print("Compound not found")
elif '503' in str(e):
print("Service unavailable - retry later")
else:
raisepython
from chembl_webresource_client.new_client import new_client
mol = new_client.molecule
try:
result = mol.get('INVALID_ID')
except Exception as e:
if '404' in str(e):
print("未找到该化合物")
elif '503' in str(e):
print("服务不可用 - 请稍后重试")
else:
raiseExternal Links
外部链接
- ChEMBL: https://www.ebi.ac.uk/chembl/
- API Documentation: https://chembl.gitbook.io/chembl-interface-documentation
- Python Client: https://github.com/chembl/chembl_webresource_client