chembl-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseChEMBL Database
ChEMBL数据库
Overview
概述
ChEMBL is a manually curated database of bioactive molecules maintained by the European Bioinformatics Institute (EBI), containing over 2 million compounds, 19 million bioactivity measurements, 13,000+ drug targets, and data on approved drugs and clinical candidates. Access and query this data programmatically using the ChEMBL Python client for drug discovery and medicinal chemistry research.
ChEMBL是由欧洲生物信息研究所(EBI)维护的人工整理的生物活性分子数据库,包含超过200万种化合物、1900万条生物活性测量数据、13000多个药物靶点,以及已获批药物和临床候选药物的数据。可通过ChEMBL Python客户端以编程方式访问和查询这些数据,用于药物发现和药物化学研究。
When to Use This Skill
何时使用该技能
This skill should be used when:
- Compound searches: Finding molecules by name, structure, or properties
- Target information: Retrieving data about proteins, enzymes, or biological targets
- Bioactivity data: Querying IC50, Ki, EC50, or other activity measurements
- Drug information: Looking up approved drugs, mechanisms, or indications
- Structure searches: Performing similarity or substructure searches
- Cheminformatics: Analyzing molecular properties and drug-likeness
- Target-ligand relationships: Exploring compound-target interactions
- Drug discovery: Identifying inhibitors, agonists, or bioactive molecules
在以下场景中可使用本技能:
- 化合物搜索:通过名称、结构或属性查找分子
- 靶点信息:检索蛋白质、酶或生物靶点的数据
- 生物活性数据:查询IC50、Ki、EC50或其他活性测量数据
- 药物信息:查找已获批药物的作用机制或适应症
- 结构搜索:进行相似性或子结构搜索
- 化学信息学:分析分子属性和类药性
- 靶点-配体关系:探索化合物与靶点的相互作用
- 药物发现:识别抑制剂、激动剂或生物活性分子
Installation and Setup
安装与设置
Python Client
Python客户端
The ChEMBL Python client is required for programmatic access:
bash
uv pip install chembl_webresource_client以编程方式访问需要使用ChEMBL Python客户端:
bash
uv pip install chembl_webresource_clientBasic Usage Pattern
基本使用模式
python
from chembl_webresource_client.new_client import new_clientpython
from chembl_webresource_client.new_client import new_clientAccess different endpoints
访问不同的端点
molecule = new_client.molecule
target = new_client.target
activity = new_client.activity
drug = new_client.drug
undefinedmolecule = new_client.molecule
target = new_client.target
activity = new_client.activity
drug = new_client.drug
undefinedCore Capabilities
核心功能
1. Molecule Queries
1. 分子查询
Retrieve by ChEMBL ID:
python
molecule = new_client.molecule
aspirin = molecule.get('CHEMBL25')Search by name:
python
results = molecule.filter(pref_name__icontains='aspirin')Filter by properties:
python
undefined通过ChEMBL ID检索:
python
molecule = new_client.molecule
aspirin = molecule.get('CHEMBL25')通过名称搜索:
python
results = molecule.filter(pref_name__icontains='aspirin')通过属性过滤:
python
undefinedFind small molecules (MW <= 500) with favorable LogP
查找分子量(MW)≤500且LogP值理想的小分子
results = molecule.filter(
molecule_properties__mw_freebase__lte=500,
molecule_properties__alogp__lte=5
)
undefinedresults = molecule.filter(
molecule_properties__mw_freebase__lte=500,
molecule_properties__alogp__lte=5
)
undefined2. Target Queries
2. 靶点查询
Retrieve target information:
python
target = new_client.target
egfr = target.get('CHEMBL203')Search for specific target types:
python
undefined检索靶点信息:
python
target = new_client.target
egfr = target.get('CHEMBL203')搜索特定类型的靶点:
python
undefinedFind all kinase targets
查找所有激酶靶点
kinases = target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
undefinedkinases = target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
undefined3. Bioactivity Data
3. 生物活性数据
Query activities for a target:
python
activity = new_client.activity查询靶点的活性数据:
python
activity = new_client.activityFind potent EGFR inhibitors
查找强效EGFR抑制剂
results = activity.filter(
target_chembl_id='CHEMBL203',
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
**Get all activities for a compound:**
```python
compound_activities = activity.filter(
molecule_chembl_id='CHEMBL25',
pchembl_value__isnull=False
)results = activity.filter(
target_chembl_id='CHEMBL203',
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
**获取化合物的所有活性数据:**
```python
compound_activities = activity.filter(
molecule_chembl_id='CHEMBL25',
pchembl_value__isnull=False
)4. Structure-Based Searches
4. 基于结构的搜索
Similarity search:
python
similarity = new_client.similarity相似性搜索:
python
similarity = new_client.similarityFind compounds similar to aspirin
查找与阿司匹林相似的化合物
similar = similarity.filter(
smiles='CC(=O)Oc1ccccc1C(=O)O',
similarity=85 # 85% similarity threshold
)
**Substructure search:**
```python
substructure = new_client.substructuresimilar = similarity.filter(
smiles='CC(=O)Oc1ccccc1C(=O)O',
similarity=85 # 85%相似性阈值
)
**子结构搜索:**
```python
substructure = new_client.substructureFind compounds containing benzene ring
查找包含苯环的化合物
results = substructure.filter(smiles='c1ccccc1')
undefinedresults = substructure.filter(smiles='c1ccccc1')
undefined5. Drug Information
5. 药物信息
Retrieve drug data:
python
drug = new_client.drug
drug_info = drug.get('CHEMBL25')Get mechanisms of action:
python
mechanism = new_client.mechanism
mechanisms = mechanism.filter(molecule_chembl_id='CHEMBL25')Query drug indications:
python
drug_indication = new_client.drug_indication
indications = drug_indication.filter(molecule_chembl_id='CHEMBL25')检索药物数据:
python
drug = new_client.drug
drug_info = drug.get('CHEMBL25')获取作用机制:
python
mechanism = new_client.mechanism
mechanisms = mechanism.filter(molecule_chembl_id='CHEMBL25')查询药物适应症:
python
drug_indication = new_client.drug_indication
indications = drug_indication.filter(molecule_chembl_id='CHEMBL25')Query Workflow
查询工作流
Workflow 1: Finding Inhibitors for a Target
工作流1:为靶点寻找抑制剂
-
Identify the target by searching by name:python
targets = new_client.target.filter(pref_name__icontains='EGFR') target_id = targets[0]['target_chembl_id'] -
Query bioactivity data for that target:python
activities = new_client.activity.filter( target_chembl_id=target_id, standard_type='IC50', standard_value__lte=100 ) -
Extract compound IDs and retrieve details:python
compound_ids = [act['molecule_chembl_id'] for act in activities] compounds = [new_client.molecule.get(cid) for cid in compound_ids]
-
通过名称搜索确定靶点:python
targets = new_client.target.filter(pref_name__icontains='EGFR') target_id = targets[0]['target_chembl_id'] -
查询该靶点的生物活性数据:python
activities = new_client.activity.filter( target_chembl_id=target_id, standard_type='IC50', standard_value__lte=100 ) -
提取化合物ID并检索详情:python
compound_ids = [act['molecule_chembl_id'] for act in activities] compounds = [new_client.molecule.get(cid) for cid in compound_ids]
Workflow 2: Analyzing a Known Drug
工作流2:分析已知药物
-
Get drug information:python
drug_info = new_client.drug.get('CHEMBL1234') -
Retrieve mechanisms:python
mechanisms = new_client.mechanism.filter(molecule_chembl_id='CHEMBL1234') -
Find all bioactivities:python
activities = new_client.activity.filter(molecule_chembl_id='CHEMBL1234')
-
获取药物信息:python
drug_info = new_client.drug.get('CHEMBL1234') -
检索作用机制:python
mechanisms = new_client.mechanism.filter(molecule_chembl_id='CHEMBL1234') -
查找所有活性数据:python
activities = new_client.activity.filter(molecule_chembl_id='CHEMBL1234')
Workflow 3: Structure-Activity Relationship (SAR) Study
工作流3:构效关系(SAR)研究
-
Find similar compounds:python
similar = new_client.similarity.filter(smiles='query_smiles', similarity=80) -
Get activities for each compound:python
for compound in similar: activities = new_client.activity.filter( molecule_chembl_id=compound['molecule_chembl_id'] ) -
Analyze property-activity relationships using molecular properties from results.
-
查找相似化合物:python
similar = new_client.similarity.filter(smiles='query_smiles', similarity=80) -
获取每个化合物的活性数据:python
for compound in similar: activities = new_client.activity.filter( molecule_chembl_id=compound['molecule_chembl_id'] ) -
利用结果中的分子属性分析构效关系。
Filter Operators
过滤操作符
ChEMBL supports Django-style query filters:
- - Exact match
__exact - - Case-insensitive exact match
__iexact - /
__contains- Substring matching__icontains - /
__startswith- Prefix/suffix matching__endswith - ,
__gt,__gte,__lt- Numeric comparisons__lte - - Value in range
__range - - Value in list
__in - - Null/not null check
__isnull
ChEMBL支持类Django的查询过滤操作符:
- - 精确匹配
__exact - - 不区分大小写的精确匹配
__iexact - /
__contains- 子串匹配__icontains - /
__startswith- 前缀/后缀匹配__endswith - ,
__gt,__gte,__lt- 数值比较__lte - - 范围值匹配
__range - - 列表值匹配
__in - - 空值/非空值检查
__isnull
Data Export and Analysis
数据导出与分析
Convert results to pandas DataFrame for analysis:
python
import pandas as pd
activities = new_client.activity.filter(target_chembl_id='CHEMBL203')
df = pd.DataFrame(list(activities))将结果转换为pandas DataFrame以便分析:
python
import pandas as pd
activities = new_client.activity.filter(target_chembl_id='CHEMBL203')
df = pd.DataFrame(list(activities))Analyze results
分析结果
print(df['standard_value'].describe())
print(df.groupby('standard_type').size())
undefinedprint(df['standard_value'].describe())
print(df.groupby('standard_type').size())
undefinedPerformance Optimization
性能优化
Caching
缓存
The client automatically caches results for 24 hours. Configure caching:
python
from chembl_webresource_client.settings import Settings客户端会自动将结果缓存24小时。可配置缓存:
python
from chembl_webresource_client.settings import SettingsDisable caching
禁用缓存
Settings.Instance().CACHING = False
Settings.Instance().CACHING = False
Adjust cache expiration (seconds)
调整缓存过期时间(秒)
Settings.Instance().CACHE_EXPIRE = 86400
undefinedSettings.Instance().CACHE_EXPIRE = 86400
undefinedLazy Evaluation
延迟计算
Queries execute only when data is accessed. Convert to list to force execution:
python
undefined查询仅在访问数据时执行。转换为列表可强制执行查询:
python
undefinedQuery is not executed yet
查询尚未执行
results = molecule.filter(pref_name__icontains='aspirin')
results = molecule.filter(pref_name__icontains='aspirin')
Force execution
强制执行
results_list = list(results)
undefinedresults_list = list(results)
undefinedPagination
分页
Results are paginated automatically. Iterate through all results:
python
for activity in new_client.activity.filter(target_chembl_id='CHEMBL203'):
# Process each activity
print(activity['molecule_chembl_id'])结果会自动分页。可遍历所有结果:
python
for activity in new_client.activity.filter(target_chembl_id='CHEMBL203'):
# 处理每个活性数据
print(activity['molecule_chembl_id'])Common Use Cases
常见用例
Find Kinase Inhibitors
寻找激酶抑制剂
python
undefinedpython
undefinedIdentify kinase targets
识别激酶靶点
kinases = new_client.target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
kinases = new_client.target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
Get potent inhibitors
获取强效抑制剂
for kinase in kinases[:5]: # First 5 kinases
activities = new_client.activity.filter(
target_chembl_id=kinase['target_chembl_id'],
standard_type='IC50',
standard_value__lte=50
)
undefinedfor kinase in kinases[:5]: # 前5个激酶
activities = new_client.activity.filter(
target_chembl_id=kinase['target_chembl_id'],
standard_type='IC50',
standard_value__lte=50
)
undefinedExplore Drug Repurposing
探索药物重定位
python
undefinedpython
undefinedGet approved drugs
获取已获批药物
drugs = new_client.drug.filter()
drugs = new_client.drug.filter()
For each drug, find all targets
为每种药物查找所有靶点
for drug in drugs[:10]:
mechanisms = new_client.mechanism.filter(
molecule_chembl_id=drug['molecule_chembl_id']
)
undefinedfor drug in drugs[:10]:
mechanisms = new_client.mechanism.filter(
molecule_chembl_id=drug['molecule_chembl_id']
)
undefinedVirtual Screening
虚拟筛选
python
undefinedpython
undefinedFind compounds with desired properties
查找具有理想属性的化合物
candidates = new_client.molecule.filter(
molecule_properties__mw_freebase__range=[300, 500],
molecule_properties__alogp__lte=5,
molecule_properties__hba__lte=10,
molecule_properties__hbd__lte=5
)
undefinedcandidates = new_client.molecule.filter(
molecule_properties__mw_freebase__range=[300, 500],
molecule_properties__alogp__lte=5,
molecule_properties__hba__lte=10,
molecule_properties__hbd__lte=5
)
undefinedResources
资源
scripts/example_queries.py
scripts/example_queries.py
Ready-to-use Python functions demonstrating common ChEMBL query patterns:
- - Retrieve molecule details by ID
get_molecule_info() - - Name-based molecule search
search_molecules_by_name() - - Property-based filtering
find_molecules_by_properties() - - Query bioactivities for targets
get_bioactivity_data() - - Similarity searching
find_similar_compounds() - - Substructure matching
substructure_search() - - Retrieve drug information
get_drug_info() - - Specialized kinase inhibitor search
find_kinase_inhibitors() - - Convert results to pandas DataFrame
export_to_dataframe()
Consult this script for implementation details and usage examples.
包含可直接使用的Python函数,展示常见的ChEMBL查询模式:
- - 通过ID检索分子详情
get_molecule_info() - - 基于名称的分子搜索
search_molecules_by_name() - - 基于属性的筛选
find_molecules_by_properties() - - 查询靶点的生物活性数据
get_bioactivity_data() - - 相似性搜索
find_similar_compounds() - - 子结构匹配
substructure_search() - - 检索药物信息
get_drug_info() - - 激酶抑制剂专项搜索
find_kinase_inhibitors() - - 将结果转换为pandas DataFrame
export_to_dataframe()
可参考该脚本了解实现细节和使用示例。
references/api_reference.md
references/api_reference.md
Comprehensive API documentation including:
- Complete endpoint listing (molecule, target, activity, assay, drug, etc.)
- All filter operators and query patterns
- Molecular properties and bioactivity fields
- Advanced query examples
- Configuration and performance tuning
- Error handling and rate limiting
Refer to this document when detailed API information is needed or when troubleshooting queries.
全面的API文档,包括:
- 完整的端点列表(molecule、target、activity、assay、drug等)
- 所有过滤操作符和查询模式
- 分子属性和生物活性字段
- 高级查询示例
- 配置与性能调优
- 错误处理与速率限制
当需要详细API信息或排查查询问题时,可参考该文档。
Important Notes
重要说明
Data Reliability
数据可靠性
- ChEMBL data is manually curated but may contain inconsistencies
- Always check field in activity records
data_validity_comment - Be aware of flags
potential_duplicate
- ChEMBL数据经过人工整理,但可能存在不一致性
- 请务必检查活性记录中的字段
data_validity_comment - 注意标记
potential_duplicate
Units and Standards
单位与标准
- Bioactivity values use standard units (nM, uM, etc.)
- provides normalized activity (-log scale)
pchembl_value - Check to understand measurement type (IC50, Ki, EC50, etc.)
standard_type
- 生物活性值使用标准单位(nM、uM等)
- 提供标准化的活性值(-log刻度)
pchembl_value - 请检查以了解测量类型(IC50、Ki、EC50等)
standard_type
Rate Limiting
速率限制
- Respect ChEMBL's fair usage policies
- Use caching to minimize repeated requests
- Consider bulk downloads for large datasets
- Avoid hammering the API with rapid consecutive requests
- 遵守ChEMBL的合理使用政策
- 使用缓存减少重复请求
- 对于大型数据集,考虑批量下载
- 避免频繁连续请求API
Chemical Structure Formats
化学结构格式
- SMILES strings are the primary structure format
- InChI keys available for compounds
- SVG images can be generated via the image endpoint
- SMILES字符串是主要的结构格式
- 化合物提供InChI键
- 可通过图像端点生成SVG图像
Additional Resources
额外资源
- ChEMBL website: https://www.ebi.ac.uk/chembl/
- API documentation: https://www.ebi.ac.uk/chembl/api/data/docs
- Python client GitHub: https://github.com/chembl/chembl_webresource_client
- Interface documentation: https://chembl.gitbook.io/chembl-interface-documentation/
- Example notebooks: https://github.com/chembl/notebooks