zinc-database

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ZINC Database

ZINC数据库

Overview

概述

ZINC is a freely accessible repository of 230M+ purchasable compounds maintained by UCSF. Search by ZINC ID or SMILES, perform similarity searches, download 3D-ready structures for docking, discover analogs for virtual screening and drug discovery.

ZINC是由加州大学旧金山分校（UCSF）维护的免费可访问化合物库，包含2.3亿+可购买化合物。支持通过ZINC ID或SMILES进行搜索、执行相似性搜索、下载用于分子对接的3D就绪结构，还可发现用于虚拟筛选和药物研发的类似物。

When to Use This Skill

何时使用该技能

This skill should be used when:

Virtual screening: Finding compounds for molecular docking studies
Lead discovery: Identifying commercially-available compounds for drug development
Structure searches: Performing similarity or analog searches by SMILES
Compound retrieval: Looking up molecules by ZINC IDs or supplier codes
Chemical space exploration: Exploring purchasable chemical diversity
Docking studies: Accessing 3D-ready molecular structures
Analog searches: Finding similar compounds based on structural similarity
Supplier queries: Identifying compounds from specific chemical vendors
Random sampling: Obtaining random compound sets for screening

该技能适用于以下场景：

虚拟筛选：为分子对接研究寻找化合物
先导化合物发现：识别可商业化用于药物开发的化合物
结构搜索：通过SMILES执行相似性或类似物搜索
化合物检索：通过ZINC ID或供应商编号查找分子
化学空间探索：探索可购买化合物的多样性
对接研究：获取3D就绪分子结构
类似物搜索：基于结构相似性寻找相似化合物
供应商查询：识别特定化学品供应商提供的化合物
随机采样：获取用于筛选的随机化合物集

Database Versions

数据库版本

ZINC has evolved through multiple versions:

ZINC22 (Current): Largest version with 230+ million purchasable compounds and multi-billion scale make-on-demand compounds
ZINC20: Still maintained, focused on lead-like and drug-like compounds
ZINC15: Predecessor version, legacy but still documented

This skill primarily focuses on ZINC22, the most current and comprehensive version.

ZINC已经历多个版本迭代：

ZINC22（当前版本）：规模最大的版本，包含2.3亿+可购买化合物，以及数十亿级按需合成化合物
ZINC20：仍在维护中，专注于类先导化合物和类药物化合物
ZINC15：前代版本，属于遗留版本但仍有文档支持

本技能主要聚焦于ZINC22，即最新、最全面的版本。

Access Methods

访问方式

Web Interface

Web界面

Primary access point: https://zinc.docking.org/ Interactive searching: https://cartblanche22.docking.org/

主要访问入口：https://zinc.docking.org/ 交互式搜索页面：https://cartblanche22.docking.org/

API Access

API访问

All ZINC22 searches can be performed programmatically via the CartBlanche22 API:

Base URL:

https://cartblanche22.docking.org/

All API endpoints return data in text or JSON format with customizable fields.

所有ZINC22搜索均可通过CartBlanche22 API以编程方式执行：

基础URL：

https://cartblanche22.docking.org/

所有API端点返回文本或JSON格式的数据，支持自定义返回字段。

Core Capabilities

核心功能

1. Search by ZINC ID

1. 通过ZINC ID搜索

Retrieve specific compounds using their ZINC identifiers.

Web interface: https://cartblanche22.docking.org/search/zincid

API endpoint:

bash

curl "https://cartblanche22.docking.org/[email protected]_fields=smiles,zinc_id"

Multiple IDs:

bash

curl "https://cartblanche22.docking.org/substances.txt:zinc_id=ZINC000000000001,ZINC000000000002&output_fields=smiles,zinc_id,tranche"

Response fields:

zinc_id

smiles

sub_id

supplier_code

catalogs

tranche

(includes H-count, LogP, MW, phase)

使用ZINC标识符检索特定化合物。

Web界面：https://cartblanche22.docking.org/search/zincid

API端点：

bash

curl "https://cartblanche22.docking.org/[email protected]_fields=smiles,zinc_id"

多ID检索：

bash

curl "https://cartblanche22.docking.org/substances.txt:zinc_id=ZINC000000000001,ZINC000000000002&output_fields=smiles,zinc_id,tranche"

返回字段：

zinc_id

、

smiles

、

sub_id

、

supplier_code

、

catalogs

、

tranche

（包含氢键供体数量、LogP、分子量、反应活性阶段）

2. Search by SMILES

2. 通过SMILES搜索

Find compounds by chemical structure using SMILES notation, with optional distance parameters for analog searching.

Web interface: https://cartblanche22.docking.org/search/smiles

API endpoint:

bash

curl "https://cartblanche22.docking.org/[email protected]=4-Fadist=4"

Parameters:

```
smiles
```
: Query SMILES string (URL-encoded if necessary)
```
dist
```
: Tanimoto distance threshold (default: 0 for exact match)
```
adist
```
: Alternative distance parameter for broader searches (default: 0)
```
output_fields
```
: Comma-separated list of desired output fields

Example - Exact match:

bash

curl "https://cartblanche22.docking.org/smiles.txt:smiles=c1ccccc1"

Example - Similarity search:

bash

curl "https://cartblanche22.docking.org/smiles.txt:smiles=c1ccccc1&dist=3&output_fields=zinc_id,smiles,tranche"

通过SMILES符号按化学结构查找化合物，可设置可选的距离参数进行类似物搜索。

Web界面：https://cartblanche22.docking.org/search/smiles

API端点：

bash

curl "https://cartblanche22.docking.org/[email protected]=4-Fadist=4"

参数说明：

```
smiles
```
：查询用SMILES字符串（必要时需进行URL编码）
```
dist
```
：Tanimoto距离阈值（默认值：0，即精确匹配）
```
adist
```
：用于更广泛搜索的备选距离参数（默认值：0）
```
output_fields
```
：所需返回字段的逗号分隔列表

示例 - 精确匹配：

bash

curl "https://cartblanche22.docking.org/smiles.txt:smiles=c1ccccc1"

示例 - 相似性搜索：

bash

curl "https://cartblanche22.docking.org/smiles.txt:smiles=c1ccccc1&dist=3&output_fields=zinc_id,smiles,tranche"

3. Search by Supplier Codes

3. 通过供应商编号搜索

Query compounds from specific chemical suppliers or retrieve all molecules from particular catalogs.

Web interface: https://cartblanche22.docking.org/search/catitems

API endpoint:

bash

curl "https://cartblanche22.docking.org/catitems.txt:catitem_id=SUPPLIER-CODE-123"

Use cases:

Verify compound availability from specific vendors
Retrieve all compounds from a catalog
Cross-reference supplier codes with ZINC IDs

查询特定化学品供应商提供的化合物，或检索特定目录中的所有分子。

Web界面：https://cartblanche22.docking.org/search/catitems

API端点：

bash

curl "https://cartblanche22.docking.org/catitems.txt:catitem_id=SUPPLIER-CODE-123"

适用场景：

验证特定供应商的化合物可用性
检索某一目录中的所有化合物
将供应商编号与ZINC ID进行交叉引用

4. Random Compound Sampling

4. 随机化合物采样

Generate random compound sets for screening or benchmarking purposes.

Web interface: https://cartblanche22.docking.org/search/random

API endpoint:

bash

curl "https://cartblanche22.docking.org/substance/random.txt:count=100"

Parameters:

```
count
```
: Number of random compounds to retrieve (default: 100)
```
subset
```
: Filter by subset (e.g., 'lead-like', 'drug-like', 'fragment')
```
output_fields
```
: Customize returned data fields

Example - Random lead-like molecules:

bash

curl "https://cartblanche22.docking.org/substance/random.txt:count=1000&subset=lead-like&output_fields=zinc_id,smiles,tranche"

生成用于筛选或基准测试的随机化合物集。

Web界面：https://cartblanche22.docking.org/search/random

API端点：

bash

curl "https://cartblanche22.docking.org/substance/random.txt:count=100"

参数说明：

```
count
```
：要检索的随机化合物数量（默认值：100）
```
subset
```
：按子集筛选（例如：'lead-like'、'drug-like'、'fragment'）
```
output_fields
```
：自定义返回数据字段

示例 - 随机类先导化合物：

bash

curl "https://cartblanche22.docking.org/substance/random.txt:count=1000&subset=lead-like&output_fields=zinc_id,smiles,tranche"

Common Workflows

常见工作流

Workflow 1: Preparing a Docking Library

工作流1：准备对接库

Define search criteria based on target properties or desired chemical space

Query ZINC22 using appropriate search method:

bash

# Example: Get drug-like compounds with specific LogP and MW
curl "https://cartblanche22.docking.org/substance/random.txt:count=10000&subset=drug-like&output_fields=zinc_id,smiles,tranche" > docking_library.txt

Parse results to extract ZINC IDs and SMILES:

python

import pandas as pd

# Load results
df = pd.read_csv('docking_library.txt', sep='\t')

# Filter by properties in tranche data
# Tranche format: H##P###M###-phase
# H = H-bond donors, P = LogP*10, M = MW

Download 3D structures for docking using ZINC ID or download from file repositories

定义搜索条件：基于目标属性或所需化学空间确定搜索规则

查询ZINC22：使用合适的搜索方法进行查询：

bash

# 示例：获取符合特定LogP和分子量的类药物化合物
curl "https://cartblanche22.docking.org/substance/random.txt:count=10000&subset=drug-like&output_fields=zinc_id,smiles,tranche" > docking_library.txt

解析结果：提取ZINC ID和SMILES：

python

import pandas as pd

# 加载结果
df = pd.read_csv('docking_library.txt', sep='\t')

# 根据tranche数据筛选属性
# Tranche格式：H##P###M###-phase
# H = 氢键供体数量, P = LogP*10, M = 分子量

下载3D结构：通过ZINC ID或从文件存储库下载用于对接的3D结构

Workflow 2: Finding Analogs of a Hit Compound

工作流2：寻找命中化合物的类似物

Obtain SMILES of the hit compound:

python

hit_smiles = "CC(C)Cc1ccc(cc1)C(C)C(=O)O"  # Example: Ibuprofen

Perform similarity search with distance threshold:

bash

curl "https://cartblanche22.docking.org/smiles.txt:smiles=CC(C)Cc1ccc(cc1)C(C)C(=O)O&dist=5&output_fields=zinc_id,smiles,catalogs" > analogs.txt

Analyze results to identify purchasable analogs:

python

import pandas as pd

analogs = pd.read_csv('analogs.txt', sep='\t')
print(f"Found {len(analogs)} analogs")
print(analogs[['zinc_id', 'smiles', 'catalogs']].head(10))

Retrieve 3D structures for the most promising analogs

获取命中化合物的SMILES：

python

hit_smiles = "CC(C)Cc1ccc(cc1)C(C)C(=O)O"  # 示例：布洛芬

执行相似性搜索：设置距离阈值：

bash

curl "https://cartblanche22.docking.org/smiles.txt:smiles=CC(C)Cc1ccc(cc1)C(C)C(=O)O&dist=5&output_fields=zinc_id,smiles,catalogs" > analogs.txt

分析结果：识别可购买的类似物：

python

import pandas as pd

analogs = pd.read_csv('analogs.txt', sep='\t')
print(f"找到 {len(analogs)} 种类似物")
print(analogs[['zinc_id', 'smiles', 'catalogs']].head(10))

检索3D结构：获取最具潜力的类似物的3D结构

Workflow 3: Batch Compound Retrieval

工作流3：批量化合物检索

Compile list of ZINC IDs from literature, databases, or previous screens:

python

zinc_ids = [
    "ZINC000000000001",
    "ZINC000000000002",
    "ZINC000000000003"
]
zinc_ids_str = ",".join(zinc_ids)

Query ZINC22 API:

bash

curl "https://cartblanche22.docking.org/substances.txt:zinc_id=ZINC000000000001,ZINC000000000002&output_fields=zinc_id,smiles,supplier_code,catalogs"

Process results for downstream analysis or purchasing

编译ZINC ID列表：从文献、数据库或之前的筛选结果中整理：

python

zinc_ids = [
    "ZINC000000000001",
    "ZINC000000000002",
    "ZINC000000000003"
]
zinc_ids_str = ",".join(zinc_ids)

查询ZINC22 API：

bash

curl "https://cartblanche22.docking.org/substances.txt:zinc_id=ZINC000000000001,ZINC000000000002&output_fields=zinc_id,smiles,supplier_code,catalogs"

处理结果：用于后续分析或采购

Workflow 4: Chemical Space Sampling

工作流4：化学空间采样

Select subset parameters based on screening goals:
- Fragment: MW < 250, good for fragment-based drug discovery
- Lead-like: MW 250-350, LogP ≤ 3.5
- Drug-like: MW 350-500, follows Lipinski's Rule of Five

Generate random sample:

bash

curl "https://cartblanche22.docking.org/substance/random.txt:count=5000&subset=lead-like&output_fields=zinc_id,smiles,tranche" > chemical_space_sample.txt

Analyze chemical diversity and prepare for virtual screening

选择子集参数：基于筛选目标确定：
- 片段化合物（Fragment）：分子量<250，适用于基于片段的药物研发
- 类先导化合物（Lead-like）：分子量250-350，LogP ≤ 3.5
- 类药物化合物（Drug-like）：分子量350-500，符合Lipinski五规则

生成随机样本：

bash

curl "https://cartblanche22.docking.org/substance/random.txt:count=5000&subset=lead-like&output_fields=zinc_id,smiles,tranche" > chemical_space_sample.txt

分析化学多样性：为虚拟筛选做准备

Output Fields

返回字段

Customize API responses with the

output_fields

parameter:

Available fields:

```
zinc_id
```
: ZINC identifier
```
smiles
```
: SMILES string representation
```
sub_id
```
: Internal substance ID
```
supplier_code
```
: Vendor catalog number
```
catalogs
```
: List of suppliers offering the compound
```
tranche
```
: Encoded molecular properties (H-count, LogP, MW, reactivity phase)

Example:

bash

curl "https://cartblanche22.docking.org/substances.txt:zinc_id=ZINC000000000001&output_fields=zinc_id,smiles,catalogs,tranche"

可通过

output_fields

参数自定义API响应内容：

可用字段：

```
zinc_id
```
：ZINC标识符
```
smiles
```
：SMILES字符串表示
```
sub_id
```
：内部物质ID
```
supplier_code
```
：供应商目录编号
```
catalogs
```
：提供该化合物的供应商列表
```
tranche
```
：编码的分子属性（氢键供体数量、LogP、分子量、反应活性阶段）

示例：

bash

curl "https://cartblanche22.docking.org/substances.txt:zinc_id=ZINC000000000001&output_fields=zinc_id,smiles,catalogs,tranche"

Tranche System

Tranche系统

ZINC organizes compounds into "tranches" based on molecular properties:

Format:

H##P###M###-phase

H##: Number of hydrogen bond donors (00-99)
P###: LogP × 10 (e.g., P035 = LogP 3.5)
M###: Molecular weight in Daltons (e.g., M400 = 400 Da)
phase: Reactivity classification

Example tranche:

H05P035M400-0

5 H-bond donors
LogP = 3.5
MW = 400 Da
Reactivity phase 0

Use tranche data to filter compounds by drug-likeness criteria.

ZINC根据分子属性将化合物划分为不同的"tranche"（组）：

格式：

H##P###M###-phase

H##：氢键供体数量（00-99）
P###：LogP × 10（例如：P035 = LogP 3.5）
M###：分子量（单位：道尔顿，例如：M400 = 400 Da）
phase：反应活性分类

示例tranche：

H05P035M400-0

5个氢键供体
LogP = 3.5
分子量 = 400 Da
反应活性阶段0

可利用tranche数据根据类药物标准筛选化合物。

Downloading 3D Structures

下载3D结构

For molecular docking, 3D structures are available via file repositories:

File repository: https://files.docking.org/zinc22/

Structures are organized by tranches and available in multiple formats:

MOL2: Multi-molecule format with 3D coordinates
SDF: Structure-data file format
DB2.GZ: Compressed database format for DOCK

Refer to ZINC documentation at https://wiki.docking.org for downloading protocols and batch access methods.

用于分子对接的3D结构可通过文件存储库获取：

文件存储库：https://files.docking.org/zinc22/

结构按tranche组织，支持多种格式：

MOL2：包含3D坐标的多分子格式
SDF：结构数据文件格式
DB2.GZ：适用于DOCK的压缩数据库格式

有关下载协议和批量访问方法，请参考ZINC文档：https://wiki.docking.org

Python Integration

Python集成

Using curl with Python

结合curl与Python使用

python

import subprocess
import json

def query_zinc_by_id(zinc_id, output_fields="zinc_id,smiles,catalogs"):
    """Query ZINC22 by ZINC ID."""
    url = f"https://cartblanche22.docking.org/[email protected]_id={zinc_id}&output_fields={output_fields}"
    result = subprocess.run(['curl', url], capture_output=True, text=True)
    return result.stdout

def search_by_smiles(smiles, dist=0, adist=0, output_fields="zinc_id,smiles"):
    """Search ZINC22 by SMILES with optional distance parameters."""
    url = f"https://cartblanche22.docking.org/smiles.txt:smiles={smiles}&dist={dist}&adist={adist}&output_fields={output_fields}"
    result = subprocess.run(['curl', url], capture_output=True, text=True)
    return result.stdout

def get_random_compounds(count=100, subset=None, output_fields="zinc_id,smiles,tranche"):
    """Get random compounds from ZINC22."""
    url = f"https://cartblanche22.docking.org/substance/random.txt:count={count}&output_fields={output_fields}"
    if subset:
        url += f"&subset={subset}"
    result = subprocess.run(['curl', url], capture_output=True, text=True)
    return result.stdout

python

import subprocess
import json

def query_zinc_by_id(zinc_id, output_fields="zinc_id,smiles,catalogs"):
    """Query ZINC22 by ZINC ID."""
    url = f"https://cartblanche22.docking.org/[email protected]_id={zinc_id}&output_fields={output_fields}"
    result = subprocess.run(['curl', url], capture_output=True, text=True)
    return result.stdout

def search_by_smiles(smiles, dist=0, adist=0, output_fields="zinc_id,smiles"):
    """Search ZINC22 by SMILES with optional distance parameters."""
    url = f"https://cartblanche22.docking.org/smiles.txt:smiles={smiles}&dist={dist}&adist={adist}&output_fields={output_fields}"
    result = subprocess.run(['curl', url], capture_output=True, text=True)
    return result.stdout

def get_random_compounds(count=100, subset=None, output_fields="zinc_id,smiles,tranche"):
    """Get random compounds from ZINC22."""
    url = f"https://cartblanche22.docking.org/substance/random.txt:count={count}&output_fields={output_fields}"
    if subset:
        url += f"&subset={subset}"
    result = subprocess.run(['curl', url], capture_output=True, text=True)
    return result.stdout

Parsing Results

解析结果

python

import pandas as pd
from io import StringIO

python

import pandas as pd
from io import StringIO

Query ZINC and parse as DataFrame

result = query_zinc_by_id("ZINC000000000001") df = pd.read_csv(StringIO(result), sep='\t')

Extract tranche properties

def parse_tranche(tranche_str): """Parse ZINC tranche code to extract properties.""" # Format: H##P###M###-phase import re match = re.match(r'H(\d+)P(\d+)M(\d+)-(\d+)', tranche_str) if match: return { 'h_donors': int(match.group(1)), 'logP': int(match.group(2)) / 10.0, 'mw': int(match.group(3)), 'phase': int(match.group(4)) } return None

df['tranche_props'] = df['tranche'].apply(parse_tranche)

undefined

df['tranche_props'] = df['tranche'].apply(parse_tranche)

undefined

Best Practices

最佳实践

Query Optimization

查询优化

Start specific: Begin with exact searches before expanding to similarity searches
Use appropriate distance parameters: Small dist values (1-3) for close analogs, larger (5-10) for diverse analogs
Limit output fields: Request only necessary fields to reduce data transfer
Batch queries: Combine multiple ZINC IDs in a single API call when possible

从精确搜索开始：先进行精确搜索，再扩展到相似性搜索
使用合适的距离参数：小dist值（1-3）用于查找相近类似物，大值（5-10）用于获取多样化类似物
限制返回字段：仅请求必要字段以减少数据传输量
批量查询：可能的话，在单个API调用中组合多个ZINC ID

Performance Considerations

性能考量

Rate limiting: Respect server resources; avoid rapid consecutive requests
Caching: Store frequently accessed compounds locally
Parallel downloads: When downloading 3D structures, use parallel wget or aria2c for file repositories
Subset filtering: Use lead-like, drug-like, or fragment subsets to reduce search space

速率限制：尊重服务器资源；避免连续快速请求
缓存：将频繁访问的化合物存储在本地
并行下载：下载3D结构时，对文件存储库使用并行wget或aria2c
子集筛选：使用类先导、类药物或片段子集缩小搜索范围

Data Quality

数据质量

Verify availability: Supplier catalogs change; confirm compound availability before large orders
Check stereochemistry: SMILES may not fully specify stereochemistry; verify 3D structures
Validate structures: Use cheminformatics tools (RDKit, OpenBabel) to verify structure validity
Cross-reference: When possible, cross-check with other databases (PubChem, ChEMBL)

验证可用性：供应商目录会变化；大量订购前请确认化合物可用性
检查立体化学：SMILES可能无法完全指定立体化学；请验证3D结构
验证结构有效性：使用 cheminformatics 工具（RDKit、OpenBabel）验证结构有效性
交叉引用：可能的话，与其他数据库（PubChem、ChEMBL）进行交叉验证

Resources

资源

references/api_reference.md

Comprehensive documentation including:

Complete API endpoint reference
URL syntax and parameter specifications
Advanced query patterns and examples
File repository organization and access
Bulk download methods
Error handling and troubleshooting
Integration with molecular docking software

Consult this document for detailed technical information and advanced usage patterns.

包含以下内容的综合文档：

完整的API端点参考
URL语法和参数规范
高级查询模式和示例
文件存储库组织与访问方式
批量下载方法
错误处理与故障排除
与分子对接软件的集成

如需详细技术信息和高级使用模式，请参考该文档。

Important Disclaimers

重要免责声明

Data Reliability

数据可靠性

ZINC explicitly states: "We do not guarantee the quality of any molecule for any purpose and take no responsibility for errors arising from the use of this database."

Compound availability may change without notice
Structure representations may contain errors
Supplier information should be verified independently
Use appropriate validation before experimental work

ZINC明确声明："我们不保证任何分子适用于任何用途的质量，对因使用本数据库产生的错误不承担任何责任。"

化合物可用性可能会随时变更，恕不另行通知
结构表示可能存在错误
供应商信息需独立验证
实验工作前请进行适当验证

Appropriate Use

合理使用

ZINC is intended for academic and research purposes in drug discovery
Verify licensing terms for commercial use
Respect intellectual property when working with patented compounds
Follow your institution's guidelines for compound procurement

ZINC旨在用于药物研发的学术和研究场景
商业使用请验证许可条款
处理专利化合物时请尊重知识产权
化合物采购请遵循所在机构的指导方针

Additional Resources

其他资源

ZINC Website: https://zinc.docking.org/
CartBlanche22 Interface: https://cartblanche22.docking.org/
ZINC Wiki: https://wiki.docking.org/
File Repository: https://files.docking.org/zinc22/
GitHub: https://github.com/docking-org/
Primary Publication: Irwin et al., J. Chem. Inf. Model 2020 (ZINC15)
ZINC22 Publication: Irwin et al., J. Chem. Inf. Model 2023

ZINC官网：https://zinc.docking.org/
CartBlanche22界面：https://cartblanche22.docking.org/
ZINC维基：https://wiki.docking.org/
文件存储库：https://files.docking.org/zinc22/
GitHub：https://github.com/docking-org/
主要出版物：Irwin等人，《Journal of Chemical Information and Modeling》2020（ZINC15）
ZINC22出版物：Irwin等人，《Journal of Chemical Information and Modeling》2023

Citations

引用规范

When using ZINC in publications, cite the appropriate version:

ZINC22: Irwin, J. J., et al. "ZINC22—A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery." Journal of Chemical Information and Modeling 2023.

ZINC15: Irwin, J. J., et al. "ZINC15 – Ligand Discovery for Everyone." Journal of Chemical Information and Modeling 2020, 60, 6065–6073.

在出版物中使用ZINC时，请引用对应版本：

ZINC22: Irwin, J. J., et al. "ZINC22—A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery." Journal of Chemical Information and Modeling 2023.

ZINC15: Irwin, J. J., et al. "ZINC15 – Ligand Discovery for Everyone." Journal of Chemical Information and Modeling 2020, 60, 6065–6073.