diffdock

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

DiffDock: Molecular Docking with Diffusion Models

DiffDock：基于扩散模型的分子对接

Overview

概述

DiffDock is a diffusion-based deep learning tool for molecular docking that predicts 3D binding poses of small molecule ligands to protein targets. It represents the state-of-the-art in computational docking, crucial for structure-based drug discovery and chemical biology.

Core Capabilities:

Predict ligand binding poses with high accuracy using deep learning
Support protein structures (PDB files) or sequences (via ESMFold)
Process single complexes or batch virtual screening campaigns
Generate confidence scores to assess prediction reliability
Handle diverse ligand inputs (SMILES, SDF, MOL2)

Key Distinction: DiffDock predicts binding poses (3D structure) and confidence (prediction certainty), NOT binding affinity (ΔG, Kd). Always combine with scoring functions (GNINA, MM/GBSA) for affinity assessment.

DiffDock是一款基于扩散模型的深度学习分子对接工具，可预测小分子配体与蛋白靶点的3D结合构象。它代表了计算对接领域的前沿水平，在基于结构的药物发现和化学生物学中至关重要。

核心功能：

利用深度学习高精度预测配体结合构象
支持蛋白结构（PDB文件）或序列（通过ESMFold）
处理单个复合物或批量虚拟筛选任务
生成置信度评分以评估预测可靠性
支持多种配体输入格式（SMILES、SDF、MOL2）

关键区别： DiffDock预测结合构象（3D结构）和置信度（预测确定性），而非结合亲和力（ΔG、Kd）。请始终结合评分函数（如GNINA、MM/GBSA）进行亲和力评估。

When to Use This Skill

适用场景

This skill should be used when:

"Dock this ligand to a protein" or "predict binding pose"
"Run molecular docking" or "perform protein-ligand docking"
"Virtual screening" or "screen compound library"
"Where does this molecule bind?" or "predict binding site"
Structure-based drug design or lead optimization tasks
Tasks involving PDB files + SMILES strings or ligand structures
Batch docking of multiple protein-ligand pairs

当你需要完成以下任务时，可使用本技能：

"将该配体对接至蛋白"或"预测结合构象"
"运行分子对接"或"执行蛋白-配体对接"
"虚拟筛选"或"筛选化合物库"
"该分子结合在何处？"或"预测结合位点"
基于结构的药物设计或先导化合物优化任务
涉及PDB文件+SMILES字符串或配体结构的任务
多组蛋白-配体对的批量对接

Installation and Environment Setup

安装与环境配置

Check Environment Status

检查环境状态

Before proceeding with DiffDock tasks, verify the environment setup:

bash

undefined

在执行DiffDock任务前，请先验证环境配置：

bash

undefined

Use the provided setup checker

使用提供的配置检查脚本

python scripts/setup_check.py


This script validates Python version, PyTorch with CUDA, PyTorch Geometric, RDKit, ESM, and other dependencies.

python scripts/setup_check.py


该脚本会验证Python版本、带CUDA的PyTorch、PyTorch Geometric、RDKit、ESM及其他依赖项。

Installation Options

安装选项

Option 1: Conda (Recommended)

bash

git clone https://github.com/gcorso/DiffDock.git
cd DiffDock
conda env create --file environment.yml
conda activate diffdock

Option 2: Docker

bash

docker pull rbgcsail/diffdock
docker run -it --gpus all --entrypoint /bin/bash rbgcsail/diffdock
micromamba activate diffdock

Important Notes:

GPU strongly recommended (10-100x speedup vs CPU)
First run pre-computes SO(2)/SO(3) lookup tables (~2-5 minutes)
Model checkpoints (~500MB) download automatically if not present

选项1：Conda（推荐）

bash

git clone https://github.com/gcorso/DiffDock.git
cd DiffDock
conda env create --file environment.yml
conda activate diffdock

选项2：Docker

bash

docker pull rbgcsail/diffdock
docker run -it --gpus all --entrypoint /bin/bash rbgcsail/diffdock
micromamba activate diffdock

重要说明：

强烈推荐使用GPU（比CPU快10-100倍）
首次运行会预计算SO(2)/SO(3)查找表（约2-5分钟）
模型 checkpoint（约500MB）若不存在会自动下载

Core Workflows

核心工作流

Workflow 1: Single Protein-Ligand Docking

工作流1：单组蛋白-配体对接

Use Case: Dock one ligand to one protein target

Input Requirements:

Protein: PDB file OR amino acid sequence
Ligand: SMILES string OR structure file (SDF/MOL2)

Command:

bash

python -m inference \
  --config default_inference_args.yaml \
  --protein_path protein.pdb \
  --ligand "CC(=O)Oc1ccccc1C(=O)O" \
  --out_dir results/single_docking/

Alternative (protein sequence):

bash

python -m inference \
  --config default_inference_args.yaml \
  --protein_sequence "MSKGEELFTGVVPILVELDGDVNGHKF..." \
  --ligand ligand.sdf \
  --out_dir results/sequence_docking/

Output Structure:

results/single_docking/
├── rank_1.sdf          # Top-ranked pose
├── rank_2.sdf          # Second-ranked pose
├── ...
├── rank_10.sdf         # 10th pose (default: 10 samples)
└── confidence_scores.txt

适用场景： 将一个配体对接至一个蛋白靶点

输入要求：

蛋白：PDB文件或氨基酸序列
配体：SMILES字符串或结构文件（SDF/MOL2）

命令：

bash

python -m inference \
  --config default_inference_args.yaml \
  --protein_path protein.pdb \
  --ligand "CC(=O)Oc1ccccc1C(=O)O" \
  --out_dir results/single_docking/

替代方案（使用蛋白序列）：

bash

python -m inference \
  --config default_inference_args.yaml \
  --protein_sequence "MSKGEELFTGVVPILVELDGDVNGHKF..." \
  --ligand ligand.sdf \
  --out_dir results/sequence_docking/

输出结构：

results/single_docking/
├── rank_1.sdf          # 排名第一的构象
├── rank_2.sdf          # 排名第二的构象
├── ...
├── rank_10.sdf         # 第10个构象（默认生成10个样本）
└── confidence_scores.txt

Workflow 2: Batch Processing Multiple Complexes

工作流2：多组复合物批量处理

Use Case: Dock multiple ligands to proteins, virtual screening campaigns

Step 1: Prepare Batch CSV

Use the provided script to create or validate batch input:

bash

undefined

适用场景： 将多个配体对接至蛋白，虚拟筛选任务

步骤1：准备批量CSV文件

使用提供的脚本创建或验证批量输入文件：

bash

undefined

Create template

创建模板

python scripts/prepare_batch_csv.py --create --output batch_input.csv

Validate existing CSV

验证现有CSV

python scripts/prepare_batch_csv.py my_input.csv --validate


**CSV Format:**
```csv
complex_name,protein_path,ligand_description,protein_sequence
complex1,protein1.pdb,CC(=O)Oc1ccccc1C(=O)O,
complex2,,COc1ccc(C#N)cc1,MSKGEELFT...
complex3,protein3.pdb,ligand3.sdf,

Required Columns:

```
complex_name
```
: Unique identifier
```
protein_path
```
: PDB file path (leave empty if using sequence)
```
ligand_description
```
: SMILES string or ligand file path
```
protein_sequence
```
: Amino acid sequence (leave empty if using PDB)

Step 2: Run Batch Docking

bash

python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv batch_input.csv \
  --out_dir results/batch/ \
  --batch_size 10

For Large Virtual Screening (>100 compounds):

Pre-compute protein embeddings for faster processing:

bash

undefined

python scripts/prepare_batch_csv.py my_input.csv --validate


**CSV格式：**
```csv
complex_name,protein_path,ligand_description,protein_sequence
complex1,protein1.pdb,CC(=O)Oc1ccccc1C(=O)O,
complex2,,COc1ccc(C#N)cc1,MSKGEELFT...
complex3,protein3.pdb,ligand3.sdf,

必填列：

```
complex_name
```
: 唯一标识符
```
protein_path
```
: PDB文件路径（使用序列时留空）
```
ligand_description
```
: SMILES字符串或配体文件路径
```
protein_sequence
```
: 氨基酸序列（使用PDB时留空）

步骤2：运行批量对接

bash

python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv batch_input.csv \
  --out_dir results/batch/ \
  --batch_size 10

针对大规模虚拟筛选（>100个化合物）：

预计算蛋白嵌入以加快处理速度：

bash

undefined

Pre-compute embeddings

预计算嵌入

python datasets/esm_embedding_preparation.py
--protein_ligand_csv screening_input.csv
--out_file protein_embeddings.pt

Run with pre-computed embeddings

使用预计算的嵌入运行

python -m inference
--config default_inference_args.yaml
--protein_ligand_csv screening_input.csv
--esm_embeddings_path protein_embeddings.pt
--out_dir results/screening/

undefined

python -m inference
--config default_inference_args.yaml
--protein_ligand_csv screening_input.csv
--esm_embeddings_path protein_embeddings.pt
--out_dir results/screening/

undefined

Workflow 3: Analyzing Results

工作流3：结果分析

After docking completes, analyze confidence scores and rank predictions:

bash

undefined

对接完成后，分析置信度评分并对预测结果排序：

bash

undefined

Analyze all results

分析所有结果

python scripts/analyze_results.py results/batch/

Show top 5 per complex

显示每组复合物的前5个结果

python scripts/analyze_results.py results/batch/ --top 5

Filter by confidence threshold

按置信度阈值筛选

python scripts/analyze_results.py results/batch/ --threshold 0.0

Export to CSV

导出至CSV

python scripts/analyze_results.py results/batch/ --export summary.csv

Show top 20 predictions across all complexes

显示所有复合物中排名前20的预测结果

python scripts/analyze_results.py results/batch/ --best 20


The analysis script:
- Parses confidence scores from all predictions
- Classifies as High (>0), Moderate (-1.5 to 0), or Low (<-1.5)
- Ranks predictions within and across complexes
- Generates statistical summaries
- Exports results to CSV for downstream analysis

python scripts/analyze_results.py results/batch/ --best 20


分析脚本功能：
- 解析所有预测结果的置信度评分
- 分为高（>0）、中（-1.5至0）、低（<-1.5）三个等级
- 在组内和组间对预测结果排序
- 生成统计摘要
- 导出结果至CSV以便后续分析

Confidence Score Interpretation

置信度评分解读

Understanding Scores:

Score Range	Confidence Level	Interpretation
> 0	High	Strong prediction, likely accurate
-1.5 to 0	Moderate	Reasonable prediction, validate carefully
< -1.5	Low	Uncertain prediction, requires validation

Critical Notes:

Confidence ≠ Affinity: High confidence means model certainty about structure, NOT strong binding
Context Matters: Adjust expectations for:
- Large ligands (>500 Da): Lower confidence expected
- Multiple protein chains: May decrease confidence
- Novel protein families: May underperform
Multiple Samples: Review top 3-5 predictions, look for consensus

For detailed guidance: Read

references/confidence_and_limitations.md

using the Read tool

评分理解：

评分范围	置信度等级	解读
> 0	高	预测结果可靠，大概率准确
-1.5 至 0	中	预测结果合理，需仔细验证
< -1.5	低	预测结果不确定，需要验证

重要提示：

置信度 ≠ 亲和力：高置信度表示模型对结构的确定性高，而非结合力强
场景影响：根据以下情况调整预期：
- 大配体（>500 Da）：置信度通常较低
- 多链蛋白：可能降低置信度
- 新型蛋白家族：性能可能不佳
多样本参考：查看前3-5个预测结果，寻找共识

详细指导： 使用Read工具阅读

references/confidence_and_limitations.md

Parameter Customization

参数自定义

Using Custom Configuration

使用自定义配置

Create custom configuration for specific use cases:

bash

undefined

针对特定场景创建自定义配置：

bash

undefined

Copy template

复制模板

cp assets/custom_inference_config.yaml my_config.yaml

Edit parameters (see template for presets)

编辑参数（模板中有预设示例）

Then run with custom config

使用自定义配置运行

python -m inference
--config my_config.yaml
--protein_ligand_csv input.csv
--out_dir results/

undefined

python -m inference
--config my_config.yaml
--protein_ligand_csv input.csv
--out_dir results/

undefined

Key Parameters to Adjust

可调整的关键参数

Sampling Density:

```
samples_per_complex: 10
```
→ Increase to 20-40 for difficult cases
More samples = better coverage but longer runtime

Inference Steps:

```
inference_steps: 20
```
→ Increase to 25-30 for higher accuracy
More steps = potentially better quality but slower

Temperature Parameters (control diversity):

```
temp_sampling_tor: 7.04
```
→ Increase for flexible ligands (8-10)
```
temp_sampling_tor: 7.04
```
→ Decrease for rigid ligands (5-6)
Higher temperature = more diverse poses

Presets Available in Template:

High Accuracy: More samples + steps, lower temperature
Fast Screening: Fewer samples, faster
Flexible Ligands: Increased torsion temperature
Rigid Ligands: Decreased torsion temperature

For complete parameter reference: Read

references/parameters_reference.md

using the Read tool

采样密度：

```
samples_per_complex: 10
```
→ 复杂场景下可增加至20-40
样本越多，覆盖越全面，但运行时间越长

推理步数：

```
inference_steps: 20
```
→ 为提高准确性可增加至25-30
步数越多，质量可能越好，但速度越慢

温度参数（控制多样性）：

```
temp_sampling_tor: 7.04
```
→ 柔性配体可提高至8-10
```
temp_sampling_tor: 7.04
```
→ 刚性配体可降低至5-6
温度越高，构象多样性越丰富

模板中的预设配置：

高精度模式：更多样本+步数，低温度
快速筛选模式：更少样本，速度更快
柔性配体模式：提高扭转温度
刚性配体模式：降低扭转温度

完整参数参考： 使用Read工具阅读

references/parameters_reference.md

Advanced Techniques

高级技巧

Ensemble Docking (Protein Flexibility)

集成对接（蛋白柔性处理）

For proteins with known flexibility, dock to multiple conformations:

python

undefined

针对已知存在柔性的蛋白，可对接至多个构象：

python

undefined

Create ensemble CSV

创建集成CSV

import pandas as pd

conformations = ["conf1.pdb", "conf2.pdb", "conf3.pdb"] ligand = "CC(=O)Oc1ccccc1C(=O)O"

data = { "complex_name": [f"ensemble_{i}" for i in range(len(conformations))], "protein_path": conformations, "ligand_description": [ligand] * len(conformations), "protein_sequence": [""] * len(conformations) }

pd.DataFrame(data).to_csv("ensemble_input.csv", index=False)


Run docking with increased sampling:
```bash
python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv ensemble_input.csv \
  --samples_per_complex 20 \
  --out_dir results/ensemble/

import pandas as pd

conformations = ["conf1.pdb", "conf2.pdb", "conf3.pdb"] ligand = "CC(=O)Oc1ccccc1C(=O)O"

pd.DataFrame(data).to_csv("ensemble_input.csv", index=False)


增加采样量运行对接：
```bash
python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv ensemble_input.csv \
  --samples_per_complex 20 \
  --out_dir results/ensemble/

Integration with Scoring Functions

与评分函数集成

DiffDock generates poses; combine with other tools for affinity:

GNINA (Fast neural network scoring):

bash

for pose in results/*.sdf; do
    gnina -r protein.pdb -l "$pose" --score_only
done

MM/GBSA (More accurate, slower): Use AmberTools MMPBSA.py or gmx_MMPBSA after energy minimization

Free Energy Calculations (Most accurate): Use OpenMM + OpenFE or GROMACS for FEP/TI calculations

Recommended Workflow:

DiffDock → Generate poses with confidence scores
Visual inspection → Check structural plausibility
GNINA or MM/GBSA → Rescore and rank by affinity
Experimental validation → Biochemical assays

DiffDock生成构象后，可结合其他工具进行亲和力评估：

GNINA（快速神经网络评分）：

bash

for pose in results/*.sdf; do
    gnina -r protein.pdb -l "$pose" --score_only
done

MM/GBSA（更准确，速度较慢）： 能量最小化后使用AmberTools MMPBSA.py或gmx_MMPBSA

自由能计算（最准确）： 使用OpenMM + OpenFE或GROMACS进行FEP/TI计算

推荐工作流：

DiffDock → 生成带置信度评分的构象
可视化检查 → 验证结构合理性
GNINA或MM/GBSA → 重新评分并按亲和力排序
实验验证 → 生化分析

Limitations and Scope

局限性与适用范围

DiffDock IS Designed For:

Small molecule ligands (typically 100-1000 Da)
Drug-like organic compounds
Small peptides (<20 residues)
Single or multi-chain proteins

DiffDock IS NOT Designed For:

Large biomolecules (protein-protein docking) → Use DiffDock-PP or AlphaFold-Multimer
Large peptides (>20 residues) → Use alternative methods
Covalent docking → Use specialized covalent docking tools
Binding affinity prediction → Combine with scoring functions
Membrane proteins → Not specifically trained, use with caution

For complete limitations: Read

references/confidence_and_limitations.md

using the Read tool

DiffDock适用场景：

小分子配体（通常100-1000 Da）
类药有机化合物
小肽（<20个残基）
单链或多链蛋白

DiffDock不适用场景：

大型生物分子（蛋白-蛋白对接）→ 使用DiffDock-PP或AlphaFold-Multimer
大肽（>20个残基）→ 使用其他方法
共价对接 → 使用专门的共价对接工具
结合亲和力预测 → 结合评分函数使用
膜蛋白 → 未针对性训练，需谨慎使用

完整局限性说明： 使用Read工具阅读

references/confidence_and_limitations.md

Troubleshooting

故障排除

Common Issues

常见问题

Issue: Low confidence scores across all predictions

Cause: Large/unusual ligands, unclear binding site, protein flexibility
Solution: Increase
```
samples_per_complex
```
(20-40), try ensemble docking, validate protein structure

Issue: Out of memory errors

Cause: GPU memory insufficient for batch size
Solution: Reduce
```
--batch_size 2
```
or process fewer complexes at once

Issue: Slow performance

Cause: Running on CPU instead of GPU

Solution: Verify CUDA with

python -c "import torch; print(torch.cuda.is_available())"

, use GPU

Issue: Unrealistic binding poses

Cause: Poor protein preparation, ligand too large, wrong binding site
Solution: Check protein for missing residues, remove far waters, consider specifying binding site

Issue: "Module not found" errors

Cause: Missing dependencies or wrong environment
Solution: Run
```
python scripts/setup_check.py
```
to diagnose

问题：所有预测结果的置信度评分都很低

原因：配体过大/特殊、结合位点不明确、蛋白柔性
解决方案：增加
```
samples_per_complex
```
（20-40），尝试集成对接，验证蛋白结构

问题：内存不足错误

原因：GPU内存不足以支持批量大小
解决方案：减小
```
--batch_size 2
```
或分批处理更少的复合物

问题：运行速度慢

原因：使用CPU而非GPU

解决方案：通过

python -c "import torch; print(torch.cuda.is_available())"

验证CUDA，使用GPU运行

问题：结合构象不符合实际

原因：蛋白预处理不当、配体过大、结合位点错误
解决方案：检查蛋白是否有缺失残基，移除远端水分子，考虑指定结合位点

问题："模块未找到"错误

原因：依赖项缺失或环境错误
解决方案：运行
```
python scripts/setup_check.py
```
诊断问题

Performance Optimization

性能优化

For Best Results:

Use GPU (essential for practical use)
Pre-compute ESM embeddings for repeated protein use
Batch process multiple complexes together
Start with default parameters, then tune if needed
Validate protein structures (resolve missing residues)
Use canonical SMILES for ligands

最佳实践：

使用GPU（实际使用的必备条件）
重复使用同一蛋白时预计算ESM嵌入
批量处理多个复合物
先使用默认参数，再根据系统需求调整
验证蛋白结构（补全缺失残基）
配体使用标准SMILES格式

Graphical User Interface

图形用户界面

For interactive use, launch the web interface:

bash

python app/main.py

如需交互式使用，启动Web界面：

bash

python app/main.py

Navigate to http://localhost:7860

访问 http://localhost:7860


Or use the online demo without installation:
- https://huggingface.co/spaces/reginabarzilaygroup/DiffDock-Web


或使用在线演示，无需安装：
- https://huggingface.co/spaces/reginabarzilaygroup/DiffDock-Web

Resources

资源

Helper Scripts (

scripts/

)

辅助脚本（

scripts/

）

prepare_batch_csv.py
: Create and validate batch input CSV files

Create templates with example entries
Validate file paths and SMILES strings
Check for required columns and format issues

analyze_results.py
: Analyze confidence scores and rank predictions

Parse results from single or batch runs
Generate statistical summaries
Export to CSV for downstream analysis
Identify top predictions across complexes

setup_check.py
: Verify DiffDock environment setup

Check Python version and dependencies
Verify PyTorch and CUDA availability
Test RDKit and PyTorch Geometric installation
Provide installation instructions if needed

prepare_batch_csv.py
: 创建并验证批量输入CSV文件

创建带示例条目的模板
验证文件路径和SMILES字符串
检查必填列和格式问题

analyze_results.py
: 分析置信度评分并对预测结果排序

解析单个或批量运行的结果
生成统计摘要
导出至CSV以便后续分析
识别所有复合物中的最优预测结果

setup_check.py
: 验证DiffDock环境配置

检查Python版本和依赖项
验证PyTorch和CUDA可用性
测试RDKit和PyTorch Geometric安装
若需要，提供安装指导

Reference Documentation (

references/

)

参考文档（

references/

）

parameters_reference.md
: Complete parameter documentation

All command-line options and configuration parameters
Default values and acceptable ranges
Temperature parameters for controlling diversity
Model checkpoint locations and version flags

Read this file when users need:

Detailed parameter explanations
Fine-tuning guidance for specific systems
Alternative sampling strategies

confidence_and_limitations.md
: Confidence score interpretation and tool limitations

Detailed confidence score interpretation
When to trust predictions
Scope and limitations of DiffDock
Integration with complementary tools
Troubleshooting prediction quality

Read this file when users need:

Help interpreting confidence scores
Understanding when NOT to use DiffDock
Guidance on combining with other tools
Validation strategies

workflows_examples.md
: Comprehensive workflow examples

Detailed installation instructions
Step-by-step examples for all workflows
Advanced integration patterns
Troubleshooting common issues
Best practices and optimization tips

Read this file when users need:

Complete workflow examples with code
Integration with GNINA, OpenMM, or other tools
Virtual screening workflows
Ensemble docking procedures

parameters_reference.md
: 完整参数文档

所有命令行选项和配置参数
默认值和可接受范围
控制多样性的温度参数
模型checkpoint位置和版本标志

当用户需要以下信息时阅读本文件：

详细的参数解释
特定系统的微调指导
替代采样策略

confidence_and_limitations.md
: 置信度评分解读与工具局限性

详细的置信度评分解读
何时可以信任预测结果
DiffDock的适用范围与局限性
与互补工具的集成
预测质量故障排除

当用户需要以下信息时阅读本文件：

帮助解读置信度评分
了解何时不应使用DiffDock
与其他工具结合的指导
验证策略

workflows_examples.md
: 全面的工作流示例

详细的安装说明
所有工作流的分步示例
高级集成模式
常见问题故障排除
最佳实践与优化技巧

当用户需要以下信息时阅读本文件：

带代码的完整工作流示例
与GNINA、OpenMM或其他工具的集成
虚拟筛选工作流
集成对接流程

Assets (

assets/

)

资源文件（

assets/

）

batch_template.csv
: Template for batch processing

Pre-formatted CSV with required columns
Example entries showing different input types
Ready to customize with actual data

custom_inference_config.yaml
: Configuration template

Annotated YAML with all parameters
Four preset configurations for common use cases
Detailed comments explaining each parameter
Ready to customize and use

batch_template.csv
: 批量处理模板

预格式化的CSV，包含必填列
展示不同输入类型的示例条目
可直接自定义填入实际数据

custom_inference_config.yaml
: 配置模板

带注释的YAML，包含所有参数
针对常见场景的4种预设配置
每个参数的详细注释说明
可直接自定义使用

Best Practices

最佳实践

Always verify environment with
```
setup_check.py
```
before starting large jobs
Validate batch CSVs with
```
prepare_batch_csv.py
```
to catch errors early
Start with defaults then tune parameters based on system-specific needs
Generate multiple samples (10-40) for robust predictions
Visual inspection of top poses before downstream analysis
Combine with scoring functions for affinity assessment
Use confidence scores for initial ranking, not final decisions
Pre-compute embeddings for virtual screening campaigns
Document parameters used for reproducibility
Validate results experimentally when possible

启动大型任务前，始终使用
```
setup_check.py
```
验证环境
使用
```
prepare_batch_csv.py
```
验证批量CSV，提前发现错误
先使用默认参数，再根据系统特定需求调整
生成多个样本（10-40个）以获得可靠的预测结果
下游分析前先对排名靠前的构象进行可视化检查
结合评分函数进行亲和力评估
使用置信度评分进行初步排序，而非作为最终决策依据
虚拟筛选任务中预计算嵌入
记录使用的参数以保证可复现性
尽可能通过实验验证结果

Citations

引用

When using DiffDock, cite the appropriate papers:

DiffDock-L (current default model):

Stärk et al. (2024) "DiffDock-L: Improving Molecular Docking with Diffusion Models"
arXiv:2402.18396

Original DiffDock:

Corso et al. (2023) "DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking"
ICLR 2023, arXiv:2210.01776

使用DiffDock时，请引用相应论文：

DiffDock-L（当前默认模型）：

Stärk et al. (2024) "DiffDock-L: Improving Molecular Docking with Diffusion Models"
arXiv:2402.18396

原始DiffDock：

Corso et al. (2023) "DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking"
ICLR 2023, arXiv:2210.01776

Additional Resources

额外资源

GitHub Repository: https://github.com/gcorso/DiffDock
Online Demo: https://huggingface.co/spaces/reginabarzilaygroup/DiffDock-Web
DiffDock-L Paper: https://arxiv.org/abs/2402.18396
Original Paper: https://arxiv.org/abs/2210.01776

GitHub仓库: https://github.com/gcorso/DiffDock
在线演示: https://huggingface.co/spaces/reginabarzilaygroup/DiffDock-Web
DiffDock-L论文: https://arxiv.org/abs/2402.18396
原始论文: https://arxiv.org/abs/2210.01776

diffdock

Original

Translation

DiffDock: Molecular Docking with Diffusion Models

DiffDock：基于扩散模型的分子对接

Overview

概述

When to Use This Skill

适用场景

Installation and Environment Setup

安装与环境配置

Check Environment Status

检查环境状态

Use the provided setup checker

使用提供的配置检查脚本

Installation Options

安装选项

Core Workflows

核心工作流

Workflow 1: Single Protein-Ligand Docking

工作流1：单组蛋白-配体对接

Workflow 2: Batch Processing Multiple Complexes

工作流2：多组复合物批量处理

Create template

创建模板

Validate existing CSV

验证现有CSV

Pre-compute embeddings

预计算嵌入

Run with pre-computed embeddings

使用预计算的嵌入运行

Workflow 3: Analyzing Results

工作流3：结果分析

Analyze all results

分析所有结果

Show top 5 per complex

显示每组复合物的前5个结果

Filter by confidence threshold

按置信度阈值筛选

Export to CSV

导出至CSV

Show top 20 predictions across all complexes

显示所有复合物中排名前20的预测结果

Confidence Score Interpretation

置信度评分解读

Parameter Customization

参数自定义

Using Custom Configuration

使用自定义配置

Copy template

复制模板

Edit parameters (see template for presets)

编辑参数（模板中有预设示例）

Then run with custom config

使用自定义配置运行

Key Parameters to Adjust

可调整的关键参数

Advanced Techniques

高级技巧

Ensemble Docking (Protein Flexibility)

集成对接（蛋白柔性处理）

Create ensemble CSV

创建集成CSV

Integration with Scoring Functions

与评分函数集成

Limitations and Scope

局限性与适用范围

Troubleshooting

故障排除

Common Issues

常见问题

Performance Optimization

性能优化

Graphical User Interface

图形用户界面

Navigate to http://localhost:7860

访问 http://localhost:7860

Resources

资源

Helper Scripts (scripts/)

Helper Scripts (
`scripts/`
)

辅助脚本（
`scripts/`
）

Reference Documentation (
`references/`
)

参考文档（
`references/`
）

Assets (
`assets/`
)

资源文件（
`assets/`
）