diffdock

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

DiffDock: Molecular Docking with Diffusion Models

DiffDock:基于扩散模型的分子对接

Overview

概述

DiffDock is a diffusion-based deep learning tool for molecular docking that predicts 3D binding poses of small molecule ligands to protein targets. It represents the state-of-the-art in computational docking, crucial for structure-based drug discovery and chemical biology.
Core Capabilities:
  • Predict ligand binding poses with high accuracy using deep learning
  • Support protein structures (PDB files) or sequences (via ESMFold)
  • Process single complexes or batch virtual screening campaigns
  • Generate confidence scores to assess prediction reliability
  • Handle diverse ligand inputs (SMILES, SDF, MOL2)
Key Distinction: DiffDock predicts binding poses (3D structure) and confidence (prediction certainty), NOT binding affinity (ΔG, Kd). Always combine with scoring functions (GNINA, MM/GBSA) for affinity assessment.
DiffDock是一款基于扩散模型的深度学习分子对接工具,可预测小分子配体与蛋白靶点的3D结合构象。它代表了计算对接领域的前沿水平,在基于结构的药物发现和化学生物学中至关重要。
核心功能:
  • 利用深度学习高精度预测配体结合构象
  • 支持蛋白结构(PDB文件)或序列(通过ESMFold)
  • 处理单个复合物或批量虚拟筛选任务
  • 生成置信度评分以评估预测可靠性
  • 支持多种配体输入格式(SMILES、SDF、MOL2)
关键区别: DiffDock预测结合构象(3D结构)和置信度(预测确定性),而非结合亲和力(ΔG、Kd)。请始终结合评分函数(如GNINA、MM/GBSA)进行亲和力评估。

When to Use This Skill

适用场景

This skill should be used when:
  • "Dock this ligand to a protein" or "predict binding pose"
  • "Run molecular docking" or "perform protein-ligand docking"
  • "Virtual screening" or "screen compound library"
  • "Where does this molecule bind?" or "predict binding site"
  • Structure-based drug design or lead optimization tasks
  • Tasks involving PDB files + SMILES strings or ligand structures
  • Batch docking of multiple protein-ligand pairs
当你需要完成以下任务时,可使用本技能:
  • "将该配体对接至蛋白"或"预测结合构象"
  • "运行分子对接"或"执行蛋白-配体对接"
  • "虚拟筛选"或"筛选化合物库"
  • "该分子结合在何处?"或"预测结合位点"
  • 基于结构的药物设计或先导化合物优化任务
  • 涉及PDB文件+SMILES字符串或配体结构的任务
  • 多组蛋白-配体对的批量对接

Installation and Environment Setup

安装与环境配置

Check Environment Status

检查环境状态

Before proceeding with DiffDock tasks, verify the environment setup:
bash
undefined
在执行DiffDock任务前,请先验证环境配置:
bash
undefined

Use the provided setup checker

使用提供的配置检查脚本

python scripts/setup_check.py

This script validates Python version, PyTorch with CUDA, PyTorch Geometric, RDKit, ESM, and other dependencies.
python scripts/setup_check.py

该脚本会验证Python版本、带CUDA的PyTorch、PyTorch Geometric、RDKit、ESM及其他依赖项。

Installation Options

安装选项

Option 1: Conda (Recommended)
bash
git clone https://github.com/gcorso/DiffDock.git
cd DiffDock
conda env create --file environment.yml
conda activate diffdock
Option 2: Docker
bash
docker pull rbgcsail/diffdock
docker run -it --gpus all --entrypoint /bin/bash rbgcsail/diffdock
micromamba activate diffdock
Important Notes:
  • GPU strongly recommended (10-100x speedup vs CPU)
  • First run pre-computes SO(2)/SO(3) lookup tables (~2-5 minutes)
  • Model checkpoints (~500MB) download automatically if not present
选项1:Conda(推荐)
bash
git clone https://github.com/gcorso/DiffDock.git
cd DiffDock
conda env create --file environment.yml
conda activate diffdock
选项2:Docker
bash
docker pull rbgcsail/diffdock
docker run -it --gpus all --entrypoint /bin/bash rbgcsail/diffdock
micromamba activate diffdock
重要说明:
  • 强烈推荐使用GPU(比CPU快10-100倍)
  • 首次运行会预计算SO(2)/SO(3)查找表(约2-5分钟)
  • 模型 checkpoint(约500MB)若不存在会自动下载

Core Workflows

核心工作流

Workflow 1: Single Protein-Ligand Docking

工作流1:单组蛋白-配体对接

Use Case: Dock one ligand to one protein target
Input Requirements:
  • Protein: PDB file OR amino acid sequence
  • Ligand: SMILES string OR structure file (SDF/MOL2)
Command:
bash
python -m inference \
  --config default_inference_args.yaml \
  --protein_path protein.pdb \
  --ligand "CC(=O)Oc1ccccc1C(=O)O" \
  --out_dir results/single_docking/
Alternative (protein sequence):
bash
python -m inference \
  --config default_inference_args.yaml \
  --protein_sequence "MSKGEELFTGVVPILVELDGDVNGHKF..." \
  --ligand ligand.sdf \
  --out_dir results/sequence_docking/
Output Structure:
results/single_docking/
├── rank_1.sdf          # Top-ranked pose
├── rank_2.sdf          # Second-ranked pose
├── ...
├── rank_10.sdf         # 10th pose (default: 10 samples)
└── confidence_scores.txt
适用场景: 将一个配体对接至一个蛋白靶点
输入要求:
  • 蛋白:PDB文件或氨基酸序列
  • 配体:SMILES字符串或结构文件(SDF/MOL2)
命令:
bash
python -m inference \
  --config default_inference_args.yaml \
  --protein_path protein.pdb \
  --ligand "CC(=O)Oc1ccccc1C(=O)O" \
  --out_dir results/single_docking/
替代方案(使用蛋白序列):
bash
python -m inference \
  --config default_inference_args.yaml \
  --protein_sequence "MSKGEELFTGVVPILVELDGDVNGHKF..." \
  --ligand ligand.sdf \
  --out_dir results/sequence_docking/
输出结构:
results/single_docking/
├── rank_1.sdf          # 排名第一的构象
├── rank_2.sdf          # 排名第二的构象
├── ...
├── rank_10.sdf         # 第10个构象(默认生成10个样本)
└── confidence_scores.txt

Workflow 2: Batch Processing Multiple Complexes

工作流2:多组复合物批量处理

Use Case: Dock multiple ligands to proteins, virtual screening campaigns
Step 1: Prepare Batch CSV
Use the provided script to create or validate batch input:
bash
undefined
适用场景: 将多个配体对接至蛋白,虚拟筛选任务
步骤1:准备批量CSV文件
使用提供的脚本创建或验证批量输入文件:
bash
undefined

Create template

创建模板

python scripts/prepare_batch_csv.py --create --output batch_input.csv
python scripts/prepare_batch_csv.py --create --output batch_input.csv

Validate existing CSV

验证现有CSV

python scripts/prepare_batch_csv.py my_input.csv --validate

**CSV Format:**
```csv
complex_name,protein_path,ligand_description,protein_sequence
complex1,protein1.pdb,CC(=O)Oc1ccccc1C(=O)O,
complex2,,COc1ccc(C#N)cc1,MSKGEELFT...
complex3,protein3.pdb,ligand3.sdf,
Required Columns:
  • complex_name
    : Unique identifier
  • protein_path
    : PDB file path (leave empty if using sequence)
  • ligand_description
    : SMILES string or ligand file path
  • protein_sequence
    : Amino acid sequence (leave empty if using PDB)
Step 2: Run Batch Docking
bash
python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv batch_input.csv \
  --out_dir results/batch/ \
  --batch_size 10
For Large Virtual Screening (>100 compounds):
Pre-compute protein embeddings for faster processing:
bash
undefined
python scripts/prepare_batch_csv.py my_input.csv --validate

**CSV格式:**
```csv
complex_name,protein_path,ligand_description,protein_sequence
complex1,protein1.pdb,CC(=O)Oc1ccccc1C(=O)O,
complex2,,COc1ccc(C#N)cc1,MSKGEELFT...
complex3,protein3.pdb,ligand3.sdf,
必填列:
  • complex_name
    : 唯一标识符
  • protein_path
    : PDB文件路径(使用序列时留空)
  • ligand_description
    : SMILES字符串或配体文件路径
  • protein_sequence
    : 氨基酸序列(使用PDB时留空)
步骤2:运行批量对接
bash
python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv batch_input.csv \
  --out_dir results/batch/ \
  --batch_size 10
针对大规模虚拟筛选(>100个化合物):
预计算蛋白嵌入以加快处理速度:
bash
undefined

Pre-compute embeddings

预计算嵌入

python datasets/esm_embedding_preparation.py
--protein_ligand_csv screening_input.csv
--out_file protein_embeddings.pt
python datasets/esm_embedding_preparation.py
--protein_ligand_csv screening_input.csv
--out_file protein_embeddings.pt

Run with pre-computed embeddings

使用预计算的嵌入运行

python -m inference
--config default_inference_args.yaml
--protein_ligand_csv screening_input.csv
--esm_embeddings_path protein_embeddings.pt
--out_dir results/screening/
undefined
python -m inference
--config default_inference_args.yaml
--protein_ligand_csv screening_input.csv
--esm_embeddings_path protein_embeddings.pt
--out_dir results/screening/
undefined

Workflow 3: Analyzing Results

工作流3:结果分析

After docking completes, analyze confidence scores and rank predictions:
bash
undefined
对接完成后,分析置信度评分并对预测结果排序:
bash
undefined

Analyze all results

分析所有结果

python scripts/analyze_results.py results/batch/
python scripts/analyze_results.py results/batch/

Show top 5 per complex

显示每组复合物的前5个结果

python scripts/analyze_results.py results/batch/ --top 5
python scripts/analyze_results.py results/batch/ --top 5

Filter by confidence threshold

按置信度阈值筛选

python scripts/analyze_results.py results/batch/ --threshold 0.0
python scripts/analyze_results.py results/batch/ --threshold 0.0

Export to CSV

导出至CSV

python scripts/analyze_results.py results/batch/ --export summary.csv
python scripts/analyze_results.py results/batch/ --export summary.csv

Show top 20 predictions across all complexes

显示所有复合物中排名前20的预测结果

python scripts/analyze_results.py results/batch/ --best 20

The analysis script:
- Parses confidence scores from all predictions
- Classifies as High (>0), Moderate (-1.5 to 0), or Low (<-1.5)
- Ranks predictions within and across complexes
- Generates statistical summaries
- Exports results to CSV for downstream analysis
python scripts/analyze_results.py results/batch/ --best 20

分析脚本功能:
- 解析所有预测结果的置信度评分
- 分为高(>0)、中(-1.5至0)、低(<-1.5)三个等级
- 在组内和组间对预测结果排序
- 生成统计摘要
- 导出结果至CSV以便后续分析

Confidence Score Interpretation

置信度评分解读

Understanding Scores:
Score RangeConfidence LevelInterpretation
> 0HighStrong prediction, likely accurate
-1.5 to 0ModerateReasonable prediction, validate carefully
< -1.5LowUncertain prediction, requires validation
Critical Notes:
  1. Confidence ≠ Affinity: High confidence means model certainty about structure, NOT strong binding
  2. Context Matters: Adjust expectations for:
    • Large ligands (>500 Da): Lower confidence expected
    • Multiple protein chains: May decrease confidence
    • Novel protein families: May underperform
  3. Multiple Samples: Review top 3-5 predictions, look for consensus
For detailed guidance: Read
references/confidence_and_limitations.md
using the Read tool
评分理解:
评分范围置信度等级解读
> 0预测结果可靠,大概率准确
-1.5 至 0预测结果合理,需仔细验证
< -1.5预测结果不确定,需要验证
重要提示:
  1. 置信度 ≠ 亲和力:高置信度表示模型对结构的确定性高,而非结合力强
  2. 场景影响:根据以下情况调整预期:
    • 大配体(>500 Da):置信度通常较低
    • 多链蛋白:可能降低置信度
    • 新型蛋白家族:性能可能不佳
  3. 多样本参考:查看前3-5个预测结果,寻找共识
详细指导: 使用Read工具阅读
references/confidence_and_limitations.md

Parameter Customization

参数自定义

Using Custom Configuration

使用自定义配置

Create custom configuration for specific use cases:
bash
undefined
针对特定场景创建自定义配置:
bash
undefined

Copy template

复制模板

cp assets/custom_inference_config.yaml my_config.yaml
cp assets/custom_inference_config.yaml my_config.yaml

Edit parameters (see template for presets)

编辑参数(模板中有预设示例)

Then run with custom config

使用自定义配置运行

python -m inference
--config my_config.yaml
--protein_ligand_csv input.csv
--out_dir results/
undefined
python -m inference
--config my_config.yaml
--protein_ligand_csv input.csv
--out_dir results/
undefined

Key Parameters to Adjust

可调整的关键参数

Sampling Density:
  • samples_per_complex: 10
    → Increase to 20-40 for difficult cases
  • More samples = better coverage but longer runtime
Inference Steps:
  • inference_steps: 20
    → Increase to 25-30 for higher accuracy
  • More steps = potentially better quality but slower
Temperature Parameters (control diversity):
  • temp_sampling_tor: 7.04
    → Increase for flexible ligands (8-10)
  • temp_sampling_tor: 7.04
    → Decrease for rigid ligands (5-6)
  • Higher temperature = more diverse poses
Presets Available in Template:
  1. High Accuracy: More samples + steps, lower temperature
  2. Fast Screening: Fewer samples, faster
  3. Flexible Ligands: Increased torsion temperature
  4. Rigid Ligands: Decreased torsion temperature
For complete parameter reference: Read
references/parameters_reference.md
using the Read tool
采样密度:
  • samples_per_complex: 10
    → 复杂场景下可增加至20-40
  • 样本越多,覆盖越全面,但运行时间越长
推理步数:
  • inference_steps: 20
    → 为提高准确性可增加至25-30
  • 步数越多,质量可能越好,但速度越慢
温度参数(控制多样性):
  • temp_sampling_tor: 7.04
    → 柔性配体可提高至8-10
  • temp_sampling_tor: 7.04
    → 刚性配体可降低至5-6
  • 温度越高,构象多样性越丰富
模板中的预设配置:
  1. 高精度模式:更多样本+步数,低温度
  2. 快速筛选模式:更少样本,速度更快
  3. 柔性配体模式:提高扭转温度
  4. 刚性配体模式:降低扭转温度
完整参数参考: 使用Read工具阅读
references/parameters_reference.md

Advanced Techniques

高级技巧

Ensemble Docking (Protein Flexibility)

集成对接(蛋白柔性处理)

For proteins with known flexibility, dock to multiple conformations:
python
undefined
针对已知存在柔性的蛋白,可对接至多个构象:
python
undefined

Create ensemble CSV

创建集成CSV

import pandas as pd
conformations = ["conf1.pdb", "conf2.pdb", "conf3.pdb"] ligand = "CC(=O)Oc1ccccc1C(=O)O"
data = { "complex_name": [f"ensemble_{i}" for i in range(len(conformations))], "protein_path": conformations, "ligand_description": [ligand] * len(conformations), "protein_sequence": [""] * len(conformations) }
pd.DataFrame(data).to_csv("ensemble_input.csv", index=False)

Run docking with increased sampling:
```bash
python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv ensemble_input.csv \
  --samples_per_complex 20 \
  --out_dir results/ensemble/
import pandas as pd
conformations = ["conf1.pdb", "conf2.pdb", "conf3.pdb"] ligand = "CC(=O)Oc1ccccc1C(=O)O"
data = { "complex_name": [f"ensemble_{i}" for i in range(len(conformations))], "protein_path": conformations, "ligand_description": [ligand] * len(conformations), "protein_sequence": [""] * len(conformations) }
pd.DataFrame(data).to_csv("ensemble_input.csv", index=False)

增加采样量运行对接:
```bash
python -m inference \
  --config default_inference_args.yaml \
  --protein_ligand_csv ensemble_input.csv \
  --samples_per_complex 20 \
  --out_dir results/ensemble/

Integration with Scoring Functions

与评分函数集成

DiffDock generates poses; combine with other tools for affinity:
GNINA (Fast neural network scoring):
bash
for pose in results/*.sdf; do
    gnina -r protein.pdb -l "$pose" --score_only
done
MM/GBSA (More accurate, slower): Use AmberTools MMPBSA.py or gmx_MMPBSA after energy minimization
Free Energy Calculations (Most accurate): Use OpenMM + OpenFE or GROMACS for FEP/TI calculations
Recommended Workflow:
  1. DiffDock → Generate poses with confidence scores
  2. Visual inspection → Check structural plausibility
  3. GNINA or MM/GBSA → Rescore and rank by affinity
  4. Experimental validation → Biochemical assays
DiffDock生成构象后,可结合其他工具进行亲和力评估:
GNINA(快速神经网络评分):
bash
for pose in results/*.sdf; do
    gnina -r protein.pdb -l "$pose" --score_only
done
MM/GBSA(更准确,速度较慢): 能量最小化后使用AmberTools MMPBSA.py或gmx_MMPBSA
自由能计算(最准确): 使用OpenMM + OpenFE或GROMACS进行FEP/TI计算
推荐工作流:
  1. DiffDock → 生成带置信度评分的构象
  2. 可视化检查 → 验证结构合理性
  3. GNINA或MM/GBSA → 重新评分并按亲和力排序
  4. 实验验证 → 生化分析

Limitations and Scope

局限性与适用范围

DiffDock IS Designed For:
  • Small molecule ligands (typically 100-1000 Da)
  • Drug-like organic compounds
  • Small peptides (<20 residues)
  • Single or multi-chain proteins
DiffDock IS NOT Designed For:
  • Large biomolecules (protein-protein docking) → Use DiffDock-PP or AlphaFold-Multimer
  • Large peptides (>20 residues) → Use alternative methods
  • Covalent docking → Use specialized covalent docking tools
  • Binding affinity prediction → Combine with scoring functions
  • Membrane proteins → Not specifically trained, use with caution
For complete limitations: Read
references/confidence_and_limitations.md
using the Read tool
DiffDock适用场景:
  • 小分子配体(通常100-1000 Da)
  • 类药有机化合物
  • 小肽(<20个残基)
  • 单链或多链蛋白
DiffDock不适用场景:
  • 大型生物分子(蛋白-蛋白对接)→ 使用DiffDock-PP或AlphaFold-Multimer
  • 大肽(>20个残基)→ 使用其他方法
  • 共价对接 → 使用专门的共价对接工具
  • 结合亲和力预测 → 结合评分函数使用
  • 膜蛋白 → 未针对性训练,需谨慎使用
完整局限性说明: 使用Read工具阅读
references/confidence_and_limitations.md

Troubleshooting

故障排除

Common Issues

常见问题

Issue: Low confidence scores across all predictions
  • Cause: Large/unusual ligands, unclear binding site, protein flexibility
  • Solution: Increase
    samples_per_complex
    (20-40), try ensemble docking, validate protein structure
Issue: Out of memory errors
  • Cause: GPU memory insufficient for batch size
  • Solution: Reduce
    --batch_size 2
    or process fewer complexes at once
Issue: Slow performance
  • Cause: Running on CPU instead of GPU
  • Solution: Verify CUDA with
    python -c "import torch; print(torch.cuda.is_available())"
    , use GPU
Issue: Unrealistic binding poses
  • Cause: Poor protein preparation, ligand too large, wrong binding site
  • Solution: Check protein for missing residues, remove far waters, consider specifying binding site
Issue: "Module not found" errors
  • Cause: Missing dependencies or wrong environment
  • Solution: Run
    python scripts/setup_check.py
    to diagnose
问题:所有预测结果的置信度评分都很低
  • 原因:配体过大/特殊、结合位点不明确、蛋白柔性
  • 解决方案:增加
    samples_per_complex
    (20-40),尝试集成对接,验证蛋白结构
问题:内存不足错误
  • 原因:GPU内存不足以支持批量大小
  • 解决方案:减小
    --batch_size 2
    或分批处理更少的复合物
问题:运行速度慢
  • 原因:使用CPU而非GPU
  • 解决方案:通过
    python -c "import torch; print(torch.cuda.is_available())"
    验证CUDA,使用GPU运行
问题:结合构象不符合实际
  • 原因:蛋白预处理不当、配体过大、结合位点错误
  • 解决方案:检查蛋白是否有缺失残基,移除远端水分子,考虑指定结合位点
问题:"模块未找到"错误
  • 原因:依赖项缺失或环境错误
  • 解决方案:运行
    python scripts/setup_check.py
    诊断问题

Performance Optimization

性能优化

For Best Results:
  1. Use GPU (essential for practical use)
  2. Pre-compute ESM embeddings for repeated protein use
  3. Batch process multiple complexes together
  4. Start with default parameters, then tune if needed
  5. Validate protein structures (resolve missing residues)
  6. Use canonical SMILES for ligands
最佳实践:
  1. 使用GPU(实际使用的必备条件)
  2. 重复使用同一蛋白时预计算ESM嵌入
  3. 批量处理多个复合物
  4. 先使用默认参数,再根据系统需求调整
  5. 验证蛋白结构(补全缺失残基)
  6. 配体使用标准SMILES格式

Graphical User Interface

图形用户界面

For interactive use, launch the web interface:
bash
python app/main.py
如需交互式使用,启动Web界面:
bash
python app/main.py

Or use the online demo without installation:
- https://huggingface.co/spaces/reginabarzilaygroup/DiffDock-Web

或使用在线演示,无需安装:
- https://huggingface.co/spaces/reginabarzilaygroup/DiffDock-Web

Resources

资源

Helper Scripts (
scripts/
)

辅助脚本(
scripts/

prepare_batch_csv.py
: Create and validate batch input CSV files
  • Create templates with example entries
  • Validate file paths and SMILES strings
  • Check for required columns and format issues
analyze_results.py
: Analyze confidence scores and rank predictions
  • Parse results from single or batch runs
  • Generate statistical summaries
  • Export to CSV for downstream analysis
  • Identify top predictions across complexes
setup_check.py
: Verify DiffDock environment setup
  • Check Python version and dependencies
  • Verify PyTorch and CUDA availability
  • Test RDKit and PyTorch Geometric installation
  • Provide installation instructions if needed
prepare_batch_csv.py
: 创建并验证批量输入CSV文件
  • 创建带示例条目的模板
  • 验证文件路径和SMILES字符串
  • 检查必填列和格式问题
analyze_results.py
: 分析置信度评分并对预测结果排序
  • 解析单个或批量运行的结果
  • 生成统计摘要
  • 导出至CSV以便后续分析
  • 识别所有复合物中的最优预测结果
setup_check.py
: 验证DiffDock环境配置
  • 检查Python版本和依赖项
  • 验证PyTorch和CUDA可用性
  • 测试RDKit和PyTorch Geometric安装
  • 若需要,提供安装指导

Reference Documentation (
references/
)

参考文档(
references/

parameters_reference.md
: Complete parameter documentation
  • All command-line options and configuration parameters
  • Default values and acceptable ranges
  • Temperature parameters for controlling diversity
  • Model checkpoint locations and version flags
Read this file when users need:
  • Detailed parameter explanations
  • Fine-tuning guidance for specific systems
  • Alternative sampling strategies
confidence_and_limitations.md
: Confidence score interpretation and tool limitations
  • Detailed confidence score interpretation
  • When to trust predictions
  • Scope and limitations of DiffDock
  • Integration with complementary tools
  • Troubleshooting prediction quality
Read this file when users need:
  • Help interpreting confidence scores
  • Understanding when NOT to use DiffDock
  • Guidance on combining with other tools
  • Validation strategies
workflows_examples.md
: Comprehensive workflow examples
  • Detailed installation instructions
  • Step-by-step examples for all workflows
  • Advanced integration patterns
  • Troubleshooting common issues
  • Best practices and optimization tips
Read this file when users need:
  • Complete workflow examples with code
  • Integration with GNINA, OpenMM, or other tools
  • Virtual screening workflows
  • Ensemble docking procedures
parameters_reference.md
: 完整参数文档
  • 所有命令行选项和配置参数
  • 默认值和可接受范围
  • 控制多样性的温度参数
  • 模型checkpoint位置和版本标志
当用户需要以下信息时阅读本文件:
  • 详细的参数解释
  • 特定系统的微调指导
  • 替代采样策略
confidence_and_limitations.md
: 置信度评分解读与工具局限性
  • 详细的置信度评分解读
  • 何时可以信任预测结果
  • DiffDock的适用范围与局限性
  • 与互补工具的集成
  • 预测质量故障排除
当用户需要以下信息时阅读本文件:
  • 帮助解读置信度评分
  • 了解何时不应使用DiffDock
  • 与其他工具结合的指导
  • 验证策略
workflows_examples.md
: 全面的工作流示例
  • 详细的安装说明
  • 所有工作流的分步示例
  • 高级集成模式
  • 常见问题故障排除
  • 最佳实践与优化技巧
当用户需要以下信息时阅读本文件:
  • 带代码的完整工作流示例
  • 与GNINA、OpenMM或其他工具的集成
  • 虚拟筛选工作流
  • 集成对接流程

Assets (
assets/
)

资源文件(
assets/

batch_template.csv
: Template for batch processing
  • Pre-formatted CSV with required columns
  • Example entries showing different input types
  • Ready to customize with actual data
custom_inference_config.yaml
: Configuration template
  • Annotated YAML with all parameters
  • Four preset configurations for common use cases
  • Detailed comments explaining each parameter
  • Ready to customize and use
batch_template.csv
: 批量处理模板
  • 预格式化的CSV,包含必填列
  • 展示不同输入类型的示例条目
  • 可直接自定义填入实际数据
custom_inference_config.yaml
: 配置模板
  • 带注释的YAML,包含所有参数
  • 针对常见场景的4种预设配置
  • 每个参数的详细注释说明
  • 可直接自定义使用

Best Practices

最佳实践

  1. Always verify environment with
    setup_check.py
    before starting large jobs
  2. Validate batch CSVs with
    prepare_batch_csv.py
    to catch errors early
  3. Start with defaults then tune parameters based on system-specific needs
  4. Generate multiple samples (10-40) for robust predictions
  5. Visual inspection of top poses before downstream analysis
  6. Combine with scoring functions for affinity assessment
  7. Use confidence scores for initial ranking, not final decisions
  8. Pre-compute embeddings for virtual screening campaigns
  9. Document parameters used for reproducibility
  10. Validate results experimentally when possible
  1. 启动大型任务前,始终使用
    setup_check.py
    验证环境
  2. 使用
    prepare_batch_csv.py
    验证批量CSV,提前发现错误
  3. 先使用默认参数,再根据系统特定需求调整
  4. 生成多个样本(10-40个)以获得可靠的预测结果
  5. 下游分析前先对排名靠前的构象进行可视化检查
  6. 结合评分函数进行亲和力评估
  7. 使用置信度评分进行初步排序,而非作为最终决策依据
  8. 虚拟筛选任务中预计算嵌入
  9. 记录使用的参数以保证可复现性
  10. 尽可能通过实验验证结果

Citations

引用

When using DiffDock, cite the appropriate papers:
DiffDock-L (current default model):
Stärk et al. (2024) "DiffDock-L: Improving Molecular Docking with Diffusion Models"
arXiv:2402.18396
Original DiffDock:
Corso et al. (2023) "DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking"
ICLR 2023, arXiv:2210.01776
使用DiffDock时,请引用相应论文:
DiffDock-L(当前默认模型):
Stärk et al. (2024) "DiffDock-L: Improving Molecular Docking with Diffusion Models"
arXiv:2402.18396
原始DiffDock:
Corso et al. (2023) "DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking"
ICLR 2023, arXiv:2210.01776

Additional Resources

额外资源