diffdock
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDiffDock: Molecular Docking with Diffusion Models
DiffDock:基于扩散模型的分子对接
Overview
概述
DiffDock is a diffusion-based deep learning tool for molecular docking that predicts 3D binding poses of small molecule ligands to protein targets. It represents the state-of-the-art in computational docking, crucial for structure-based drug discovery and chemical biology.
Core Capabilities:
- Predict ligand binding poses with high accuracy using deep learning
- Support protein structures (PDB files) or sequences (via ESMFold)
- Process single complexes or batch virtual screening campaigns
- Generate confidence scores to assess prediction reliability
- Handle diverse ligand inputs (SMILES, SDF, MOL2)
Key Distinction: DiffDock predicts binding poses (3D structure) and confidence (prediction certainty), NOT binding affinity (ΔG, Kd). Always combine with scoring functions (GNINA, MM/GBSA) for affinity assessment.
DiffDock是一款基于扩散模型的深度学习分子对接工具,可预测小分子配体与蛋白靶点的3D结合构象。它代表了计算对接领域的前沿水平,在基于结构的药物发现和化学生物学中至关重要。
核心功能:
- 利用深度学习高精度预测配体结合构象
- 支持蛋白结构(PDB文件)或序列(通过ESMFold)
- 处理单个复合物或批量虚拟筛选任务
- 生成置信度评分以评估预测可靠性
- 支持多种配体输入格式(SMILES、SDF、MOL2)
关键区别: DiffDock预测结合构象(3D结构)和置信度(预测确定性),而非结合亲和力(ΔG、Kd)。请始终结合评分函数(如GNINA、MM/GBSA)进行亲和力评估。
When to Use This Skill
适用场景
This skill should be used when:
- "Dock this ligand to a protein" or "predict binding pose"
- "Run molecular docking" or "perform protein-ligand docking"
- "Virtual screening" or "screen compound library"
- "Where does this molecule bind?" or "predict binding site"
- Structure-based drug design or lead optimization tasks
- Tasks involving PDB files + SMILES strings or ligand structures
- Batch docking of multiple protein-ligand pairs
当你需要完成以下任务时,可使用本技能:
- "将该配体对接至蛋白"或"预测结合构象"
- "运行分子对接"或"执行蛋白-配体对接"
- "虚拟筛选"或"筛选化合物库"
- "该分子结合在何处?"或"预测结合位点"
- 基于结构的药物设计或先导化合物优化任务
- 涉及PDB文件+SMILES字符串或配体结构的任务
- 多组蛋白-配体对的批量对接
Installation and Environment Setup
安装与环境配置
Check Environment Status
检查环境状态
Before proceeding with DiffDock tasks, verify the environment setup:
bash
undefined在执行DiffDock任务前,请先验证环境配置:
bash
undefinedUse the provided setup checker
使用提供的配置检查脚本
python scripts/setup_check.py
This script validates Python version, PyTorch with CUDA, PyTorch Geometric, RDKit, ESM, and other dependencies.python scripts/setup_check.py
该脚本会验证Python版本、带CUDA的PyTorch、PyTorch Geometric、RDKit、ESM及其他依赖项。Installation Options
安装选项
Option 1: Conda (Recommended)
bash
git clone https://github.com/gcorso/DiffDock.git
cd DiffDock
conda env create --file environment.yml
conda activate diffdockOption 2: Docker
bash
docker pull rbgcsail/diffdock
docker run -it --gpus all --entrypoint /bin/bash rbgcsail/diffdock
micromamba activate diffdockImportant Notes:
- GPU strongly recommended (10-100x speedup vs CPU)
- First run pre-computes SO(2)/SO(3) lookup tables (~2-5 minutes)
- Model checkpoints (~500MB) download automatically if not present
选项1:Conda(推荐)
bash
git clone https://github.com/gcorso/DiffDock.git
cd DiffDock
conda env create --file environment.yml
conda activate diffdock选项2:Docker
bash
docker pull rbgcsail/diffdock
docker run -it --gpus all --entrypoint /bin/bash rbgcsail/diffdock
micromamba activate diffdock重要说明:
- 强烈推荐使用GPU(比CPU快10-100倍)
- 首次运行会预计算SO(2)/SO(3)查找表(约2-5分钟)
- 模型 checkpoint(约500MB)若不存在会自动下载
Core Workflows
核心工作流
Workflow 1: Single Protein-Ligand Docking
工作流1:单组蛋白-配体对接
Use Case: Dock one ligand to one protein target
Input Requirements:
- Protein: PDB file OR amino acid sequence
- Ligand: SMILES string OR structure file (SDF/MOL2)
Command:
bash
python -m inference \
--config default_inference_args.yaml \
--protein_path protein.pdb \
--ligand "CC(=O)Oc1ccccc1C(=O)O" \
--out_dir results/single_docking/Alternative (protein sequence):
bash
python -m inference \
--config default_inference_args.yaml \
--protein_sequence "MSKGEELFTGVVPILVELDGDVNGHKF..." \
--ligand ligand.sdf \
--out_dir results/sequence_docking/Output Structure:
results/single_docking/
├── rank_1.sdf # Top-ranked pose
├── rank_2.sdf # Second-ranked pose
├── ...
├── rank_10.sdf # 10th pose (default: 10 samples)
└── confidence_scores.txt适用场景: 将一个配体对接至一个蛋白靶点
输入要求:
- 蛋白:PDB文件或氨基酸序列
- 配体:SMILES字符串或结构文件(SDF/MOL2)
命令:
bash
python -m inference \
--config default_inference_args.yaml \
--protein_path protein.pdb \
--ligand "CC(=O)Oc1ccccc1C(=O)O" \
--out_dir results/single_docking/替代方案(使用蛋白序列):
bash
python -m inference \
--config default_inference_args.yaml \
--protein_sequence "MSKGEELFTGVVPILVELDGDVNGHKF..." \
--ligand ligand.sdf \
--out_dir results/sequence_docking/输出结构:
results/single_docking/
├── rank_1.sdf # 排名第一的构象
├── rank_2.sdf # 排名第二的构象
├── ...
├── rank_10.sdf # 第10个构象(默认生成10个样本)
└── confidence_scores.txtWorkflow 2: Batch Processing Multiple Complexes
工作流2:多组复合物批量处理
Use Case: Dock multiple ligands to proteins, virtual screening campaigns
Step 1: Prepare Batch CSV
Use the provided script to create or validate batch input:
bash
undefined适用场景: 将多个配体对接至蛋白,虚拟筛选任务
步骤1:准备批量CSV文件
使用提供的脚本创建或验证批量输入文件:
bash
undefinedCreate template
创建模板
python scripts/prepare_batch_csv.py --create --output batch_input.csv
python scripts/prepare_batch_csv.py --create --output batch_input.csv
Validate existing CSV
验证现有CSV
python scripts/prepare_batch_csv.py my_input.csv --validate
**CSV Format:**
```csv
complex_name,protein_path,ligand_description,protein_sequence
complex1,protein1.pdb,CC(=O)Oc1ccccc1C(=O)O,
complex2,,COc1ccc(C#N)cc1,MSKGEELFT...
complex3,protein3.pdb,ligand3.sdf,Required Columns:
- : Unique identifier
complex_name - : PDB file path (leave empty if using sequence)
protein_path - : SMILES string or ligand file path
ligand_description - : Amino acid sequence (leave empty if using PDB)
protein_sequence
Step 2: Run Batch Docking
bash
python -m inference \
--config default_inference_args.yaml \
--protein_ligand_csv batch_input.csv \
--out_dir results/batch/ \
--batch_size 10For Large Virtual Screening (>100 compounds):
Pre-compute protein embeddings for faster processing:
bash
undefinedpython scripts/prepare_batch_csv.py my_input.csv --validate
**CSV格式:**
```csv
complex_name,protein_path,ligand_description,protein_sequence
complex1,protein1.pdb,CC(=O)Oc1ccccc1C(=O)O,
complex2,,COc1ccc(C#N)cc1,MSKGEELFT...
complex3,protein3.pdb,ligand3.sdf,必填列:
- : 唯一标识符
complex_name - : PDB文件路径(使用序列时留空)
protein_path - : SMILES字符串或配体文件路径
ligand_description - : 氨基酸序列(使用PDB时留空)
protein_sequence
步骤2:运行批量对接
bash
python -m inference \
--config default_inference_args.yaml \
--protein_ligand_csv batch_input.csv \
--out_dir results/batch/ \
--batch_size 10针对大规模虚拟筛选(>100个化合物):
预计算蛋白嵌入以加快处理速度:
bash
undefinedPre-compute embeddings
预计算嵌入
python datasets/esm_embedding_preparation.py
--protein_ligand_csv screening_input.csv
--out_file protein_embeddings.pt
--protein_ligand_csv screening_input.csv
--out_file protein_embeddings.pt
python datasets/esm_embedding_preparation.py
--protein_ligand_csv screening_input.csv
--out_file protein_embeddings.pt
--protein_ligand_csv screening_input.csv
--out_file protein_embeddings.pt
Run with pre-computed embeddings
使用预计算的嵌入运行
python -m inference
--config default_inference_args.yaml
--protein_ligand_csv screening_input.csv
--esm_embeddings_path protein_embeddings.pt
--out_dir results/screening/
--config default_inference_args.yaml
--protein_ligand_csv screening_input.csv
--esm_embeddings_path protein_embeddings.pt
--out_dir results/screening/
undefinedpython -m inference
--config default_inference_args.yaml
--protein_ligand_csv screening_input.csv
--esm_embeddings_path protein_embeddings.pt
--out_dir results/screening/
--config default_inference_args.yaml
--protein_ligand_csv screening_input.csv
--esm_embeddings_path protein_embeddings.pt
--out_dir results/screening/
undefinedWorkflow 3: Analyzing Results
工作流3:结果分析
After docking completes, analyze confidence scores and rank predictions:
bash
undefined对接完成后,分析置信度评分并对预测结果排序:
bash
undefinedAnalyze all results
分析所有结果
python scripts/analyze_results.py results/batch/
python scripts/analyze_results.py results/batch/
Show top 5 per complex
显示每组复合物的前5个结果
python scripts/analyze_results.py results/batch/ --top 5
python scripts/analyze_results.py results/batch/ --top 5
Filter by confidence threshold
按置信度阈值筛选
python scripts/analyze_results.py results/batch/ --threshold 0.0
python scripts/analyze_results.py results/batch/ --threshold 0.0
Export to CSV
导出至CSV
python scripts/analyze_results.py results/batch/ --export summary.csv
python scripts/analyze_results.py results/batch/ --export summary.csv
Show top 20 predictions across all complexes
显示所有复合物中排名前20的预测结果
python scripts/analyze_results.py results/batch/ --best 20
The analysis script:
- Parses confidence scores from all predictions
- Classifies as High (>0), Moderate (-1.5 to 0), or Low (<-1.5)
- Ranks predictions within and across complexes
- Generates statistical summaries
- Exports results to CSV for downstream analysispython scripts/analyze_results.py results/batch/ --best 20
分析脚本功能:
- 解析所有预测结果的置信度评分
- 分为高(>0)、中(-1.5至0)、低(<-1.5)三个等级
- 在组内和组间对预测结果排序
- 生成统计摘要
- 导出结果至CSV以便后续分析Confidence Score Interpretation
置信度评分解读
Understanding Scores:
| Score Range | Confidence Level | Interpretation |
|---|---|---|
| > 0 | High | Strong prediction, likely accurate |
| -1.5 to 0 | Moderate | Reasonable prediction, validate carefully |
| < -1.5 | Low | Uncertain prediction, requires validation |
Critical Notes:
- Confidence ≠ Affinity: High confidence means model certainty about structure, NOT strong binding
- Context Matters: Adjust expectations for:
- Large ligands (>500 Da): Lower confidence expected
- Multiple protein chains: May decrease confidence
- Novel protein families: May underperform
- Multiple Samples: Review top 3-5 predictions, look for consensus
For detailed guidance: Read using the Read tool
references/confidence_and_limitations.md评分理解:
| 评分范围 | 置信度等级 | 解读 |
|---|---|---|
| > 0 | 高 | 预测结果可靠,大概率准确 |
| -1.5 至 0 | 中 | 预测结果合理,需仔细验证 |
| < -1.5 | 低 | 预测结果不确定,需要验证 |
重要提示:
- 置信度 ≠ 亲和力:高置信度表示模型对结构的确定性高,而非结合力强
- 场景影响:根据以下情况调整预期:
- 大配体(>500 Da):置信度通常较低
- 多链蛋白:可能降低置信度
- 新型蛋白家族:性能可能不佳
- 多样本参考:查看前3-5个预测结果,寻找共识
详细指导: 使用Read工具阅读
references/confidence_and_limitations.mdParameter Customization
参数自定义
Using Custom Configuration
使用自定义配置
Create custom configuration for specific use cases:
bash
undefined针对特定场景创建自定义配置:
bash
undefinedCopy template
复制模板
cp assets/custom_inference_config.yaml my_config.yaml
cp assets/custom_inference_config.yaml my_config.yaml
Edit parameters (see template for presets)
编辑参数(模板中有预设示例)
Then run with custom config
使用自定义配置运行
python -m inference
--config my_config.yaml
--protein_ligand_csv input.csv
--out_dir results/
--config my_config.yaml
--protein_ligand_csv input.csv
--out_dir results/
undefinedpython -m inference
--config my_config.yaml
--protein_ligand_csv input.csv
--out_dir results/
--config my_config.yaml
--protein_ligand_csv input.csv
--out_dir results/
undefinedKey Parameters to Adjust
可调整的关键参数
Sampling Density:
- → Increase to 20-40 for difficult cases
samples_per_complex: 10 - More samples = better coverage but longer runtime
Inference Steps:
- → Increase to 25-30 for higher accuracy
inference_steps: 20 - More steps = potentially better quality but slower
Temperature Parameters (control diversity):
- → Increase for flexible ligands (8-10)
temp_sampling_tor: 7.04 - → Decrease for rigid ligands (5-6)
temp_sampling_tor: 7.04 - Higher temperature = more diverse poses
Presets Available in Template:
- High Accuracy: More samples + steps, lower temperature
- Fast Screening: Fewer samples, faster
- Flexible Ligands: Increased torsion temperature
- Rigid Ligands: Decreased torsion temperature
For complete parameter reference: Read using the Read tool
references/parameters_reference.md采样密度:
- → 复杂场景下可增加至20-40
samples_per_complex: 10 - 样本越多,覆盖越全面,但运行时间越长
推理步数:
- → 为提高准确性可增加至25-30
inference_steps: 20 - 步数越多,质量可能越好,但速度越慢
温度参数(控制多样性):
- → 柔性配体可提高至8-10
temp_sampling_tor: 7.04 - → 刚性配体可降低至5-6
temp_sampling_tor: 7.04 - 温度越高,构象多样性越丰富
模板中的预设配置:
- 高精度模式:更多样本+步数,低温度
- 快速筛选模式:更少样本,速度更快
- 柔性配体模式:提高扭转温度
- 刚性配体模式:降低扭转温度
完整参数参考: 使用Read工具阅读
references/parameters_reference.mdAdvanced Techniques
高级技巧
Ensemble Docking (Protein Flexibility)
集成对接(蛋白柔性处理)
For proteins with known flexibility, dock to multiple conformations:
python
undefined针对已知存在柔性的蛋白,可对接至多个构象:
python
undefinedCreate ensemble CSV
创建集成CSV
import pandas as pd
conformations = ["conf1.pdb", "conf2.pdb", "conf3.pdb"]
ligand = "CC(=O)Oc1ccccc1C(=O)O"
data = {
"complex_name": [f"ensemble_{i}" for i in range(len(conformations))],
"protein_path": conformations,
"ligand_description": [ligand] * len(conformations),
"protein_sequence": [""] * len(conformations)
}
pd.DataFrame(data).to_csv("ensemble_input.csv", index=False)
Run docking with increased sampling:
```bash
python -m inference \
--config default_inference_args.yaml \
--protein_ligand_csv ensemble_input.csv \
--samples_per_complex 20 \
--out_dir results/ensemble/import pandas as pd
conformations = ["conf1.pdb", "conf2.pdb", "conf3.pdb"]
ligand = "CC(=O)Oc1ccccc1C(=O)O"
data = {
"complex_name": [f"ensemble_{i}" for i in range(len(conformations))],
"protein_path": conformations,
"ligand_description": [ligand] * len(conformations),
"protein_sequence": [""] * len(conformations)
}
pd.DataFrame(data).to_csv("ensemble_input.csv", index=False)
增加采样量运行对接:
```bash
python -m inference \
--config default_inference_args.yaml \
--protein_ligand_csv ensemble_input.csv \
--samples_per_complex 20 \
--out_dir results/ensemble/Integration with Scoring Functions
与评分函数集成
DiffDock generates poses; combine with other tools for affinity:
GNINA (Fast neural network scoring):
bash
for pose in results/*.sdf; do
gnina -r protein.pdb -l "$pose" --score_only
doneMM/GBSA (More accurate, slower):
Use AmberTools MMPBSA.py or gmx_MMPBSA after energy minimization
Free Energy Calculations (Most accurate):
Use OpenMM + OpenFE or GROMACS for FEP/TI calculations
Recommended Workflow:
- DiffDock → Generate poses with confidence scores
- Visual inspection → Check structural plausibility
- GNINA or MM/GBSA → Rescore and rank by affinity
- Experimental validation → Biochemical assays
DiffDock生成构象后,可结合其他工具进行亲和力评估:
GNINA(快速神经网络评分):
bash
for pose in results/*.sdf; do
gnina -r protein.pdb -l "$pose" --score_only
doneMM/GBSA(更准确,速度较慢):
能量最小化后使用AmberTools MMPBSA.py或gmx_MMPBSA
自由能计算(最准确):
使用OpenMM + OpenFE或GROMACS进行FEP/TI计算
推荐工作流:
- DiffDock → 生成带置信度评分的构象
- 可视化检查 → 验证结构合理性
- GNINA或MM/GBSA → 重新评分并按亲和力排序
- 实验验证 → 生化分析
Limitations and Scope
局限性与适用范围
DiffDock IS Designed For:
- Small molecule ligands (typically 100-1000 Da)
- Drug-like organic compounds
- Small peptides (<20 residues)
- Single or multi-chain proteins
DiffDock IS NOT Designed For:
- Large biomolecules (protein-protein docking) → Use DiffDock-PP or AlphaFold-Multimer
- Large peptides (>20 residues) → Use alternative methods
- Covalent docking → Use specialized covalent docking tools
- Binding affinity prediction → Combine with scoring functions
- Membrane proteins → Not specifically trained, use with caution
For complete limitations: Read using the Read tool
references/confidence_and_limitations.mdDiffDock适用场景:
- 小分子配体(通常100-1000 Da)
- 类药有机化合物
- 小肽(<20个残基)
- 单链或多链蛋白
DiffDock不适用场景:
- 大型生物分子(蛋白-蛋白对接)→ 使用DiffDock-PP或AlphaFold-Multimer
- 大肽(>20个残基)→ 使用其他方法
- 共价对接 → 使用专门的共价对接工具
- 结合亲和力预测 → 结合评分函数使用
- 膜蛋白 → 未针对性训练,需谨慎使用
完整局限性说明: 使用Read工具阅读
references/confidence_and_limitations.mdTroubleshooting
故障排除
Common Issues
常见问题
Issue: Low confidence scores across all predictions
- Cause: Large/unusual ligands, unclear binding site, protein flexibility
- Solution: Increase (20-40), try ensemble docking, validate protein structure
samples_per_complex
Issue: Out of memory errors
- Cause: GPU memory insufficient for batch size
- Solution: Reduce or process fewer complexes at once
--batch_size 2
Issue: Slow performance
- Cause: Running on CPU instead of GPU
- Solution: Verify CUDA with , use GPU
python -c "import torch; print(torch.cuda.is_available())"
Issue: Unrealistic binding poses
- Cause: Poor protein preparation, ligand too large, wrong binding site
- Solution: Check protein for missing residues, remove far waters, consider specifying binding site
Issue: "Module not found" errors
- Cause: Missing dependencies or wrong environment
- Solution: Run to diagnose
python scripts/setup_check.py
问题:所有预测结果的置信度评分都很低
- 原因:配体过大/特殊、结合位点不明确、蛋白柔性
- 解决方案:增加(20-40),尝试集成对接,验证蛋白结构
samples_per_complex
问题:内存不足错误
- 原因:GPU内存不足以支持批量大小
- 解决方案:减小或分批处理更少的复合物
--batch_size 2
问题:运行速度慢
- 原因:使用CPU而非GPU
- 解决方案:通过验证CUDA,使用GPU运行
python -c "import torch; print(torch.cuda.is_available())"
问题:结合构象不符合实际
- 原因:蛋白预处理不当、配体过大、结合位点错误
- 解决方案:检查蛋白是否有缺失残基,移除远端水分子,考虑指定结合位点
问题:"模块未找到"错误
- 原因:依赖项缺失或环境错误
- 解决方案:运行诊断问题
python scripts/setup_check.py
Performance Optimization
性能优化
For Best Results:
- Use GPU (essential for practical use)
- Pre-compute ESM embeddings for repeated protein use
- Batch process multiple complexes together
- Start with default parameters, then tune if needed
- Validate protein structures (resolve missing residues)
- Use canonical SMILES for ligands
最佳实践:
- 使用GPU(实际使用的必备条件)
- 重复使用同一蛋白时预计算ESM嵌入
- 批量处理多个复合物
- 先使用默认参数,再根据系统需求调整
- 验证蛋白结构(补全缺失残基)
- 配体使用标准SMILES格式
Graphical User Interface
图形用户界面
For interactive use, launch the web interface:
bash
python app/main.py如需交互式使用,启动Web界面:
bash
python app/main.pyNavigate to http://localhost:7860
Or use the online demo without installation:
- https://huggingface.co/spaces/reginabarzilaygroup/DiffDock-Web
或使用在线演示,无需安装:
- https://huggingface.co/spaces/reginabarzilaygroup/DiffDock-WebResources
资源
Helper Scripts (scripts/
)
scripts/辅助脚本(scripts/
)
scripts/prepare_batch_csv.py- Create templates with example entries
- Validate file paths and SMILES strings
- Check for required columns and format issues
analyze_results.py- Parse results from single or batch runs
- Generate statistical summaries
- Export to CSV for downstream analysis
- Identify top predictions across complexes
setup_check.py- Check Python version and dependencies
- Verify PyTorch and CUDA availability
- Test RDKit and PyTorch Geometric installation
- Provide installation instructions if needed
prepare_batch_csv.py- 创建带示例条目的模板
- 验证文件路径和SMILES字符串
- 检查必填列和格式问题
analyze_results.py- 解析单个或批量运行的结果
- 生成统计摘要
- 导出至CSV以便后续分析
- 识别所有复合物中的最优预测结果
setup_check.py- 检查Python版本和依赖项
- 验证PyTorch和CUDA可用性
- 测试RDKit和PyTorch Geometric安装
- 若需要,提供安装指导
Reference Documentation (references/
)
references/参考文档(references/
)
references/parameters_reference.md- All command-line options and configuration parameters
- Default values and acceptable ranges
- Temperature parameters for controlling diversity
- Model checkpoint locations and version flags
Read this file when users need:
- Detailed parameter explanations
- Fine-tuning guidance for specific systems
- Alternative sampling strategies
confidence_and_limitations.md- Detailed confidence score interpretation
- When to trust predictions
- Scope and limitations of DiffDock
- Integration with complementary tools
- Troubleshooting prediction quality
Read this file when users need:
- Help interpreting confidence scores
- Understanding when NOT to use DiffDock
- Guidance on combining with other tools
- Validation strategies
workflows_examples.md- Detailed installation instructions
- Step-by-step examples for all workflows
- Advanced integration patterns
- Troubleshooting common issues
- Best practices and optimization tips
Read this file when users need:
- Complete workflow examples with code
- Integration with GNINA, OpenMM, or other tools
- Virtual screening workflows
- Ensemble docking procedures
parameters_reference.md- 所有命令行选项和配置参数
- 默认值和可接受范围
- 控制多样性的温度参数
- 模型checkpoint位置和版本标志
当用户需要以下信息时阅读本文件:
- 详细的参数解释
- 特定系统的微调指导
- 替代采样策略
confidence_and_limitations.md- 详细的置信度评分解读
- 何时可以信任预测结果
- DiffDock的适用范围与局限性
- 与互补工具的集成
- 预测质量故障排除
当用户需要以下信息时阅读本文件:
- 帮助解读置信度评分
- 了解何时不应使用DiffDock
- 与其他工具结合的指导
- 验证策略
workflows_examples.md- 详细的安装说明
- 所有工作流的分步示例
- 高级集成模式
- 常见问题故障排除
- 最佳实践与优化技巧
当用户需要以下信息时阅读本文件:
- 带代码的完整工作流示例
- 与GNINA、OpenMM或其他工具的集成
- 虚拟筛选工作流
- 集成对接流程
Assets (assets/
)
assets/资源文件(assets/
)
assets/batch_template.csv- Pre-formatted CSV with required columns
- Example entries showing different input types
- Ready to customize with actual data
custom_inference_config.yaml- Annotated YAML with all parameters
- Four preset configurations for common use cases
- Detailed comments explaining each parameter
- Ready to customize and use
batch_template.csv- 预格式化的CSV,包含必填列
- 展示不同输入类型的示例条目
- 可直接自定义填入实际数据
custom_inference_config.yaml- 带注释的YAML,包含所有参数
- 针对常见场景的4种预设配置
- 每个参数的详细注释说明
- 可直接自定义使用
Best Practices
最佳实践
- Always verify environment with before starting large jobs
setup_check.py - Validate batch CSVs with to catch errors early
prepare_batch_csv.py - Start with defaults then tune parameters based on system-specific needs
- Generate multiple samples (10-40) for robust predictions
- Visual inspection of top poses before downstream analysis
- Combine with scoring functions for affinity assessment
- Use confidence scores for initial ranking, not final decisions
- Pre-compute embeddings for virtual screening campaigns
- Document parameters used for reproducibility
- Validate results experimentally when possible
- 启动大型任务前,始终使用验证环境
setup_check.py - 使用验证批量CSV,提前发现错误
prepare_batch_csv.py - 先使用默认参数,再根据系统特定需求调整
- 生成多个样本(10-40个)以获得可靠的预测结果
- 下游分析前先对排名靠前的构象进行可视化检查
- 结合评分函数进行亲和力评估
- 使用置信度评分进行初步排序,而非作为最终决策依据
- 虚拟筛选任务中预计算嵌入
- 记录使用的参数以保证可复现性
- 尽可能通过实验验证结果
Citations
引用
When using DiffDock, cite the appropriate papers:
DiffDock-L (current default model):
Stärk et al. (2024) "DiffDock-L: Improving Molecular Docking with Diffusion Models"
arXiv:2402.18396Original DiffDock:
Corso et al. (2023) "DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking"
ICLR 2023, arXiv:2210.01776使用DiffDock时,请引用相应论文:
DiffDock-L(当前默认模型):
Stärk et al. (2024) "DiffDock-L: Improving Molecular Docking with Diffusion Models"
arXiv:2402.18396原始DiffDock:
Corso et al. (2023) "DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking"
ICLR 2023, arXiv:2210.01776Additional Resources
额外资源
- GitHub Repository: https://github.com/gcorso/DiffDock
- Online Demo: https://huggingface.co/spaces/reginabarzilaygroup/DiffDock-Web
- DiffDock-L Paper: https://arxiv.org/abs/2402.18396
- Original Paper: https://arxiv.org/abs/2210.01776