protein-design-workflow
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseProtein Design Workflow Guide
蛋白质设计工作流指南
Standard binder design pipeline
标准结合体设计流程
Overview
概述
Target Preparation --> Backbone Generation --> Sequence Design
| | |
v v v
(pdb skill) (rfdiffusion) (proteinmpnn)
| |
v v
Structure Validation --> Filtering
| |
v v
(alphafold/chai) (protein-qc)靶点准备 --> 骨架生成 --> 序列设计
| | |
v v v
(pdb skill) (rfdiffusion) (proteinmpnn)
| |
v v
结构验证 --> 筛选
| |
v v
(alphafold/chai) (protein-qc)Phase 1: Target preparation
阶段1:靶点准备
1.1 Obtain target structure
1.1 获取靶点结构
bash
undefinedbash
undefinedDownload from PDB
从PDB下载
curl -o target.pdb "https://files.rcsb.org/download/XXXX.pdb"
undefinedcurl -o target.pdb "https://files.rcsb.org/download/XXXX.pdb"
undefined1.2 Clean and prepare
1.2 清理与预处理
python
undefinedpython
undefinedExtract target chain
提取靶点链
Remove waters, ligands if needed
按需去除水分子、配体
Trim to binding region + 10A buffer
裁剪至结合区域 + 10A缓冲带
undefinedundefined1.3 Select hotspots
1.3 选择热点残基
- Choose 3-6 exposed residues
- Prefer charged/aromatic (K, R, E, D, W, Y, F)
- Check surface accessibility
- Verify residue numbering
Output: , hotspot list
target_prepared.pdb- 选择3-6个暴露的残基
- 优先选择带电/芳香族残基(K、R、E、D、W、Y、F)
- 检查表面可及性
- 验证残基编号
输出:、热点残基列表
target_prepared.pdbPhase 2: Backbone generation
阶段2:骨架生成
Option A: RFdiffusion (diverse exploration)
选项A:RFdiffusion(多样化探索)
bash
modal run modal_rfdiffusion.py \
--pdb target_prepared.pdb \
--contigs "A1-150/0 70-100" \
--hotspot "A45,A67,A89" \
--num-designs 500bash
modal run modal_rfdiffusion.py \
--pdb target_prepared.pdb \
--contigs "A1-150/0 70-100" \
--hotspot "A45,A67,A89" \
--num-designs 500Option B: BindCraft (end-to-end)
选项B:BindCraft(端到端)
bash
modal run modal_bindcraft.py \
--target-pdb target_prepared.pdb \
--hotspots "A45,A67,A89" \
--num-designs 100Output: 100-500 backbone PDBs
bash
modal run modal_bindcraft.py \
--target-pdb target_prepared.pdb \
--hotspots "A45,A67,A89" \
--num-designs 100输出:100-500个骨架PDB文件
Phase 3: Sequence design
阶段3:序列设计
For RFdiffusion backbones
针对RFdiffusion生成的骨架
bash
for backbone in backbones/*.pdb; do
modal run modal_proteinmpnn.py \
--pdb-path "$backbone" \
--num-seq-per-target 8 \
--sampling-temp 0.1
doneOutput: 8 sequences per backbone (800-4000 total)
bash
for backbone in backbones/*.pdb; do
modal run modal_proteinmpnn.py \
--pdb-path "$backbone" \
--num-seq-per-target 8 \
--sampling-temp 0.1
done输出:每个骨架对应8条序列(总计800-4000条)
Phase 4: Structure validation
阶段4:结构验证
Predict complexes
预测复合物结构
bash
undefinedbash
undefinedPrepare FASTA with binder + target
准备包含结合体+靶点的FASTA文件
binder:target format for multimer
采用binder:target格式用于多聚体预测
modal run modal_colabfold.py
--input-faa all_sequences.fasta
--out-dir predictions/
--input-faa all_sequences.fasta
--out-dir predictions/
**Output**: AF2 predictions with pLDDT, ipTM, PAEmodal run modal_colabfold.py
--input-faa all_sequences.fasta
--out-dir predictions/
--input-faa all_sequences.fasta
--out-dir predictions/
**输出**:带有pLDDT、ipTM、PAE指标的AF2预测结果Phase 5: Filtering and selection
阶段5:筛选与选择
Apply standard thresholds
应用标准阈值
python
import pandas as pdpython
import pandas as pdLoad metrics
加载指标数据
designs = pd.read_csv('all_metrics.csv')
designs = pd.read_csv('all_metrics.csv')
Filter
筛选
filtered = designs[
(designs['pLDDT'] > 0.85) &
(designs['ipTM'] > 0.50) &
(designs['PAE_interface'] < 10) &
(designs['scRMSD'] < 2.0) &
(designs['esm2_pll'] > 0.0)
]
filtered = designs[
(designs['pLDDT'] > 0.85) &
(designs['ipTM'] > 0.50) &
(designs['PAE_interface'] < 10) &
(designs['scRMSD'] < 2.0) &
(designs['esm2_pll'] > 0.0)
]
Rank by composite score
按综合评分排序
filtered['score'] = (
0.3 * filtered['pLDDT'] +
0.3 * filtered['ipTM'] +
0.2 * (1 - filtered['PAE_interface'] / 20) +
0.2 * filtered['esm2_pll']
)
top_designs = filtered.nlargest(50, 'score')
**Output**: 50-200 filtered candidatesfiltered['score'] = (
0.3 * filtered['pLDDT'] +
0.3 * filtered['ipTM'] +
0.2 * (1 - filtered['PAE_interface'] / 20) +
0.2 * filtered['esm2_pll']
)
top_designs = filtered.nlargest(50, 'score')
**输出**:50-200个筛选后的候选结构Resource planning
资源规划
Compute requirements
计算需求
| Stage | GPU | Time (100 designs) |
|---|---|---|
| RFdiffusion | A10G | 30 min |
| ProteinMPNN | T4 | 15 min |
| ColabFold | A100 | 4-8 hours |
| Filtering | CPU | 15 min |
| 阶段 | GPU | 时间(100个设计) |
|---|---|---|
| RFdiffusion | A10G | 30分钟 |
| ProteinMPNN | T4 | 15分钟 |
| ColabFold | A100 | 4-8小时 |
| 筛选 | CPU | 15分钟 |
Total timeline
总时间线
- Small campaign (100 designs): 8-12 hours
- Medium campaign (500 designs): 24-48 hours
- Large campaign (1000+ designs): 2-5 days
- 小型项目(100个设计):8-12小时
- 中型项目(500个设计):24-48小时
- 大型项目(1000+个设计):2-5天
Quality checkpoints
质量检查点
After backbone generation
骨架生成后
- Visual inspection of diverse backbones
- Secondary structure present
- No clashes with target
- 可视化检查骨架多样性
- 确认存在二级结构
- 与靶点无冲突
After sequence design
序列设计后
- ESM2 PLL > 0.0 for most sequences
- No unwanted cysteines (unless intentional)
- Reasonable sequence diversity
- 大多数序列的ESM2 PLL > 0.0
- 无多余半胱氨酸(除非刻意设计)
- 序列多样性合理
After validation
验证后
- pLDDT > 0.85
- ipTM > 0.50
- PAE_interface < 10
- Self-consistency RMSD < 2.0 A
- pLDDT > 0.85
- ipTM > 0.50
- PAE_interface < 10
- 自洽性RMSD < 2.0 Å
Final selection
最终选择
- Diverse sequences (cluster if needed)
- Manufacturable (no problematic motifs)
- Reasonable molecular weight
- 序列多样化(必要时聚类)
- 可制造性(无问题基序)
- 分子量合理
Common issues
常见问题
| Problem | Solution |
|---|---|
| Low ipTM | Check hotspots, increase designs |
| Poor diversity | Higher temperature, more backbones |
| High scRMSD | Backbone may be unusual |
| Low pLDDT | Check design quality |
| 问题 | 解决方案 |
|---|---|
| ipTM值低 | 检查热点残基,增加设计数量 |
| 多样性不足 | 提高采样温度,生成更多骨架 |
| scRMSD值高 | 骨架可能存在异常 |
| pLDDT值低 | 检查设计质量 |
Advanced workflows
高级工作流
Multi-tool combination
多工具组合
- RFdiffusion for initial backbones
- ColabDesign for refinement
- ProteinMPNN diversification
- AF2 final validation
- 使用RFdiffusion生成初始骨架
- 使用ColabDesign进行优化
- 使用ProteinMPNN实现多样化
- 使用AF2进行最终验证
Iterative refinement
迭代优化
- Run initial campaign
- Analyze failures
- Adjust hotspots/parameters
- Repeat with insights
- 运行初始项目
- 分析失败原因
- 调整热点残基/参数
- 结合经验重复设计