protein-design-workflow

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Protein Design Workflow Guide

蛋白质设计工作流指南

Standard binder design pipeline

标准结合体设计流程

Overview

概述

Target Preparation --> Backbone Generation --> Sequence Design
         |                     |                     |
         v                     v                     v
    (pdb skill)          (rfdiffusion)         (proteinmpnn)
                               |                     |
                               v                     v
                        Structure Validation --> Filtering
                               |                     |
                               v                     v
                         (alphafold/chai)      (protein-qc)
靶点准备 --> 骨架生成 --> 序列设计
         |                     |                     |
         v                     v                     v
    (pdb skill)          (rfdiffusion)         (proteinmpnn)
                               |                     |
                               v                     v
                        结构验证 --> 筛选
                               |                     |
                               v                     v
                         (alphafold/chai)      (protein-qc)

Phase 1: Target preparation

阶段1:靶点准备

1.1 Obtain target structure

1.1 获取靶点结构

bash
undefined
bash
undefined

Download from PDB

从PDB下载

undefined
undefined

1.2 Clean and prepare

1.2 清理与预处理

python
undefined
python
undefined

Extract target chain

提取靶点链

Remove waters, ligands if needed

按需去除水分子、配体

Trim to binding region + 10A buffer

裁剪至结合区域 + 10A缓冲带

undefined
undefined

1.3 Select hotspots

1.3 选择热点残基

  • Choose 3-6 exposed residues
  • Prefer charged/aromatic (K, R, E, D, W, Y, F)
  • Check surface accessibility
  • Verify residue numbering
Output:
target_prepared.pdb
, hotspot list
  • 选择3-6个暴露的残基
  • 优先选择带电/芳香族残基(K、R、E、D、W、Y、F)
  • 检查表面可及性
  • 验证残基编号
输出
target_prepared.pdb
、热点残基列表

Phase 2: Backbone generation

阶段2:骨架生成

Option A: RFdiffusion (diverse exploration)

选项A:RFdiffusion(多样化探索)

bash
modal run modal_rfdiffusion.py \
  --pdb target_prepared.pdb \
  --contigs "A1-150/0 70-100" \
  --hotspot "A45,A67,A89" \
  --num-designs 500
bash
modal run modal_rfdiffusion.py \
  --pdb target_prepared.pdb \
  --contigs "A1-150/0 70-100" \
  --hotspot "A45,A67,A89" \
  --num-designs 500

Option B: BindCraft (end-to-end)

选项B:BindCraft(端到端)

bash
modal run modal_bindcraft.py \
  --target-pdb target_prepared.pdb \
  --hotspots "A45,A67,A89" \
  --num-designs 100
Output: 100-500 backbone PDBs
bash
modal run modal_bindcraft.py \
  --target-pdb target_prepared.pdb \
  --hotspots "A45,A67,A89" \
  --num-designs 100
输出:100-500个骨架PDB文件

Phase 3: Sequence design

阶段3:序列设计

For RFdiffusion backbones

针对RFdiffusion生成的骨架

bash
for backbone in backbones/*.pdb; do
  modal run modal_proteinmpnn.py \
    --pdb-path "$backbone" \
    --num-seq-per-target 8 \
    --sampling-temp 0.1
done
Output: 8 sequences per backbone (800-4000 total)
bash
for backbone in backbones/*.pdb; do
  modal run modal_proteinmpnn.py \
    --pdb-path "$backbone" \
    --num-seq-per-target 8 \
    --sampling-temp 0.1
done
输出:每个骨架对应8条序列(总计800-4000条)

Phase 4: Structure validation

阶段4:结构验证

Predict complexes

预测复合物结构

bash
undefined
bash
undefined

Prepare FASTA with binder + target

准备包含结合体+靶点的FASTA文件

binder:target format for multimer

采用binder:target格式用于多聚体预测

modal run modal_colabfold.py
--input-faa all_sequences.fasta
--out-dir predictions/

**Output**: AF2 predictions with pLDDT, ipTM, PAE
modal run modal_colabfold.py
--input-faa all_sequences.fasta
--out-dir predictions/

**输出**:带有pLDDT、ipTM、PAE指标的AF2预测结果

Phase 5: Filtering and selection

阶段5:筛选与选择

Apply standard thresholds

应用标准阈值

python
import pandas as pd
python
import pandas as pd

Load metrics

加载指标数据

designs = pd.read_csv('all_metrics.csv')
designs = pd.read_csv('all_metrics.csv')

Filter

筛选

filtered = designs[ (designs['pLDDT'] > 0.85) & (designs['ipTM'] > 0.50) & (designs['PAE_interface'] < 10) & (designs['scRMSD'] < 2.0) & (designs['esm2_pll'] > 0.0) ]
filtered = designs[ (designs['pLDDT'] > 0.85) & (designs['ipTM'] > 0.50) & (designs['PAE_interface'] < 10) & (designs['scRMSD'] < 2.0) & (designs['esm2_pll'] > 0.0) ]

Rank by composite score

按综合评分排序

filtered['score'] = ( 0.3 * filtered['pLDDT'] + 0.3 * filtered['ipTM'] + 0.2 * (1 - filtered['PAE_interface'] / 20) + 0.2 * filtered['esm2_pll'] )
top_designs = filtered.nlargest(50, 'score')

**Output**: 50-200 filtered candidates
filtered['score'] = ( 0.3 * filtered['pLDDT'] + 0.3 * filtered['ipTM'] + 0.2 * (1 - filtered['PAE_interface'] / 20) + 0.2 * filtered['esm2_pll'] )
top_designs = filtered.nlargest(50, 'score')

**输出**:50-200个筛选后的候选结构

Resource planning

资源规划

Compute requirements

计算需求

StageGPUTime (100 designs)
RFdiffusionA10G30 min
ProteinMPNNT415 min
ColabFoldA1004-8 hours
FilteringCPU15 min
阶段GPU时间(100个设计)
RFdiffusionA10G30分钟
ProteinMPNNT415分钟
ColabFoldA1004-8小时
筛选CPU15分钟

Total timeline

总时间线

  • Small campaign (100 designs): 8-12 hours
  • Medium campaign (500 designs): 24-48 hours
  • Large campaign (1000+ designs): 2-5 days
  • 小型项目(100个设计):8-12小时
  • 中型项目(500个设计):24-48小时
  • 大型项目(1000+个设计):2-5天

Quality checkpoints

质量检查点

After backbone generation

骨架生成后

  • Visual inspection of diverse backbones
  • Secondary structure present
  • No clashes with target
  • 可视化检查骨架多样性
  • 确认存在二级结构
  • 与靶点无冲突

After sequence design

序列设计后

  • ESM2 PLL > 0.0 for most sequences
  • No unwanted cysteines (unless intentional)
  • Reasonable sequence diversity
  • 大多数序列的ESM2 PLL > 0.0
  • 无多余半胱氨酸(除非刻意设计)
  • 序列多样性合理

After validation

验证后

  • pLDDT > 0.85
  • ipTM > 0.50
  • PAE_interface < 10
  • Self-consistency RMSD < 2.0 A
  • pLDDT > 0.85
  • ipTM > 0.50
  • PAE_interface < 10
  • 自洽性RMSD < 2.0 Å

Final selection

最终选择

  • Diverse sequences (cluster if needed)
  • Manufacturable (no problematic motifs)
  • Reasonable molecular weight
  • 序列多样化(必要时聚类)
  • 可制造性(无问题基序)
  • 分子量合理

Common issues

常见问题

ProblemSolution
Low ipTMCheck hotspots, increase designs
Poor diversityHigher temperature, more backbones
High scRMSDBackbone may be unusual
Low pLDDTCheck design quality
问题解决方案
ipTM值低检查热点残基,增加设计数量
多样性不足提高采样温度,生成更多骨架
scRMSD值高骨架可能存在异常
pLDDT值低检查设计质量

Advanced workflows

高级工作流

Multi-tool combination

多工具组合

  1. RFdiffusion for initial backbones
  2. ColabDesign for refinement
  3. ProteinMPNN diversification
  4. AF2 final validation
  1. 使用RFdiffusion生成初始骨架
  2. 使用ColabDesign进行优化
  3. 使用ProteinMPNN实现多样化
  4. 使用AF2进行最终验证

Iterative refinement

迭代优化

  1. Run initial campaign
  2. Analyze failures
  3. Adjust hotspots/parameters
  4. Repeat with insights
  1. 运行初始项目
  2. 分析失败原因
  3. 调整热点残基/参数
  4. 结合经验重复设计