solublempnn

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

SolubleMPNN Solubility-Optimized Design

SolubleMPNN 溶解性优化设计

Prerequisites

前置要求

RequirementMinimumRecommended
Python3.8+3.10
CUDA11.0+11.7+
GPU VRAM8GB16GB (T4)
RAM8GB16GB
要求最低配置推荐配置
Python3.8+3.10
CUDA11.0+11.7+
GPU VRAM8GB16GB (T4)
RAM8GB16GB

How to run

运行方法

First time? See Installation Guide to set up Modal and biomodals.
首次使用? 查看安装指南来设置Modal和biomodals。

Option 1: Modal (recommended)

选项1:Modal(推荐)

SolubleMPNN uses the ProteinMPNN Modal wrapper with soluble model:
bash
cd biomodals
modal run modal_proteinmpnn.py \
  --pdb-path backbone.pdb \
  --num-seq-per-target 16 \
  --sampling-temp 0.1 \
  --model-name v_48_020
GPU: T4 (16GB) | Timeout: 600s default
SolubleMPNN 使用带有溶解性模型的ProteinMPNN Modal封装:
bash
cd biomodals
modal run modal_proteinmpnn.py \
  --pdb-path backbone.pdb \
  --num-seq-per-target 16 \
  --sampling-temp 0.1 \
  --model-name v_48_020
GPU:T4 (16GB) | 超时时间:默认600秒

Option 2: Local installation

选项2:本地安装

bash
git clone https://github.com/dauparas/ProteinMPNN.git
cd ProteinMPNN
bash
git clone https://github.com/dauparas/ProteinMPNN.git
cd ProteinMPNN

Use soluble model weights

使用溶解性模型权重

python protein_mpnn_run.py
--pdb_path backbone.pdb
--out_folder output/
--num_seq_per_target 16
--sampling_temp "0.1"
--model_name "v_48_020" # Soluble model
undefined
python protein_mpnn_run.py
--pdb_path backbone.pdb
--out_folder output/
--num_seq_per_target 16
--sampling_temp "0.1"
--model_name "v_48_020" # 溶解性模型
undefined

Key parameters

关键参数

ParameterDefaultRangeDescription
--pdb_path
requiredpathInput structure
--num_seq_per_target
11-1000Sequences per structure
--sampling_temp
"0.1""0.0001-1.0"Temperature (string!)
--model_name
v_48_020stringSoluble model variant
参数默认值范围描述
--pdb_path
必填文件路径输入结构
--num_seq_per_target
11-1000每个结构生成的序列数量
--sampling_temp
"0.1""0.0001-1.0"采样温度(字符串类型!)
--model_name
v_48_020字符串溶解性模型变体

Model Variants

模型变体

ModelDescriptionUse Case
v_48_002StandardGeneral design
v_48_020Soluble-trainedE. coli expression
v_48_030High solubilityDifficult targets
模型描述使用场景
v_48_002标准模型通用设计
v_48_020溶解性训练模型大肠杆菌表达
v_48_030高溶解性模型难处理靶点

Output format

输出格式

output/
├── seqs/backbone.fa
└── backbone_pdb/backbone_0001.pdb
output/
├── seqs/backbone.fa
└── backbone_pdb/backbone_0001.pdb

Sample output

示例输出

Successful run

运行成功示例

$ python protein_mpnn_run.py --pdb_path backbone.pdb --model_name v_48_020 --num_seq_per_target 8
Loading soluble model weights (v_48_020)...
Designing sequences for backbone.pdb
Generated 8 sequences in 2.1 seconds

output/seqs/backbone.fa:
>backbone_0001, score=1.31, global_score=1.24, seq_recovery=0.78
MKTAYIAKQRQISFVKSHFSRQLE...
>backbone_0002, score=1.28, global_score=1.21, seq_recovery=0.81
MKTAYIAKQRQISFVKSQFSRQLD...
What good output looks like:
  • Score: 1.0-2.0 (lower = more confident)
  • Reduced hydrophobic patches compared to standard MPNN
  • Improved charge distribution
$ python protein_mpnn_run.py --pdb_path backbone.pdb --model_name v_48_020 --num_seq_per_target 8
Loading soluble model weights (v_48_020)...
Designing sequences for backbone.pdb
Generated 8 sequences in 2.1 seconds

output/seqs/backbone.fa:
>backbone_0001, score=1.31, global_score=1.24, seq_recovery=0.78
MKTAYIAKQRQISFVKSHFSRQLE...
>backbone_0002, score=1.28, global_score=1.21, seq_recovery=0.81
MKTAYIAKQRQISFVKSQFSRQLD...
优质输出特征
  • 分数:1.0-2.0(越低表示模型置信度越高)
  • 相较于标准MPNN,疏水区域减少
  • 电荷分布更优

Decision tree

决策树

Should I use SolubleMPNN?
├─ What expression system?
│  ├─ E. coli → SolubleMPNN ✓
│  ├─ Mammalian → ProteinMPNN (PTMs matter more)
│  └─ Yeast → Either
├─ History of expression problems?
│  ├─ Yes, aggregation → SolubleMPNN ✓
│  ├─ Yes, low yield → SolubleMPNN ✓
│  └─ No → ProteinMPNN is fine
├─ What's in the binding site?
│  ├─ Small molecule / ligand → Use LigandMPNN
│  └─ Nothing / protein only → SolubleMPNN ✓
└─ Need highest solubility?
   ├─ Yes → Use v_48_030 model
   └─ Standard → Use v_48_020 model
是否使用SolubleMPNN?
├─ 采用何种表达系统?
│  ├─ 大肠杆菌 → 使用SolubleMPNN ✓
│  ├─ 哺乳动物 → 使用ProteinMPNN(翻译后修饰更重要)
│  └─ 酵母 → 两者均可
├─ 曾遇到表达问题?
│  ├─ 是,存在聚集 → 使用SolubleMPNN ✓
│  ├─ 是,产量低 → 使用SolubleMPNN ✓
│  └─ 否 → 使用ProteinMPNN即可
├─ 结合位点包含什么?
│  ├─ 小分子/配体 → 使用LigandMPNN
│  └─ 无/仅蛋白质 → 使用SolubleMPNN ✓
└─ 是否需要最高溶解性?
   ├─ 是 → 使用v_48_030模型
   └─ 标准需求 → 使用v_48_020模型

Typical performance

典型性能

Campaign SizeTime (T4)Cost (Modal)Notes
100 backbones × 8 seq15-20 min~$2Standard
500 backbones × 8 seq1-1.5h~$8Large campaign
Expected improvement: +15-30% solubility score vs standard ProteinMPNN.

任务规模耗时(T4 GPU)成本(Modal平台)说明
100个骨架 × 8条序列15-20分钟~2美元标准任务
500个骨架 × 8条序列1-1.5小时~8美元大规模任务
预期提升:相较于标准ProteinMPNN,溶解性评分提升15-30%。

Verify

验证

bash
grep -c "^>" output/seqs/*.fa  # Should match backbone_count × num_seq_per_target

bash
grep -c "^>" output/seqs/*.fa  # 结果应等于骨架数量 × 每个结构生成的序列数

Troubleshooting

故障排除

Still insoluble: Try v_48_030 (higher solubility bias) Low diversity: Increase temperature to 0.2 Poor folding: Use standard ProteinMPNN and optimize later
仍存在不溶解问题:尝试使用v_48_030模型(更高溶解性偏向) 多样性低:将采样温度提高至0.2 折叠效果差:使用标准ProteinMPNN,后续再进行优化

Error interpretation

错误解析

ErrorCauseFix
RuntimeError: CUDA out of memory
Long protein or large batchReduce batch_size
FileNotFoundError: v_48_020
Missing model weightsDownload soluble weights

Next: Structure prediction for validation →
protein-qc
for filtering.
错误原因解决方法
RuntimeError: CUDA out of memory
蛋白质序列过长或批量过大减小批量大小
FileNotFoundError: v_48_020
缺少模型权重下载溶解性模型权重

下一步:通过结构预测进行验证 → 使用
protein-qc
进行筛选。