solublempnn

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

SolubleMPNN Solubility-Optimized Design

SolubleMPNN 溶解性优化设计

Prerequisites

前置要求

Requirement	Minimum	Recommended
Python	3.8+	3.10
CUDA	11.0+	11.7+
GPU VRAM	8GB	16GB (T4)
RAM	8GB	16GB

要求	最低配置	推荐配置
Python	3.8+	3.10
CUDA	11.0+	11.7+
GPU VRAM	8GB	16GB (T4)
RAM	8GB	16GB

How to run

运行方法

First time? See Installation Guide to set up Modal and biomodals.

首次使用？ 查看安装指南来设置Modal和biomodals。

Option 1: Modal (recommended)

选项1：Modal（推荐）

SolubleMPNN uses the ProteinMPNN Modal wrapper with soluble model:

bash

cd biomodals
modal run modal_proteinmpnn.py \
  --pdb-path backbone.pdb \
  --num-seq-per-target 16 \
  --sampling-temp 0.1 \
  --model-name v_48_020

GPU: T4 (16GB) | Timeout: 600s default

SolubleMPNN 使用带有溶解性模型的ProteinMPNN Modal封装：

bash

cd biomodals
modal run modal_proteinmpnn.py \
  --pdb-path backbone.pdb \
  --num-seq-per-target 16 \
  --sampling-temp 0.1 \
  --model-name v_48_020

GPU：T4 (16GB) | 超时时间：默认600秒

Option 2: Local installation

选项2：本地安装

bash

git clone https://github.com/dauparas/ProteinMPNN.git
cd ProteinMPNN

bash

git clone https://github.com/dauparas/ProteinMPNN.git
cd ProteinMPNN

Use soluble model weights

使用溶解性模型权重

python protein_mpnn_run.py
--pdb_path backbone.pdb
--out_folder output/
--num_seq_per_target 16
--sampling_temp "0.1"
--model_name "v_48_020" # Soluble model

undefined

python protein_mpnn_run.py
--pdb_path backbone.pdb
--out_folder output/
--num_seq_per_target 16
--sampling_temp "0.1"
--model_name "v_48_020" # 溶解性模型

undefined

Key parameters

关键参数

Parameter	Default	Range	Description
`--pdb_path`	required	path	Input structure
`--num_seq_per_target`	1	1-1000	Sequences per structure
`--sampling_temp`	"0.1"	"0.0001-1.0"	Temperature (string!)
`--model_name`	v_48_020	string	Soluble model variant

参数	默认值	范围	描述
`--pdb_path`	必填	文件路径	输入结构
`--num_seq_per_target`	1	1-1000	每个结构生成的序列数量
`--sampling_temp`	"0.1"	"0.0001-1.0"	采样温度（字符串类型！）
`--model_name`	v_48_020	字符串	溶解性模型变体

Model Variants

模型变体

Model	Description	Use Case
v_48_002	Standard	General design
v_48_020	Soluble-trained	E. coli expression
v_48_030	High solubility	Difficult targets

模型	描述	使用场景
v_48_002	标准模型	通用设计
v_48_020	溶解性训练模型	大肠杆菌表达
v_48_030	高溶解性模型	难处理靶点

Output format

输出格式

output/
├── seqs/backbone.fa
└── backbone_pdb/backbone_0001.pdb

output/
├── seqs/backbone.fa
└── backbone_pdb/backbone_0001.pdb

Sample output

示例输出

Successful run

运行成功示例

$ python protein_mpnn_run.py --pdb_path backbone.pdb --model_name v_48_020 --num_seq_per_target 8
Loading soluble model weights (v_48_020)...
Designing sequences for backbone.pdb
Generated 8 sequences in 2.1 seconds

output/seqs/backbone.fa:
>backbone_0001, score=1.31, global_score=1.24, seq_recovery=0.78
MKTAYIAKQRQISFVKSHFSRQLE...
>backbone_0002, score=1.28, global_score=1.21, seq_recovery=0.81
MKTAYIAKQRQISFVKSQFSRQLD...

What good output looks like:

Score: 1.0-2.0 (lower = more confident)
Reduced hydrophobic patches compared to standard MPNN
Improved charge distribution

$ python protein_mpnn_run.py --pdb_path backbone.pdb --model_name v_48_020 --num_seq_per_target 8
Loading soluble model weights (v_48_020)...
Designing sequences for backbone.pdb
Generated 8 sequences in 2.1 seconds

output/seqs/backbone.fa:
>backbone_0001, score=1.31, global_score=1.24, seq_recovery=0.78
MKTAYIAKQRQISFVKSHFSRQLE...
>backbone_0002, score=1.28, global_score=1.21, seq_recovery=0.81
MKTAYIAKQRQISFVKSQFSRQLD...

优质输出特征：

分数：1.0-2.0（越低表示模型置信度越高）
相较于标准MPNN，疏水区域减少
电荷分布更优

Decision tree

决策树

Should I use SolubleMPNN?
│
├─ What expression system?
│  ├─ E. coli → SolubleMPNN ✓
│  ├─ Mammalian → ProteinMPNN (PTMs matter more)
│  └─ Yeast → Either
│
├─ History of expression problems?
│  ├─ Yes, aggregation → SolubleMPNN ✓
│  ├─ Yes, low yield → SolubleMPNN ✓
│  └─ No → ProteinMPNN is fine
│
├─ What's in the binding site?
│  ├─ Small molecule / ligand → Use LigandMPNN
│  └─ Nothing / protein only → SolubleMPNN ✓
│
└─ Need highest solubility?
   ├─ Yes → Use v_48_030 model
   └─ Standard → Use v_48_020 model

是否使用SolubleMPNN？
│
├─ 采用何种表达系统？
│  ├─ 大肠杆菌 → 使用SolubleMPNN ✓
│  ├─ 哺乳动物 → 使用ProteinMPNN（翻译后修饰更重要）
│  └─ 酵母 → 两者均可
│
├─ 曾遇到表达问题？
│  ├─ 是，存在聚集 → 使用SolubleMPNN ✓
│  ├─ 是，产量低 → 使用SolubleMPNN ✓
│  └─ 否 → 使用ProteinMPNN即可
│
├─ 结合位点包含什么？
│  ├─ 小分子/配体 → 使用LigandMPNN
│  └─ 无/仅蛋白质 → 使用SolubleMPNN ✓
│
└─ 是否需要最高溶解性？
   ├─ 是 → 使用v_48_030模型
   └─ 标准需求 → 使用v_48_020模型

Typical performance

典型性能

Campaign Size	Time (T4)	Cost (Modal)	Notes
100 backbones × 8 seq	15-20 min	~$2	Standard
500 backbones × 8 seq	1-1.5h	~$8	Large campaign

Expected improvement: +15-30% solubility score vs standard ProteinMPNN.

任务规模	耗时（T4 GPU）	成本（Modal平台）	说明
100个骨架 × 8条序列	15-20分钟	~2美元	标准任务
500个骨架 × 8条序列	1-1.5小时	~8美元	大规模任务

预期提升：相较于标准ProteinMPNN，溶解性评分提升15-30%。

Verify

验证

bash

grep -c "^>" output/seqs/*.fa  # Should match backbone_count × num_seq_per_target

bash

grep -c "^>" output/seqs/*.fa  # 结果应等于骨架数量 × 每个结构生成的序列数

Troubleshooting

故障排除

Still insoluble: Try v_48_030 (higher solubility bias) Low diversity: Increase temperature to 0.2 Poor folding: Use standard ProteinMPNN and optimize later

仍存在不溶解问题：尝试使用v_48_030模型（更高溶解性偏向） 多样性低：将采样温度提高至0.2 折叠效果差：使用标准ProteinMPNN，后续再进行优化

Error interpretation

错误解析

Error	Cause	Fix
`RuntimeError: CUDA out of memory`	Long protein or large batch	Reduce batch_size
`FileNotFoundError: v_48_020`	Missing model weights	Download soluble weights

Next: Structure prediction for validation →

protein-qc

for filtering.

错误	原因	解决方法
`RuntimeError: CUDA out of memory`	蛋白质序列过长或批量过大	减小批量大小
`FileNotFoundError: v_48_020`	缺少模型权重	下载溶解性模型权重

下一步：通过结构预测进行验证 → 使用

protein-qc

进行筛选。