solublempnn
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSolubleMPNN Solubility-Optimized Design
SolubleMPNN 溶解性优化设计
Prerequisites
前置要求
| Requirement | Minimum | Recommended |
|---|---|---|
| Python | 3.8+ | 3.10 |
| CUDA | 11.0+ | 11.7+ |
| GPU VRAM | 8GB | 16GB (T4) |
| RAM | 8GB | 16GB |
| 要求 | 最低配置 | 推荐配置 |
|---|---|---|
| Python | 3.8+ | 3.10 |
| CUDA | 11.0+ | 11.7+ |
| GPU VRAM | 8GB | 16GB (T4) |
| RAM | 8GB | 16GB |
How to run
运行方法
First time? See Installation Guide to set up Modal and biomodals.
首次使用? 查看安装指南来设置Modal和biomodals。
Option 1: Modal (recommended)
选项1:Modal(推荐)
SolubleMPNN uses the ProteinMPNN Modal wrapper with soluble model:
bash
cd biomodals
modal run modal_proteinmpnn.py \
--pdb-path backbone.pdb \
--num-seq-per-target 16 \
--sampling-temp 0.1 \
--model-name v_48_020GPU: T4 (16GB) | Timeout: 600s default
SolubleMPNN 使用带有溶解性模型的ProteinMPNN Modal封装:
bash
cd biomodals
modal run modal_proteinmpnn.py \
--pdb-path backbone.pdb \
--num-seq-per-target 16 \
--sampling-temp 0.1 \
--model-name v_48_020GPU:T4 (16GB) | 超时时间:默认600秒
Option 2: Local installation
选项2:本地安装
bash
git clone https://github.com/dauparas/ProteinMPNN.git
cd ProteinMPNNbash
git clone https://github.com/dauparas/ProteinMPNN.git
cd ProteinMPNNUse soluble model weights
使用溶解性模型权重
python protein_mpnn_run.py
--pdb_path backbone.pdb
--out_folder output/
--num_seq_per_target 16
--sampling_temp "0.1"
--model_name "v_48_020" # Soluble model
--pdb_path backbone.pdb
--out_folder output/
--num_seq_per_target 16
--sampling_temp "0.1"
--model_name "v_48_020" # Soluble model
undefinedpython protein_mpnn_run.py
--pdb_path backbone.pdb
--out_folder output/
--num_seq_per_target 16
--sampling_temp "0.1"
--model_name "v_48_020" # 溶解性模型
--pdb_path backbone.pdb
--out_folder output/
--num_seq_per_target 16
--sampling_temp "0.1"
--model_name "v_48_020" # 溶解性模型
undefinedKey parameters
关键参数
| Parameter | Default | Range | Description |
|---|---|---|---|
| required | path | Input structure |
| 1 | 1-1000 | Sequences per structure |
| "0.1" | "0.0001-1.0" | Temperature (string!) |
| v_48_020 | string | Soluble model variant |
| 参数 | 默认值 | 范围 | 描述 |
|---|---|---|---|
| 必填 | 文件路径 | 输入结构 |
| 1 | 1-1000 | 每个结构生成的序列数量 |
| "0.1" | "0.0001-1.0" | 采样温度(字符串类型!) |
| v_48_020 | 字符串 | 溶解性模型变体 |
Model Variants
模型变体
| Model | Description | Use Case |
|---|---|---|
| v_48_002 | Standard | General design |
| v_48_020 | Soluble-trained | E. coli expression |
| v_48_030 | High solubility | Difficult targets |
| 模型 | 描述 | 使用场景 |
|---|---|---|
| v_48_002 | 标准模型 | 通用设计 |
| v_48_020 | 溶解性训练模型 | 大肠杆菌表达 |
| v_48_030 | 高溶解性模型 | 难处理靶点 |
Output format
输出格式
output/
├── seqs/backbone.fa
└── backbone_pdb/backbone_0001.pdboutput/
├── seqs/backbone.fa
└── backbone_pdb/backbone_0001.pdbSample output
示例输出
Successful run
运行成功示例
$ python protein_mpnn_run.py --pdb_path backbone.pdb --model_name v_48_020 --num_seq_per_target 8
Loading soluble model weights (v_48_020)...
Designing sequences for backbone.pdb
Generated 8 sequences in 2.1 seconds
output/seqs/backbone.fa:
>backbone_0001, score=1.31, global_score=1.24, seq_recovery=0.78
MKTAYIAKQRQISFVKSHFSRQLE...
>backbone_0002, score=1.28, global_score=1.21, seq_recovery=0.81
MKTAYIAKQRQISFVKSQFSRQLD...What good output looks like:
- Score: 1.0-2.0 (lower = more confident)
- Reduced hydrophobic patches compared to standard MPNN
- Improved charge distribution
$ python protein_mpnn_run.py --pdb_path backbone.pdb --model_name v_48_020 --num_seq_per_target 8
Loading soluble model weights (v_48_020)...
Designing sequences for backbone.pdb
Generated 8 sequences in 2.1 seconds
output/seqs/backbone.fa:
>backbone_0001, score=1.31, global_score=1.24, seq_recovery=0.78
MKTAYIAKQRQISFVKSHFSRQLE...
>backbone_0002, score=1.28, global_score=1.21, seq_recovery=0.81
MKTAYIAKQRQISFVKSQFSRQLD...优质输出特征:
- 分数:1.0-2.0(越低表示模型置信度越高)
- 相较于标准MPNN,疏水区域减少
- 电荷分布更优
Decision tree
决策树
Should I use SolubleMPNN?
│
├─ What expression system?
│ ├─ E. coli → SolubleMPNN ✓
│ ├─ Mammalian → ProteinMPNN (PTMs matter more)
│ └─ Yeast → Either
│
├─ History of expression problems?
│ ├─ Yes, aggregation → SolubleMPNN ✓
│ ├─ Yes, low yield → SolubleMPNN ✓
│ └─ No → ProteinMPNN is fine
│
├─ What's in the binding site?
│ ├─ Small molecule / ligand → Use LigandMPNN
│ └─ Nothing / protein only → SolubleMPNN ✓
│
└─ Need highest solubility?
├─ Yes → Use v_48_030 model
└─ Standard → Use v_48_020 model是否使用SolubleMPNN?
│
├─ 采用何种表达系统?
│ ├─ 大肠杆菌 → 使用SolubleMPNN ✓
│ ├─ 哺乳动物 → 使用ProteinMPNN(翻译后修饰更重要)
│ └─ 酵母 → 两者均可
│
├─ 曾遇到表达问题?
│ ├─ 是,存在聚集 → 使用SolubleMPNN ✓
│ ├─ 是,产量低 → 使用SolubleMPNN ✓
│ └─ 否 → 使用ProteinMPNN即可
│
├─ 结合位点包含什么?
│ ├─ 小分子/配体 → 使用LigandMPNN
│ └─ 无/仅蛋白质 → 使用SolubleMPNN ✓
│
└─ 是否需要最高溶解性?
├─ 是 → 使用v_48_030模型
└─ 标准需求 → 使用v_48_020模型Typical performance
典型性能
| Campaign Size | Time (T4) | Cost (Modal) | Notes |
|---|---|---|---|
| 100 backbones × 8 seq | 15-20 min | ~$2 | Standard |
| 500 backbones × 8 seq | 1-1.5h | ~$8 | Large campaign |
Expected improvement: +15-30% solubility score vs standard ProteinMPNN.
| 任务规模 | 耗时(T4 GPU) | 成本(Modal平台) | 说明 |
|---|---|---|---|
| 100个骨架 × 8条序列 | 15-20分钟 | ~2美元 | 标准任务 |
| 500个骨架 × 8条序列 | 1-1.5小时 | ~8美元 | 大规模任务 |
预期提升:相较于标准ProteinMPNN,溶解性评分提升15-30%。
Verify
验证
bash
grep -c "^>" output/seqs/*.fa # Should match backbone_count × num_seq_per_targetbash
grep -c "^>" output/seqs/*.fa # 结果应等于骨架数量 × 每个结构生成的序列数Troubleshooting
故障排除
Still insoluble: Try v_48_030 (higher solubility bias)
Low diversity: Increase temperature to 0.2
Poor folding: Use standard ProteinMPNN and optimize later
仍存在不溶解问题:尝试使用v_48_030模型(更高溶解性偏向)
多样性低:将采样温度提高至0.2
折叠效果差:使用标准ProteinMPNN,后续再进行优化
Error interpretation
错误解析
| Error | Cause | Fix |
|---|---|---|
| Long protein or large batch | Reduce batch_size |
| Missing model weights | Download soluble weights |
Next: Structure prediction for validation → for filtering.
protein-qc| 错误 | 原因 | 解决方法 |
|---|---|---|
| 蛋白质序列过长或批量过大 | 减小批量大小 |
| 缺少模型权重 | 下载溶解性模型权重 |
下一步:通过结构预测进行验证 → 使用进行筛选。
protein-qc