ligandmpnn

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LigandMPNN Ligand-Aware Design

LigandMPNN 配体感知设计

Prerequisites

前置要求

Requirement	Minimum	Recommended
Python	3.8+	3.10
CUDA	11.0+	11.7+
GPU VRAM	8GB	16GB (T4)
RAM	8GB	16GB

要求	最低配置	推荐配置
Python	3.8+	3.10
CUDA	11.0+	11.7+
GPU显存	8GB	16GB (T4)
内存	8GB	16GB

How to run

运行方法

First time? See Installation Guide to set up Modal and biomodals.

首次使用？ 查看安装指南设置Modal和biomodals。

Option 1: Modal (recommended)

选项1：Modal（推荐）

bash

cd biomodals
modal run modal_ligandmpnn.py \
  --pdb-path protein_ligand.pdb \
  --num-seq-per-target 16 \
  --sampling-temp 0.1

GPU: T4 (16GB) | Timeout: 600s default

bash

cd biomodals
modal run modal_ligandmpnn.py \
  --pdb-path protein_ligand.pdb \
  --num-seq-per-target 16 \
  --sampling-temp 0.1

GPU: T4 (16GB) | 超时时间: 默认600秒

Option 2: Local installation

选项2：本地安装

bash

git clone https://github.com/dauparas/LigandMPNN.git
cd LigandMPNN

python run.py \
  --pdb_path protein_ligand.pdb \
  --out_folder output/ \
  --num_seq_per_target 16

bash

git clone https://github.com/dauparas/LigandMPNN.git
cd LigandMPNN

python run.py \
  --pdb_path protein_ligand.pdb \
  --out_folder output/ \
  --num_seq_per_target 16

Key parameters

关键参数

Parameter	Default	Range	Description
`--pdb_path`	required	path	PDB with ligand
`--num_seq_per_target`	1	1-1000	Sequences per structure
`--sampling_temp`	"0.1"	"0.0001-1.0"	Temperature (string!)
`--ligand_mpnn_use_side_chain_context`	true	bool	Use ligand context

参数	默认值	范围	说明
`--pdb_path`	必填	文件路径	包含配体的PDB文件
`--num_seq_per_target`	1	1-1000	每个结构生成的序列数量
`--sampling_temp`	"0.1"	"0.0001-1.0"	采样温度（字符串类型！）
`--ligand_mpnn_use_side_chain_context`	true	布尔值	使用配体上下文信息

Ligand Specification

配体说明

In PDB File

在PDB文件中

Ligand must be present as HETATM records:

ATOM    ...protein atoms...
HETATM  1  C1  LIG A 999      x.xxx  y.yyy  z.zzz  1.00  0.00           C

配体必须以HETATM记录形式存在：

ATOM    ...蛋白质原子...
HETATM  1  C1  LIG A 999      x.xxx  y.yyy  z.zzz  1.00  0.00           C

Supported Ligand Types

支持的配体类型

Small molecules (HETATM)
Metals (Zn, Fe, Mg, Ca, etc.)
Cofactors (NAD, FAD, ATP)
DNA/RNA

小分子（HETATM）
金属离子（Zn、Fe、Mg、Ca等）
辅因子（NAD、FAD、ATP）
DNA/RNA

Output format

输出格式

output/
├── seqs/
│   └── protein.fa          # FASTA sequences
└── protein_pdb/
    └── protein_0001.pdb    # PDBs with designed sequence

output/
├── seqs/
│   └── protein.fa          # FASTA格式序列
└── protein_pdb/
    └── protein_0001.pdb    # 包含设计序列的PDB文件

Sample output

示例输出

Successful run

运行成功示例

$ python run.py --pdb_path enzyme_substrate.pdb --out_folder output/ --num_seq_per_target 8
Loading LigandMPNN model weights...
Processing enzyme_substrate.pdb
Found ligand: LIG (12 atoms)
Generated 8 sequences in 3.1 seconds

output/seqs/enzyme_substrate.fa:
>enzyme_substrate_0001, score=1.45, global_score=1.38
MKTAYIAKQRQISFVKSHFSRQLE...
>enzyme_substrate_0002, score=1.52, global_score=1.41
MKTAYIAKQRQISFVKSQFSRQLD...

What good output looks like:

Score: 1.0-2.0 (lower = more confident)
Ligand detected and incorporated in context
Active site residues preserved or optimized

$ python run.py --pdb_path enzyme_substrate.pdb --out_folder output/ --num_seq_per_target 8
加载LigandMPNN模型权重...
处理enzyme_substrate.pdb
检测到配体：LIG（12个原子）
在3.1秒内生成8条序列

output/seqs/enzyme_substrate.fa:
>enzyme_substrate_0001, score=1.45, global_score=1.38
MKTAYIAKQRQISFVKSHFSRQLE...
>enzyme_substrate_0002, score=1.52, global_score=1.41
MKTAYIAKQRQISFVKSQFSRQLD...

优质输出特征：

分数：1.0-2.0（分数越低，置信度越高）
配体被检测并纳入上下文
活性位点残基被保留或优化

Decision tree

决策树

Should I use LigandMPNN?
│
├─ What's in your binding site?
│  ├─ Small molecule / ligand → LigandMPNN ✓
│  ├─ Metal ion (Zn, Fe, etc.) → LigandMPNN ✓
│  ├─ Cofactor (NAD, FAD, ATP) → LigandMPNN ✓
│  ├─ DNA/RNA → LigandMPNN ✓
│  └─ Nothing / protein only → Use ProteinMPNN
│
├─ What type of design?
│  ├─ Enzyme active site → LigandMPNN ✓
│  ├─ Metal binding site → LigandMPNN ✓
│  ├─ Protein-protein binder → Use ProteinMPNN
│  └─ De novo scaffold → Use ProteinMPNN
│
└─ Priority?
   ├─ Solubility/expression → Consider SolubleMPNN
   └─ Ligand context accuracy → LigandMPNN ✓

是否应该使用LigandMPNN？
│
├─ 你的结合位点中有什么？
│  ├─ 小分子/配体 → 使用LigandMPNN ✓
│  ├─ 金属离子（Zn、Fe等） → 使用LigandMPNN ✓
│  ├─ 辅因子（NAD、FAD、ATP） → 使用LigandMPNN ✓
│  ├─ DNA/RNA → 使用LigandMPNN ✓
│  └─ 无/仅蛋白质 → 使用ProteinMPNN
│
├─ 你需要哪种类型的设计？
│  ├─ 酶活性位点 → 使用LigandMPNN ✓
│  ├─ 金属结合位点 → 使用LigandMPNN ✓
│  ├─ 蛋白质-蛋白质结合物 → 使用ProteinMPNN
│  └─ 全新支架 → 使用ProteinMPNN
│
└─ 你的优先级是什么？
   ├─ 溶解度/表达量 → 考虑使用SolubleMPNN
   └─ 配体上下文准确性 → 使用LigandMPNN ✓

Typical performance

典型性能

Campaign Size	Time (T4)	Cost (Modal)	Notes
100 backbones × 8 seq	15-20 min	~$2	Standard
500 backbones × 8 seq	1-1.5h	~$8	Large campaign

Throughput: ~50-100 sequences/minute on T4 GPU.

任务规模	耗时（T4 GPU）	成本（Modal平台）	说明
100个骨架 × 8条序列	15-20分钟	~$2	标准任务
500个骨架 × 8条序列	1-1.5小时	~$8	大型任务

处理速度：在T4 GPU上约50-100条序列/分钟。

Verify

验证

bash

grep -c "^>" output/seqs/*.fa  # Should match backbone_count × num_seq_per_target

bash

grep -c "^>" output/seqs/*.fa  # 结果应等于骨架数量 × 每个骨架生成的序列数

Troubleshooting

问题排查

Ligand not recognized: Check HETATM format, verify ligand residue name Poor binding residues: Increase sampling around active site Missing contacts: Verify ligand coordinates in PDB

配体未被识别：检查HETATM格式，验证配体残基名称 结合残基质量差：增加活性位点周围的采样量 缺失相互作用：验证PDB文件中的配体坐标

Error interpretation

错误解读

Error	Cause	Fix
`RuntimeError: CUDA out of memory`	Long protein or large batch	Reduce batch_size
`KeyError: 'LIG'`	Ligand not found in PDB	Check HETATM records
`ValueError: no ligand atoms`	Empty ligand	Verify ligand has atoms in PDB

Next: Structure prediction for validation →

protein-qc

for filtering.

错误	原因	解决方法
`RuntimeError: CUDA out of memory`	蛋白质序列过长或批次过大	减小批次大小
`KeyError: 'LIG'`	PDB文件中未找到配体	检查HETATM记录
`ValueError: no ligand atoms`	配体为空	验证PDB文件中配体是否包含原子

下一步：使用结构预测进行验证 → 使用

protein-qc

进行筛选。