prody

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ProDy - Protein Dynamics & Structural Biology

ProDy - 蛋白质动力学与结构生物学

ProDy is designed to model the collective motions of proteins. It treats proteins as elastic networks, allowing researchers to predict functional movements and structural flexibility from a single PDB file or an ensemble of structures.
ProDy 专为蛋白质集体运动建模而设计。它将蛋白质视为弹性网络,研究人员只需单个PDB文件或一组结构即可预测蛋白质的功能运动和结构柔性。

When to Use

适用场景

  • Predicting protein flexibility and collective motions (ANM/GNM).
  • Performing Principal Component Analysis (PCA) on structural ensembles or MD trajectories.
  • Analyzing structural conservation and co-evolution (Evol).
  • Comparing multiple protein structures (Ensemble analysis).
  • Identifying hinge regions and rigid domains in proteins.
  • Docking preparation and binding site analysis (druggability).
  • Filtering MD trajectories based on collective modes.
  • 预测蛋白质柔性与集体运动(ANM/GNM)。
  • 对结构集合或MD轨迹进行主成分分析(PCA)。
  • 分析结构保守性与共进化(Evol)。
  • 比较多个蛋白质结构(集合分析)。
  • 识别蛋白质中的铰链区域与刚性结构域。
  • 对接准备与结合位点分析(成药性评估)。
  • 基于集体模过滤MD轨迹。

Reference Documentation

参考文档

Official docs: http://prody.csb.pitt.edu/
Manual: http://prody.csb.pitt.edu/manual/
Search patterns:
prody.parsePDB
,
prody.ANM
,
prody.GNM
,
prody.select
,
prody.Ensemble
官方文档http://prody.csb.pitt.edu/
手册http://prody.csb.pitt.edu/manual/
常用搜索关键词
prody.parsePDB
,
prody.ANM
,
prody.GNM
,
prody.select
,
prody.Ensemble

Core Principles

核心原则

Atom Selection Algebra

原子选择代数

ProDy features a powerful selection language similar to VMD or PyMOL. You can select atoms by chain, residue, property, or proximity (e.g.,
'protein and resname TRP and within 5 of resname HEM'
).
ProDy 具备类似VMD或PyMOL的强大选择语言。你可以根据链、残基、属性或距离选择原子(例如:
'protein and resname TRP and within 5 of resname HEM'
)。

Elastic Network Models (ENM)

弹性网络模型(ENM)

  • GNM (Gaussian Network Model): Predicts magnitude of fluctuations (B-factors).
  • ANM (Anisotropic Network Model): Predicts direction and magnitude of motion.
  • GNM(高斯网络模型):预测波动幅度(B因子)。
  • ANM(各向异性网络模型):预测运动的方向与幅度。

Ensembles

结构集合

A collection of structures (e.g., multiple NMR models or MD frames) stored in a way that allows for rapid statistical analysis and PCA.
结构集合是一组结构的集合(例如多个NMR模型或MD帧),存储方式支持快速统计分析与PCA。

Quick Reference

快速参考

Installation

安装

bash
pip install prody
bash
pip install prody

Standard Imports

标准导入

python
import numpy as np
from prody import *
python
import numpy as np
from prody import *

Optional: for plotting

可选:用于绘图

confProDy(auto_show=False)

confProDy(auto_show=False)

undefined
undefined

Basic Pattern - Normal Mode Analysis

基础流程 - 正则模分析

python
from prody import *
python
from prody import *

1. Parse structure

1. 解析结构

atoms = parsePDB('1p38') calphas = atoms.select('protein and calpha')
atoms = parsePDB('1p38') calphas = atoms.select('protein and calpha')

2. Build and solve ANM

2. 构建并求解ANM

anm = ANM('p38_anm') anm.buildHessian(calphas) anm.calcModes(n_modes=20)
anm = ANM('p38_anm') anm.buildHessian(calphas) anm.calcModes(n_modes=20)

3. Analyze results

3. 分析结果

for mode in anm[:3]: print(f"Mode {mode.getIndex()}: Variance = {mode.getVariance():.2f}")
for mode in anm[:3]: print(f"模 {mode.getIndex()}:方差 = {mode.getVariance():.2f}")

4. Save for visualization (NMD format for VMD/PyMOL)

4. 保存结果用于可视化(NMD格式,支持VMD/PyMOL)

writeNMD('p38_modes.nmd', anm, calphas)
undefined
writeNMD('p38_modes.nmd', anm, calphas)
undefined

Critical Rules

重要规则

✅ DO

✅ 建议做法

  • Select C-alphas for NMA - For large systems, ENMs (ANM/GNM) are most effective and computationally efficient when applied only to C-alpha atoms.
  • Always Align Ensembles - Before performing PCA on a structural ensemble, ensure all frames are aligned to a reference structure using
    ensemble.iterpose()
    .
  • Use select() early - Filter your PDB object to only necessary chains/atoms to save memory during Hessian matrix calculations.
  • Check Eigensolver Convergence - Ensure the calculated modes represent the majority of the variance.
  • Preserve Atom Orders - When comparing structures, ensure atom selections result in matching indices using
    matchAlign()
    .
  • 仅对C-alpha原子进行NMA - 对于大型系统,ENM(ANM/GNM)仅应用于C-alpha原子时效率最高、计算成本最低。
  • 务必对齐结构集合 - 对结构集合进行PCA前,确保所有帧都通过
    ensemble.iterpose()
    与参考结构对齐。
  • 尽早使用select() - 过滤PDB对象只保留必要的链/原子,以节省海森矩阵计算时的内存。
  • 检查特征解算器收敛性 - 确保计算得到的模能代表大部分方差。
  • 保持原子顺序一致 - 比较结构时,使用
    matchAlign()
    确保原子选择的索引匹配。

❌ DON'T

❌ 禁止做法

  • Run NMA on raw PDBs - PDBs often have missing loops or multiple occupancies. Clean or select specific chains before analysis.
  • Ignore the "Zero Modes" - The first 6 modes of an ANM are rigid-body translations/rotations and have zero frequency. Real biological motion starts at mode index 6.
  • Calculate Hessian for All-Atom large proteins - All-atom ENM creates a 3N×3N matrix; for a 1000-residue protein, this is a 30,000×30,000 matrix, which is memory-intensive.
  • 直接对原始PDB运行NMA - PDB通常存在缺失环或多占位情况,分析前需清理或选择特定链。
  • 忽略“零模” - ANM的前6个模是刚体平移/旋转,频率为零。真正的生物运动从索引6的模开始。
  • 对大型全原子蛋白质计算海森矩阵 - 全原子ENM会生成3N×3N的矩阵;对于1000残基的蛋白质,这是一个30000×30000的矩阵,非常消耗内存。

Anti-Patterns (NEVER)

反模式(绝对禁止)

python
from prody import *
python
from prody import *

❌ BAD: Iterating over atoms to find distance

❌ 错误:遍历原子计算距离

for a1 in atoms:

for a1 in atoms:

for a2 in atoms: ... # O(N^2) Python loop

for a2 in atoms: ... # O(N²)的Python循环

✅ GOOD: Use selection algebra

✅ 正确:使用选择代数

nearby = atoms.select('within 5 of resname LIG')
nearby = atoms.select('within 5 of resname LIG')

❌ BAD: PCA on unaligned frames

❌ 错误:对未对齐的帧进行PCA

pca = PCA('test'); pca.buildCovariance(coord_array) # Wrong!

pca = PCA('test'); pca.buildCovariance(coord_array) # 错误!

✅ GOOD: Create Ensemble and interpose

✅ 正确:创建结构集合并对齐

ens = Ensemble(atoms) ens.addCoordset(trajectory) ens.iterpose() # Crucial step pca = PCA('test') pca.buildCovariance(ens)
ens = Ensemble(atoms) ens.addCoordset(trajectory) ens.iterpose() # 关键步骤 pca = PCA('test') pca.buildCovariance(ens)

❌ BAD: Using NMA modes 0-5 for biology

❌ 错误:将NMA的0-5号模用于生物学分析

slow_mode = anm[0] # This is just a translation/rotation

slow_mode = anm[0] # 这只是平移/旋转

undefined
undefined

Atom Selection & Manipulation

原子选择与操作

Powerful Queries

强大的查询语句

python
atoms = parsePDB('3hhr')
python
atoms = parsePDB('3hhr')

Chain and residue range

链与残基范围

heavy_chain = atoms.select('chain H and resnum 1 to 120')
heavy_chain = atoms.select('chain H and resnum 1 to 120')

Chemical properties

化学属性

backbone = atoms.select('backbone') hydrophobic = atoms.select('resname ALA VAL ILE LEU MET PHE TYR TRP')
backbone = atoms.select('backbone') hydrophobic = atoms.select('resname ALA VAL ILE LEU MET PHE TYR TRP')

Proximity (Binding site)

距离筛选(结合位点)

site = atoms.select('protein and within 10 of resname ATP')
site = atoms.select('protein and within 10 of resname ATP')

Geometric center

几何中心

center = calcCenter(site)
undefined
center = calcCenter(site)
undefined

Elastic Network Models (ENM)

弹性网络模型(ENM)

GNM (Fluctuations)

GNM(波动分析)

python
gnm = GNM('1p38_gnm')
gnm.buildKirchhoff(calphas, cutoff=10.0)
gnm.calcModes()
python
gnm = GNM('1p38_gnm')
gnm.buildKirchhoff(calphas, cutoff=10.0)
gnm.calcModes()

Cross-correlations (How atoms move together)

互相关分析(原子如何协同运动)

cross_corr = calcCrossCorr(gnm)
cross_corr = calcCrossCorr(gnm)

Square fluctuations (Theoretical B-factors)

平方波动(理论B因子)

sq_flucts = calcSqFlucts(gnm)
undefined
sq_flucts = calcSqFlucts(gnm)
undefined

ANM (Directions of motion)

ANM(运动方向分析)

python
anm = ANM('1p38_anm')
anm.buildHessian(calphas, cutoff=15.0)
anm.calcModes()
python
anm = ANM('1p38_anm')
anm.buildHessian(calphas, cutoff=15.0)
anm.calcModes()

Getting the hinge regions (where motion changes direction)

识别铰链区域(运动方向发生变化的区域)

hinges = findHinges(anm[0]) # From the slowest mode
undefined
hinges = findHinges(anm[0]) # 基于最慢的功能模
undefined

Ensemble Analysis and PCA

结构集合分析与PCA

Structural Comparison

结构比较

python
undefined
python
undefined

Parse multiple structures

解析多个结构

pdb_ids = ['1p38', '1zz2', '1ywr'] structures = [parsePDB(pid) for pid in pdb_ids]
pdb_ids = ['1p38', '1zz2', '1ywr'] structures = [parsePDB(pid) for pid in pdb_ids]

Align and match

对齐与匹配

ensemble = Ensemble('p38_set') for s in structures: # Match calphas of s to the reference first structure mappings = matchAlign(s, structures[0]) ensemble.addCoordset(mappings[0][0]) # Add matched coords
ensemble = Ensemble('p38_set') for s in structures: # 将s的C-alpha原子与第一个参考结构匹配 mappings = matchAlign(s, structures[0]) ensemble.addCoordset(mappings[0][0]) # 添加匹配后的坐标

PCA

PCA分析

pca = PCA('p38_pca') pca.buildCovariance(ensemble) pca.calcModes()
pca = PCA('p38_pca') pca.buildCovariance(ensemble) pca.calcModes()

Project structures onto PCs

将结构投影到主成分上

projection = ensemble.getProjection(pca[:2])
undefined
projection = ensemble.getProjection(pca[:2])
undefined

Evolutionary Analysis (Evol)

进化分析(Evol)

Sequence Conservation and Co-evolution

序列保守性与共进化

python
undefined
python
undefined

Load Multiple Sequence Alignment (MSA)

加载多序列比对(MSA)

msa = parseMSA('p38_alignment.fasta')
msa = parseMSA('p38_alignment.fasta')

Calculate conservation (Shannon Entropy)

计算保守性(香农熵)

entropy = calcShannonEntropy(msa)
entropy = calcShannonEntropy(msa)

Mutual Information (Co-evolution)

互信息(共进化)

mi = calcMutualInformation(msa)
mi = calcMutualInformation(msa)

Direct Coupling Analysis (DCA) - requires external tools or specific plugins

直接耦合分析(DCA)- 需要外部工具或特定插件

dca = calcDirectCovariance(msa)

dca = calcDirectCovariance(msa)

undefined
undefined

Practical Workflows

实用工作流

1. Identifying Functional "Hinges"

1. 识别功能性“铰链”

python
def get_protein_hinges(pdb_id):
    atoms = parsePDB(pdb_id)
    calphas = atoms.select('protein and calpha')
    
    anm = ANM(pdb_id)
    anm.buildHessian(calphas)
    anm.calcModes()
    
    # Hinge residues for the first two functional modes
    hinges_m1 = findHinges(anm[0])
    hinges_m2 = findHinges(anm[1])
    
    return list(set(hinges_m1) | set(hinges_m2))
python
def get_protein_hinges(pdb_id):
    atoms = parsePDB(pdb_id)
    calphas = atoms.select('protein and calpha')
    
    anm = ANM(pdb_id)
    anm.buildHessian(calphas)
    anm.calcModes()
    
    # 前两个功能模对应的铰链残基
    hinges_m1 = findHinges(anm[0])
    hinges_m2 = findHinges(anm[1])
    
    return list(set(hinges_m1) | set(hinges_m2))

2. Comparing MD Trajectory to ANM Modes

2. 比较MD轨迹与ANM模

python
def compare_md_to_anm(md_traj, p_pdb):
    # 1. ANM from static structure
    atoms = parsePDB(p_pdb)
    anm = ANM('static')
    anm.buildHessian(atoms.select('calpha'))
    anm.calcModes()
    
    # 2. PCA from MD
    ens = Ensemble('md')
    ens.setCoords(atoms)
    ens.addCoordset(md_traj)
    ens.iterpose()
    pca = PCA('md_pca')
    pca.buildCovariance(ens)
    pca.calcModes()
    
    # 3. Overlap (Inner product of modes)
    overlap = calcOverlap(anm[0], pca[0])
    return overlap
python
def compare_md_to_anm(md_traj, p_pdb):
    # 1. 基于静态结构构建ANM
    atoms = parsePDB(p_pdb)
    anm = ANM('static')
    anm.buildHessian(atoms.select('calpha'))
    anm.calcModes()
    
    # 2. 基于MD轨迹构建PCA
    ens = Ensemble('md')
    ens.setCoords(atoms)
    ens.addCoordset(md_traj)
    ens.iterpose()
    pca = PCA('md_pca')
    pca.buildCovariance(ens)
    pca.calcModes()
    
    # 3. 重叠度(模的内积)
    overlap = calcOverlap(anm[0], pca[0])
    return overlap

3. Druggability Analysis (TRAWLER/ProDy Integration)

3. 成药性分析(TRAWLER/ProDy集成)

python
undefined
python
undefined

Note: Full druggability analysis usually involves 'hotspot' calculations

注意:完整的成药性分析通常涉及“热点”计算

def binding_site_flexibility(atoms, lig_resname): site = atoms.select(f'protein and within 8 of resname {lig_resname}') # Calculate GNM just for the site context gnm = GNM('site') gnm.buildKirchhoff(atoms.select('calpha')) gnm.calcModes()
# High fluctuations = likely flexible binding site
flucts = calcSqFlucts(gnm)
return flucts[site.getIndices()]
undefined
def binding_site_flexibility(atoms, lig_resname): site = atoms.select(f'protein and within 8 of resname {lig_resname}') # 针对结合位点环境计算GNM gnm = GNM('site') gnm.buildKirchhoff(atoms.select('calpha')) gnm.calcModes()
# 高波动 = 结合位点可能具有柔性
flucts = calcSqFlucts(gnm)
return flucts[site.getIndices()]
undefined

Performance Optimization

性能优化

Memory Management with Large Hessians

大型海森矩阵的内存管理

For very large complexes (Ribosomes, Capsids), use sparse matrices or the Hierarchical Network Model (HNM) if available.
对于超大型复合物(核糖体、衣壳),使用稀疏矩阵或分层网络模型(HNM,若可用)。

Parallel MSA Parsing

并行MSA解析

When dealing with massive alignments (100k+ sequences), use
parseMSA
with specific memory-efficient flags.
处理大规模比对(10万+序列)时,使用
parseMSA
并启用特定的内存高效标志。

Common Pitfalls and Solutions

常见问题与解决方案

Atom Mapping Mismatch

原子映射不匹配

When comparing two PDBs of the same protein, one might have missing residues.
python
undefined
比较同一蛋白质的两个PDB时,其中一个可能存在残基缺失。
python
undefined

❌ Problem: ensemble.addCoordset(pdb2) fails due to different atom counts

❌ 问题:ensemble.addCoordset(pdb2)因原子数量不同失败

✅ Solution: Use matchAlign

✅ 解决方案:使用matchAlign

matches = matchAlign(pdb2, pdb1) if matches: ensemble.addCoordset(matches[0][0])
undefined
matches = matchAlign(pdb2, pdb1) if matches: ensemble.addCoordset(matches[0][0])
undefined

Non-Standard Residues

非标准残基

ProDy might not recognize unusual ligands as 'hetero' or 'protein'.
python
undefined
ProDy可能无法识别特殊配体为“杂原子”或“蛋白质”。
python
undefined

✅ Solution: Use specific residue names or 'all'

✅ 解决方案:使用特定残基名称或'all'

ligand = atoms.select('resname MYL')
undefined
ligand = atoms.select('resname MYL')
undefined

Hessian Singularities

海森矩阵奇异性

If your protein has disconnected parts, the Hessian will have more than 6 zero eigenvalues.
python
undefined
如果蛋白质存在不相连的部分,海森矩阵的零特征值会超过6个。
python
undefined

✅ Solution: Check connectivity

✅ 解决方案:检查连通性

if not atoms.select('protein').connected: print("Warning: Disconnected components found!")
undefined
if not atoms.select('protein').connected: print("警告:发现不相连的组件!")
undefined

Best Practices

最佳实践

  1. Always select C-alpha atoms for NMA on large proteins to reduce computational cost.
  2. Align all structures in an ensemble before performing PCA using
    iterpose()
    .
  3. Filter PDB structures early using selection algebra to reduce memory usage.
  4. Remember that ANM modes 0-5 are rigid-body motions; biological motion starts at mode 6.
  5. Use
    matchAlign()
    when comparing structures with different atom counts or missing residues.
  6. Check for disconnected components before building Hessian matrices.
  7. Use appropriate cutoff distances (typically 10-15 Å for C-alpha networks).
  8. Validate eigensolver convergence to ensure meaningful results.
  9. Save modes in NMD format for visualization in VMD or PyMOL.
  10. Consider using sparse matrix representations for very large systems.
ProDy is the essential toolkit for the "Dynamic" in Structural Biology. By treating proteins as physical networks, it provides a bridge between static snapshots and the vibrating reality of life at the molecular scale.
  1. 对大型蛋白质进行NMA时,仅选择C-alpha原子以降低计算成本。
  2. 对结构集合执行PCA前,务必使用
    iterpose()
    对齐所有结构。
  3. 尽早使用选择代数过滤PDB结构,以减少内存占用。
  4. 记住ANM的0-5号模是刚体运动;生物运动从模6开始。
  5. 比较原子数量不同或存在残基缺失的结构时,使用
    matchAlign()
  6. 构建海森矩阵前检查是否存在不相连的组件。
  7. 使用合适的截断距离(C-alpha网络通常为10-15 Å)。
  8. 验证特征解算器的收敛性,确保结果有意义。
  9. 将模保存为NMD格式,以便在VMD或PyMOL中可视化。
  10. 对于超大型系统,考虑使用稀疏矩阵表示。
ProDy是结构生物学中“动力学”分析的必备工具包。它将蛋白质视为物理网络,搭建起静态结构快照与分子尺度下生命动态过程之间的桥梁。