prody
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseProDy - Protein Dynamics & Structural Biology
ProDy - 蛋白质动力学与结构生物学
ProDy is designed to model the collective motions of proteins. It treats proteins as elastic networks, allowing researchers to predict functional movements and structural flexibility from a single PDB file or an ensemble of structures.
ProDy 专为蛋白质集体运动建模而设计。它将蛋白质视为弹性网络,研究人员只需单个PDB文件或一组结构即可预测蛋白质的功能运动和结构柔性。
When to Use
适用场景
- Predicting protein flexibility and collective motions (ANM/GNM).
- Performing Principal Component Analysis (PCA) on structural ensembles or MD trajectories.
- Analyzing structural conservation and co-evolution (Evol).
- Comparing multiple protein structures (Ensemble analysis).
- Identifying hinge regions and rigid domains in proteins.
- Docking preparation and binding site analysis (druggability).
- Filtering MD trajectories based on collective modes.
- 预测蛋白质柔性与集体运动(ANM/GNM)。
- 对结构集合或MD轨迹进行主成分分析(PCA)。
- 分析结构保守性与共进化(Evol)。
- 比较多个蛋白质结构(集合分析)。
- 识别蛋白质中的铰链区域与刚性结构域。
- 对接准备与结合位点分析(成药性评估)。
- 基于集体模过滤MD轨迹。
Reference Documentation
参考文档
Official docs: http://prody.csb.pitt.edu/
Manual: http://prody.csb.pitt.edu/manual/
Search patterns:, , , ,
Manual: http://prody.csb.pitt.edu/manual/
Search patterns:
prody.parsePDBprody.ANMprody.GNMprody.selectprody.Ensemble官方文档:http://prody.csb.pitt.edu/
手册:http://prody.csb.pitt.edu/manual/
常用搜索关键词:, , , ,
手册:http://prody.csb.pitt.edu/manual/
常用搜索关键词:
prody.parsePDBprody.ANMprody.GNMprody.selectprody.EnsembleCore Principles
核心原则
Atom Selection Algebra
原子选择代数
ProDy features a powerful selection language similar to VMD or PyMOL. You can select atoms by chain, residue, property, or proximity (e.g., ).
'protein and resname TRP and within 5 of resname HEM'ProDy 具备类似VMD或PyMOL的强大选择语言。你可以根据链、残基、属性或距离选择原子(例如:)。
'protein and resname TRP and within 5 of resname HEM'Elastic Network Models (ENM)
弹性网络模型(ENM)
- GNM (Gaussian Network Model): Predicts magnitude of fluctuations (B-factors).
- ANM (Anisotropic Network Model): Predicts direction and magnitude of motion.
- GNM(高斯网络模型):预测波动幅度(B因子)。
- ANM(各向异性网络模型):预测运动的方向与幅度。
Ensembles
结构集合
A collection of structures (e.g., multiple NMR models or MD frames) stored in a way that allows for rapid statistical analysis and PCA.
结构集合是一组结构的集合(例如多个NMR模型或MD帧),存储方式支持快速统计分析与PCA。
Quick Reference
快速参考
Installation
安装
bash
pip install prodybash
pip install prodyStandard Imports
标准导入
python
import numpy as np
from prody import *python
import numpy as np
from prody import *Optional: for plotting
可选:用于绘图
confProDy(auto_show=False)
confProDy(auto_show=False)
undefinedundefinedBasic Pattern - Normal Mode Analysis
基础流程 - 正则模分析
python
from prody import *python
from prody import *1. Parse structure
1. 解析结构
atoms = parsePDB('1p38')
calphas = atoms.select('protein and calpha')
atoms = parsePDB('1p38')
calphas = atoms.select('protein and calpha')
2. Build and solve ANM
2. 构建并求解ANM
anm = ANM('p38_anm')
anm.buildHessian(calphas)
anm.calcModes(n_modes=20)
anm = ANM('p38_anm')
anm.buildHessian(calphas)
anm.calcModes(n_modes=20)
3. Analyze results
3. 分析结果
for mode in anm[:3]:
print(f"Mode {mode.getIndex()}: Variance = {mode.getVariance():.2f}")
for mode in anm[:3]:
print(f"模 {mode.getIndex()}:方差 = {mode.getVariance():.2f}")
4. Save for visualization (NMD format for VMD/PyMOL)
4. 保存结果用于可视化(NMD格式,支持VMD/PyMOL)
writeNMD('p38_modes.nmd', anm, calphas)
undefinedwriteNMD('p38_modes.nmd', anm, calphas)
undefinedCritical Rules
重要规则
✅ DO
✅ 建议做法
- Select C-alphas for NMA - For large systems, ENMs (ANM/GNM) are most effective and computationally efficient when applied only to C-alpha atoms.
- Always Align Ensembles - Before performing PCA on a structural ensemble, ensure all frames are aligned to a reference structure using .
ensemble.iterpose() - Use select() early - Filter your PDB object to only necessary chains/atoms to save memory during Hessian matrix calculations.
- Check Eigensolver Convergence - Ensure the calculated modes represent the majority of the variance.
- Preserve Atom Orders - When comparing structures, ensure atom selections result in matching indices using .
matchAlign()
- 仅对C-alpha原子进行NMA - 对于大型系统,ENM(ANM/GNM)仅应用于C-alpha原子时效率最高、计算成本最低。
- 务必对齐结构集合 - 对结构集合进行PCA前,确保所有帧都通过与参考结构对齐。
ensemble.iterpose() - 尽早使用select() - 过滤PDB对象只保留必要的链/原子,以节省海森矩阵计算时的内存。
- 检查特征解算器收敛性 - 确保计算得到的模能代表大部分方差。
- 保持原子顺序一致 - 比较结构时,使用确保原子选择的索引匹配。
matchAlign()
❌ DON'T
❌ 禁止做法
- Run NMA on raw PDBs - PDBs often have missing loops or multiple occupancies. Clean or select specific chains before analysis.
- Ignore the "Zero Modes" - The first 6 modes of an ANM are rigid-body translations/rotations and have zero frequency. Real biological motion starts at mode index 6.
- Calculate Hessian for All-Atom large proteins - All-atom ENM creates a 3N×3N matrix; for a 1000-residue protein, this is a 30,000×30,000 matrix, which is memory-intensive.
- 直接对原始PDB运行NMA - PDB通常存在缺失环或多占位情况,分析前需清理或选择特定链。
- 忽略“零模” - ANM的前6个模是刚体平移/旋转,频率为零。真正的生物运动从索引6的模开始。
- 对大型全原子蛋白质计算海森矩阵 - 全原子ENM会生成3N×3N的矩阵;对于1000残基的蛋白质,这是一个30000×30000的矩阵,非常消耗内存。
Anti-Patterns (NEVER)
反模式(绝对禁止)
python
from prody import *python
from prody import *❌ BAD: Iterating over atoms to find distance
❌ 错误:遍历原子计算距离
for a1 in atoms:
for a1 in atoms:
for a2 in atoms: ... # O(N^2) Python loop
for a2 in atoms: ... # O(N²)的Python循环
✅ GOOD: Use selection algebra
✅ 正确:使用选择代数
nearby = atoms.select('within 5 of resname LIG')
nearby = atoms.select('within 5 of resname LIG')
❌ BAD: PCA on unaligned frames
❌ 错误:对未对齐的帧进行PCA
pca = PCA('test'); pca.buildCovariance(coord_array) # Wrong!
pca = PCA('test'); pca.buildCovariance(coord_array) # 错误!
✅ GOOD: Create Ensemble and interpose
✅ 正确:创建结构集合并对齐
ens = Ensemble(atoms)
ens.addCoordset(trajectory)
ens.iterpose() # Crucial step
pca = PCA('test')
pca.buildCovariance(ens)
ens = Ensemble(atoms)
ens.addCoordset(trajectory)
ens.iterpose() # 关键步骤
pca = PCA('test')
pca.buildCovariance(ens)
❌ BAD: Using NMA modes 0-5 for biology
❌ 错误:将NMA的0-5号模用于生物学分析
slow_mode = anm[0] # This is just a translation/rotation
slow_mode = anm[0] # 这只是平移/旋转
undefinedundefinedAtom Selection & Manipulation
原子选择与操作
Powerful Queries
强大的查询语句
python
atoms = parsePDB('3hhr')python
atoms = parsePDB('3hhr')Chain and residue range
链与残基范围
heavy_chain = atoms.select('chain H and resnum 1 to 120')
heavy_chain = atoms.select('chain H and resnum 1 to 120')
Chemical properties
化学属性
backbone = atoms.select('backbone')
hydrophobic = atoms.select('resname ALA VAL ILE LEU MET PHE TYR TRP')
backbone = atoms.select('backbone')
hydrophobic = atoms.select('resname ALA VAL ILE LEU MET PHE TYR TRP')
Proximity (Binding site)
距离筛选(结合位点)
site = atoms.select('protein and within 10 of resname ATP')
site = atoms.select('protein and within 10 of resname ATP')
Geometric center
几何中心
center = calcCenter(site)
undefinedcenter = calcCenter(site)
undefinedElastic Network Models (ENM)
弹性网络模型(ENM)
GNM (Fluctuations)
GNM(波动分析)
python
gnm = GNM('1p38_gnm')
gnm.buildKirchhoff(calphas, cutoff=10.0)
gnm.calcModes()python
gnm = GNM('1p38_gnm')
gnm.buildKirchhoff(calphas, cutoff=10.0)
gnm.calcModes()Cross-correlations (How atoms move together)
互相关分析(原子如何协同运动)
cross_corr = calcCrossCorr(gnm)
cross_corr = calcCrossCorr(gnm)
Square fluctuations (Theoretical B-factors)
平方波动(理论B因子)
sq_flucts = calcSqFlucts(gnm)
undefinedsq_flucts = calcSqFlucts(gnm)
undefinedANM (Directions of motion)
ANM(运动方向分析)
python
anm = ANM('1p38_anm')
anm.buildHessian(calphas, cutoff=15.0)
anm.calcModes()python
anm = ANM('1p38_anm')
anm.buildHessian(calphas, cutoff=15.0)
anm.calcModes()Getting the hinge regions (where motion changes direction)
识别铰链区域(运动方向发生变化的区域)
hinges = findHinges(anm[0]) # From the slowest mode
undefinedhinges = findHinges(anm[0]) # 基于最慢的功能模
undefinedEnsemble Analysis and PCA
结构集合分析与PCA
Structural Comparison
结构比较
python
undefinedpython
undefinedParse multiple structures
解析多个结构
pdb_ids = ['1p38', '1zz2', '1ywr']
structures = [parsePDB(pid) for pid in pdb_ids]
pdb_ids = ['1p38', '1zz2', '1ywr']
structures = [parsePDB(pid) for pid in pdb_ids]
Align and match
对齐与匹配
ensemble = Ensemble('p38_set')
for s in structures:
# Match calphas of s to the reference first structure
mappings = matchAlign(s, structures[0])
ensemble.addCoordset(mappings[0][0]) # Add matched coords
ensemble = Ensemble('p38_set')
for s in structures:
# 将s的C-alpha原子与第一个参考结构匹配
mappings = matchAlign(s, structures[0])
ensemble.addCoordset(mappings[0][0]) # 添加匹配后的坐标
PCA
PCA分析
pca = PCA('p38_pca')
pca.buildCovariance(ensemble)
pca.calcModes()
pca = PCA('p38_pca')
pca.buildCovariance(ensemble)
pca.calcModes()
Project structures onto PCs
将结构投影到主成分上
projection = ensemble.getProjection(pca[:2])
undefinedprojection = ensemble.getProjection(pca[:2])
undefinedEvolutionary Analysis (Evol)
进化分析(Evol)
Sequence Conservation and Co-evolution
序列保守性与共进化
python
undefinedpython
undefinedLoad Multiple Sequence Alignment (MSA)
加载多序列比对(MSA)
msa = parseMSA('p38_alignment.fasta')
msa = parseMSA('p38_alignment.fasta')
Calculate conservation (Shannon Entropy)
计算保守性(香农熵)
entropy = calcShannonEntropy(msa)
entropy = calcShannonEntropy(msa)
Mutual Information (Co-evolution)
互信息(共进化)
mi = calcMutualInformation(msa)
mi = calcMutualInformation(msa)
Direct Coupling Analysis (DCA) - requires external tools or specific plugins
直接耦合分析(DCA)- 需要外部工具或特定插件
dca = calcDirectCovariance(msa)
dca = calcDirectCovariance(msa)
undefinedundefinedPractical Workflows
实用工作流
1. Identifying Functional "Hinges"
1. 识别功能性“铰链”
python
def get_protein_hinges(pdb_id):
atoms = parsePDB(pdb_id)
calphas = atoms.select('protein and calpha')
anm = ANM(pdb_id)
anm.buildHessian(calphas)
anm.calcModes()
# Hinge residues for the first two functional modes
hinges_m1 = findHinges(anm[0])
hinges_m2 = findHinges(anm[1])
return list(set(hinges_m1) | set(hinges_m2))python
def get_protein_hinges(pdb_id):
atoms = parsePDB(pdb_id)
calphas = atoms.select('protein and calpha')
anm = ANM(pdb_id)
anm.buildHessian(calphas)
anm.calcModes()
# 前两个功能模对应的铰链残基
hinges_m1 = findHinges(anm[0])
hinges_m2 = findHinges(anm[1])
return list(set(hinges_m1) | set(hinges_m2))2. Comparing MD Trajectory to ANM Modes
2. 比较MD轨迹与ANM模
python
def compare_md_to_anm(md_traj, p_pdb):
# 1. ANM from static structure
atoms = parsePDB(p_pdb)
anm = ANM('static')
anm.buildHessian(atoms.select('calpha'))
anm.calcModes()
# 2. PCA from MD
ens = Ensemble('md')
ens.setCoords(atoms)
ens.addCoordset(md_traj)
ens.iterpose()
pca = PCA('md_pca')
pca.buildCovariance(ens)
pca.calcModes()
# 3. Overlap (Inner product of modes)
overlap = calcOverlap(anm[0], pca[0])
return overlappython
def compare_md_to_anm(md_traj, p_pdb):
# 1. 基于静态结构构建ANM
atoms = parsePDB(p_pdb)
anm = ANM('static')
anm.buildHessian(atoms.select('calpha'))
anm.calcModes()
# 2. 基于MD轨迹构建PCA
ens = Ensemble('md')
ens.setCoords(atoms)
ens.addCoordset(md_traj)
ens.iterpose()
pca = PCA('md_pca')
pca.buildCovariance(ens)
pca.calcModes()
# 3. 重叠度(模的内积)
overlap = calcOverlap(anm[0], pca[0])
return overlap3. Druggability Analysis (TRAWLER/ProDy Integration)
3. 成药性分析(TRAWLER/ProDy集成)
python
undefinedpython
undefinedNote: Full druggability analysis usually involves 'hotspot' calculations
注意:完整的成药性分析通常涉及“热点”计算
def binding_site_flexibility(atoms, lig_resname):
site = atoms.select(f'protein and within 8 of resname {lig_resname}')
# Calculate GNM just for the site context
gnm = GNM('site')
gnm.buildKirchhoff(atoms.select('calpha'))
gnm.calcModes()
# High fluctuations = likely flexible binding site
flucts = calcSqFlucts(gnm)
return flucts[site.getIndices()]undefineddef binding_site_flexibility(atoms, lig_resname):
site = atoms.select(f'protein and within 8 of resname {lig_resname}')
# 针对结合位点环境计算GNM
gnm = GNM('site')
gnm.buildKirchhoff(atoms.select('calpha'))
gnm.calcModes()
# 高波动 = 结合位点可能具有柔性
flucts = calcSqFlucts(gnm)
return flucts[site.getIndices()]undefinedPerformance Optimization
性能优化
Memory Management with Large Hessians
大型海森矩阵的内存管理
For very large complexes (Ribosomes, Capsids), use sparse matrices or the Hierarchical Network Model (HNM) if available.
对于超大型复合物(核糖体、衣壳),使用稀疏矩阵或分层网络模型(HNM,若可用)。
Parallel MSA Parsing
并行MSA解析
When dealing with massive alignments (100k+ sequences), use with specific memory-efficient flags.
parseMSA处理大规模比对(10万+序列)时,使用并启用特定的内存高效标志。
parseMSACommon Pitfalls and Solutions
常见问题与解决方案
Atom Mapping Mismatch
原子映射不匹配
When comparing two PDBs of the same protein, one might have missing residues.
python
undefined比较同一蛋白质的两个PDB时,其中一个可能存在残基缺失。
python
undefined❌ Problem: ensemble.addCoordset(pdb2) fails due to different atom counts
❌ 问题:ensemble.addCoordset(pdb2)因原子数量不同失败
✅ Solution: Use matchAlign
✅ 解决方案:使用matchAlign
matches = matchAlign(pdb2, pdb1)
if matches:
ensemble.addCoordset(matches[0][0])
undefinedmatches = matchAlign(pdb2, pdb1)
if matches:
ensemble.addCoordset(matches[0][0])
undefinedNon-Standard Residues
非标准残基
ProDy might not recognize unusual ligands as 'hetero' or 'protein'.
python
undefinedProDy可能无法识别特殊配体为“杂原子”或“蛋白质”。
python
undefined✅ Solution: Use specific residue names or 'all'
✅ 解决方案:使用特定残基名称或'all'
ligand = atoms.select('resname MYL')
undefinedligand = atoms.select('resname MYL')
undefinedHessian Singularities
海森矩阵奇异性
If your protein has disconnected parts, the Hessian will have more than 6 zero eigenvalues.
python
undefined如果蛋白质存在不相连的部分,海森矩阵的零特征值会超过6个。
python
undefined✅ Solution: Check connectivity
✅ 解决方案:检查连通性
if not atoms.select('protein').connected:
print("Warning: Disconnected components found!")
undefinedif not atoms.select('protein').connected:
print("警告:发现不相连的组件!")
undefinedBest Practices
最佳实践
- Always select C-alpha atoms for NMA on large proteins to reduce computational cost.
- Align all structures in an ensemble before performing PCA using .
iterpose() - Filter PDB structures early using selection algebra to reduce memory usage.
- Remember that ANM modes 0-5 are rigid-body motions; biological motion starts at mode 6.
- Use when comparing structures with different atom counts or missing residues.
matchAlign() - Check for disconnected components before building Hessian matrices.
- Use appropriate cutoff distances (typically 10-15 Å for C-alpha networks).
- Validate eigensolver convergence to ensure meaningful results.
- Save modes in NMD format for visualization in VMD or PyMOL.
- Consider using sparse matrix representations for very large systems.
ProDy is the essential toolkit for the "Dynamic" in Structural Biology. By treating proteins as physical networks, it provides a bridge between static snapshots and the vibrating reality of life at the molecular scale.
- 对大型蛋白质进行NMA时,仅选择C-alpha原子以降低计算成本。
- 对结构集合执行PCA前,务必使用对齐所有结构。
iterpose() - 尽早使用选择代数过滤PDB结构,以减少内存占用。
- 记住ANM的0-5号模是刚体运动;生物运动从模6开始。
- 比较原子数量不同或存在残基缺失的结构时,使用。
matchAlign() - 构建海森矩阵前检查是否存在不相连的组件。
- 使用合适的截断距离(C-alpha网络通常为10-15 Å)。
- 验证特征解算器的收敛性,确保结果有意义。
- 将模保存为NMD格式,以便在VMD或PyMOL中可视化。
- 对于超大型系统,考虑使用稀疏矩阵表示。
ProDy是结构生物学中“动力学”分析的必备工具包。它将蛋白质视为物理网络,搭建起静态结构快照与分子尺度下生命动态过程之间的桥梁。