mdanalysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MDAnalysis - Molecular Dynamics Analysis

MDAnalysis - 分子动力学分析

Python library for reading, writing, and analyzing molecular dynamics trajectories and structural files.
用于读取、写入和分析分子动力学轨迹与结构文件的Python库。

When to Use

适用场景

  • Loading MD trajectories (DCD, XTC, TRR, NetCDF, etc.)
  • RMSD and RMSF calculations
  • Distance, angle, and dihedral analysis
  • Atom selections (VMD-like syntax)
  • Hydrogen bond analysis
  • Solvent Accessible Surface Area (SASA)
  • Protein secondary structure analysis
  • Membrane system analysis
  • Water/ion distribution analysis
  • Trajectory alignment and fitting
  • Custom trajectory analysis
  • Converting between file formats
  • 加载MD轨迹(DCD、XTC、TRR、NetCDF等格式)
  • RMSD与RMSF计算
  • 距离、角度与二面角分析
  • 原子选择(类VMD语法)
  • 氢键分析
  • 溶剂可及表面积(SASA)计算
  • 蛋白质二级结构分析
  • 膜系统分析
  • 水/离子分布分析
  • 轨迹对齐与拟合
  • 自定义轨迹分析
  • 文件格式转换

Reference Documentation

参考文档

Official docs: https://www.mdanalysis.org/docs/
Search patterns:
MDAnalysis.Universe
,
MDAnalysis.analysis.rms
,
MDAnalysis.analysis.distances
官方文档https://www.mdanalysis.org/docs/
搜索关键词
MDAnalysis.Universe
MDAnalysis.analysis.rms
MDAnalysis.analysis.distances

Core Principles

核心原则

Use MDAnalysis For

使用MDAnalysis的场景

TaskModuleExample
Load trajectory
Universe
Universe(topology, trajectory)
RMSD calculation
analysis.rms
RMSD(mobile, ref)
Atom selection
select_atoms
u.select_atoms('protein')
Distance analysis
analysis.distances
distance_array(pos1, pos2)
H-bond analysis
analysis.hbonds
HydrogenBondAnalysis()
SASA calculation
analysis.sasa
SASAnalysis()
Contacts analysis
analysis.contacts
Contacts()
Trajectory writing
Writer
with Writer() as W
任务模块示例
加载轨迹
Universe
Universe(topology, trajectory)
计算RMSD
analysis.rms
RMSD(mobile, ref)
原子选择
select_atoms
u.select_atoms('protein')
距离分析
analysis.distances
distance_array(pos1, pos2)
氢键分析
analysis.hbonds
HydrogenBondAnalysis()
SASA计算
analysis.sasa
SASAnalysis()
接触分析
analysis.contacts
Contacts()
轨迹写入
Writer
with Writer() as W

Do NOT Use For

不适合使用MDAnalysis的场景

  • Running MD simulations (use GROMACS, AMBER, NAMD)
  • Force field calculations (use OpenMM, MDTraj)
  • Quantum chemistry (use PySCF, Qiskit)
  • Protein structure prediction (use AlphaFold, RosettaFold)
  • Initial structure building (use Biopython, PyMOL)
  • 运行MD模拟(使用GROMACS、AMBER、NAMD)
  • 力场计算(使用OpenMM、MDTraj)
  • 量子化学计算(使用PySCF、Qiskit)
  • 蛋白质结构预测(使用AlphaFold、RosettaFold)
  • 初始结构构建(使用Biopython、PyMOL)

Quick Reference

快速参考

Installation

安装

bash
undefined
bash
undefined

pip

pip安装

pip install MDAnalysis
pip install MDAnalysis

With additional analysis modules

安装包含额外分析模块的版本

pip install MDAnalysis[analysis]
pip install MDAnalysis[analysis]

conda

conda安装

conda install -c conda-forge mdanalysis
conda install -c conda-forge mdanalysis

Development version

安装开发版本

Standard Imports

标准导入

python
undefined
python
undefined

Core imports

核心模块导入

import MDAnalysis as mda from MDAnalysis import Universe from MDAnalysis.analysis import rms, align, distances
import MDAnalysis as mda from MDAnalysis import Universe from MDAnalysis.analysis import rms, align, distances

Common analysis modules

常用分析模块

from MDAnalysis.analysis.rms import RMSD, RMSF from MDAnalysis.analysis.distances import distance_array from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis from MDAnalysis.analysis.dihedrals import Dihedral
from MDAnalysis.analysis.rms import RMSD, RMSF from MDAnalysis.analysis.distances import distance_array from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis from MDAnalysis.analysis.dihedrals import Dihedral

Utilities

工具模块

import numpy as np import matplotlib.pyplot as plt
undefined
import numpy as np import matplotlib.pyplot as plt
undefined

Basic Pattern - Load and Analyze

基础模式 - 加载与分析

python
import MDAnalysis as mda
from MDAnalysis.analysis.rms import RMSD
python
import MDAnalysis as mda
from MDAnalysis.analysis.rms import RMSD

Load trajectory

加载轨迹

u = mda.Universe('topology.pdb', 'trajectory.dcd')
u = mda.Universe('topology.pdb', 'trajectory.dcd')

Select atoms

选择原子

protein = u.select_atoms('protein') ca_atoms = u.select_atoms('protein and name CA')
protein = u.select_atoms('protein') ca_atoms = u.select_atoms('protein and name CA')

Calculate RMSD

计算RMSD

rmsd_analysis = RMSD(protein, protein, select='backbone') rmsd_analysis.run()
rmsd_analysis = RMSD(protein, protein, select='backbone') rmsd_analysis.run()

Access results

获取结果

rmsd = rmsd_analysis.results.rmsd print(f"RMSD over time: {rmsd[:, 2]}") # Column 2 is RMSD
undefined
rmsd = rmsd_analysis.results.rmsd print(f"随时间变化的RMSD: {rmsd[:, 2]}") # 第2列为RMSD数值
undefined

Basic Pattern - Atom Selection

基础模式 - 原子选择

python
import MDAnalysis as mda

u = mda.Universe('structure.pdb')
python
import MDAnalysis as mda

u = mda.Universe('structure.pdb')

Various selections (VMD-like syntax)

多种选择方式(类VMD语法)

protein = u.select_atoms('protein') backbone = u.select_atoms('backbone') ca = u.select_atoms('name CA') resid_10 = u.select_atoms('resid 10') within_5A = u.select_atoms('around 5 resid 10') water = u.select_atoms('resname WAT or resname HOH')
print(f"Number of protein atoms: {len(protein)}") print(f"Number of CA atoms: {len(ca)}")
undefined
protein = u.select_atoms('protein') backbone = u.select_atoms('backbone') ca = u.select_atoms('name CA') resid_10 = u.select_atoms('resid 10') within_5A = u.select_atoms('around 5 resid 10') water = u.select_atoms('resname WAT or resname HOH')
print(f"蛋白质原子数量: {len(protein)}") print(f"CA原子数量: {len(ca)}")
undefined

Critical Rules

关键规则

✅ DO

✅ 建议做法

  • Close trajectory files - Use context managers or close explicitly
  • Use atom selections efficiently - Cache selections for reuse
  • Check trajectory length - Verify n_frames before analysis
  • Use vectorized operations - Leverage NumPy for speed
  • Align trajectories - Align before RMSD calculations
  • Handle periodic boundaries - Use PBC-aware distance calculations
  • Validate atom groups - Check empty selections
  • Use appropriate frames - Slice trajectories if needed
  • Save intermediate results - Don't recompute expensive calculations
  • Check units - MDAnalysis uses Angstroms and picoseconds
  • 关闭轨迹文件 - 使用上下文管理器或显式关闭
  • 高效使用原子选择 - 缓存选择结果以便复用
  • 检查轨迹长度 - 分析前验证帧数
  • 使用向量化操作 - 借助NumPy提升速度
  • 对齐轨迹 - 计算RMSD前先对齐轨迹
  • 处理周期性边界 - 使用支持PBC的距离计算方法
  • 验证原子组 - 检查选择结果是否为空
  • 使用合适的帧 - 必要时对轨迹进行切片
  • 保存中间结果 - 避免重复计算耗时操作
  • 检查单位 - MDAnalysis使用埃(Angstroms)和皮秒(picoseconds)作为单位

❌ DON'T

❌ 禁止做法

  • Load entire trajectory in memory - Stream through frames
  • Ignore PBC - Always consider periodic boundary conditions
  • Forget to align - RMSD without alignment is meaningless
  • Use wrong atom names - Check topology for correct names
  • Mix coordinate systems - Be consistent with units
  • Ignore missing atoms - Handle incomplete residues
  • Recompute unnecessarily - Cache expensive calculations
  • Use string selections in loops - Parse once, reuse
  • Forget to unwrap coordinates - Handle molecules split by PBC
  • Ignore memory limits - Process large trajectories in chunks
  • 将整个轨迹加载到内存 - 逐帧流式处理
  • 忽略PBC - 始终考虑周期性边界条件
  • 忘记对齐轨迹 - 未对齐的RMSD计算无意义
  • 使用错误的原子名称 - 检查拓扑文件确认正确名称
  • 混合坐标系 - 保持单位一致性
  • 忽略缺失原子 - 处理不完整的残基
  • 不必要的重复计算 - 缓存耗时计算结果
  • 在循环中使用字符串选择 - 解析一次后复用
  • 忘记展开坐标 - 处理被PBC分割的分子
  • 忽略内存限制 - 分块处理大型轨迹

Anti-Patterns (NEVER)

反模式(绝对避免)

python
import MDAnalysis as mda
import numpy as np
python
import MDAnalysis as mda
import numpy as np

❌ BAD: Loading entire trajectory in memory

❌ 错误:将整个轨迹加载到内存

u = mda.Universe('top.pdb', 'traj.dcd') all_coords = [] for ts in u.trajectory: all_coords.append(u.atoms.positions.copy()) all_coords = np.array(all_coords) # Huge memory usage!
u = mda.Universe('top.pdb', 'traj.dcd') all_coords = [] for ts in u.trajectory: all_coords.append(u.atoms.positions.copy()) all_coords = np.array(all_coords) # 内存占用极大!

✅ GOOD: Process frame by frame

✅ 正确:逐帧处理

u = mda.Universe('top.pdb', 'traj.dcd') for ts in u.trajectory: # Process current frame coords = u.atoms.positions # Do analysis... # Move to next frame automatically
u = mda.Universe('top.pdb', 'traj.dcd') for ts in u.trajectory: # 处理当前帧 coords = u.atoms.positions # 执行分析... # 自动跳转到下一帧

❌ BAD: RMSD without alignment

❌ 错误:未对齐轨迹就计算RMSD

rmsd_values = [] for ts in u.trajectory: rmsd = rms.rmsd(mobile.positions, reference.positions) rmsd_values.append(rmsd) # Wrong! Not aligned!
rmsd_values = [] for ts in u.trajectory: rmsd = rms.rmsd(mobile.positions, reference.positions) rmsd_values.append(rmsd) # 错误!未对齐!

✅ GOOD: Align before RMSD

✅ 正确:先对齐再计算RMSD

from MDAnalysis.analysis.rms import RMSD R = RMSD(mobile, reference, select='backbone') R.run() rmsd_values = R.results.rmsd[:, 2]
from MDAnalysis.analysis.rms import RMSD R = RMSD(mobile, reference, select='backbone') R.run() rmsd_values = R.results.rmsd[:, 2]

❌ BAD: Creating selection in loop

❌ 错误:在循环中创建选择

for ts in u.trajectory: ca = u.select_atoms('name CA') # Parsed every frame! # Do something with ca
for ts in u.trajectory: ca = u.select_atoms('name CA') # 每帧都重新解析! # 使用ca执行操作

✅ GOOD: Create selection once

✅ 正确:仅创建一次选择

ca = u.select_atoms('name CA') for ts in u.trajectory: # Use ca (automatically updated each frame) positions = ca.positions
ca = u.select_atoms('name CA') for ts in u.trajectory: # 使用ca(每帧自动更新) positions = ca.positions

❌ BAD: Ignoring periodic boundaries

❌ 错误:忽略周期性边界

distance = np.linalg.norm(atom1.position - atom2.position)
distance = np.linalg.norm(atom1.position - atom2.position)

✅ GOOD: PBC-aware distance

✅ 正确:支持PBC的距离计算

from MDAnalysis.lib.distances import distance_array dist = distance_array( atom1.position[np.newaxis, :], atom2.position[np.newaxis, :], box=u.dimensions )[0, 0]
from MDAnalysis.lib.distances import distance_array dist = distance_array( atom1.position[np.newaxis, :], atom2.position[np.newaxis, :], box=u.dimensions )[0, 0]

❌ BAD: Not checking for empty selections

❌ 错误:未检查空选择

selection = u.select_atoms('resname XYZ')
selection = u.select_atoms('resname XYZ')

Continue without checking if selection is empty!

未检查选择是否为空就继续!

avg_pos = selection.center_of_mass() # May crash!
avg_pos = selection.center_of_mass() # 可能崩溃!

✅ GOOD: Validate selections

✅ 正确:验证选择结果

selection = u.select_atoms('resname XYZ') if len(selection) == 0: print("Warning: No atoms found matching selection") else: avg_pos = selection.center_of_mass()
undefined
selection = u.select_atoms('resname XYZ') if len(selection) == 0: print("警告:未找到匹配选择条件的原子") else: avg_pos = selection.center_of_mass()
undefined

Loading Trajectories (Universe)

加载轨迹(Universe)

Basic Universe Creation

基础Universe创建

python
import MDAnalysis as mda
python
import MDAnalysis as mda

Single structure file

加载单个结构文件

u = mda.Universe('protein.pdb')
u = mda.Universe('protein.pdb')

Topology + trajectory

加载拓扑文件+轨迹文件

u = mda.Universe('topology.pdb', 'trajectory.dcd')
u = mda.Universe('topology.pdb', 'trajectory.dcd')

Multiple trajectories (concatenated)

加载多个轨迹文件(拼接)

u = mda.Universe('top.pdb', 'traj1.dcd', 'traj2.dcd', 'traj3.dcd')
u = mda.Universe('top.pdb', 'traj1.dcd', 'traj2.dcd', 'traj3.dcd')

Different formats

加载不同格式的文件

u = mda.Universe('system.gro', 'traj.xtc') # GROMACS u = mda.Universe('system.psf', 'traj.dcd') # CHARMM/NAMD u = mda.Universe('system.prmtop', 'traj.nc') # AMBER
u = mda.Universe('system.gro', 'traj.xtc') # GROMACS格式 u = mda.Universe('system.psf', 'traj.dcd') # CHARMM/NAMD格式 u = mda.Universe('system.prmtop', 'traj.nc') # AMBER格式

From memory (numpy arrays)

从内存加载(numpy数组)

coords = np.random.rand(100, 3) # 100 atoms, xyz u = mda.Universe.empty(100, trajectory=True) u.atoms.positions = coords
print(f"Number of atoms: {len(u.atoms)}") print(f"Number of frames: {len(u.trajectory)}") print(f"Total time: {u.trajectory.totaltime} ps")
undefined
coords = np.random.rand(100, 3) # 100个原子的xyz坐标 u = mda.Universe.empty(100, trajectory=True) u.atoms.positions = coords
print(f"原子数量: {len(u.atoms)}") print(f"帧数: {len(u.trajectory)}") print(f"总时长: {u.trajectory.totaltime} ps")
undefined

Trajectory Information

轨迹信息

python
import MDAnalysis as mda

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Trajectory properties

轨迹属性

traj = u.trajectory print(f"Number of frames: {traj.n_frames}") print(f"Time step: {traj.dt} ps") print(f"Total time: {traj.totaltime} ps")
traj = u.trajectory print(f"帧数: {traj.n_frames}") print(f"时间步长: {traj.dt} ps") print(f"总时长: {traj.totaltime} ps")

Current frame info

当前帧信息

print(f"Current frame: {traj.frame}") print(f"Current time: {traj.time} ps") print(f"Box dimensions: {u.dimensions}") # [a, b, c, alpha, beta, gamma]
print(f"当前帧: {traj.frame}") print(f"当前时间: {traj.time} ps") print(f"盒子尺寸: {u.dimensions}") # [a, b, c, alpha, beta, gamma]

Iterate through frames

遍历帧

for i, ts in enumerate(u.trajectory): if i >= 5: break print(f"Frame {ts.frame}: time = {ts.time:.2f} ps")
for i, ts in enumerate(u.trajectory): if i >= 5: break print(f"帧 {ts.frame}: 时间 = {ts.time:.2f} ps")

Jump to specific frame

跳转到指定帧

u.trajectory[100] # Go to frame 100 print(f"Now at frame: {u.trajectory.frame}")
u.trajectory[100] # 跳转到第100帧 print(f"当前帧: {u.trajectory.frame}")

Slice trajectory

切片轨迹

for ts in u.trajectory[::10]: # Every 10th frame print(f"Frame {ts.frame}")
undefined
for ts in u.trajectory[::10]: # 每10帧取一帧 print(f"帧 {ts.frame}")
undefined

Working with Multiple Trajectories

处理多个轨迹文件

python
import MDAnalysis as mda
python
import MDAnalysis as mda

Load multiple trajectory files

加载多个轨迹文件

u = mda.Universe('top.pdb', 'part1.dcd', 'part2.dcd', 'part3.dcd')
print(f"Total frames from all trajectories: {len(u.trajectory)}")
u = mda.Universe('top.pdb', 'part1.dcd', 'part2.dcd', 'part3.dcd')
print(f"所有轨迹的总帧数: {len(u.trajectory)}")

Or use ChainReader explicitly

或显式使用ChainReader

from MDAnalysis.coordinates.chain import ChainReader
trajectories = ['part1.dcd', 'part2.dcd', 'part3.dcd'] u = mda.Universe('top.pdb', trajectories, continuous=True)
from MDAnalysis.coordinates.chain import ChainReader
trajectories = ['part1.dcd', 'part2.dcd', 'part3.dcd'] u = mda.Universe('top.pdb', trajectories, continuous=True)

Access frame indices

访问帧索引

for i, ts in enumerate(u.trajectory[::100]): print(f"Global frame {i}: actual frame {ts.frame}, time {ts.time:.2f}")
undefined
for i, ts in enumerate(u.trajectory[::100]): print(f"全局索引 {i}: 实际帧 {ts.frame}, 时间 {ts.time:.2f}")
undefined

Atom Selections

原子选择

Basic Selection Syntax

基础选择语法

python
import MDAnalysis as mda

u = mda.Universe('system.pdb')
python
import MDAnalysis as mda

u = mda.Universe('system.pdb')

Protein selections

蛋白质选择

protein = u.select_atoms('protein') backbone = u.select_atoms('backbone') # N, CA, C mainchain = u.select_atoms('backbone or name O')
protein = u.select_atoms('protein') backbone = u.select_atoms('backbone') # N、CA、C原子 mainchain = u.select_atoms('backbone or name O')

Specific residues

特定残基选择

resid_10 = u.select_atoms('resid 10') resid_range = u.select_atoms('resid 10:20') resid_list = u.select_atoms('resid 10 15 20 25')
resid_10 = u.select_atoms('resid 10') resid_range = u.select_atoms('resid 10:20') resid_list = u.select_atoms('resid 10 15 20 25')

Residue names

残基名称选择

water = u.select_atoms('resname WAT or resname HOH') ions = u.select_atoms('resname NA or resname CL')
water = u.select_atoms('resname WAT or resname HOH') ions = u.select_atoms('resname NA or resname CL')

Atom names

原子名称选择

ca_atoms = u.select_atoms('name CA') hydrogens = u.select_atoms('name H*') # Wildcards
ca_atoms = u.select_atoms('name CA') hydrogens = u.select_atoms('name H*') # 使用通配符

By element

按元素选择

carbons = u.select_atoms('element C')
carbons = u.select_atoms('element C')

Segments

按片段选择

seg_a = u.select_atoms('segid A')
print(f"Protein atoms: {len(protein)}") print(f"Water molecules: {len(water)}") print(f"CA atoms: {len(ca_atoms)}")
undefined
seg_a = u.select_atoms('segid A')
print(f"蛋白质原子数量: {len(protein)}") print(f"水分子数量: {len(water)}") print(f"CA原子数量: {len(ca_atoms)}")
undefined

Advanced Selections

高级选择

python
import MDAnalysis as mda

u = mda.Universe('protein_solvent.pdb')
python
import MDAnalysis as mda

u = mda.Universe('protein_solvent.pdb')

Geometric selections

几何选择

around_10 = u.select_atoms('around 5.0 protein') # Within 5Å of protein within_box = u.select_atoms('prop x < 50 and prop y < 50')
around_10 = u.select_atoms('around 5.0 protein') # 距离蛋白质5Å范围内的原子 within_box = u.select_atoms('prop x < 50 and prop y < 50')

Spatial queries

空间查询

resid_10 = u.select_atoms('resid 10') near_res10 = u.select_atoms('around 10.0 resid 10 and not resid 10')
resid_10 = u.select_atoms('resid 10') near_res10 = u.select_atoms('around 10.0 resid 10 and not resid 10')

Combining selections

组合选择

hydrophobic = u.select_atoms( 'resname ALA or resname VAL or resname LEU or ' 'resname ILE or resname PHE or resname TRP or resname MET' )
charged = u.select_atoms( 'resname ARG or resname LYS or resname ASP or resname GLU' )
hydrophobic = u.select_atoms( 'resname ALA or resname VAL or resname LEU or ' 'resname ILE or resname PHE or resname TRP or resname MET' )
charged = u.select_atoms( 'resname ARG or resname LYS or resname ASP or resname GLU' )

Boolean operations

布尔运算

not_protein = u.select_atoms('not protein') protein_no_h = u.select_atoms('protein and not name H*')
not_protein = u.select_atoms('not protein') protein_no_h = u.select_atoms('protein and not name H*')

By property

按属性选择

high_bfactor = u.select_atoms('protein and prop beta > 50')
high_bfactor = u.select_atoms('protein and prop beta > 50')

SMARTS matching (requires RDKit)

SMARTS匹配(需要RDKit)

aromatic = u.select_atoms('smarts c1ccccc1')

aromatic = u.select_atoms('smarts c1ccccc1')

print(f"Hydrophobic residues: {len(hydrophobic)}") print(f"Charged residues: {len(charged)}")
undefined
print(f"疏水残基数量: {len(hydrophobic)}") print(f"带电残基数量: {len(charged)}")
undefined

Dynamic Selections

动态选择

python
import MDAnalysis as mda

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Regular selection (static)

常规选择(静态)

protein = u.select_atoms('protein')
protein = u.select_atoms('protein')

Dynamic selection (updated each frame)

动态选择(每帧更新)

Example: Select water within 3.5Å of protein

示例:选择距离蛋白质3.5Å范围内的水

water_near_protein = u.select_atoms( 'resname WAT and around 3.5 protein', updating=True )
water_near_protein = u.select_atoms( 'resname WAT and around 3.5 protein', updating=True )

Track number of nearby water molecules over time

跟踪每帧中靠近蛋白质的水分子数量

water_counts = [] for ts in u.trajectory: # water_near_protein automatically updates water_counts.append(len(water_near_protein))
print(f"Water molecules near protein per frame:") print(f" Min: {min(water_counts)}") print(f" Max: {max(water_counts)}") print(f" Mean: {sum(water_counts)/len(water_counts):.1f}")
water_counts = [] for ts in u.trajectory: # water_near_protein会自动更新 water_counts.append(len(water_near_protein))
print(f"每帧中靠近蛋白质的水分子数量:") print(f" 最小值: {min(water_counts)}") print(f" 最大值: {max(water_counts)}") print(f" 平均值: {sum(water_counts)/len(water_counts):.1f}")

Another example: ions near active site

另一个示例:活性位点附近的离子

active_site = u.select_atoms('resid 100:110') nearby_ions = u.select_atoms( '(resname NA or resname CL) and around 8.0 resid 100:110', updating=True )
for i, ts in enumerate(u.trajectory[::10]): print(f"Frame {ts.frame}: {len(nearby_ions)} ions near active site")
undefined
active_site = u.select_atoms('resid 100:110') nearby_ions = u.select_atoms( '(resname NA or resname CL) and around 8.0 resid 100:110', updating=True )
for i, ts in enumerate(u.trajectory[::10]): print(f"帧 {ts.frame}: 活性位点附近有 {len(nearby_ions)} 个离子")
undefined

Selection Groups and Operations

选择组与操作

python
import MDAnalysis as mda

u = mda.Universe('protein.pdb')
python
import MDAnalysis as mda

u = mda.Universe('protein.pdb')

Create multiple selections

创建多个选择结果

protein = u.select_atoms('protein') backbone = u.select_atoms('backbone') ca_atoms = u.select_atoms('name CA')
protein = u.select_atoms('protein') backbone = u.select_atoms('backbone') ca_atoms = u.select_atoms('name CA')

Combine selections

组合选择结果

combined = protein | backbone # Union (OR) intersection = protein & backbone # Intersection (AND) difference = protein - backbone # Difference (NOT)
combined = protein | backbone # 并集(OR) intersection = protein & backbone # 交集(AND) difference = protein - backbone # 差集(NOT)

Access properties

访问属性

print(f"Combined: {len(combined)} atoms") print(f"Center of mass: {protein.center_of_mass()}") print(f"Center of geometry: {protein.center_of_geometry()}") print(f"Total mass: {protein.total_mass()}") print(f"Total charge: {protein.total_charge()}")
print(f"组合后的原子数量: {len(combined)}") print(f"质心: {protein.center_of_mass()}") print(f"几何中心: {protein.center_of_geometry()}") print(f"总质量: {protein.total_mass()}") print(f"总电荷: {protein.total_charge()}")

Iterate through atoms

遍历原子

for atom in ca_atoms[:5]: print(f"Atom {atom.name} {atom.resname}{atom.resid}: {atom.position}")
for atom in ca_atoms[:5]: print(f"原子 {atom.name} {atom.resname}{atom.resid}: {atom.position}")

Iterate through residues

遍历残基

for residue in protein.residues[:5]: print(f"Residue {residue.resname}{residue.resid}: {residue.atoms.n_atoms} atoms")
for residue in protein.residues[:5]: print(f"残基 {residue.resname}{residue.resid}: {residue.atoms.n_atoms} 个原子")

Iterate through segments

遍历片段

for segment in u.segments: print(f"Segment {segment.segid}: {len(segment.atoms)} atoms")
undefined
for segment in u.segments: print(f"片段 {segment.segid}: {len(segment.atoms)} 个原子")
undefined

RMSD and RMSF Analysis

RMSD与RMSF分析

RMSD Calculation

RMSD计算

python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt
python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt

Load trajectory

加载轨迹

u = mda.Universe('topology.pdb', 'trajectory.dcd')
u = mda.Universe('topology.pdb', 'trajectory.dcd')

Define reference (first frame)

定义参考结构(第一帧)

reference = u.copy() reference.trajectory[0]
reference = u.copy() reference.trajectory[0]

Select atoms for RMSD (typically backbone or CA)

选择用于RMSD计算的原子(通常是主链或CA原子)

mobile = u.select_atoms('backbone') ref = reference.select_atoms('backbone')
mobile = u.select_atoms('backbone') ref = reference.select_atoms('backbone')

Calculate RMSD with alignment

带对齐的RMSD计算

R = rms.RMSD(mobile, ref, select='backbone', ref_frame=0) R.run()
R = rms.RMSD(mobile, ref, select='backbone', ref_frame=0) R.run()

Extract results

提取结果

rmsd_data = R.results.rmsd time = rmsd_data[:, 1] # Time in ps rmsd_values = rmsd_data[:, 2] # RMSD in Angstroms
rmsd_data = R.results.rmsd time = rmsd_data[:, 1] # 时间(单位:ps) rmsd_values = rmsd_data[:, 2] # RMSD(单位:埃)

Plot

绘图

plt.figure(figsize=(10, 6)) plt.plot(time, rmsd_values) plt.xlabel('Time (ps)') plt.ylabel('RMSD (Å)') plt.title('Backbone RMSD over time') plt.grid(True) plt.savefig('rmsd.png', dpi=300)
print(f"Mean RMSD: {np.mean(rmsd_values):.2f} Å") print(f"Max RMSD: {np.max(rmsd_values):.2f} Å") print(f"Final RMSD: {rmsd_values[-1]:.2f} Å")
undefined
plt.figure(figsize=(10, 6)) plt.plot(time, rmsd_values) plt.xlabel('时间 (ps)') plt.ylabel('RMSD (Å)') plt.title('主链RMSD随时间变化') plt.grid(True) plt.savefig('rmsd.png', dpi=300)
print(f"平均RMSD: {np.mean(rmsd_values):.2f} Å") print(f"最大RMSD: {np.max(rmsd_values):.2f} Å") print(f"最终RMSD: {rmsd_values[-1]:.2f} Å")
undefined

RMSD to Multiple References

与多个参考结构的RMSD比较

python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Compare to multiple reference structures

与多个参考结构比较

references = { 'crystal': mda.Universe('crystal.pdb'), 'equilibrated': mda.Universe('equilibrated.pdb'), 'apo': mda.Universe('apo_structure.pdb') }
results = {}
for ref_name, ref_universe in references.items(): mobile = u.select_atoms('backbone') ref = ref_universe.select_atoms('backbone')
R = rms.RMSD(mobile, ref, select='backbone')
R.run()

results[ref_name] = R.results.rmsd[:, 2]
references = { '晶体结构': mda.Universe('crystal.pdb'), '平衡后结构': mda.Universe('equilibrated.pdb'), 'apo结构': mda.Universe('apo_structure.pdb') }
results = {}
for ref_name, ref_universe in references.items(): mobile = u.select_atoms('backbone') ref = ref_universe.select_atoms('backbone')
R = rms.RMSD(mobile, ref, select='backbone')
R.run()

results[ref_name] = R.results.rmsd[:, 2]

Plot comparison

绘制比较图

import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6)) for ref_name, rmsd_values in results.items(): time = np.arange(len(rmsd_values)) * u.trajectory.dt plt.plot(time, rmsd_values, label=ref_name)
plt.xlabel('Time (ps)') plt.ylabel('RMSD (Å)') plt.legend() plt.title('RMSD to different reference structures') plt.grid(True) plt.savefig('rmsd_comparison.png', dpi=300)
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6)) for ref_name, rmsd_values in results.items(): time = np.arange(len(rmsd_values)) * u.trajectory.dt plt.plot(time, rmsd_values, label=ref_name)
plt.xlabel('时间 (ps)') plt.ylabel('RMSD (Å)') plt.legend() plt.title('与不同参考结构的RMSD比较') plt.grid(True) plt.savefig('rmsd_comparison.png', dpi=300)

Statistics

统计信息

for ref_name, rmsd_values in results.items(): print(f"{ref_name}: mean={np.mean(rmsd_values):.2f} Å, " f"std={np.std(rmsd_values):.2f} Å")
undefined
for ref_name, rmsd_values in results.items(): print(f"{ref_name}: 平均值={np.mean(rmsd_values):.2f} Å, " f"标准差={np.std(rmsd_values):.2f} Å")
undefined

RMSF Calculation (Fluctuations)

RMSF计算(波动分析)

python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Select CA atoms (typical for RMSF)

选择CA原子(RMSF计算的常用选择)

ca_atoms = u.select_atoms('protein and name CA')
ca_atoms = u.select_atoms('protein and name CA')

Calculate RMSF

计算RMSF

rmsf_analysis = rms.RMSF(ca_atoms, verbose=True) rmsf_analysis.run()
rmsf_analysis = rms.RMSF(ca_atoms, verbose=True) rmsf_analysis.run()

Extract results

提取结果

rmsf_values = rmsf_analysis.results.rmsf
rmsf_values = rmsf_analysis.results.rmsf

Plot RMSF vs residue number

绘制RMSF与残基编号的关系图

residue_numbers = [atom.resid for atom in ca_atoms]
plt.figure(figsize=(12, 6)) plt.plot(residue_numbers, rmsf_values, linewidth=2) plt.xlabel('Residue Number') plt.ylabel('RMSF (Å)') plt.title('Per-residue Fluctuations (RMSF)') plt.grid(True) plt.savefig('rmsf.png', dpi=300)
residue_numbers = [atom.resid for atom in ca_atoms]
plt.figure(figsize=(12, 6)) plt.plot(residue_numbers, rmsf_values, linewidth=2) plt.xlabel('残基编号') plt.ylabel('RMSF (Å)') plt.title('残基水平波动(RMSF)') plt.grid(True) plt.savefig('rmsf.png', dpi=300)

Identify flexible regions (high RMSF)

识别柔性区域(高RMSD)

threshold = np.mean(rmsf_values) + np.std(rmsf_values) flexible_residues = [resid for resid, rmsf in zip(residue_numbers, rmsf_values) if rmsf > threshold]
print(f"Mean RMSF: {np.mean(rmsf_values):.2f} Å") print(f"Flexible regions (RMSF > {threshold:.2f}): {flexible_residues}")
threshold = np.mean(rmsf_values) + np.std(rmsf_values) flexible_residues = [resid for resid, rmsf in zip(residue_numbers, rmsf_values) if rmsf > threshold]
print(f"平均RMSF: {np.mean(rmsf_values):.2f} Å") print(f"柔性区域(RMSF > {threshold:.2f}): {flexible_residues}")

RMSF per atom (all heavy atoms)

所有重原子的RMSF

heavy_atoms = u.select_atoms('protein and not name H*') rmsf_all = rms.RMSF(heavy_atoms) rmsf_all.run()
print(f"\nMean RMSF (all heavy atoms): {np.mean(rmsf_all.results.rmsf):.2f} Å")
undefined
heavy_atoms = u.select_atoms('protein and not name H*') rmsf_all = rms.RMSF(heavy_atoms) rmsf_all.run()
print(f"\n所有重原子的平均RMSF: {np.mean(rmsf_all.results.rmsf):.2f} Å")
undefined

2D RMSD Matrix

二维RMSD矩阵

python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Select atoms

选择原子

selection = u.select_atoms('backbone')
selection = u.select_atoms('backbone')

Compute pairwise RMSD matrix

计算两两RMSD矩阵

n_frames = len(u.trajectory) rmsd_matrix = np.zeros((n_frames, n_frames))
n_frames = len(u.trajectory) rmsd_matrix = np.zeros((n_frames, n_frames))

Store coordinates for all frames

存储所有帧的坐标

coords = [] for ts in u.trajectory: coords.append(selection.positions.copy())
coords = [] for ts in u.trajectory: coords.append(selection.positions.copy())

Calculate pairwise RMSD

计算两两RMSD

for i in range(n_frames): for j in range(i, n_frames): # Align j to i rmsd_value = rms.rmsd(coords[j], coords[i], center=True, superposition=True) rmsd_matrix[i, j] = rmsd_value rmsd_matrix[j, i] = rmsd_value
for i in range(n_frames): for j in range(i, n_frames): # 将j帧对齐到i帧 rmsd_value = rms.rmsd(coords[j], coords[i], center=True, superposition=True) rmsd_matrix[i, j] = rmsd_value rmsd_matrix[j, i] = rmsd_value

Plot matrix

绘制矩阵图

plt.figure(figsize=(10, 8)) plt.imshow(rmsd_matrix, cmap='viridis', origin='lower') plt.colorbar(label='RMSD (Å)') plt.xlabel('Frame') plt.ylabel('Frame') plt.title('Pairwise RMSD Matrix') plt.savefig('rmsd_matrix.png', dpi=300)
plt.figure(figsize=(10, 8)) plt.imshow(rmsd_matrix, cmap='viridis', origin='lower') plt.colorbar(label='RMSD (Å)') plt.xlabel('帧') plt.ylabel('帧') plt.title('两两RMSD矩阵') plt.savefig('rmsd_matrix.png', dpi=300)

Identify clusters (frames with low mutual RMSD)

识别聚类(两两RMSD低的帧)

threshold = 2.0 # Angstroms similar_frames = np.where(rmsd_matrix < threshold) print(f"Frame pairs with RMSD < {threshold} Å: {len(similar_frames[0])}")
undefined
threshold = 2.0 # 埃 similar_frames = np.where(rmsd_matrix < threshold) print(f"RMSD < {threshold} Å 的帧对数量: {len(similar_frames[0])}")
undefined

Distance Analysis

距离分析

Simple Distance Calculations

简单距离计算

python
import MDAnalysis as mda
from MDAnalysis.analysis import distances
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.analysis import distances
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Select two groups of atoms

选择两组原子

group1 = u.select_atoms('resid 10 and name CA') group2 = u.select_atoms('resid 50 and name CA')
group1 = u.select_atoms('resid 10 and name CA') group2 = u.select_atoms('resid 50 and name CA')

Calculate distance over trajectory

计算随时间变化的距离

dist_array = [] for ts in u.trajectory: # PBC-aware distance d = distances.distance_array( group1.positions, group2.positions, box=u.dimensions )[0, 0] dist_array.append(d)
dist_array = np.array(dist_array) time = np.arange(len(dist_array)) * u.trajectory.dt
dist_array = [] for ts in u.trajectory: # 支持PBC的距离计算 d = distances.distance_array( group1.positions, group2.positions, box=u.dimensions )[0, 0] dist_array.append(d)
dist_array = np.array(dist_array) time = np.arange(len(dist_array)) * u.trajectory.dt

Plot

绘图

plt.figure(figsize=(10, 6)) plt.plot(time, dist_array) plt.xlabel('Time (ps)') plt.ylabel('Distance (Å)') plt.title('Distance between residues 10 and 50') plt.grid(True) plt.savefig('distance.png', dpi=300)
print(f"Mean distance: {np.mean(dist_array):.2f} Å") print(f"Min distance: {np.min(dist_array):.2f} Å") print(f"Max distance: {np.max(dist_array):.2f} Å")
undefined
plt.figure(figsize=(10, 6)) plt.plot(time, dist_array) plt.xlabel('时间 (ps)') plt.ylabel('距离 (Å)') plt.title('残基10与50的CA原子距离随时间变化') plt.grid(True) plt.savefig('distance.png', dpi=300)
print(f"平均距离: {np.mean(dist_array):.2f} Å") print(f"最小距离: {np.min(dist_array):.2f} Å") print(f"最大距离: {np.max(dist_array):.2f} Å")
undefined

Distance Matrix

距离矩阵

python
import MDAnalysis as mda
from MDAnalysis.analysis import distances
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('protein.pdb')
python
import MDAnalysis as mda
from MDAnalysis.analysis import distances
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('protein.pdb')

Select CA atoms

选择CA原子

ca_atoms = u.select_atoms('protein and name CA')
ca_atoms = u.select_atoms('protein and name CA')

Calculate distance matrix

计算距离矩阵

dist_matrix = distances.distance_array( ca_atoms.positions, ca_atoms.positions, box=u.dimensions )
dist_matrix = distances.distance_array( ca_atoms.positions, ca_atoms.positions, box=u.dimensions )

Plot

绘图

plt.figure(figsize=(10, 8)) plt.imshow(dist_matrix, cmap='viridis', origin='lower') plt.colorbar(label='Distance (Å)') plt.xlabel('CA atom index') plt.ylabel('CA atom index') plt.title('CA Distance Matrix') plt.savefig('distance_matrix.png', dpi=300)
plt.figure(figsize=(10, 8)) plt.imshow(dist_matrix, cmap='viridis', origin='lower') plt.colorbar(label='距离 (Å)') plt.xlabel('CA原子索引') plt.ylabel('CA原子索引') plt.title('CA原子距离矩阵') plt.savefig('distance_matrix.png', dpi=300)

Find contacts (CA atoms within 8Å)

识别接触对(距离小于8Å的CA原子)

contact_threshold = 8.0 contacts = np.where((dist_matrix < contact_threshold) & (dist_matrix > 0)) n_contacts = len(contacts[0]) // 2 # Divide by 2 (symmetric matrix)
print(f"Number of CA-CA contacts within {contact_threshold} Å: {n_contacts}")
undefined
contact_threshold = 8.0 contacts = np.where((dist_matrix < contact_threshold) & (dist_matrix > 0)) n_contacts = len(contacts[0]) // 2 # 对称矩阵,除以2去重
print(f"距离 < {contact_threshold} Å 的CA-CA接触对数量: {n_contacts}")
undefined

Contact Analysis Over Time

随时间变化的接触分析

python
import MDAnalysis as mda
from MDAnalysis.analysis.contacts import Contacts
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.analysis.contacts import Contacts
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Define two groups

定义两组原子

group1 = u.select_atoms('resid 1-50 and name CA') group2 = u.select_atoms('resid 51-100 and name CA')
group1 = u.select_atoms('resid 1-50 and name CA') group2 = u.select_atoms('resid 51-100 and name CA')

Calculate contacts over trajectory

计算随时间变化的接触对

ca = Contacts( u, selection=(group1, group2), refgroup=(group1, group2), radius=8.0, # Contact cutoff in Angstroms method='hard_cut' # or 'soft_cut' for smooth cutoff )
ca.run()
ca = Contacts( u, selection=(group1, group2), refgroup=(group1, group2), radius=8.0, # 接触阈值(单位:埃) method='hard_cut' # 或'soft_cut'使用平滑阈值 )
ca.run()

Extract results

提取结果

time = ca.results.timeseries[:, 0] n_contacts = ca.results.timeseries[:, 1]
time = ca.results.timeseries[:, 0] n_contacts = ca.results.timeseries[:, 1]

Plot

绘图

plt.figure(figsize=(10, 6)) plt.plot(time, n_contacts) plt.xlabel('Time (ps)') plt.ylabel('Number of Contacts') plt.title('Inter-domain Contacts') plt.grid(True) plt.savefig('contacts.png', dpi=300)
print(f"Mean contacts: {np.mean(n_contacts):.1f}") print(f"Contact stability: {np.std(n_contacts):.1f} (lower = more stable)")
undefined
plt.figure(figsize=(10, 6)) plt.plot(time, n_contacts) plt.xlabel('时间 (ps)') plt.ylabel('接触对数量') plt.title('结构域间接触对') plt.grid(True) plt.savefig('contacts.png', dpi=300)
print(f"平均接触对数量: {np.mean(n_contacts):.1f}") print(f"接触对稳定性: {np.std(n_contacts):.1f}(值越小越稳定)")
undefined

Radius of Gyration

回转半径

python
import MDAnalysis as mda
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')

protein = u.select_atoms('protein')
python
import MDAnalysis as mda
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')

protein = u.select_atoms('protein')

Calculate radius of gyration over time

计算随时间变化的回转半径

rg_values = [] for ts in u.trajectory: rg = protein.radius_of_gyration() rg_values.append(rg)
rg_values = np.array(rg_values) time = np.arange(len(rg_values)) * u.trajectory.dt
rg_values = [] for ts in u.trajectory: rg_values.append(protein.radius_of_gyration())
rg_values = np.array(rg_values) time = np.arange(len(rg_values)) * u.trajectory.dt

Plot

绘图

plt.figure(figsize=(10, 6)) plt.plot(time, rg_values) plt.xlabel('Time (ps)') plt.ylabel('Radius of Gyration (Å)') plt.title('Protein Compactness') plt.grid(True) plt.savefig('radius_of_gyration.png', dpi=300)
print(f"Mean Rg: {np.mean(rg_values):.2f} Å") print(f"Rg fluctuation: {np.std(rg_values):.2f} Å")
plt.figure(figsize=(10, 6)) plt.plot(time, rg_values) plt.xlabel('时间 (ps)') plt.ylabel('回转半径 (Å)') plt.title('蛋白质紧凑性分析') plt.grid(True) plt.savefig('radius_of_gyration.png', dpi=300)
print(f"平均回转半径: {np.mean(rg_values):.2f} Å") print(f"回转半径波动: {np.std(rg_values):.2f} Å")

Check if protein is folding/unfolding

检查蛋白质是否发生折叠/去折叠

if np.std(rg_values) > 2.0: print("Warning: Large Rg fluctuations indicate conformational changes")
undefined
if np.std(rg_values) > 2.0: print("警告:回转半径波动大,表明存在构象变化")
undefined

Hydrogen Bond Analysis

氢键分析

Basic H-Bond Analysis

基础氢键分析

python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Setup hydrogen bond analysis

设置氢键分析

hbonds = HydrogenBondAnalysis( universe=u, donors_sel='protein', hydrogens_sel='protein', acceptors_sel='protein', d_h_cutoff=1.2, # Donor-H distance cutoff (Å) d_a_cutoff=3.0, # Donor-Acceptor distance cutoff (Å) d_h_a_angle_cutoff=150 # Angle cutoff (degrees) )
hbonds = HydrogenBondAnalysis( universe=u, donors_sel='protein', hydrogens_sel='protein', acceptors_sel='protein', d_h_cutoff=1.2, # 供体-氢距离阈值(单位:埃) d_a_cutoff=3.0, # 供体-受体距离阈值(单位:埃) d_h_a_angle_cutoff=150 # 供体-氢-受体角度阈值(单位:度) )

Run analysis

运行分析

hbonds.run(verbose=True)
hbonds.run(verbose=True)

Access results

获取结果

hbond_results = hbonds.results.hbonds
print(f"Total hydrogen bonds detected: {len(hbond_results)}")
hbond_results = hbonds.results.hbonds
print(f"检测到的氢键总数: {len(hbond_results)}")

Count H-bonds per frame

统计每帧的氢键数量

frames = hbond_results[:, 0].astype(int) unique_frames, counts = np.unique(frames, return_counts=True)
frames = hbond_results[:, 0].astype(int) unique_frames, counts = np.unique(frames, return_counts=True)

Plot

绘图

time = unique_frames * u.trajectory.dt plt.figure(figsize=(10, 6)) plt.plot(time, counts) plt.xlabel('Time (ps)') plt.ylabel('Number of H-bonds') plt.title('Intramolecular Hydrogen Bonds') plt.grid(True) plt.savefig('hbonds_time.png', dpi=300)
print(f"Mean H-bonds per frame: {np.mean(counts):.1f}") print(f"Std H-bonds per frame: {np.std(counts):.1f}")
undefined
time = unique_frames * u.trajectory.dt plt.figure(figsize=(10, 6)) plt.plot(time, counts) plt.xlabel('时间 (ps)') plt.ylabel('氢键数量') plt.title('分子内氢键') plt.grid(True) plt.savefig('hbonds_time.png', dpi=300)
print(f"每帧平均氢键数量: {np.mean(counts):.1f}") print(f"每帧氢键数量标准差: {np.std(counts):.1f}")
undefined

Persistent H-Bond Analysis

持续性氢键分析

python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
from collections import Counter

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
from collections import Counter

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Run H-bond analysis

运行氢键分析

hbonds = HydrogenBondAnalysis( universe=u, donors_sel='protein', hydrogens_sel='protein', acceptors_sel='protein' ) hbonds.run()
hbonds = HydrogenBondAnalysis( universe=u, donors_sel='protein', hydrogens_sel='protein', acceptors_sel='protein' ) hbonds.run()

Extract results

提取结果

hbond_data = hbonds.results.hbonds
hbond_data = hbonds.results.hbonds

Count H-bond occurrences

统计氢键出现次数

Each row: [frame, donor_idx, hydrogen_idx, acceptor_idx, distance, angle]

每行数据: [帧索引, 供体索引, 氢原子索引, 受体索引, 距离, 角度]

hbond_pairs = [] for row in hbond_data: donor_idx = int(row[1]) acceptor_idx = int(row[3]) hbond_pairs.append((donor_idx, acceptor_idx))
hbond_pairs = [] for row in hbond_data: donor_idx = int(row[1]) acceptor_idx = int(row[3]) hbond_pairs.append((donor_idx, acceptor_idx))

Count frequency

统计频率

hbond_counts = Counter(hbond_pairs) total_frames = len(u.trajectory)
hbond_counts = Counter(hbond_pairs) total_frames = len(u.trajectory)

Find persistent H-bonds (present in >50% of frames)

找出持续性氢键(在超过50%的帧中出现)

threshold = 0.5 * total_frames persistent_hbonds = {pair: count for pair, count in hbond_counts.items() if count > threshold}
print(f"Persistent H-bonds (>{threshold/total_frames:.0%} occupancy):") for (donor_idx, acceptor_idx), count in sorted(persistent_hbonds.items(), key=lambda x: x[1], reverse=True): donor_atom = u.atoms[donor_idx] acceptor_atom = u.atoms[acceptor_idx] occupancy = count / total_frames
print(f"  {donor_atom.resname}{donor_atom.resid}:{donor_atom.name} -> "
      f"{acceptor_atom.resname}{acceptor_atom.resid}:{acceptor_atom.name} "
      f"({occupancy:.1%} occupancy)")
undefined
threshold = 0.5 * total_frames persistent_hbonds = {pair: count for pair, count in hbond_counts.items() if count > threshold}
print(f"持续性氢键(出现占比 >{threshold/total_frames:.0%}):") for (donor_idx, acceptor_idx), count in sorted(persistent_hbonds.items(), key=lambda x: x[1], reverse=True): donor_atom = u.atoms[donor_idx] acceptor_atom = u.atoms[acceptor_idx] occupancy = count / total_frames
print(f"  {donor_atom.resname}{donor_atom.resid}:{donor_atom.name} -> "
      f"{acceptor_atom.resname}{acceptor_atom.resid}:{acceptor_atom.name} "
      f"(出现占比: {occupancy:.1%})")
undefined

Protein-Ligand H-Bonds

蛋白质-配体氢键分析

python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis

u = mda.Universe('complex.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis

u = mda.Universe('complex.pdb', 'trajectory.dcd')

Separate H-bond analyses

分别进行氢键分析

1. Ligand as donor, protein as acceptor

1. 配体作为供体,蛋白质作为受体

hbonds_lig_don = HydrogenBondAnalysis( universe=u, donors_sel='resname LIG', hydrogens_sel='resname LIG', acceptors_sel='protein' ) hbonds_lig_don.run()
hbonds_lig_don = HydrogenBondAnalysis( universe=u, donors_sel='resname LIG', hydrogens_sel='resname LIG', acceptors_sel='protein' ) hbonds_lig_don.run()

2. Protein as donor, ligand as acceptor

2. 蛋白质作为供体,配体作为受体

hbonds_prot_don = HydrogenBondAnalysis( universe=u, donors_sel='protein', hydrogens_sel='protein', acceptors_sel='resname LIG' ) hbonds_prot_don.run()
hbonds_prot_don = HydrogenBondAnalysis( universe=u, donors_sel='protein', hydrogens_sel='protein', acceptors_sel='resname LIG' ) hbonds_prot_don.run()

Combine results

合并结果

total_hbonds_per_frame = [] for frame in range(len(u.trajectory)): n_lig_don = len(hbonds_lig_don.results.hbonds[ hbonds_lig_don.results.hbonds[:, 0] == frame ]) n_prot_don = len(hbonds_prot_don.results.hbonds[ hbonds_prot_don.results.hbonds[:, 0] == frame ]) total_hbonds_per_frame.append(n_lig_don + n_prot_don)
import matplotlib.pyplot as plt import numpy as np
time = np.arange(len(total_hbonds_per_frame)) * u.trajectory.dt plt.figure(figsize=(10, 6)) plt.plot(time, total_hbonds_per_frame) plt.xlabel('Time (ps)') plt.ylabel('Number of Protein-Ligand H-bonds') plt.title('Protein-Ligand Hydrogen Bonding') plt.grid(True) plt.savefig('protein_ligand_hbonds.png', dpi=300)
print(f"Mean protein-ligand H-bonds: {np.mean(total_hbonds_per_frame):.1f}")
undefined
total_hbonds_per_frame = [] for frame in range(len(u.trajectory)): n_lig_don = len(hbonds_lig_don.results.hbonds[ hbonds_lig_don.results.hbonds[:, 0] == frame ]) n_prot_don = len(hbonds_prot_don.results.hbonds[ hbonds_prot_don.results.hbonds[:, 0] == frame ]) total_hbonds_per_frame.append(n_lig_don + n_prot_don)
import matplotlib.pyplot as plt import numpy as np
time = np.arange(len(total_hbonds_per_frame)) * u.trajectory.dt plt.figure(figsize=(10, 6)) plt.plot(time, total_hbonds_per_frame) plt.xlabel('时间 (ps)') plt.ylabel('蛋白质-配体氢键数量') plt.title('蛋白质-配体氢键') plt.grid(True) plt.savefig('protein_ligand_hbonds.png', dpi=300)
print(f"平均蛋白质-配体氢键数量: {np.mean(total_hbonds_per_frame):.1f}")
undefined

Dihedral Angle Analysis

二面角分析

Backbone Dihedrals (Ramachandran)

主链二面角(Ramachandran图)

python
import MDAnalysis as mda
from MDAnalysis.analysis.dihedrals import Ramachandran
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.analysis.dihedrals import Ramachandran
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Select residues (exclude first and last)

选择残基(排除第一个和最后一个残基)

residues = u.select_atoms('protein').residues[1:-1]
residues = u.select_atoms('protein').residues[1:-1]

Calculate Ramachandran angles

计算Ramachandran角

rama = Ramachandran(residues) rama.run()
rama = Ramachandran(residues) rama.run()

Extract phi and psi angles

提取phi和psi角

angles = rama.results.angles
angles = rama.results.angles

Plot Ramachandran plot

绘制Ramachandran图

phi = angles[:, :, 0].flatten() psi = angles[:, :, 1].flatten()
plt.figure(figsize=(8, 8)) plt.hexbin(phi, psi, gridsize=50, cmap='Blues', mincnt=1) plt.colorbar(label='Counts') plt.xlabel('Phi (degrees)') plt.ylabel('Psi (degrees)') plt.title('Ramachandran Plot') plt.xlim(-180, 180) plt.ylim(-180, 180) plt.axhline(0, color='gray', linestyle='--', alpha=0.3) plt.axvline(0, color='gray', linestyle='--', alpha=0.3) plt.savefig('ramachandran.png', dpi=300)
phi = angles[:, :, 0].flatten() psi = angles[:, :, 1].flatten()
plt.figure(figsize=(8, 8)) plt.hexbin(phi, psi, gridsize=50, cmap='Blues', mincnt=1) plt.colorbar(label='计数') plt.xlabel('Phi (度)') plt.ylabel('Psi (度)') plt.title('Ramachandran图') plt.xlim(-180, 180) plt.ylim(-180, 180) plt.axhline(0, color='gray', linestyle='--', alpha=0.3) plt.axvline(0, color='gray', linestyle='--', alpha=0.3) plt.savefig('ramachandran.png', dpi=300)

Check for outliers (non-favored regions)

检查异常值(非优势区域)

Alpha-helix region: phi ~ -60, psi ~ -45

阿尔法螺旋区域: phi ~ -60, psi ~ -45

Beta-sheet region: phi ~ -120, psi ~ 120

贝塔折叠区域: phi ~ -120, psi ~ 120

alpha_region = (np.abs(phi + 60) < 30) & (np.abs(psi + 45) < 30) beta_region = (np.abs(phi + 120) < 30) & (np.abs(psi - 120) < 30) favored = alpha_region | beta_region
print(f"Percentage in favored regions: {np.sum(favored)/len(phi)*100:.1f}%")
undefined
alpha_region = (np.abs(phi + 60) < 30) & (np.abs(psi + 45) < 30) beta_region = (np.abs(phi + 120) < 30) & (np.abs(psi - 120) < 30) favored = alpha_region | beta_region
print(f"处于优势区域的比例: {np.sum(favored)/len(phi)*100:.1f}%")
undefined

Custom Dihedral Analysis

自定义二面角分析

python
import MDAnalysis as mda
from MDAnalysis.analysis.dihedrals import Dihedral
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.analysis.dihedrals import Dihedral
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Define atoms for dihedral (4 consecutive atoms)

定义用于计算二面角的4个连续原子

Example: Chi1 angle of a specific residue

示例:特定残基的Chi1角

residue = u.select_atoms('resid 42') atoms = [ residue.select_atoms('name N')[0], residue.select_atoms('name CA')[0], residue.select_atoms('name CB')[0], residue.select_atoms('name CG')[0] ]
residue = u.select_atoms('resid 42') atoms = [ residue.select_atoms('name N')[0], residue.select_atoms('name CA')[0], residue.select_atoms('name CB')[0], residue.select_atoms('name CG')[0] ]

Calculate dihedral over trajectory

计算随时间变化的二面角

dihedral_angles = [] for ts in u.trajectory: # Get positions positions = np.array([atom.position for atom in atoms])
# Calculate dihedral
from MDAnalysis.lib.distances import calc_dihedrals
angle = calc_dihedrals(
    positions[0], positions[1], positions[2], positions[3],
    box=u.dimensions
)
dihedral_angles.append(np.degrees(angle))
dihedral_angles = np.array(dihedral_angles) time = np.arange(len(dihedral_angles)) * u.trajectory.dt
dihedral_angles = [] for ts in u.trajectory: # 获取原子位置 positions = np.array([atom.position for atom in atoms])
# 计算二面角
from MDAnalysis.lib.distances import calc_dihedrals
angle = calc_dihedrals(
    positions[0], positions[1], positions[2], positions[3],
    box=u.dimensions
)
dihedral_angles.append(np.degrees(angle))
dihedral_angles = np.array(dihedral_angles) time = np.arange(len(dihedral_angles)) * u.trajectory.dt

Plot

绘图

plt.figure(figsize=(10, 6)) plt.plot(time, dihedral_angles) plt.xlabel('Time (ps)') plt.ylabel('Dihedral Angle (degrees)') plt.title('Chi1 Angle of Residue 42') plt.ylim(-180, 180) plt.grid(True) plt.savefig('chi1_angle.png', dpi=300)
plt.figure(figsize=(10, 6)) plt.plot(time, dihedral_angles) plt.xlabel('时间 (ps)') plt.ylabel('二面角 (度)') plt.title('残基42的Chi1角') plt.ylim(-180, 180) plt.grid(True) plt.savefig('chi1_angle.png', dpi=300)

Histogram

绘制直方图

plt.figure(figsize=(8, 6)) plt.hist(dihedral_angles, bins=36, range=(-180, 180), edgecolor='black') plt.xlabel('Dihedral Angle (degrees)') plt.ylabel('Frequency') plt.title('Chi1 Angle Distribution') plt.savefig('chi1_histogram.png', dpi=300)
print(f"Mean dihedral angle: {np.mean(dihedral_angles):.1f}°") print(f"Std dihedral angle: {np.std(dihedral_angles):.1f}°")
undefined
plt.figure(figsize=(8, 6)) plt.hist(dihedral_angles, bins=36, range=(-180, 180), edgecolor='black') plt.xlabel('二面角 (度)') plt.ylabel('频率') plt.title('Chi1角分布') plt.savefig('chi1_histogram.png', dpi=300)
print(f"平均二面角: {np.mean(dihedral_angles):.1f}°") print(f"二面角标准差: {np.std(dihedral_angles):.1f}°")
undefined

Solvent Accessible Surface Area (SASA)

溶剂可及表面积(SASA)

Basic SASA Calculation

基础SASA计算

python
import MDAnalysis as mda
from MDAnalysis.analysis.sasa import SASAnalysis
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.analysis.sasa import SASAnalysis
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Calculate SASA for protein

计算蛋白质的SASA

sasa = SASAnalysis(u.select_atoms('protein')) sasa.run(verbose=True)
sasa = SASAnalysis(u.select_atoms('protein')) sasa.run(verbose=True)

Extract results

提取结果

sasa_values = sasa.results.sasa time = np.arange(len(sasa_values)) * u.trajectory.dt
sasa_values = sasa.results.sasa time = np.arange(len(sasa_values)) * u.trajectory.dt

Plot

绘图

plt.figure(figsize=(10, 6)) plt.plot(time, sasa_values) plt.xlabel('Time (ps)') plt.ylabel('SASA (Ų)') plt.title('Protein Solvent Accessible Surface Area') plt.grid(True) plt.savefig('sasa.png', dpi=300)
print(f"Mean SASA: {np.mean(sasa_values):.1f} Ų") print(f"SASA range: {np.min(sasa_values):.1f} - {np.max(sasa_values):.1f} Ų")
undefined
plt.figure(figsize=(10, 6)) plt.plot(time, sasa_values) plt.xlabel('时间 (ps)') plt.ylabel('SASA (Ų)') plt.title('蛋白质溶剂可及表面积') plt.grid(True) plt.savefig('sasa.png', dpi=300)
print(f"平均SASA: {np.mean(sasa_values):.1f} Ų") print(f"SASA范围: {np.min(sasa_values):.1f} - {np.max(sasa_values):.1f} Ų")
undefined

Per-Residue SASA

残基水平SASA

python
import MDAnalysis as mda
from MDAnalysis.analysis.sasa import SASAnalysis
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('protein.pdb')

protein = u.select_atoms('protein')
python
import MDAnalysis as mda
from MDAnalysis.analysis.sasa import SASAnalysis
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('protein.pdb')

protein = u.select_atoms('protein')

Calculate SASA for each residue

计算每个残基的SASA

residue_sasa = [] residue_ids = []
for residue in protein.residues: sasa = SASAnalysis(residue.atoms) sasa.run() residue_sasa.append(sasa.results.sasa[0]) residue_ids.append(residue.resid)
residue_sasa = np.array(residue_sasa)
residue_sasa = [] residue_ids = []
for residue in protein.residues: sasa = SASAnalysis(residue.atoms) sasa.run() residue_sasa.append(sasa.results.sasa[0]) residue_ids.append(residue.resid)
residue_sasa = np.array(residue_sasa)

Plot

绘图

plt.figure(figsize=(12, 6)) plt.bar(residue_ids, residue_sasa) plt.xlabel('Residue Number') plt.ylabel('SASA (Ų)') plt.title('Per-Residue Solvent Accessible Surface Area') plt.savefig('per_residue_sasa.png', dpi=300)
plt.figure(figsize=(12, 6)) plt.bar(residue_ids, residue_sasa) plt.xlabel('残基编号') plt.ylabel('SASA (Ų)') plt.title('残基水平溶剂可及表面积') plt.savefig('per_residue_sasa.png', dpi=300)

Identify buried vs exposed residues

识别埋藏与暴露残基

threshold = 40.0 # Ų buried = [resid for resid, sasa in zip(residue_ids, residue_sasa) if sasa < threshold] exposed = [resid for resid, sasa in zip(residue_ids, residue_sasa) if sasa >= threshold]
print(f"Buried residues (SASA < {threshold} Ų): {len(buried)}") print(f"Exposed residues (SASA >= {threshold} Ų): {len(exposed)}") print(f"Mean residue SASA: {np.mean(residue_sasa):.1f} Ų")
undefined
threshold = 40.0 # Ų buried = [resid for resid, sasa in zip(residue_ids, residue_sasa) if sasa < threshold] exposed = [resid for resid, sasa in zip(residue_ids, residue_sasa) if sasa >= threshold]
print(f"埋藏残基(SASA < {threshold} Ų): {len(buried)}") print(f"暴露残基(SASA >= {threshold} Ų): {len(exposed)}") print(f"残基平均SASA: {np.mean(residue_sasa):.1f} Ų")
undefined

Trajectory Alignment and Transformation

轨迹对齐与变换

Align Trajectory to Reference

将轨迹对齐到参考结构

python
import MDAnalysis as mda
from MDAnalysis.analysis import align
from MDAnalysis.analysis.rms import rmsd
python
import MDAnalysis as mda
from MDAnalysis.analysis import align
from MDAnalysis.analysis.rms import rmsd

Load trajectory

加载轨迹

mobile = mda.Universe('topology.pdb', 'trajectory.dcd') reference = mda.Universe('reference.pdb')
mobile = mda.Universe('topology.pdb', 'trajectory.dcd') reference = mda.Universe('reference.pdb')

Align entire trajectory to reference

将整个轨迹对齐到参考结构

alignment = align.AlignTraj( mobile, reference, select='backbone', filename='aligned_trajectory.dcd', weights='mass' ) alignment.run()
print(f"Trajectory aligned and saved to aligned_trajectory.dcd")
alignment = align.AlignTraj( mobile, reference, select='backbone', filename='aligned_trajectory.dcd', weights='mass' ) alignment.run()
print(f"轨迹已对齐并保存到aligned_trajectory.dcd")

Verify alignment - RMSD should be lower

验证对齐效果 - RMSD应降低

mobile_original = mda.Universe('topology.pdb', 'trajectory.dcd') mobile_aligned = mda.Universe('topology.pdb', 'aligned_trajectory.dcd')
ref_backbone = reference.select_atoms('backbone')
mobile_original = mda.Universe('topology.pdb', 'trajectory.dcd') mobile_aligned = mda.Universe('topology.pdb', 'aligned_trajectory.dcd')
ref_backbone = reference.select_atoms('backbone')

Calculate RMSD before and after alignment

计算对齐前后的RMSD

rmsd_before = [] rmsd_after = []
for ts_orig, ts_align in zip(mobile_original.trajectory, mobile_aligned.trajectory): orig_backbone = mobile_original.select_atoms('backbone') align_backbone = mobile_aligned.select_atoms('backbone')
rmsd_before.append(rmsd(orig_backbone.positions, ref_backbone.positions, 
                        center=True, superposition=True))
rmsd_after.append(rmsd(align_backbone.positions, ref_backbone.positions,
                       center=False, superposition=False))
import numpy as np print(f"Mean RMSD before alignment: {np.mean(rmsd_before):.2f} Å") print(f"Mean RMSD after alignment: {np.mean(rmsd_after):.2f} Å")
undefined
rmsd_before = [] rmsd_after = []
for ts_orig, ts_align in zip(mobile_original.trajectory, mobile_aligned.trajectory): orig_backbone = mobile_original.select_atoms('backbone') align_backbone = mobile_aligned.select_atoms('backbone')
rmsd_before.append(rmsd(orig_backbone.positions, ref_backbone.positions, 
                        center=True, superposition=True))
rmsd_after.append(rmsd(align_backbone.positions, ref_backbone.positions,
                       center=False, superposition=False))
import numpy as np print(f"对齐前平均RMSD: {np.mean(rmsd_before):.2f} Å") print(f"对齐后平均RMSD: {np.mean(rmsd_after):.2f} Å")
undefined

PBC Unwrapping

PBC展开

python
import MDAnalysis as mda
from MDAnalysis.transformations import unwrap

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.transformations import unwrap

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Molecules can be split across periodic boundaries

分子可能被周期性边界分割

Unwrap to make them whole

展开使分子保持完整

protein = u.select_atoms('protein')
protein = u.select_atoms('protein')

Add unwrap transformation

添加展开变换

workflow = [unwrap(protein)] u.trajectory.add_transformations(*workflow)
workflow = [unwrap(protein)] u.trajectory.add_transformations(*workflow)

Now iterate - protein will be automatically unwrapped

现在遍历轨迹 - 蛋白质会自动展开

for ts in u.trajectory[:5]: com = protein.center_of_mass() print(f"Frame {ts.frame}: Protein COM = {com}")
for ts in u.trajectory[:5]: com = protein.center_of_mass() print(f"帧 {ts.frame}: 蛋白质质心 = {com}")

Save unwrapped trajectory

保存展开后的轨迹

with mda.Writer('unwrapped.dcd', protein.n_atoms) as W: for ts in u.trajectory: W.write(protein)
print("Unwrapped trajectory saved")
undefined
with mda.Writer('unwrapped.dcd', protein.n_atoms) as W: for ts in u.trajectory: W.write(protein)
print("展开后的轨迹已保存")
undefined

On-the-Fly Transformations

实时变换

python
import MDAnalysis as mda
from MDAnalysis.transformations import (
    center_in_box,
    wrap,
    unwrap,
    fit_rot_trans
)

u = mda.Universe('topology.pdb', 'trajectory.dcd')
reference = mda.Universe('reference.pdb')

protein = u.select_atoms('protein')
ref_protein = reference.select_atoms('protein')
python
import MDAnalysis as mda
from MDAnalysis.transformations import (
    center_in_box,
    wrap,
    unwrap,
    fit_rot_trans
)

u = mda.Universe('topology.pdb', 'trajectory.dcd')
reference = mda.Universe('reference.pdb')

protein = u.select_atoms('protein')
ref_protein = reference.select_atoms('protein')

Chain multiple transformations

链式多个变换

workflow = [ unwrap(protein), # 1. Unwrap protein center_in_box(protein), # 2. Center protein in box fit_rot_trans(protein, ref_protein) # 3. Align to reference ]
u.trajectory.add_transformations(*workflow)
workflow = [ unwrap(protein), # 1. 展开蛋白质 center_in_box(protein), # 2. 将蛋白质居中到盒子 fit_rot_trans(protein, ref_protein) # 3. 对齐到参考结构 ]
u.trajectory.add_transformations(*workflow)

All transformations applied automatically when iterating

遍历轨迹时会自动应用所有变换

for ts in u.trajectory[:10]: # Protein is now unwrapped, centered, and aligned rmsd_value = mda.analysis.rms.rmsd(protein.positions, ref_protein.positions) print(f"Frame {ts.frame}: RMSD = {rmsd_value:.2f} Å")
undefined
for ts in u.trajectory[:10]: # 蛋白质现在已展开、居中并对齐 rmsd_value = mda.analysis.rms.rmsd(protein.positions, ref_protein.positions) print(f"帧 {ts.frame}: RMSD = {rmsd_value:.2f} Å")
undefined

Advanced Analysis

高级分析

Principal Component Analysis (PCA)

主成分分析(PCA)

python
import MDAnalysis as mda
from MDAnalysis.analysis import pca
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.analysis import pca
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Select atoms for PCA (usually CA atoms)

选择用于PCA的原子(通常是CA原子)

ca_atoms = u.select_atoms('protein and name CA')
ca_atoms = u.select_atoms('protein and name CA')

Perform PCA

执行PCA

pc = pca.PCA(u, select='protein and name CA', align=True) pc.run()
pc = pca.PCA(u, select='protein and name CA', align=True) pc.run()

Extract results

提取结果

n_pcs = min(10, pc.results.n_components) variance = pc.results.variance[:n_pcs] cumulative_variance = pc.results.cumulative_variance[:n_pcs]
n_pcs = min(10, pc.results.n_components) variance = pc.results.variance[:n_pcs] cumulative_variance = pc.results.cumulative_variance[:n_pcs]

Plot variance explained

绘制方差解释率

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
ax1.bar(range(1, n_pcs + 1), variance) ax1.set_xlabel('Principal Component') ax1.set_ylabel('Variance') ax1.set_title('Variance per PC')
ax2.plot(range(1, n_pcs + 1), cumulative_variance, marker='o') ax2.set_xlabel('Number of PCs') ax2.set_ylabel('Cumulative Variance') ax2.set_title('Cumulative Variance Explained') ax2.grid(True)
plt.tight_layout() plt.savefig('pca_variance.png', dpi=300)
print(f"Variance explained by first 3 PCs: {cumulative_variance[2]:.1%}")
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
ax1.bar(range(1, n_pcs + 1), variance) ax1.set_xlabel('主成分') ax1.set_ylabel('方差') ax1.set_title('各主成分的方差')
ax2.plot(range(1, n_pcs + 1), cumulative_variance, marker='o') ax2.set_xlabel('主成分数量') ax2.set_ylabel('累计方差解释率') ax2.set_title('累计方差解释率') ax2.grid(True)
plt.tight_layout() plt.savefig('pca_variance.png', dpi=300)
print(f"前3个主成分解释的方差: {cumulative_variance[2]:.1%}")

Project trajectory onto PC1 and PC2

将轨迹投影到PC1和PC2

projection = pc.transform(ca_atoms, n_components=2)
plt.figure(figsize=(8, 8)) plt.scatter(projection[:, 0], projection[:, 1], c=range(len(projection)), cmap='viridis', alpha=0.6) plt.colorbar(label='Frame') plt.xlabel('PC1') plt.ylabel('PC2') plt.title('Trajectory in PC1-PC2 Space') plt.savefig('pca_projection.png', dpi=300)
projection = pc.transform(ca_atoms, n_components=2)
plt.figure(figsize=(8, 8)) plt.scatter(projection[:, 0], projection[:, 1], c=range(len(projection)), cmap='viridis', alpha=0.6) plt.colorbar(label='帧') plt.xlabel('PC1') plt.ylabel('PC2') plt.title('轨迹在PC1-PC2空间的投影') plt.savefig('pca_projection.png', dpi=300)

Identify frames representing different conformational states

识别代表不同构象状态的帧

Cluster in PC space or select extremes

在PC空间聚类或选择极端值

pc1_min_frame = np.argmin(projection[:, 0]) pc1_max_frame = np.argmax(projection[:, 0])
print(f"PC1 minimum conformation: frame {pc1_min_frame}") print(f"PC1 maximum conformation: frame {pc1_max_frame}")
undefined
pc1_min_frame = np.argmin(projection[:, 0]) pc1_max_frame = np.argmax(projection[:, 0])
print(f"PC1最小构象对应的帧: {pc1_min_frame}") print(f"PC1最大构象对应的帧: {pc1_max_frame}")
undefined

Density Analysis

密度分析

python
import MDAnalysis as mda
from MDAnalysis.analysis import density
import numpy as np

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda
from MDAnalysis.analysis import density
import numpy as np

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Calculate water density around protein

计算蛋白质周围的水密度

water = u.select_atoms('resname WAT and name O')
water = u.select_atoms('resname WAT and name O')

Create density grid

创建密度网格

dens = density.DensityAnalysis(water, delta=1.0) # 1 Angstrom grid dens.run()
dens = density.DensityAnalysis(water, delta=1.0) # 1埃网格 dens.run()

Access density grid

获取密度网格

density_grid = dens.results.density
print(f"Density grid shape: {density_grid.grid.shape}") print(f"Mean density: {np.mean(density_grid.grid):.4f} atoms/ų") print(f"Max density: {np.max(density_grid.grid):.4f} atoms/ų")
density_grid = dens.results.density
print(f"密度网格形状: {density_grid.grid.shape}") print(f"平均密度: {np.mean(density_grid.grid):.4f} atoms/ų") print(f"最大密度: {np.max(density_grid.grid):.4f} atoms/ų")

Export for visualization (can view in VMD, PyMOL, ChimeraX)

导出用于可视化(可在VMD、PyMOL、ChimeraX中查看)

density_grid.export('water_density.dx', type='double') print("Density exported to water_density.dx")
density_grid.export('water_density.dx', type='double') print("密度已导出到water_density.dx")

Find high-density regions (binding sites)

寻找高密度区域(结合位点)

threshold = np.mean(density_grid.grid) + 2 * np.std(density_grid.grid) high_density_points = np.where(density_grid.grid > threshold)
print(f"Number of high-density voxels: {len(high_density_points[0])}")
undefined
threshold = np.mean(density_grid.grid) + 2 * np.std(density_grid.grid) high_density_points = np.where(density_grid.grid > threshold)
print(f"高密度体素数量: {len(high_density_points[0])}")
undefined

Native Contacts (Q Analysis)

天然接触(Q分析)

python
import MDAnalysis as mda
from MDAnalysis.analysis.contacts import Contacts
import numpy as np
import matplotlib.pyplot as plt
python
import MDAnalysis as mda
from MDAnalysis.analysis.contacts import Contacts
import numpy as np
import matplotlib.pyplot as plt

Load native structure and trajectory

加载天然结构和轨迹

native = mda.Universe('native.pdb') u = mda.Universe('topology.pdb', 'trajectory.dcd')
native = mda.Universe('native.pdb') u = mda.Universe('topology.pdb', 'trajectory.dcd')

Get native contacts (CA atoms within 8 Angstroms)

获取天然接触(距离小于8埃的CA原子对)

native_ca = native.select_atoms('protein and name CA') native_distances = mda.analysis.distances.distance_array( native_ca.positions, native_ca.positions )
native_ca = native.select_atoms('protein and name CA') native_distances = mda.analysis.distances.distance_array( native_ca.positions, native_ca.positions )

Define native contacts

定义天然接触

cutoff = 8.0 native_contacts = np.where((native_distances < cutoff) & (native_distances > 0)) n_native_contacts = len(native_contacts[0]) // 2
print(f"Number of native contacts: {n_native_contacts}")
cutoff = 8.0 native_contacts = np.where((native_distances < cutoff) & (native_distances > 0)) n_native_contacts = len(native_contacts[0]) // 2
print(f"天然接触对数量: {n_native_contacts}")

Calculate Q (fraction of native contacts) over trajectory

计算随时间变化的Q值(天然接触保留比例)

q_values = [] mobile_ca = u.select_atoms('protein and name CA')
for ts in u.trajectory: # Calculate current distances current_distances = mda.analysis.distances.distance_array( mobile_ca.positions, mobile_ca.positions )
# Count how many native contacts are still present
maintained_contacts = current_distances[native_contacts] < cutoff
q = np.sum(maintained_contacts) / n_native_contacts
q_values.append(q)
q_values = np.array(q_values) time = np.arange(len(q_values)) * u.trajectory.dt
q_values = [] mobile_ca = u.select_atoms('protein and name CA')
for ts in u.trajectory: # 计算当前距离 current_distances = mda.analysis.distances.distance_array( mobile_ca.positions, mobile_ca.positions )
# 统计仍保留的天然接触对数量
maintained_contacts = current_distances[native_contacts] < cutoff
q = np.sum(maintained_contacts) / n_native_contacts
q_values.append(q)
q_values = np.array(q_values) time = np.arange(len(q_values)) * u.trajectory.dt

Plot

绘图

plt.figure(figsize=(10, 6)) plt.plot(time, q_values) plt.xlabel('Time (ps)') plt.ylabel('Q (Fraction of Native Contacts)') plt.title('Native Contacts Conservation') plt.ylim(0, 1) plt.grid(True) plt.savefig('native_contacts.png', dpi=300)
print(f"Mean Q: {np.mean(q_values):.2f}") print(f"Q at final frame: {q_values[-1]:.2f}")
undefined
plt.figure(figsize=(10, 6)) plt.plot(time, q_values) plt.xlabel('时间 (ps)') plt.ylabel('Q值(天然接触保留比例)') plt.title('天然接触保留情况') plt.ylim(0, 1) plt.grid(True) plt.savefig('native_contacts.png', dpi=300)
print(f"平均Q值: {np.mean(q_values):.2f}") print(f"最后一帧Q值: {q_values[-1]:.2f}")
undefined

Integration with Other Tools

与其他工具集成

Converting to Biopython

转换为Biopython格式

python
import MDAnalysis as mda
from Bio.PDB import PDBIO
from Bio.PDB.Structure import Structure
from Bio.PDB.Model import Model
from Bio.PDB.Chain import Chain
from Bio.PDB.Residue import Residue
from Bio.PDB.Atom import Atom

def mda_to_biopython(universe, frame=0):
    """
    Convert MDAnalysis Universe to Biopython Structure.
    
    Args:
        universe: MDAnalysis Universe
        frame: Frame number to convert
    """
    # Go to specific frame
    universe.trajectory[frame]
    
    # Create Biopython structure
    structure = Structure('structure')
    model = Model(0)
    structure.add(model)
    
    # Get current segment
    for seg in universe.segments:
        chain = Chain(seg.segid if seg.segid else 'A')
        model.add(chain)
        
        for residue in seg.residues:
            # Create residue
            resname = residue.resname
            resid = residue.resid
            bio_residue = Residue((' ', resid, ' '), resname, '')
            chain.add(bio_residue)
            
            # Add atoms
            for atom in residue.atoms:
                bio_atom = Atom(
                    name=atom.name,
                    coord=atom.position,
                    bfactor=0.0,
                    occupancy=1.0,
                    altloc=' ',
                    fullname=atom.name,
                    serial_number=atom.ix,
                    element=atom.element if hasattr(atom, 'element') else atom.name[0]
                )
                bio_residue.add(bio_atom)
    
    return structure
python
import MDAnalysis as mda
from Bio.PDB import PDBIO
from Bio.PDB.Structure import Structure
from Bio.PDB.Model import Model
from Bio.PDB.Chain import Chain
from Bio.PDB.Residue import Residue
from Bio.PDB.Atom import Atom

def mda_to_biopython(universe, frame=0):
    """
    将MDAnalysis Universe转换为Biopython Structure。
    
    参数:
        universe: MDAnalysis Universe对象
        frame: 要转换的帧索引
    """
    # 跳转到指定帧
    universe.trajectory[frame]
    
    # 创建Biopython结构
    structure = Structure('structure')
    model = Model(0)
    structure.add(model)
    
    # 处理每个片段
    for seg in universe.segments:
        chain = Chain(seg.segid if seg.segid else 'A')
        model.add(chain)
        
        for residue in seg.residues:
            # 创建残基
            resname = residue.resname
            resid = residue.resid
            bio_residue = Residue((' ', resid, ' '), resname, '')
            chain.add(bio_residue)
            
            # 添加原子
            for atom in residue.atoms:
                bio_atom = Atom(
                    name=atom.name,
                    coord=atom.position,
                    bfactor=0.0,
                    occupancy=1.0,
                    altloc=' ',
                    fullname=atom.name,
                    serial_number=atom.ix,
                    element=atom.element if hasattr(atom, 'element') else atom.name[0]
                )
                bio_residue.add(bio_atom)
    
    return structure

Example usage

示例用法

u = mda.Universe('protein.pdb', 'trajectory.dcd')
u = mda.Universe('protein.pdb', 'trajectory.dcd')

Convert frame 100 to Biopython

将第100帧转换为Biopython格式

bio_structure = mda_to_biopython(u, frame=100)
bio_structure = mda_to_biopython(u, frame=100)

Save using Biopython

使用Biopython保存

io = PDBIO() io.set_structure(bio_structure) io.save('frame_100_biopython.pdb')
print("Converted to Biopython and saved")
undefined
io = PDBIO() io.set_structure(bio_structure) io.save('frame_100_biopython.pdb')
print("已转换为Biopython格式并保存")
undefined

Exporting for VMD/PyMOL

导出用于VMD/PyMOL

python
import MDAnalysis as mda

u = mda.Universe('topology.pdb', 'trajectory.dcd')
python
import MDAnalysis as mda

u = mda.Universe('topology.pdb', 'trajectory.dcd')

Select specific atoms

选择特定原子

protein = u.select_atoms('protein') important_residues = u.select_atoms('resid 10 20 30 40 50')
protein = u.select_atoms('protein') important_residues = u.select_atoms('resid 10 20 30 40 50')

Write selection to PDB

将选择结果写入PDB文件

protein.write('protein_only.pdb') important_residues.write('important_residues.pdb')
protein.write('protein_only.pdb') important_residues.write('important_residues.pdb')

Write trajectory (DCD format for VMD)

写入轨迹(VMD支持的DCD格式)

with mda.Writer('protein_trajectory.dcd', protein.n_atoms) as W: for ts in u.trajectory: W.write(protein)
with mda.Writer('protein_trajectory.dcd', protein.n_atoms) as W: for ts in u.trajectory: W.write(protein)

Write specific frames

写入特定帧

frames_to_save = [0, 50, 100, 150, 200] with mda.Writer('key_frames.pdb', protein.n_atoms) as W: for frame_idx in frames_to_save: u.trajectory[frame_idx] W.write(protein)
print("Exported files for visualization")
undefined
frames_to_save = [0, 50, 100, 150, 200] with mda.Writer('key_frames.pdb', protein.n_atoms) as W: for frame_idx in frames_to_save: u.trajectory[frame_idx] W.write(protein)
print("已导出用于可视化的文件")
undefined

Parallel Processing with Dask

使用Dask并行处理

python
import MDAnalysis as mda
from MDAnalysis.analysis.rms import RMSD
from dask import delayed, compute
import numpy as np

def calculate_rmsd_chunk(topology, trajectory_chunk, reference):
    """
    Calculate RMSD for a chunk of trajectory.
    
    Args:
        topology: Topology file
        trajectory_chunk: List of trajectory files
        reference: Reference Universe
    """
    u = mda.Universe(topology, *trajectory_chunk)
    mobile = u.select_atoms('backbone')
    ref = reference.select_atoms('backbone')
    
    R = RMSD(mobile, ref, select='backbone')
    R.run()
    
    return R.results.rmsd[:, 2]
python
import MDAnalysis as mda
from MDAnalysis.analysis.rms import RMSD
from dask import delayed, compute
import numpy as np

def calculate_rmsd_chunk(topology, trajectory_chunk, reference):
    """
    计算轨迹片段的RMSD。
    
    参数:
        topology: 拓扑文件路径
        trajectory_chunk: 轨迹文件列表
        reference: 参考Universe对象
    """
    u = mda.Universe(topology, *trajectory_chunk)
    mobile = u.select_atoms('backbone')
    ref = reference.select_atoms('backbone')
    
    R = RMSD(mobile, ref, select='backbone')
    R.run()
    
    return R.results.rmsd[:, 2]

Split trajectory into chunks

将轨迹分割为片段

topology = 'topology.pdb' trajectories = [f'traj_part{i}.dcd' for i in range(10)] reference = mda.Universe('reference.pdb')
topology = 'topology.pdb' trajectories = [f'traj_part{i}.dcd' for i in range(10)] reference = mda.Universe('reference.pdb')

Create delayed tasks

分割为多个片段

chunk_size = 2 chunks = [trajectories[i:i+chunk_size] for i in range(0, len(trajectories), chunk_size)]
tasks = [delayed(calculate_rmsd_chunk)(topology, chunk, reference) for chunk in chunks]
chunk_size = 2 chunks = [trajectories[i:i+chunk_size] for i in range(0, len(trajectories), chunk_size)]
tasks = [delayed(calculate_rmsd_chunk)(topology, chunk, reference) for chunk in chunks]

Compute in parallel

并行计算

results = compute(*tasks)
results = compute(*tasks)

Combine results

合并结果

rmsd_all = np.concatenate(results)
print(f"Total frames processed: {len(rmsd_all)}") print(f"Mean RMSD: {np.mean(rmsd_all):.2f} Å")
undefined
rmsd_all = np.concatenate(results)
print(f"处理的总帧数: {len(rmsd_all)}") print(f"平均RMSD: {np.mean(rmsd_all):.2f} Å")
undefined

Practical Workflows

实用工作流

Complete Protein Stability Analysis

完整蛋白质稳定性分析

python
import MDAnalysis as mda
from MDAnalysis.analysis import rms, align
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
import matplotlib.pyplot as plt

def analyze_protein_stability(topology, trajectory, output_prefix='analysis'):
    """
    Complete workflow for protein stability analysis.
    
    Analyzes:
    - RMSD (backbone, all-atom)
    - RMSF per residue
    - Radius of gyration
    - Hydrogen bonds
    - Secondary structure retention (if possible)
    """
    print("Loading trajectory...")
    u = mda.Universe(topology, trajectory)
    
    # Create reference (first frame)
    reference = u.copy()
    reference.trajectory[0]
    
    results = {}
    
    # 1. RMSD Analysis
    print("Calculating RMSD...")
    protein = u.select_atoms('protein')
    ref_protein = reference.select_atoms('protein')
    
    # Backbone RMSD
    rmsd_bb = rms.RMSD(protein, ref_protein, select='backbone')
    rmsd_bb.run()
    
    # All-atom RMSD
    rmsd_all = rms.RMSD(protein, ref_protein, select='protein')
    rmsd_all.run()
    
    results['rmsd_backbone'] = rmsd_bb.results.rmsd[:, 2]
    results['rmsd_allatom'] = rmsd_all.results.rmsd[:, 2]
    
    # 2. RMSF Analysis
    print("Calculating RMSF...")
    ca_atoms = u.select_atoms('protein and name CA')
    rmsf_analysis = rms.RMSF(ca_atoms)
    rmsf_analysis.run()
    
    results['rmsf'] = rmsf_analysis.results.rmsf
    results['residue_ids'] = [atom.resid for atom in ca_atoms]
    
    # 3. Radius of Gyration
    print("Calculating Radius of Gyration...")
    rg_values = []
    for ts in u.trajectory:
        rg_values.append(protein.radius_of_gyration())
    results['radius_gyration'] = np.array(rg_values)
    
    # 4. Hydrogen Bonds
    print("Analyzing hydrogen bonds...")
    hbonds = HydrogenBondAnalysis(
        universe=u,
        donors_sel='protein',
        hydrogens_sel='protein',
        acceptors_sel='protein'
    )
    hbonds.run()
    
    # Count H-bonds per frame
    hbond_counts = []
    for frame in range(len(u.trajectory)):
        n_hbonds = len(hbonds.results.hbonds[
            hbonds.results.hbonds[:, 0] == frame
        ])
        hbond_counts.append(n_hbonds)
    results['hbond_counts'] = np.array(hbond_counts)
    
    # Generate plots
    print("Generating plots...")
    time = np.arange(len(u.trajectory)) * u.trajectory.dt
    
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # RMSD
    axes[0, 0].plot(time, results['rmsd_backbone'], label='Backbone')
    axes[0, 0].plot(time, results['rmsd_allatom'], label='All-atom', alpha=0.7)
    axes[0, 0].set_xlabel('Time (ps)')
    axes[0, 0].set_ylabel('RMSD (Å)')
    axes[0, 0].set_title('RMSD over time')
    axes[0, 0].legend()
    axes[0, 0].grid(True)
    
    # RMSF
    axes[0, 1].plot(results['residue_ids'], results['rmsf'])
    axes[0, 1].set_xlabel('Residue Number')
    axes[0, 1].set_ylabel('RMSF (Å)')
    axes[0, 1].set_title('Per-residue Fluctuations')
    axes[0, 1].grid(True)
    
    # Radius of Gyration
    axes[1, 0].plot(time, results['radius_gyration'])
    axes[1, 0].set_xlabel('Time (ps)')
    axes[1, 0].set_ylabel('Rg (Å)')
    axes[1, 0].set_title('Radius of Gyration')
    axes[1, 0].grid(True)
    
    # Hydrogen Bonds
    axes[1, 1].plot(time, results['hbond_counts'])
    axes[1, 1].set_xlabel('Time (ps)')
    axes[1, 1].set_ylabel('Number of H-bonds')
    axes[1, 1].set_title('Intramolecular Hydrogen Bonds')
    axes[1, 1].grid(True)
    
    plt.tight_layout()
    plt.savefig(f'{output_prefix}_stability.png', dpi=300)
    print(f"Saved plot: {output_prefix}_stability.png")
    
    # Print summary
    print("\n" + "="*60)
    print("STABILITY ANALYSIS SUMMARY")
    print("="*60)
    print(f"Trajectory length: {len(u.trajectory)} frames ({time[-1]:.1f} ps)")
    print(f"\nRMSD (backbone):")
    print(f"  Mean: {np.mean(results['rmsd_backbone']):.2f} Å")
    print(f"  Max: {np.max(results['rmsd_backbone']):.2f} Å")
    print(f"  Final: {results['rmsd_backbone'][-1]:.2f} Å")
    print(f"\nRMSF:")
    print(f"  Mean: {np.mean(results['rmsf']):.2f} Å")
    print(f"  Most flexible residue: {results['residue_ids'][np.argmax(results['rmsf'])]}")
    print(f"\nRadius of Gyration:")
    print(f"  Mean: {np.mean(results['radius_gyration']):.2f} Å")
    print(f"  Std: {np.std(results['radius_gyration']):.2f} Å")
    print(f"\nHydrogen Bonds:")
    print(f"  Mean: {np.mean(results['hbond_counts']):.1f}")
    print(f"  Range: {np.min(results['hbond_counts'])}-{np.max(results['hbond_counts'])}")
    
    # Stability assessment
    print("\n" + "="*60)
    print("STABILITY ASSESSMENT")
    print("="*60)
    
    if np.mean(results['rmsd_backbone']) < 2.0:
        print("✓ Structure is STABLE (low RMSD)")
    elif np.mean(results['rmsd_backbone']) < 4.0:
        print("⚠ Structure shows MODERATE fluctuations")
    else:
        print("✗ Structure is UNSTABLE (high RMSD)")
    
    if np.std(results['radius_gyration']) < 1.0:
        print("✓ Protein maintains COMPACT structure")
    else:
        print("⚠ Protein shows conformational EXPANSION/CONTRACTION")
    
    return results
python
import MDAnalysis as mda
from MDAnalysis.analysis import rms, align
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
import matplotlib.pyplot as plt

def analyze_protein_stability(topology, trajectory, output_prefix='analysis'):
    """
    完整的蛋白质稳定性分析工作流。
    
    分析内容:
    - RMSD(主链、全原子)
    - 残基水平RMSF
    - 回转半径
    - 氢键
    - 二级结构保留(如果支持)
    """
    print("加载轨迹...")
    u = mda.Universe(topology, trajectory)
    
    # 创建参考结构(第一帧)
    reference = u.copy()
    reference.trajectory[0]
    
    results = {}
    
    # 1. RMSD分析
    print("计算RMSD...")
    protein = u.select_atoms('protein')
    ref_protein = reference.select_atoms('protein')
    
    # 主链RMSD
    rmsd_bb = rms.RMSD(protein, ref_protein, select='backbone')
    rmsd_bb.run()
    
    # 全原子RMSD
    rmsd_all = rms.RMSD(protein, ref_protein, select='protein')
    rmsd_all.run()
    
    results['rmsd_backbone'] = rmsd_bb.results.rmsd[:, 2]
    results['rmsd_allatom'] = rmsd_all.results.rmsd[:, 2]
    
    # 2. RMSF分析
    print("计算RMSF...")
    ca_atoms = u.select_atoms('protein and name CA')
    rmsf_analysis = rms.RMSF(ca_atoms)
    rmsf_analysis.run()
    
    results['rmsf'] = rmsf_analysis.results.rmsf
    results['residue_ids'] = [atom.resid for atom in ca_atoms]
    
    # 3. 回转半径
    print("计算回转半径...")
    rg_values = []
    for ts in u.trajectory:
        rg_values.append(protein.radius_of_gyration())
    results['radius_gyration'] = np.array(rg_values)
    
    # 4. 氢键分析
    print("分析氢键...")
    hbonds = HydrogenBondAnalysis(
        universe=u,
        donors_sel='protein',
        hydrogens_sel='protein',
        acceptors_sel='protein'
    )
    hbonds.run()
    
    # 统计每帧的氢键数量
    hbond_counts = []
    for frame in range(len(u.trajectory)):
        n_hbonds = len(hbonds.results.hbonds[
            hbonds.results.hbonds[:, 0] == frame
        ])
        hbond_counts.append(n_hbonds)
    results['hbond_counts'] = np.array(hbond_counts)
    
    # 生成绘图
    print("生成绘图...")
    time = np.arange(len(u.trajectory)) * u.trajectory.dt
    
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # RMSD图
    axes[0, 0].plot(time, results['rmsd_backbone'], label='主链')
    axes[0, 0].plot(time, results['rmsd_allatom'], label='全原子', alpha=0.7)
    axes[0, 0].set_xlabel('时间 (ps)')
    axes[0, 0].set_ylabel('RMSD (Å)')
    axes[0, 0].set_title('RMSD随时间变化')
    axes[0, 0].legend()
    axes[0, 0].grid(True)
    
    # RMSF图
    axes[0, 1].plot(results['residue_ids'], results['rmsf'])
    axes[0, 1].set_xlabel('残基编号')
    axes[0, 1].set_ylabel('RMSF (Å)')
    axes[0, 1].set_title('残基水平波动')
    axes[0, 1].grid(True)
    
    # 回转半径图
    axes[1, 0].plot(time, results['radius_gyration'])
    axes[1, 0].set_xlabel('时间 (ps)')
    axes[1, 0].set_ylabel('Rg (Å)')
    axes[1, 0].set_title('回转半径')
    axes[1, 0].grid(True)
    
    # 氢键图
    axes[1, 1].plot(time, results['hbond_counts'])
    axes[1, 1].set_xlabel('时间 (ps)')
    axes[1, 1].set_ylabel('氢键数量')
    axes[1, 1].set_title('分子内氢键')
    axes[1, 1].grid(True)
    
    plt.tight_layout()
    plt.savefig(f'{output_prefix}_stability.png', dpi=300)
    print(f"绘图已保存: {output_prefix}_stability.png")
    
    # 打印总结
    print("\n" + "="*60)
    print("稳定性分析总结")
    print("="*60)
    print(f"轨迹长度: {len(u.trajectory)} 帧({time[-1]:.1f} ps)")
    print(f"\nRMSD(主链):")
    print(f"  平均值: {np.mean(results['rmsd_backbone']):.2f} Å")
    print(f"  最大值: {np.max(results['rmsd_backbone']):.2f} Å")
    print(f"  最后一帧值: {results['rmsd_backbone'][-1]:.2f} Å")
    print(f"\nRMSF:")
    print(f"  平均值: {np.mean(results['rmsf']):.2f} Å")
    print(f"  最柔性残基: {results['residue_ids'][np.argmax(results['rmsf'])]}")
    print(f"\n回转半径:")
    print(f"  平均值: {np.mean(results['radius_gyration']):.2f} Å")
    print(f"  标准差: {np.std(results['radius_gyration']):.2f} Å")
    print(f"\n氢键:")
    print(f"  平均值: {np.mean(results['hbond_counts']):.1f}")
    print(f"  范围: {np.min(results['hbond_counts'])}-{np.max(results['hbond_counts'])}")
    
    # 稳定性评估
    print("\n" + "="*60)
    print("稳定性评估")
    print("="*60)
    
    if np.mean(results['rmsd_backbone']) < 2.0:
        print("✓ 结构稳定(RMSD低)")
    elif np.mean(results['rmsd_backbone']) < 4.0:
        print("⚠ 结构存在中等波动")
    else:
        print("✗ 结构不稳定(RMSD高)")
    
    if np.std(results['radius_gyration']) < 1.0:
        print("✓ 蛋白质保持紧凑结构")
    else:
        print("⚠ 蛋白质存在构象扩张/收缩")
    
    return results

Example usage

示例用法

results = analyze_protein_stability( 'topology.pdb', 'trajectory.dcd', output_prefix='protein' )
undefined
results = analyze_protein_stability( 'topology.pdb', 'trajectory.dcd', output_prefix='protein' )
undefined

Protein-Ligand Binding Analysis

蛋白质-配体结合分析

python
import MDAnalysis as mda
from MDAnalysis.analysis import distances, contacts
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
import matplotlib.pyplot as plt

def analyze_protein_ligand_binding(topology, trajectory, 
                                   ligand_resname='LIG',
                                   binding_site_resids=None):
    """
    Comprehensive protein-ligand binding analysis.
    
    Args:
        topology: Topology file
        trajectory: Trajectory file
        ligand_resname: Residue name of ligand
        binding_site_resids: List of binding site residue IDs
    """
    print("Loading complex...")
    u = mda.Universe(topology, trajectory)
    
    protein = u.select_atoms('protein')
    ligand = u.select_atoms(f'resname {ligand_resname}')
    
    if len(ligand) == 0:
        raise ValueError(f"No ligand found with resname {ligand_resname}")
    
    print(f"Found ligand with {len(ligand)} atoms")
    
    results = {}
    
    # 1. Ligand RMSD
    print("Calculating ligand RMSD...")
    from MDAnalysis.analysis.rms import RMSD
    ligand_rmsd = RMSD(ligand, ligand, select=f'resname {ligand_resname}',
                       ref_frame=0)
    ligand_rmsd.run()
    results['ligand_rmsd'] = ligand_rmsd.results.rmsd[:, 2]
    
    # 2. Protein-Ligand Distance
    print("Calculating protein-ligand distance...")
    if binding_site_resids:
        binding_site = u.select_atoms(f'protein and resid {" ".join(map(str, binding_site_resids))}')
    else:
        binding_site = protein
    
    min_distances = []
    for ts in u.trajectory:
        dist_array = distances.distance_array(
            ligand.center_of_mass()[np.newaxis, :],
            binding_site.positions,
            box=u.dimensions
        )
        min_distances.append(np.min(dist_array))
    results['min_distance'] = np.array(min_distances)
    
    # 3. Protein-Ligand Contacts
    print("Analyzing contacts...")
    ca = contacts.Contacts(
        u,
        selection=(ligand, protein),
        refgroup=(ligand, protein),
        radius=4.5
    )
    ca.run()
    results['contacts'] = ca.results.timeseries[:, 1]
    
    # 4. Protein-Ligand H-bonds
    print("Analyzing hydrogen bonds...")
    # Ligand as donor
    hb_lig_don = HydrogenBondAnalysis(
        universe=u,
        donors_sel=f'resname {ligand_resname}',
        hydrogens_sel=f'resname {ligand_resname}',
        acceptors_sel='protein'
    )
    hb_lig_don.run()
    
    # Protein as donor
    hb_prot_don = HydrogenBondAnalysis(
        universe=u,
        donors_sel='protein',
        hydrogens_sel='protein',
        acceptors_sel=f'resname {ligand_resname}'
    )
    hb_prot_don.run()
    
    # Combine
    hbond_counts = []
    for frame in range(len(u.trajectory)):
        n1 = len(hb_lig_don.results.hbonds[hb_lig_don.results.hbonds[:, 0] == frame])
        n2 = len(hb_prot_don.results.hbonds[hb_prot_don.results.hbonds[:, 0] == frame])
        hbond_counts.append(n1 + n2)
    results['hbonds'] = np.array(hbond_counts)
    
    # Plot results
    print("Generating plots...")
    time = np.arange(len(u.trajectory)) * u.trajectory.dt
    
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Ligand RMSD
    axes[0, 0].plot(time, results['ligand_rmsd'])
    axes[0, 0].set_xlabel('Time (ps)')
    axes[0, 0].set_ylabel('Ligand RMSD (Å)')
    axes[0, 0].set_title('Ligand Stability')
    axes[0, 0].grid(True)
    
    # Distance to binding site
    axes[0, 1].plot(time, results['min_distance'])
    axes[0, 1].set_xlabel('Time (ps)')
    axes[0, 1].set_ylabel('Min Distance (Å)')
    axes[0, 1].set_title('Ligand-Protein Proximity')
    axes[0, 1].grid(True)
    
    # Contacts
    axes[1, 0].plot(time, results['contacts'])
    axes[1, 0].set_xlabel('Time (ps)')
    axes[1, 0].set_ylabel('Number of Contacts')
    axes[1, 0].set_title('Protein-Ligand Contacts (< 4.5 Å)')
    axes[1, 0].grid(True)
    
    # H-bonds
    axes[1, 1].plot(time, results['hbonds'])
    axes[1, 1].set_xlabel('Time (ps)')
    axes[1, 1].set_ylabel('Number of H-bonds')
    axes[1, 1].set_title('Protein-Ligand Hydrogen Bonds')
    axes[1, 1].grid(True)
    
    plt.tight_layout()
    plt.savefig('protein_ligand_analysis.png', dpi=300)
    
    # Summary
    print("\n" + "="*60)
    print("PROTEIN-LIGAND BINDING SUMMARY")
    print("="*60)
    print(f"\nLigand RMSD:")
    print(f"  Mean: {np.mean(results['ligand_rmsd']):.2f} Å")
    print(f"  Max: {np.max(results['ligand_rmsd']):.2f} Å")
    print(f"\nLigand-Protein Distance:")
    print(f"  Mean: {np.mean(results['min_distance']):.2f} Å")
    print(f"  Min: {np.min(results['min_distance']):.2f} Å")
    print(f"\nContacts:")
    print(f"  Mean: {np.mean(results['contacts']):.1f}")
    print(f"  Range: {np.min(results['contacts'])}-{np.max(results['contacts'])}")
    print(f"\nHydrogen Bonds:")
    print(f"  Mean: {np.mean(results['hbonds']):.1f}")
    print(f"  Max: {np.max(results['hbonds'])}")
    
    # Binding assessment
    print("\n" + "="*60)
    print("BINDING ASSESSMENT")
    print("="*60)
    
    if np.mean(results['ligand_rmsd']) < 2.0:
        print("✓ Ligand remains STABLE in binding site")
    else:
        print("⚠ Ligand shows SIGNIFICANT movement")
    
    if np.mean(results['contacts']) > 5:
        print("✓ STRONG protein-ligand interactions")
    elif np.mean(results['contacts']) > 2:
        print("⚠ MODERATE protein-ligand interactions")
    else:
        print("✗ WEAK protein-ligand interactions")
    
    if np.mean(results['hbonds']) >= 2:
        print(f"✓ Ligand forms stable H-bonds ({np.mean(results['hbonds']):.1f} average)")
    elif np.mean(results['hbonds']) >= 1:
        print(f"⚠ Limited H-bonding ({np.mean(results['hbonds']):.1f} average)")
    else:
        print("✗ Minimal H-bonding interaction")
    
    return results
python
import MDAnalysis as mda
from MDAnalysis.analysis import distances, contacts
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
import matplotlib.pyplot as plt

def analyze_protein_ligand_binding(topology, trajectory, 
                                   ligand_resname='LIG',
                                   binding_site_resids=None):
    """
    完整的蛋白质-配体结合分析工作流。
    
    参数:
        topology: 拓扑文件路径
        trajectory: 轨迹文件路径
        ligand_resname: 配体的残基名称
        binding_site_resids: 结合位点残基ID列表
    """
    print("加载复合物...")
    u = mda.Universe(topology, trajectory)
    
    protein = u.select_atoms('protein')
    ligand = u.select_atoms(f'resname {ligand_resname}')
    
    if len(ligand) == 0:
        raise ValueError(f"未找到残基名称为 {ligand_resname} 的配体")
    
    print(f"找到配体,包含 {len(ligand)} 个原子")
    
    results = {}
    
    # 1. 配体RMSD
    print("计算配体RMSD...")
    from MDAnalysis.analysis.rms import RMSD
    ligand_rmsd = RMSD(ligand, ligand, select=f'resname {ligand_resname}',
                       ref_frame=0)
    ligand_rmsd.run()
    results['ligand_rmsd'] = ligand_rmsd.results.rmsd[:, 2]
    
    # 2. 蛋白质-配体距离
    print("计算蛋白质-配体距离...")
    if binding_site_resids:
        binding_site = u.select_atoms(f'protein and resid {" ".join(map(str, binding_site_resids))}')
    else:
        binding_site = protein
    
    min_distances = []
    for ts in u.trajectory:
        dist_array = distances.distance_array(
            ligand.center_of_mass()[np.newaxis, :],
            binding_site.positions,
            box=u.dimensions
        )
        min_distances.append(np.min(dist_array))
    results['min_distance'] = np.array(min_distances)
    
    # 3. 蛋白质-配体接触对
    print("分析接触对...")
    ca = contacts.Contacts(
        u,
        selection=(ligand, protein),
        refgroup=(ligand, protein),
        radius=4.5
    )
    ca.run()
    results['contacts'] = ca.results.timeseries[:, 1]
    
    # 4. 蛋白质-配体氢键
    print("分析氢键...")
    # 配体作为供体
    hb_lig_don = HydrogenBondAnalysis(
        universe=u,
        donors_sel=f'resname {ligand_resname}',
        hydrogens_sel=f'resname {ligand_resname}',
        acceptors_sel='protein'
    )
    hb_lig_don.run()
    
    # 蛋白质作为供体
    hb_prot_don = HydrogenBondAnalysis(
        universe=u,
        donors_sel='protein',
        hydrogens_sel='protein',
        acceptors_sel=f'resname {ligand_resname}'
    )
    hb_prot_don.run()
    
    # 合并结果
    hbond_counts = []
    for frame in range(len(u.trajectory)):
        n1 = len(hb_lig_don.results.hbonds[hb_lig_don.results.hbonds[:, 0] == frame])
        n2 = len(hb_prot_don.results.hbonds[hb_prot_don.results.hbonds[:, 0] == frame])
        hbond_counts.append(n1 + n2)
    results['hbonds'] = np.array(hbond_counts)
    
    # 生成绘图
    print("生成绘图...")
    time = np.arange(len(u.trajectory)) * u.trajectory.dt
    
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # 配体RMSD图
    axes[0, 0].plot(time, results['ligand_rmsd'])
    axes[0, 0].set_xlabel('时间 (ps)')
    axes[0, 0].set_ylabel('配体RMSD (Å)')
    axes[0, 0].set_title('配体稳定性')
    axes[0, 0].grid(True)
    
    # 配体-蛋白质距离图
    axes[0, 1].plot(time, results['min_distance'])
    axes[0, 1].set_xlabel('时间 (ps)')
    axes[0, 1].set_ylabel('最小距离 (Å)')
    axes[0, 1].set_title('配体-蛋白质接近程度')
    axes[0, 1].grid(True)
    
    # 接触对图
    axes[1, 0].plot(time, results['contacts'])
    axes[1, 0].set_xlabel('时间 (ps)')
    axes[1, 0].set_ylabel('接触对数量')
    axes[1, 0].set_title('蛋白质-配体接触对(< 4.5 Å)')
    axes[1, 0].grid(True)
    
    # 氢键图
    axes[1, 1].plot(time, results['hbonds'])
    axes[1, 1].set_xlabel('时间 (ps)')
    axes[1, 1].set_ylabel('氢键数量')
    axes[1, 1].set_title('蛋白质-配体氢键')
    axes[1, 1].grid(True)
    
    plt.tight_layout()
    plt.savefig('protein_ligand_analysis.png', dpi=300)
    
    # 打印总结
    print("\n" + "="*60)
    print("蛋白质-配体结合分析总结")
    print("="*60)
    print(f"\n配体RMSD:")
    print(f"  平均值: {np.mean(results['ligand_rmsd']):.2f} Å")
    print(f"  最大值: {np.max(results['ligand_rmsd']):.2f} Å")
    print(f"\n配体-蛋白质距离:")
    print(f"  平均值: {np.mean(results['min_distance']):.2f} Å")
    print(f"  最小值: {np.min(results['min_distance']):.2f} Å")
    print(f"\n接触对:")
    print(f"  平均值: {np.mean(results['contacts']):.1f}")
    print(f"  范围: {np.min(results['contacts'])}-{np.max(results['contacts'])}")
    print(f"\n氢键:")
    print(f"  平均值: {np.mean(results['hbonds']):.1f}")
    print(f"  最大值: {np.max(results['hbonds'])}")
    
    # 结合评估
    print("\n" + "="*60)
    print("结合评估")
    print("="*60)
    
    if np.mean(results['ligand_rmsd']) < 2.0:
        print("✓ 配体在结合位点保持稳定")
    else:
        print("⚠ 配体存在显著移动")
    
    if np.mean(results['contacts']) > 5:
        print("✓ 蛋白质-配体相互作用强")
    elif np.mean(results['contacts']) > 2:
        print("⚠ 蛋白质-配体相互作用中等")
    else:
        print("✗ 蛋白质-配体相互作用弱")
    
    if np.mean(results['hbonds']) >= 2:
        print(f"✓ 配体形成稳定氢键(平均 {np.mean(results['hbonds']):.1f} 个)")
    elif np.mean(results['hbonds']) >= 1:
        print(f"⚠ 氢键形成有限(平均 {np.mean(results['hbonds']):.1f} 个)")
    else:
        print("✗ 氢键相互作用极少")
    
    return results

Example usage

示例用法

results = analyze_protein_ligand_binding( 'complex.pdb', 'trajectory.dcd', ligand_resname='LIG', binding_site_resids=[45, 46, 89, 92, 115, 118] )

This comprehensive MDAnalysis guide covers 50+ examples for molecular dynamics trajectory analysis!
results = analyze_protein_ligand_binding( 'complex.pdb', 'trajectory.dcd', ligand_resname='LIG', binding_site_resids=[45, 46, 89, 92, 115, 118] )

这份全面的MDAnalysis指南涵盖了50多个分子动力学轨迹分析示例!