mdanalysis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMDAnalysis - Molecular Dynamics Analysis
MDAnalysis - 分子动力学分析
Python library for reading, writing, and analyzing molecular dynamics trajectories and structural files.
用于读取、写入和分析分子动力学轨迹与结构文件的Python库。
When to Use
适用场景
- Loading MD trajectories (DCD, XTC, TRR, NetCDF, etc.)
- RMSD and RMSF calculations
- Distance, angle, and dihedral analysis
- Atom selections (VMD-like syntax)
- Hydrogen bond analysis
- Solvent Accessible Surface Area (SASA)
- Protein secondary structure analysis
- Membrane system analysis
- Water/ion distribution analysis
- Trajectory alignment and fitting
- Custom trajectory analysis
- Converting between file formats
- 加载MD轨迹(DCD、XTC、TRR、NetCDF等格式)
- RMSD与RMSF计算
- 距离、角度与二面角分析
- 原子选择(类VMD语法)
- 氢键分析
- 溶剂可及表面积(SASA)计算
- 蛋白质二级结构分析
- 膜系统分析
- 水/离子分布分析
- 轨迹对齐与拟合
- 自定义轨迹分析
- 文件格式转换
Reference Documentation
参考文档
Official docs: https://www.mdanalysis.org/docs/
Search patterns:, ,
Search patterns:
MDAnalysis.UniverseMDAnalysis.analysis.rmsMDAnalysis.analysis.distances官方文档:https://www.mdanalysis.org/docs/
搜索关键词:、、
搜索关键词:
MDAnalysis.UniverseMDAnalysis.analysis.rmsMDAnalysis.analysis.distancesCore Principles
核心原则
Use MDAnalysis For
使用MDAnalysis的场景
| Task | Module | Example |
|---|---|---|
| Load trajectory | | |
| RMSD calculation | | |
| Atom selection | | |
| Distance analysis | | |
| H-bond analysis | | |
| SASA calculation | | |
| Contacts analysis | | |
| Trajectory writing | | |
| 任务 | 模块 | 示例 |
|---|---|---|
| 加载轨迹 | | |
| 计算RMSD | | |
| 原子选择 | | |
| 距离分析 | | |
| 氢键分析 | | |
| SASA计算 | | |
| 接触分析 | | |
| 轨迹写入 | | |
Do NOT Use For
不适合使用MDAnalysis的场景
- Running MD simulations (use GROMACS, AMBER, NAMD)
- Force field calculations (use OpenMM, MDTraj)
- Quantum chemistry (use PySCF, Qiskit)
- Protein structure prediction (use AlphaFold, RosettaFold)
- Initial structure building (use Biopython, PyMOL)
- 运行MD模拟(使用GROMACS、AMBER、NAMD)
- 力场计算(使用OpenMM、MDTraj)
- 量子化学计算(使用PySCF、Qiskit)
- 蛋白质结构预测(使用AlphaFold、RosettaFold)
- 初始结构构建(使用Biopython、PyMOL)
Quick Reference
快速参考
Installation
安装
bash
undefinedbash
undefinedpip
pip安装
pip install MDAnalysis
pip install MDAnalysis
With additional analysis modules
安装包含额外分析模块的版本
pip install MDAnalysis[analysis]
pip install MDAnalysis[analysis]
conda
conda安装
conda install -c conda-forge mdanalysis
conda install -c conda-forge mdanalysis
Development version
安装开发版本
pip install git+https://github.com/MDAnalysis/mdanalysis.git
undefinedpip install git+https://github.com/MDAnalysis/mdanalysis.git
undefinedStandard Imports
标准导入
python
undefinedpython
undefinedCore imports
核心模块导入
import MDAnalysis as mda
from MDAnalysis import Universe
from MDAnalysis.analysis import rms, align, distances
import MDAnalysis as mda
from MDAnalysis import Universe
from MDAnalysis.analysis import rms, align, distances
Common analysis modules
常用分析模块
from MDAnalysis.analysis.rms import RMSD, RMSF
from MDAnalysis.analysis.distances import distance_array
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
from MDAnalysis.analysis.dihedrals import Dihedral
from MDAnalysis.analysis.rms import RMSD, RMSF
from MDAnalysis.analysis.distances import distance_array
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
from MDAnalysis.analysis.dihedrals import Dihedral
Utilities
工具模块
import numpy as np
import matplotlib.pyplot as plt
undefinedimport numpy as np
import matplotlib.pyplot as plt
undefinedBasic Pattern - Load and Analyze
基础模式 - 加载与分析
python
import MDAnalysis as mda
from MDAnalysis.analysis.rms import RMSDpython
import MDAnalysis as mda
from MDAnalysis.analysis.rms import RMSDLoad trajectory
加载轨迹
u = mda.Universe('topology.pdb', 'trajectory.dcd')
u = mda.Universe('topology.pdb', 'trajectory.dcd')
Select atoms
选择原子
protein = u.select_atoms('protein')
ca_atoms = u.select_atoms('protein and name CA')
protein = u.select_atoms('protein')
ca_atoms = u.select_atoms('protein and name CA')
Calculate RMSD
计算RMSD
rmsd_analysis = RMSD(protein, protein, select='backbone')
rmsd_analysis.run()
rmsd_analysis = RMSD(protein, protein, select='backbone')
rmsd_analysis.run()
Access results
获取结果
rmsd = rmsd_analysis.results.rmsd
print(f"RMSD over time: {rmsd[:, 2]}") # Column 2 is RMSD
undefinedrmsd = rmsd_analysis.results.rmsd
print(f"随时间变化的RMSD: {rmsd[:, 2]}") # 第2列为RMSD数值
undefinedBasic Pattern - Atom Selection
基础模式 - 原子选择
python
import MDAnalysis as mda
u = mda.Universe('structure.pdb')python
import MDAnalysis as mda
u = mda.Universe('structure.pdb')Various selections (VMD-like syntax)
多种选择方式(类VMD语法)
protein = u.select_atoms('protein')
backbone = u.select_atoms('backbone')
ca = u.select_atoms('name CA')
resid_10 = u.select_atoms('resid 10')
within_5A = u.select_atoms('around 5 resid 10')
water = u.select_atoms('resname WAT or resname HOH')
print(f"Number of protein atoms: {len(protein)}")
print(f"Number of CA atoms: {len(ca)}")
undefinedprotein = u.select_atoms('protein')
backbone = u.select_atoms('backbone')
ca = u.select_atoms('name CA')
resid_10 = u.select_atoms('resid 10')
within_5A = u.select_atoms('around 5 resid 10')
water = u.select_atoms('resname WAT or resname HOH')
print(f"蛋白质原子数量: {len(protein)}")
print(f"CA原子数量: {len(ca)}")
undefinedCritical Rules
关键规则
✅ DO
✅ 建议做法
- Close trajectory files - Use context managers or close explicitly
- Use atom selections efficiently - Cache selections for reuse
- Check trajectory length - Verify n_frames before analysis
- Use vectorized operations - Leverage NumPy for speed
- Align trajectories - Align before RMSD calculations
- Handle periodic boundaries - Use PBC-aware distance calculations
- Validate atom groups - Check empty selections
- Use appropriate frames - Slice trajectories if needed
- Save intermediate results - Don't recompute expensive calculations
- Check units - MDAnalysis uses Angstroms and picoseconds
- 关闭轨迹文件 - 使用上下文管理器或显式关闭
- 高效使用原子选择 - 缓存选择结果以便复用
- 检查轨迹长度 - 分析前验证帧数
- 使用向量化操作 - 借助NumPy提升速度
- 对齐轨迹 - 计算RMSD前先对齐轨迹
- 处理周期性边界 - 使用支持PBC的距离计算方法
- 验证原子组 - 检查选择结果是否为空
- 使用合适的帧 - 必要时对轨迹进行切片
- 保存中间结果 - 避免重复计算耗时操作
- 检查单位 - MDAnalysis使用埃(Angstroms)和皮秒(picoseconds)作为单位
❌ DON'T
❌ 禁止做法
- Load entire trajectory in memory - Stream through frames
- Ignore PBC - Always consider periodic boundary conditions
- Forget to align - RMSD without alignment is meaningless
- Use wrong atom names - Check topology for correct names
- Mix coordinate systems - Be consistent with units
- Ignore missing atoms - Handle incomplete residues
- Recompute unnecessarily - Cache expensive calculations
- Use string selections in loops - Parse once, reuse
- Forget to unwrap coordinates - Handle molecules split by PBC
- Ignore memory limits - Process large trajectories in chunks
- 将整个轨迹加载到内存 - 逐帧流式处理
- 忽略PBC - 始终考虑周期性边界条件
- 忘记对齐轨迹 - 未对齐的RMSD计算无意义
- 使用错误的原子名称 - 检查拓扑文件确认正确名称
- 混合坐标系 - 保持单位一致性
- 忽略缺失原子 - 处理不完整的残基
- 不必要的重复计算 - 缓存耗时计算结果
- 在循环中使用字符串选择 - 解析一次后复用
- 忘记展开坐标 - 处理被PBC分割的分子
- 忽略内存限制 - 分块处理大型轨迹
Anti-Patterns (NEVER)
反模式(绝对避免)
python
import MDAnalysis as mda
import numpy as nppython
import MDAnalysis as mda
import numpy as np❌ BAD: Loading entire trajectory in memory
❌ 错误:将整个轨迹加载到内存
u = mda.Universe('top.pdb', 'traj.dcd')
all_coords = []
for ts in u.trajectory:
all_coords.append(u.atoms.positions.copy())
all_coords = np.array(all_coords) # Huge memory usage!
u = mda.Universe('top.pdb', 'traj.dcd')
all_coords = []
for ts in u.trajectory:
all_coords.append(u.atoms.positions.copy())
all_coords = np.array(all_coords) # 内存占用极大!
✅ GOOD: Process frame by frame
✅ 正确:逐帧处理
u = mda.Universe('top.pdb', 'traj.dcd')
for ts in u.trajectory:
# Process current frame
coords = u.atoms.positions
# Do analysis...
# Move to next frame automatically
u = mda.Universe('top.pdb', 'traj.dcd')
for ts in u.trajectory:
# 处理当前帧
coords = u.atoms.positions
# 执行分析...
# 自动跳转到下一帧
❌ BAD: RMSD without alignment
❌ 错误:未对齐轨迹就计算RMSD
rmsd_values = []
for ts in u.trajectory:
rmsd = rms.rmsd(mobile.positions, reference.positions)
rmsd_values.append(rmsd) # Wrong! Not aligned!
rmsd_values = []
for ts in u.trajectory:
rmsd = rms.rmsd(mobile.positions, reference.positions)
rmsd_values.append(rmsd) # 错误!未对齐!
✅ GOOD: Align before RMSD
✅ 正确:先对齐再计算RMSD
from MDAnalysis.analysis.rms import RMSD
R = RMSD(mobile, reference, select='backbone')
R.run()
rmsd_values = R.results.rmsd[:, 2]
from MDAnalysis.analysis.rms import RMSD
R = RMSD(mobile, reference, select='backbone')
R.run()
rmsd_values = R.results.rmsd[:, 2]
❌ BAD: Creating selection in loop
❌ 错误:在循环中创建选择
for ts in u.trajectory:
ca = u.select_atoms('name CA') # Parsed every frame!
# Do something with ca
for ts in u.trajectory:
ca = u.select_atoms('name CA') # 每帧都重新解析!
# 使用ca执行操作
✅ GOOD: Create selection once
✅ 正确:仅创建一次选择
ca = u.select_atoms('name CA')
for ts in u.trajectory:
# Use ca (automatically updated each frame)
positions = ca.positions
ca = u.select_atoms('name CA')
for ts in u.trajectory:
# 使用ca(每帧自动更新)
positions = ca.positions
❌ BAD: Ignoring periodic boundaries
❌ 错误:忽略周期性边界
distance = np.linalg.norm(atom1.position - atom2.position)
distance = np.linalg.norm(atom1.position - atom2.position)
✅ GOOD: PBC-aware distance
✅ 正确:支持PBC的距离计算
from MDAnalysis.lib.distances import distance_array
dist = distance_array(
atom1.position[np.newaxis, :],
atom2.position[np.newaxis, :],
box=u.dimensions
)[0, 0]
from MDAnalysis.lib.distances import distance_array
dist = distance_array(
atom1.position[np.newaxis, :],
atom2.position[np.newaxis, :],
box=u.dimensions
)[0, 0]
❌ BAD: Not checking for empty selections
❌ 错误:未检查空选择
selection = u.select_atoms('resname XYZ')
selection = u.select_atoms('resname XYZ')
Continue without checking if selection is empty!
未检查选择是否为空就继续!
avg_pos = selection.center_of_mass() # May crash!
avg_pos = selection.center_of_mass() # 可能崩溃!
✅ GOOD: Validate selections
✅ 正确:验证选择结果
selection = u.select_atoms('resname XYZ')
if len(selection) == 0:
print("Warning: No atoms found matching selection")
else:
avg_pos = selection.center_of_mass()
undefinedselection = u.select_atoms('resname XYZ')
if len(selection) == 0:
print("警告:未找到匹配选择条件的原子")
else:
avg_pos = selection.center_of_mass()
undefinedLoading Trajectories (Universe)
加载轨迹(Universe)
Basic Universe Creation
基础Universe创建
python
import MDAnalysis as mdapython
import MDAnalysis as mdaSingle structure file
加载单个结构文件
u = mda.Universe('protein.pdb')
u = mda.Universe('protein.pdb')
Topology + trajectory
加载拓扑文件+轨迹文件
u = mda.Universe('topology.pdb', 'trajectory.dcd')
u = mda.Universe('topology.pdb', 'trajectory.dcd')
Multiple trajectories (concatenated)
加载多个轨迹文件(拼接)
u = mda.Universe('top.pdb', 'traj1.dcd', 'traj2.dcd', 'traj3.dcd')
u = mda.Universe('top.pdb', 'traj1.dcd', 'traj2.dcd', 'traj3.dcd')
Different formats
加载不同格式的文件
u = mda.Universe('system.gro', 'traj.xtc') # GROMACS
u = mda.Universe('system.psf', 'traj.dcd') # CHARMM/NAMD
u = mda.Universe('system.prmtop', 'traj.nc') # AMBER
u = mda.Universe('system.gro', 'traj.xtc') # GROMACS格式
u = mda.Universe('system.psf', 'traj.dcd') # CHARMM/NAMD格式
u = mda.Universe('system.prmtop', 'traj.nc') # AMBER格式
From memory (numpy arrays)
从内存加载(numpy数组)
coords = np.random.rand(100, 3) # 100 atoms, xyz
u = mda.Universe.empty(100, trajectory=True)
u.atoms.positions = coords
print(f"Number of atoms: {len(u.atoms)}")
print(f"Number of frames: {len(u.trajectory)}")
print(f"Total time: {u.trajectory.totaltime} ps")
undefinedcoords = np.random.rand(100, 3) # 100个原子的xyz坐标
u = mda.Universe.empty(100, trajectory=True)
u.atoms.positions = coords
print(f"原子数量: {len(u.atoms)}")
print(f"帧数: {len(u.trajectory)}")
print(f"总时长: {u.trajectory.totaltime} ps")
undefinedTrajectory Information
轨迹信息
python
import MDAnalysis as mda
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
u = mda.Universe('topology.pdb', 'trajectory.dcd')Trajectory properties
轨迹属性
traj = u.trajectory
print(f"Number of frames: {traj.n_frames}")
print(f"Time step: {traj.dt} ps")
print(f"Total time: {traj.totaltime} ps")
traj = u.trajectory
print(f"帧数: {traj.n_frames}")
print(f"时间步长: {traj.dt} ps")
print(f"总时长: {traj.totaltime} ps")
Current frame info
当前帧信息
print(f"Current frame: {traj.frame}")
print(f"Current time: {traj.time} ps")
print(f"Box dimensions: {u.dimensions}") # [a, b, c, alpha, beta, gamma]
print(f"当前帧: {traj.frame}")
print(f"当前时间: {traj.time} ps")
print(f"盒子尺寸: {u.dimensions}") # [a, b, c, alpha, beta, gamma]
Iterate through frames
遍历帧
for i, ts in enumerate(u.trajectory):
if i >= 5:
break
print(f"Frame {ts.frame}: time = {ts.time:.2f} ps")
for i, ts in enumerate(u.trajectory):
if i >= 5:
break
print(f"帧 {ts.frame}: 时间 = {ts.time:.2f} ps")
Jump to specific frame
跳转到指定帧
u.trajectory[100] # Go to frame 100
print(f"Now at frame: {u.trajectory.frame}")
u.trajectory[100] # 跳转到第100帧
print(f"当前帧: {u.trajectory.frame}")
Slice trajectory
切片轨迹
for ts in u.trajectory[::10]: # Every 10th frame
print(f"Frame {ts.frame}")
undefinedfor ts in u.trajectory[::10]: # 每10帧取一帧
print(f"帧 {ts.frame}")
undefinedWorking with Multiple Trajectories
处理多个轨迹文件
python
import MDAnalysis as mdapython
import MDAnalysis as mdaLoad multiple trajectory files
加载多个轨迹文件
u = mda.Universe('top.pdb', 'part1.dcd', 'part2.dcd', 'part3.dcd')
print(f"Total frames from all trajectories: {len(u.trajectory)}")
u = mda.Universe('top.pdb', 'part1.dcd', 'part2.dcd', 'part3.dcd')
print(f"所有轨迹的总帧数: {len(u.trajectory)}")
Or use ChainReader explicitly
或显式使用ChainReader
from MDAnalysis.coordinates.chain import ChainReader
trajectories = ['part1.dcd', 'part2.dcd', 'part3.dcd']
u = mda.Universe('top.pdb', trajectories, continuous=True)
from MDAnalysis.coordinates.chain import ChainReader
trajectories = ['part1.dcd', 'part2.dcd', 'part3.dcd']
u = mda.Universe('top.pdb', trajectories, continuous=True)
Access frame indices
访问帧索引
for i, ts in enumerate(u.trajectory[::100]):
print(f"Global frame {i}: actual frame {ts.frame}, time {ts.time:.2f}")
undefinedfor i, ts in enumerate(u.trajectory[::100]):
print(f"全局索引 {i}: 实际帧 {ts.frame}, 时间 {ts.time:.2f}")
undefinedAtom Selections
原子选择
Basic Selection Syntax
基础选择语法
python
import MDAnalysis as mda
u = mda.Universe('system.pdb')python
import MDAnalysis as mda
u = mda.Universe('system.pdb')Protein selections
蛋白质选择
protein = u.select_atoms('protein')
backbone = u.select_atoms('backbone') # N, CA, C
mainchain = u.select_atoms('backbone or name O')
protein = u.select_atoms('protein')
backbone = u.select_atoms('backbone') # N、CA、C原子
mainchain = u.select_atoms('backbone or name O')
Specific residues
特定残基选择
resid_10 = u.select_atoms('resid 10')
resid_range = u.select_atoms('resid 10:20')
resid_list = u.select_atoms('resid 10 15 20 25')
resid_10 = u.select_atoms('resid 10')
resid_range = u.select_atoms('resid 10:20')
resid_list = u.select_atoms('resid 10 15 20 25')
Residue names
残基名称选择
water = u.select_atoms('resname WAT or resname HOH')
ions = u.select_atoms('resname NA or resname CL')
water = u.select_atoms('resname WAT or resname HOH')
ions = u.select_atoms('resname NA or resname CL')
Atom names
原子名称选择
ca_atoms = u.select_atoms('name CA')
hydrogens = u.select_atoms('name H*') # Wildcards
ca_atoms = u.select_atoms('name CA')
hydrogens = u.select_atoms('name H*') # 使用通配符
By element
按元素选择
carbons = u.select_atoms('element C')
carbons = u.select_atoms('element C')
Segments
按片段选择
seg_a = u.select_atoms('segid A')
print(f"Protein atoms: {len(protein)}")
print(f"Water molecules: {len(water)}")
print(f"CA atoms: {len(ca_atoms)}")
undefinedseg_a = u.select_atoms('segid A')
print(f"蛋白质原子数量: {len(protein)}")
print(f"水分子数量: {len(water)}")
print(f"CA原子数量: {len(ca_atoms)}")
undefinedAdvanced Selections
高级选择
python
import MDAnalysis as mda
u = mda.Universe('protein_solvent.pdb')python
import MDAnalysis as mda
u = mda.Universe('protein_solvent.pdb')Geometric selections
几何选择
around_10 = u.select_atoms('around 5.0 protein') # Within 5Å of protein
within_box = u.select_atoms('prop x < 50 and prop y < 50')
around_10 = u.select_atoms('around 5.0 protein') # 距离蛋白质5Å范围内的原子
within_box = u.select_atoms('prop x < 50 and prop y < 50')
Spatial queries
空间查询
resid_10 = u.select_atoms('resid 10')
near_res10 = u.select_atoms('around 10.0 resid 10 and not resid 10')
resid_10 = u.select_atoms('resid 10')
near_res10 = u.select_atoms('around 10.0 resid 10 and not resid 10')
Combining selections
组合选择
hydrophobic = u.select_atoms(
'resname ALA or resname VAL or resname LEU or '
'resname ILE or resname PHE or resname TRP or resname MET'
)
charged = u.select_atoms(
'resname ARG or resname LYS or resname ASP or resname GLU'
)
hydrophobic = u.select_atoms(
'resname ALA or resname VAL or resname LEU or '
'resname ILE or resname PHE or resname TRP or resname MET'
)
charged = u.select_atoms(
'resname ARG or resname LYS or resname ASP or resname GLU'
)
Boolean operations
布尔运算
not_protein = u.select_atoms('not protein')
protein_no_h = u.select_atoms('protein and not name H*')
not_protein = u.select_atoms('not protein')
protein_no_h = u.select_atoms('protein and not name H*')
By property
按属性选择
high_bfactor = u.select_atoms('protein and prop beta > 50')
high_bfactor = u.select_atoms('protein and prop beta > 50')
SMARTS matching (requires RDKit)
SMARTS匹配(需要RDKit)
aromatic = u.select_atoms('smarts c1ccccc1')
aromatic = u.select_atoms('smarts c1ccccc1')
print(f"Hydrophobic residues: {len(hydrophobic)}")
print(f"Charged residues: {len(charged)}")
undefinedprint(f"疏水残基数量: {len(hydrophobic)}")
print(f"带电残基数量: {len(charged)}")
undefinedDynamic Selections
动态选择
python
import MDAnalysis as mda
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
u = mda.Universe('topology.pdb', 'trajectory.dcd')Regular selection (static)
常规选择(静态)
protein = u.select_atoms('protein')
protein = u.select_atoms('protein')
Dynamic selection (updated each frame)
动态选择(每帧更新)
Example: Select water within 3.5Å of protein
示例:选择距离蛋白质3.5Å范围内的水
water_near_protein = u.select_atoms(
'resname WAT and around 3.5 protein',
updating=True
)
water_near_protein = u.select_atoms(
'resname WAT and around 3.5 protein',
updating=True
)
Track number of nearby water molecules over time
跟踪每帧中靠近蛋白质的水分子数量
water_counts = []
for ts in u.trajectory:
# water_near_protein automatically updates
water_counts.append(len(water_near_protein))
print(f"Water molecules near protein per frame:")
print(f" Min: {min(water_counts)}")
print(f" Max: {max(water_counts)}")
print(f" Mean: {sum(water_counts)/len(water_counts):.1f}")
water_counts = []
for ts in u.trajectory:
# water_near_protein会自动更新
water_counts.append(len(water_near_protein))
print(f"每帧中靠近蛋白质的水分子数量:")
print(f" 最小值: {min(water_counts)}")
print(f" 最大值: {max(water_counts)}")
print(f" 平均值: {sum(water_counts)/len(water_counts):.1f}")
Another example: ions near active site
另一个示例:活性位点附近的离子
active_site = u.select_atoms('resid 100:110')
nearby_ions = u.select_atoms(
'(resname NA or resname CL) and around 8.0 resid 100:110',
updating=True
)
for i, ts in enumerate(u.trajectory[::10]):
print(f"Frame {ts.frame}: {len(nearby_ions)} ions near active site")
undefinedactive_site = u.select_atoms('resid 100:110')
nearby_ions = u.select_atoms(
'(resname NA or resname CL) and around 8.0 resid 100:110',
updating=True
)
for i, ts in enumerate(u.trajectory[::10]):
print(f"帧 {ts.frame}: 活性位点附近有 {len(nearby_ions)} 个离子")
undefinedSelection Groups and Operations
选择组与操作
python
import MDAnalysis as mda
u = mda.Universe('protein.pdb')python
import MDAnalysis as mda
u = mda.Universe('protein.pdb')Create multiple selections
创建多个选择结果
protein = u.select_atoms('protein')
backbone = u.select_atoms('backbone')
ca_atoms = u.select_atoms('name CA')
protein = u.select_atoms('protein')
backbone = u.select_atoms('backbone')
ca_atoms = u.select_atoms('name CA')
Combine selections
组合选择结果
combined = protein | backbone # Union (OR)
intersection = protein & backbone # Intersection (AND)
difference = protein - backbone # Difference (NOT)
combined = protein | backbone # 并集(OR)
intersection = protein & backbone # 交集(AND)
difference = protein - backbone # 差集(NOT)
Access properties
访问属性
print(f"Combined: {len(combined)} atoms")
print(f"Center of mass: {protein.center_of_mass()}")
print(f"Center of geometry: {protein.center_of_geometry()}")
print(f"Total mass: {protein.total_mass()}")
print(f"Total charge: {protein.total_charge()}")
print(f"组合后的原子数量: {len(combined)}")
print(f"质心: {protein.center_of_mass()}")
print(f"几何中心: {protein.center_of_geometry()}")
print(f"总质量: {protein.total_mass()}")
print(f"总电荷: {protein.total_charge()}")
Iterate through atoms
遍历原子
for atom in ca_atoms[:5]:
print(f"Atom {atom.name} {atom.resname}{atom.resid}: {atom.position}")
for atom in ca_atoms[:5]:
print(f"原子 {atom.name} {atom.resname}{atom.resid}: {atom.position}")
Iterate through residues
遍历残基
for residue in protein.residues[:5]:
print(f"Residue {residue.resname}{residue.resid}: {residue.atoms.n_atoms} atoms")
for residue in protein.residues[:5]:
print(f"残基 {residue.resname}{residue.resid}: {residue.atoms.n_atoms} 个原子")
Iterate through segments
遍历片段
for segment in u.segments:
print(f"Segment {segment.segid}: {len(segment.atoms)} atoms")
undefinedfor segment in u.segments:
print(f"片段 {segment.segid}: {len(segment.atoms)} 个原子")
undefinedRMSD and RMSF Analysis
RMSD与RMSF分析
RMSD Calculation
RMSD计算
python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as pltpython
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as pltLoad trajectory
加载轨迹
u = mda.Universe('topology.pdb', 'trajectory.dcd')
u = mda.Universe('topology.pdb', 'trajectory.dcd')
Define reference (first frame)
定义参考结构(第一帧)
reference = u.copy()
reference.trajectory[0]
reference = u.copy()
reference.trajectory[0]
Select atoms for RMSD (typically backbone or CA)
选择用于RMSD计算的原子(通常是主链或CA原子)
mobile = u.select_atoms('backbone')
ref = reference.select_atoms('backbone')
mobile = u.select_atoms('backbone')
ref = reference.select_atoms('backbone')
Calculate RMSD with alignment
带对齐的RMSD计算
R = rms.RMSD(mobile, ref, select='backbone', ref_frame=0)
R.run()
R = rms.RMSD(mobile, ref, select='backbone', ref_frame=0)
R.run()
Extract results
提取结果
rmsd_data = R.results.rmsd
time = rmsd_data[:, 1] # Time in ps
rmsd_values = rmsd_data[:, 2] # RMSD in Angstroms
rmsd_data = R.results.rmsd
time = rmsd_data[:, 1] # 时间(单位:ps)
rmsd_values = rmsd_data[:, 2] # RMSD(单位:埃)
Plot
绘图
plt.figure(figsize=(10, 6))
plt.plot(time, rmsd_values)
plt.xlabel('Time (ps)')
plt.ylabel('RMSD (Å)')
plt.title('Backbone RMSD over time')
plt.grid(True)
plt.savefig('rmsd.png', dpi=300)
print(f"Mean RMSD: {np.mean(rmsd_values):.2f} Å")
print(f"Max RMSD: {np.max(rmsd_values):.2f} Å")
print(f"Final RMSD: {rmsd_values[-1]:.2f} Å")
undefinedplt.figure(figsize=(10, 6))
plt.plot(time, rmsd_values)
plt.xlabel('时间 (ps)')
plt.ylabel('RMSD (Å)')
plt.title('主链RMSD随时间变化')
plt.grid(True)
plt.savefig('rmsd.png', dpi=300)
print(f"平均RMSD: {np.mean(rmsd_values):.2f} Å")
print(f"最大RMSD: {np.max(rmsd_values):.2f} Å")
print(f"最终RMSD: {rmsd_values[-1]:.2f} Å")
undefinedRMSD to Multiple References
与多个参考结构的RMSD比较
python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
u = mda.Universe('topology.pdb', 'trajectory.dcd')Compare to multiple reference structures
与多个参考结构比较
references = {
'crystal': mda.Universe('crystal.pdb'),
'equilibrated': mda.Universe('equilibrated.pdb'),
'apo': mda.Universe('apo_structure.pdb')
}
results = {}
for ref_name, ref_universe in references.items():
mobile = u.select_atoms('backbone')
ref = ref_universe.select_atoms('backbone')
R = rms.RMSD(mobile, ref, select='backbone')
R.run()
results[ref_name] = R.results.rmsd[:, 2]references = {
'晶体结构': mda.Universe('crystal.pdb'),
'平衡后结构': mda.Universe('equilibrated.pdb'),
'apo结构': mda.Universe('apo_structure.pdb')
}
results = {}
for ref_name, ref_universe in references.items():
mobile = u.select_atoms('backbone')
ref = ref_universe.select_atoms('backbone')
R = rms.RMSD(mobile, ref, select='backbone')
R.run()
results[ref_name] = R.results.rmsd[:, 2]Plot comparison
绘制比较图
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
for ref_name, rmsd_values in results.items():
time = np.arange(len(rmsd_values)) * u.trajectory.dt
plt.plot(time, rmsd_values, label=ref_name)
plt.xlabel('Time (ps)')
plt.ylabel('RMSD (Å)')
plt.legend()
plt.title('RMSD to different reference structures')
plt.grid(True)
plt.savefig('rmsd_comparison.png', dpi=300)
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
for ref_name, rmsd_values in results.items():
time = np.arange(len(rmsd_values)) * u.trajectory.dt
plt.plot(time, rmsd_values, label=ref_name)
plt.xlabel('时间 (ps)')
plt.ylabel('RMSD (Å)')
plt.legend()
plt.title('与不同参考结构的RMSD比较')
plt.grid(True)
plt.savefig('rmsd_comparison.png', dpi=300)
Statistics
统计信息
for ref_name, rmsd_values in results.items():
print(f"{ref_name}: mean={np.mean(rmsd_values):.2f} Å, "
f"std={np.std(rmsd_values):.2f} Å")
undefinedfor ref_name, rmsd_values in results.items():
print(f"{ref_name}: 平均值={np.mean(rmsd_values):.2f} Å, "
f"标准差={np.std(rmsd_values):.2f} Å")
undefinedRMSF Calculation (Fluctuations)
RMSF计算(波动分析)
python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')Select CA atoms (typical for RMSF)
选择CA原子(RMSF计算的常用选择)
ca_atoms = u.select_atoms('protein and name CA')
ca_atoms = u.select_atoms('protein and name CA')
Calculate RMSF
计算RMSF
rmsf_analysis = rms.RMSF(ca_atoms, verbose=True)
rmsf_analysis.run()
rmsf_analysis = rms.RMSF(ca_atoms, verbose=True)
rmsf_analysis.run()
Extract results
提取结果
rmsf_values = rmsf_analysis.results.rmsf
rmsf_values = rmsf_analysis.results.rmsf
Plot RMSF vs residue number
绘制RMSF与残基编号的关系图
residue_numbers = [atom.resid for atom in ca_atoms]
plt.figure(figsize=(12, 6))
plt.plot(residue_numbers, rmsf_values, linewidth=2)
plt.xlabel('Residue Number')
plt.ylabel('RMSF (Å)')
plt.title('Per-residue Fluctuations (RMSF)')
plt.grid(True)
plt.savefig('rmsf.png', dpi=300)
residue_numbers = [atom.resid for atom in ca_atoms]
plt.figure(figsize=(12, 6))
plt.plot(residue_numbers, rmsf_values, linewidth=2)
plt.xlabel('残基编号')
plt.ylabel('RMSF (Å)')
plt.title('残基水平波动(RMSF)')
plt.grid(True)
plt.savefig('rmsf.png', dpi=300)
Identify flexible regions (high RMSF)
识别柔性区域(高RMSD)
threshold = np.mean(rmsf_values) + np.std(rmsf_values)
flexible_residues = [resid for resid, rmsf in zip(residue_numbers, rmsf_values)
if rmsf > threshold]
print(f"Mean RMSF: {np.mean(rmsf_values):.2f} Å")
print(f"Flexible regions (RMSF > {threshold:.2f}): {flexible_residues}")
threshold = np.mean(rmsf_values) + np.std(rmsf_values)
flexible_residues = [resid for resid, rmsf in zip(residue_numbers, rmsf_values)
if rmsf > threshold]
print(f"平均RMSF: {np.mean(rmsf_values):.2f} Å")
print(f"柔性区域(RMSF > {threshold:.2f}): {flexible_residues}")
RMSF per atom (all heavy atoms)
所有重原子的RMSF
heavy_atoms = u.select_atoms('protein and not name H*')
rmsf_all = rms.RMSF(heavy_atoms)
rmsf_all.run()
print(f"\nMean RMSF (all heavy atoms): {np.mean(rmsf_all.results.rmsf):.2f} Å")
undefinedheavy_atoms = u.select_atoms('protein and not name H*')
rmsf_all = rms.RMSF(heavy_atoms)
rmsf_all.run()
print(f"\n所有重原子的平均RMSF: {np.mean(rmsf_all.results.rmsf):.2f} Å")
undefined2D RMSD Matrix
二维RMSD矩阵
python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')Select atoms
选择原子
selection = u.select_atoms('backbone')
selection = u.select_atoms('backbone')
Compute pairwise RMSD matrix
计算两两RMSD矩阵
n_frames = len(u.trajectory)
rmsd_matrix = np.zeros((n_frames, n_frames))
n_frames = len(u.trajectory)
rmsd_matrix = np.zeros((n_frames, n_frames))
Store coordinates for all frames
存储所有帧的坐标
coords = []
for ts in u.trajectory:
coords.append(selection.positions.copy())
coords = []
for ts in u.trajectory:
coords.append(selection.positions.copy())
Calculate pairwise RMSD
计算两两RMSD
for i in range(n_frames):
for j in range(i, n_frames):
# Align j to i
rmsd_value = rms.rmsd(coords[j], coords[i], center=True, superposition=True)
rmsd_matrix[i, j] = rmsd_value
rmsd_matrix[j, i] = rmsd_value
for i in range(n_frames):
for j in range(i, n_frames):
# 将j帧对齐到i帧
rmsd_value = rms.rmsd(coords[j], coords[i], center=True, superposition=True)
rmsd_matrix[i, j] = rmsd_value
rmsd_matrix[j, i] = rmsd_value
Plot matrix
绘制矩阵图
plt.figure(figsize=(10, 8))
plt.imshow(rmsd_matrix, cmap='viridis', origin='lower')
plt.colorbar(label='RMSD (Å)')
plt.xlabel('Frame')
plt.ylabel('Frame')
plt.title('Pairwise RMSD Matrix')
plt.savefig('rmsd_matrix.png', dpi=300)
plt.figure(figsize=(10, 8))
plt.imshow(rmsd_matrix, cmap='viridis', origin='lower')
plt.colorbar(label='RMSD (Å)')
plt.xlabel('帧')
plt.ylabel('帧')
plt.title('两两RMSD矩阵')
plt.savefig('rmsd_matrix.png', dpi=300)
Identify clusters (frames with low mutual RMSD)
识别聚类(两两RMSD低的帧)
threshold = 2.0 # Angstroms
similar_frames = np.where(rmsd_matrix < threshold)
print(f"Frame pairs with RMSD < {threshold} Å: {len(similar_frames[0])}")
undefinedthreshold = 2.0 # 埃
similar_frames = np.where(rmsd_matrix < threshold)
print(f"RMSD < {threshold} Å 的帧对数量: {len(similar_frames[0])}")
undefinedDistance Analysis
距离分析
Simple Distance Calculations
简单距离计算
python
import MDAnalysis as mda
from MDAnalysis.analysis import distances
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.analysis import distances
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')Select two groups of atoms
选择两组原子
group1 = u.select_atoms('resid 10 and name CA')
group2 = u.select_atoms('resid 50 and name CA')
group1 = u.select_atoms('resid 10 and name CA')
group2 = u.select_atoms('resid 50 and name CA')
Calculate distance over trajectory
计算随时间变化的距离
dist_array = []
for ts in u.trajectory:
# PBC-aware distance
d = distances.distance_array(
group1.positions,
group2.positions,
box=u.dimensions
)[0, 0]
dist_array.append(d)
dist_array = np.array(dist_array)
time = np.arange(len(dist_array)) * u.trajectory.dt
dist_array = []
for ts in u.trajectory:
# 支持PBC的距离计算
d = distances.distance_array(
group1.positions,
group2.positions,
box=u.dimensions
)[0, 0]
dist_array.append(d)
dist_array = np.array(dist_array)
time = np.arange(len(dist_array)) * u.trajectory.dt
Plot
绘图
plt.figure(figsize=(10, 6))
plt.plot(time, dist_array)
plt.xlabel('Time (ps)')
plt.ylabel('Distance (Å)')
plt.title('Distance between residues 10 and 50')
plt.grid(True)
plt.savefig('distance.png', dpi=300)
print(f"Mean distance: {np.mean(dist_array):.2f} Å")
print(f"Min distance: {np.min(dist_array):.2f} Å")
print(f"Max distance: {np.max(dist_array):.2f} Å")
undefinedplt.figure(figsize=(10, 6))
plt.plot(time, dist_array)
plt.xlabel('时间 (ps)')
plt.ylabel('距离 (Å)')
plt.title('残基10与50的CA原子距离随时间变化')
plt.grid(True)
plt.savefig('distance.png', dpi=300)
print(f"平均距离: {np.mean(dist_array):.2f} Å")
print(f"最小距离: {np.min(dist_array):.2f} Å")
print(f"最大距离: {np.max(dist_array):.2f} Å")
undefinedDistance Matrix
距离矩阵
python
import MDAnalysis as mda
from MDAnalysis.analysis import distances
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('protein.pdb')python
import MDAnalysis as mda
from MDAnalysis.analysis import distances
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('protein.pdb')Select CA atoms
选择CA原子
ca_atoms = u.select_atoms('protein and name CA')
ca_atoms = u.select_atoms('protein and name CA')
Calculate distance matrix
计算距离矩阵
dist_matrix = distances.distance_array(
ca_atoms.positions,
ca_atoms.positions,
box=u.dimensions
)
dist_matrix = distances.distance_array(
ca_atoms.positions,
ca_atoms.positions,
box=u.dimensions
)
Plot
绘图
plt.figure(figsize=(10, 8))
plt.imshow(dist_matrix, cmap='viridis', origin='lower')
plt.colorbar(label='Distance (Å)')
plt.xlabel('CA atom index')
plt.ylabel('CA atom index')
plt.title('CA Distance Matrix')
plt.savefig('distance_matrix.png', dpi=300)
plt.figure(figsize=(10, 8))
plt.imshow(dist_matrix, cmap='viridis', origin='lower')
plt.colorbar(label='距离 (Å)')
plt.xlabel('CA原子索引')
plt.ylabel('CA原子索引')
plt.title('CA原子距离矩阵')
plt.savefig('distance_matrix.png', dpi=300)
Find contacts (CA atoms within 8Å)
识别接触对(距离小于8Å的CA原子)
contact_threshold = 8.0
contacts = np.where((dist_matrix < contact_threshold) & (dist_matrix > 0))
n_contacts = len(contacts[0]) // 2 # Divide by 2 (symmetric matrix)
print(f"Number of CA-CA contacts within {contact_threshold} Å: {n_contacts}")
undefinedcontact_threshold = 8.0
contacts = np.where((dist_matrix < contact_threshold) & (dist_matrix > 0))
n_contacts = len(contacts[0]) // 2 # 对称矩阵,除以2去重
print(f"距离 < {contact_threshold} Å 的CA-CA接触对数量: {n_contacts}")
undefinedContact Analysis Over Time
随时间变化的接触分析
python
import MDAnalysis as mda
from MDAnalysis.analysis.contacts import Contacts
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.analysis.contacts import Contacts
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')Define two groups
定义两组原子
group1 = u.select_atoms('resid 1-50 and name CA')
group2 = u.select_atoms('resid 51-100 and name CA')
group1 = u.select_atoms('resid 1-50 and name CA')
group2 = u.select_atoms('resid 51-100 and name CA')
Calculate contacts over trajectory
计算随时间变化的接触对
ca = Contacts(
u,
selection=(group1, group2),
refgroup=(group1, group2),
radius=8.0, # Contact cutoff in Angstroms
method='hard_cut' # or 'soft_cut' for smooth cutoff
)
ca.run()
ca = Contacts(
u,
selection=(group1, group2),
refgroup=(group1, group2),
radius=8.0, # 接触阈值(单位:埃)
method='hard_cut' # 或'soft_cut'使用平滑阈值
)
ca.run()
Extract results
提取结果
time = ca.results.timeseries[:, 0]
n_contacts = ca.results.timeseries[:, 1]
time = ca.results.timeseries[:, 0]
n_contacts = ca.results.timeseries[:, 1]
Plot
绘图
plt.figure(figsize=(10, 6))
plt.plot(time, n_contacts)
plt.xlabel('Time (ps)')
plt.ylabel('Number of Contacts')
plt.title('Inter-domain Contacts')
plt.grid(True)
plt.savefig('contacts.png', dpi=300)
print(f"Mean contacts: {np.mean(n_contacts):.1f}")
print(f"Contact stability: {np.std(n_contacts):.1f} (lower = more stable)")
undefinedplt.figure(figsize=(10, 6))
plt.plot(time, n_contacts)
plt.xlabel('时间 (ps)')
plt.ylabel('接触对数量')
plt.title('结构域间接触对')
plt.grid(True)
plt.savefig('contacts.png', dpi=300)
print(f"平均接触对数量: {np.mean(n_contacts):.1f}")
print(f"接触对稳定性: {np.std(n_contacts):.1f}(值越小越稳定)")
undefinedRadius of Gyration
回转半径
python
import MDAnalysis as mda
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')
protein = u.select_atoms('protein')python
import MDAnalysis as mda
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')
protein = u.select_atoms('protein')Calculate radius of gyration over time
计算随时间变化的回转半径
rg_values = []
for ts in u.trajectory:
rg = protein.radius_of_gyration()
rg_values.append(rg)
rg_values = np.array(rg_values)
time = np.arange(len(rg_values)) * u.trajectory.dt
rg_values = []
for ts in u.trajectory:
rg_values.append(protein.radius_of_gyration())
rg_values = np.array(rg_values)
time = np.arange(len(rg_values)) * u.trajectory.dt
Plot
绘图
plt.figure(figsize=(10, 6))
plt.plot(time, rg_values)
plt.xlabel('Time (ps)')
plt.ylabel('Radius of Gyration (Å)')
plt.title('Protein Compactness')
plt.grid(True)
plt.savefig('radius_of_gyration.png', dpi=300)
print(f"Mean Rg: {np.mean(rg_values):.2f} Å")
print(f"Rg fluctuation: {np.std(rg_values):.2f} Å")
plt.figure(figsize=(10, 6))
plt.plot(time, rg_values)
plt.xlabel('时间 (ps)')
plt.ylabel('回转半径 (Å)')
plt.title('蛋白质紧凑性分析')
plt.grid(True)
plt.savefig('radius_of_gyration.png', dpi=300)
print(f"平均回转半径: {np.mean(rg_values):.2f} Å")
print(f"回转半径波动: {np.std(rg_values):.2f} Å")
Check if protein is folding/unfolding
检查蛋白质是否发生折叠/去折叠
if np.std(rg_values) > 2.0:
print("Warning: Large Rg fluctuations indicate conformational changes")
undefinedif np.std(rg_values) > 2.0:
print("警告:回转半径波动大,表明存在构象变化")
undefinedHydrogen Bond Analysis
氢键分析
Basic H-Bond Analysis
基础氢键分析
python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')Setup hydrogen bond analysis
设置氢键分析
hbonds = HydrogenBondAnalysis(
universe=u,
donors_sel='protein',
hydrogens_sel='protein',
acceptors_sel='protein',
d_h_cutoff=1.2, # Donor-H distance cutoff (Å)
d_a_cutoff=3.0, # Donor-Acceptor distance cutoff (Å)
d_h_a_angle_cutoff=150 # Angle cutoff (degrees)
)
hbonds = HydrogenBondAnalysis(
universe=u,
donors_sel='protein',
hydrogens_sel='protein',
acceptors_sel='protein',
d_h_cutoff=1.2, # 供体-氢距离阈值(单位:埃)
d_a_cutoff=3.0, # 供体-受体距离阈值(单位:埃)
d_h_a_angle_cutoff=150 # 供体-氢-受体角度阈值(单位:度)
)
Run analysis
运行分析
hbonds.run(verbose=True)
hbonds.run(verbose=True)
Access results
获取结果
hbond_results = hbonds.results.hbonds
print(f"Total hydrogen bonds detected: {len(hbond_results)}")
hbond_results = hbonds.results.hbonds
print(f"检测到的氢键总数: {len(hbond_results)}")
Count H-bonds per frame
统计每帧的氢键数量
frames = hbond_results[:, 0].astype(int)
unique_frames, counts = np.unique(frames, return_counts=True)
frames = hbond_results[:, 0].astype(int)
unique_frames, counts = np.unique(frames, return_counts=True)
Plot
绘图
time = unique_frames * u.trajectory.dt
plt.figure(figsize=(10, 6))
plt.plot(time, counts)
plt.xlabel('Time (ps)')
plt.ylabel('Number of H-bonds')
plt.title('Intramolecular Hydrogen Bonds')
plt.grid(True)
plt.savefig('hbonds_time.png', dpi=300)
print(f"Mean H-bonds per frame: {np.mean(counts):.1f}")
print(f"Std H-bonds per frame: {np.std(counts):.1f}")
undefinedtime = unique_frames * u.trajectory.dt
plt.figure(figsize=(10, 6))
plt.plot(time, counts)
plt.xlabel('时间 (ps)')
plt.ylabel('氢键数量')
plt.title('分子内氢键')
plt.grid(True)
plt.savefig('hbonds_time.png', dpi=300)
print(f"每帧平均氢键数量: {np.mean(counts):.1f}")
print(f"每帧氢键数量标准差: {np.std(counts):.1f}")
undefinedPersistent H-Bond Analysis
持续性氢键分析
python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
from collections import Counter
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
from collections import Counter
u = mda.Universe('topology.pdb', 'trajectory.dcd')Run H-bond analysis
运行氢键分析
hbonds = HydrogenBondAnalysis(
universe=u,
donors_sel='protein',
hydrogens_sel='protein',
acceptors_sel='protein'
)
hbonds.run()
hbonds = HydrogenBondAnalysis(
universe=u,
donors_sel='protein',
hydrogens_sel='protein',
acceptors_sel='protein'
)
hbonds.run()
Extract results
提取结果
hbond_data = hbonds.results.hbonds
hbond_data = hbonds.results.hbonds
Count H-bond occurrences
统计氢键出现次数
Each row: [frame, donor_idx, hydrogen_idx, acceptor_idx, distance, angle]
每行数据: [帧索引, 供体索引, 氢原子索引, 受体索引, 距离, 角度]
hbond_pairs = []
for row in hbond_data:
donor_idx = int(row[1])
acceptor_idx = int(row[3])
hbond_pairs.append((donor_idx, acceptor_idx))
hbond_pairs = []
for row in hbond_data:
donor_idx = int(row[1])
acceptor_idx = int(row[3])
hbond_pairs.append((donor_idx, acceptor_idx))
Count frequency
统计频率
hbond_counts = Counter(hbond_pairs)
total_frames = len(u.trajectory)
hbond_counts = Counter(hbond_pairs)
total_frames = len(u.trajectory)
Find persistent H-bonds (present in >50% of frames)
找出持续性氢键(在超过50%的帧中出现)
threshold = 0.5 * total_frames
persistent_hbonds = {pair: count for pair, count in hbond_counts.items()
if count > threshold}
print(f"Persistent H-bonds (>{threshold/total_frames:.0%} occupancy):")
for (donor_idx, acceptor_idx), count in sorted(persistent_hbonds.items(),
key=lambda x: x[1],
reverse=True):
donor_atom = u.atoms[donor_idx]
acceptor_atom = u.atoms[acceptor_idx]
occupancy = count / total_frames
print(f" {donor_atom.resname}{donor_atom.resid}:{donor_atom.name} -> "
f"{acceptor_atom.resname}{acceptor_atom.resid}:{acceptor_atom.name} "
f"({occupancy:.1%} occupancy)")undefinedthreshold = 0.5 * total_frames
persistent_hbonds = {pair: count for pair, count in hbond_counts.items()
if count > threshold}
print(f"持续性氢键(出现占比 >{threshold/total_frames:.0%}):")
for (donor_idx, acceptor_idx), count in sorted(persistent_hbonds.items(),
key=lambda x: x[1],
reverse=True):
donor_atom = u.atoms[donor_idx]
acceptor_atom = u.atoms[acceptor_idx]
occupancy = count / total_frames
print(f" {donor_atom.resname}{donor_atom.resid}:{donor_atom.name} -> "
f"{acceptor_atom.resname}{acceptor_atom.resid}:{acceptor_atom.name} "
f"(出现占比: {occupancy:.1%})")undefinedProtein-Ligand H-Bonds
蛋白质-配体氢键分析
python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
u = mda.Universe('complex.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
u = mda.Universe('complex.pdb', 'trajectory.dcd')Separate H-bond analyses
分别进行氢键分析
1. Ligand as donor, protein as acceptor
1. 配体作为供体,蛋白质作为受体
hbonds_lig_don = HydrogenBondAnalysis(
universe=u,
donors_sel='resname LIG',
hydrogens_sel='resname LIG',
acceptors_sel='protein'
)
hbonds_lig_don.run()
hbonds_lig_don = HydrogenBondAnalysis(
universe=u,
donors_sel='resname LIG',
hydrogens_sel='resname LIG',
acceptors_sel='protein'
)
hbonds_lig_don.run()
2. Protein as donor, ligand as acceptor
2. 蛋白质作为供体,配体作为受体
hbonds_prot_don = HydrogenBondAnalysis(
universe=u,
donors_sel='protein',
hydrogens_sel='protein',
acceptors_sel='resname LIG'
)
hbonds_prot_don.run()
hbonds_prot_don = HydrogenBondAnalysis(
universe=u,
donors_sel='protein',
hydrogens_sel='protein',
acceptors_sel='resname LIG'
)
hbonds_prot_don.run()
Combine results
合并结果
total_hbonds_per_frame = []
for frame in range(len(u.trajectory)):
n_lig_don = len(hbonds_lig_don.results.hbonds[
hbonds_lig_don.results.hbonds[:, 0] == frame
])
n_prot_don = len(hbonds_prot_don.results.hbonds[
hbonds_prot_don.results.hbonds[:, 0] == frame
])
total_hbonds_per_frame.append(n_lig_don + n_prot_don)
import matplotlib.pyplot as plt
import numpy as np
time = np.arange(len(total_hbonds_per_frame)) * u.trajectory.dt
plt.figure(figsize=(10, 6))
plt.plot(time, total_hbonds_per_frame)
plt.xlabel('Time (ps)')
plt.ylabel('Number of Protein-Ligand H-bonds')
plt.title('Protein-Ligand Hydrogen Bonding')
plt.grid(True)
plt.savefig('protein_ligand_hbonds.png', dpi=300)
print(f"Mean protein-ligand H-bonds: {np.mean(total_hbonds_per_frame):.1f}")
undefinedtotal_hbonds_per_frame = []
for frame in range(len(u.trajectory)):
n_lig_don = len(hbonds_lig_don.results.hbonds[
hbonds_lig_don.results.hbonds[:, 0] == frame
])
n_prot_don = len(hbonds_prot_don.results.hbonds[
hbonds_prot_don.results.hbonds[:, 0] == frame
])
total_hbonds_per_frame.append(n_lig_don + n_prot_don)
import matplotlib.pyplot as plt
import numpy as np
time = np.arange(len(total_hbonds_per_frame)) * u.trajectory.dt
plt.figure(figsize=(10, 6))
plt.plot(time, total_hbonds_per_frame)
plt.xlabel('时间 (ps)')
plt.ylabel('蛋白质-配体氢键数量')
plt.title('蛋白质-配体氢键')
plt.grid(True)
plt.savefig('protein_ligand_hbonds.png', dpi=300)
print(f"平均蛋白质-配体氢键数量: {np.mean(total_hbonds_per_frame):.1f}")
undefinedDihedral Angle Analysis
二面角分析
Backbone Dihedrals (Ramachandran)
主链二面角(Ramachandran图)
python
import MDAnalysis as mda
from MDAnalysis.analysis.dihedrals import Ramachandran
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.analysis.dihedrals import Ramachandran
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')Select residues (exclude first and last)
选择残基(排除第一个和最后一个残基)
residues = u.select_atoms('protein').residues[1:-1]
residues = u.select_atoms('protein').residues[1:-1]
Calculate Ramachandran angles
计算Ramachandran角
rama = Ramachandran(residues)
rama.run()
rama = Ramachandran(residues)
rama.run()
Extract phi and psi angles
提取phi和psi角
angles = rama.results.angles
angles = rama.results.angles
Plot Ramachandran plot
绘制Ramachandran图
phi = angles[:, :, 0].flatten()
psi = angles[:, :, 1].flatten()
plt.figure(figsize=(8, 8))
plt.hexbin(phi, psi, gridsize=50, cmap='Blues', mincnt=1)
plt.colorbar(label='Counts')
plt.xlabel('Phi (degrees)')
plt.ylabel('Psi (degrees)')
plt.title('Ramachandran Plot')
plt.xlim(-180, 180)
plt.ylim(-180, 180)
plt.axhline(0, color='gray', linestyle='--', alpha=0.3)
plt.axvline(0, color='gray', linestyle='--', alpha=0.3)
plt.savefig('ramachandran.png', dpi=300)
phi = angles[:, :, 0].flatten()
psi = angles[:, :, 1].flatten()
plt.figure(figsize=(8, 8))
plt.hexbin(phi, psi, gridsize=50, cmap='Blues', mincnt=1)
plt.colorbar(label='计数')
plt.xlabel('Phi (度)')
plt.ylabel('Psi (度)')
plt.title('Ramachandran图')
plt.xlim(-180, 180)
plt.ylim(-180, 180)
plt.axhline(0, color='gray', linestyle='--', alpha=0.3)
plt.axvline(0, color='gray', linestyle='--', alpha=0.3)
plt.savefig('ramachandran.png', dpi=300)
Check for outliers (non-favored regions)
检查异常值(非优势区域)
Alpha-helix region: phi ~ -60, psi ~ -45
阿尔法螺旋区域: phi ~ -60, psi ~ -45
Beta-sheet region: phi ~ -120, psi ~ 120
贝塔折叠区域: phi ~ -120, psi ~ 120
alpha_region = (np.abs(phi + 60) < 30) & (np.abs(psi + 45) < 30)
beta_region = (np.abs(phi + 120) < 30) & (np.abs(psi - 120) < 30)
favored = alpha_region | beta_region
print(f"Percentage in favored regions: {np.sum(favored)/len(phi)*100:.1f}%")
undefinedalpha_region = (np.abs(phi + 60) < 30) & (np.abs(psi + 45) < 30)
beta_region = (np.abs(phi + 120) < 30) & (np.abs(psi - 120) < 30)
favored = alpha_region | beta_region
print(f"处于优势区域的比例: {np.sum(favored)/len(phi)*100:.1f}%")
undefinedCustom Dihedral Analysis
自定义二面角分析
python
import MDAnalysis as mda
from MDAnalysis.analysis.dihedrals import Dihedral
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.analysis.dihedrals import Dihedral
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')Define atoms for dihedral (4 consecutive atoms)
定义用于计算二面角的4个连续原子
Example: Chi1 angle of a specific residue
示例:特定残基的Chi1角
residue = u.select_atoms('resid 42')
atoms = [
residue.select_atoms('name N')[0],
residue.select_atoms('name CA')[0],
residue.select_atoms('name CB')[0],
residue.select_atoms('name CG')[0]
]
residue = u.select_atoms('resid 42')
atoms = [
residue.select_atoms('name N')[0],
residue.select_atoms('name CA')[0],
residue.select_atoms('name CB')[0],
residue.select_atoms('name CG')[0]
]
Calculate dihedral over trajectory
计算随时间变化的二面角
dihedral_angles = []
for ts in u.trajectory:
# Get positions
positions = np.array([atom.position for atom in atoms])
# Calculate dihedral
from MDAnalysis.lib.distances import calc_dihedrals
angle = calc_dihedrals(
positions[0], positions[1], positions[2], positions[3],
box=u.dimensions
)
dihedral_angles.append(np.degrees(angle))dihedral_angles = np.array(dihedral_angles)
time = np.arange(len(dihedral_angles)) * u.trajectory.dt
dihedral_angles = []
for ts in u.trajectory:
# 获取原子位置
positions = np.array([atom.position for atom in atoms])
# 计算二面角
from MDAnalysis.lib.distances import calc_dihedrals
angle = calc_dihedrals(
positions[0], positions[1], positions[2], positions[3],
box=u.dimensions
)
dihedral_angles.append(np.degrees(angle))dihedral_angles = np.array(dihedral_angles)
time = np.arange(len(dihedral_angles)) * u.trajectory.dt
Plot
绘图
plt.figure(figsize=(10, 6))
plt.plot(time, dihedral_angles)
plt.xlabel('Time (ps)')
plt.ylabel('Dihedral Angle (degrees)')
plt.title('Chi1 Angle of Residue 42')
plt.ylim(-180, 180)
plt.grid(True)
plt.savefig('chi1_angle.png', dpi=300)
plt.figure(figsize=(10, 6))
plt.plot(time, dihedral_angles)
plt.xlabel('时间 (ps)')
plt.ylabel('二面角 (度)')
plt.title('残基42的Chi1角')
plt.ylim(-180, 180)
plt.grid(True)
plt.savefig('chi1_angle.png', dpi=300)
Histogram
绘制直方图
plt.figure(figsize=(8, 6))
plt.hist(dihedral_angles, bins=36, range=(-180, 180), edgecolor='black')
plt.xlabel('Dihedral Angle (degrees)')
plt.ylabel('Frequency')
plt.title('Chi1 Angle Distribution')
plt.savefig('chi1_histogram.png', dpi=300)
print(f"Mean dihedral angle: {np.mean(dihedral_angles):.1f}°")
print(f"Std dihedral angle: {np.std(dihedral_angles):.1f}°")
undefinedplt.figure(figsize=(8, 6))
plt.hist(dihedral_angles, bins=36, range=(-180, 180), edgecolor='black')
plt.xlabel('二面角 (度)')
plt.ylabel('频率')
plt.title('Chi1角分布')
plt.savefig('chi1_histogram.png', dpi=300)
print(f"平均二面角: {np.mean(dihedral_angles):.1f}°")
print(f"二面角标准差: {np.std(dihedral_angles):.1f}°")
undefinedSolvent Accessible Surface Area (SASA)
溶剂可及表面积(SASA)
Basic SASA Calculation
基础SASA计算
python
import MDAnalysis as mda
from MDAnalysis.analysis.sasa import SASAnalysis
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.analysis.sasa import SASAnalysis
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')Calculate SASA for protein
计算蛋白质的SASA
sasa = SASAnalysis(u.select_atoms('protein'))
sasa.run(verbose=True)
sasa = SASAnalysis(u.select_atoms('protein'))
sasa.run(verbose=True)
Extract results
提取结果
sasa_values = sasa.results.sasa
time = np.arange(len(sasa_values)) * u.trajectory.dt
sasa_values = sasa.results.sasa
time = np.arange(len(sasa_values)) * u.trajectory.dt
Plot
绘图
plt.figure(figsize=(10, 6))
plt.plot(time, sasa_values)
plt.xlabel('Time (ps)')
plt.ylabel('SASA (Ų)')
plt.title('Protein Solvent Accessible Surface Area')
plt.grid(True)
plt.savefig('sasa.png', dpi=300)
print(f"Mean SASA: {np.mean(sasa_values):.1f} Ų")
print(f"SASA range: {np.min(sasa_values):.1f} - {np.max(sasa_values):.1f} Ų")
undefinedplt.figure(figsize=(10, 6))
plt.plot(time, sasa_values)
plt.xlabel('时间 (ps)')
plt.ylabel('SASA (Ų)')
plt.title('蛋白质溶剂可及表面积')
plt.grid(True)
plt.savefig('sasa.png', dpi=300)
print(f"平均SASA: {np.mean(sasa_values):.1f} Ų")
print(f"SASA范围: {np.min(sasa_values):.1f} - {np.max(sasa_values):.1f} Ų")
undefinedPer-Residue SASA
残基水平SASA
python
import MDAnalysis as mda
from MDAnalysis.analysis.sasa import SASAnalysis
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('protein.pdb')
protein = u.select_atoms('protein')python
import MDAnalysis as mda
from MDAnalysis.analysis.sasa import SASAnalysis
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('protein.pdb')
protein = u.select_atoms('protein')Calculate SASA for each residue
计算每个残基的SASA
residue_sasa = []
residue_ids = []
for residue in protein.residues:
sasa = SASAnalysis(residue.atoms)
sasa.run()
residue_sasa.append(sasa.results.sasa[0])
residue_ids.append(residue.resid)
residue_sasa = np.array(residue_sasa)
residue_sasa = []
residue_ids = []
for residue in protein.residues:
sasa = SASAnalysis(residue.atoms)
sasa.run()
residue_sasa.append(sasa.results.sasa[0])
residue_ids.append(residue.resid)
residue_sasa = np.array(residue_sasa)
Plot
绘图
plt.figure(figsize=(12, 6))
plt.bar(residue_ids, residue_sasa)
plt.xlabel('Residue Number')
plt.ylabel('SASA (Ų)')
plt.title('Per-Residue Solvent Accessible Surface Area')
plt.savefig('per_residue_sasa.png', dpi=300)
plt.figure(figsize=(12, 6))
plt.bar(residue_ids, residue_sasa)
plt.xlabel('残基编号')
plt.ylabel('SASA (Ų)')
plt.title('残基水平溶剂可及表面积')
plt.savefig('per_residue_sasa.png', dpi=300)
Identify buried vs exposed residues
识别埋藏与暴露残基
threshold = 40.0 # Ų
buried = [resid for resid, sasa in zip(residue_ids, residue_sasa) if sasa < threshold]
exposed = [resid for resid, sasa in zip(residue_ids, residue_sasa) if sasa >= threshold]
print(f"Buried residues (SASA < {threshold} Ų): {len(buried)}")
print(f"Exposed residues (SASA >= {threshold} Ų): {len(exposed)}")
print(f"Mean residue SASA: {np.mean(residue_sasa):.1f} Ų")
undefinedthreshold = 40.0 # Ų
buried = [resid for resid, sasa in zip(residue_ids, residue_sasa) if sasa < threshold]
exposed = [resid for resid, sasa in zip(residue_ids, residue_sasa) if sasa >= threshold]
print(f"埋藏残基(SASA < {threshold} Ų): {len(buried)}")
print(f"暴露残基(SASA >= {threshold} Ų): {len(exposed)}")
print(f"残基平均SASA: {np.mean(residue_sasa):.1f} Ų")
undefinedTrajectory Alignment and Transformation
轨迹对齐与变换
Align Trajectory to Reference
将轨迹对齐到参考结构
python
import MDAnalysis as mda
from MDAnalysis.analysis import align
from MDAnalysis.analysis.rms import rmsdpython
import MDAnalysis as mda
from MDAnalysis.analysis import align
from MDAnalysis.analysis.rms import rmsdLoad trajectory
加载轨迹
mobile = mda.Universe('topology.pdb', 'trajectory.dcd')
reference = mda.Universe('reference.pdb')
mobile = mda.Universe('topology.pdb', 'trajectory.dcd')
reference = mda.Universe('reference.pdb')
Align entire trajectory to reference
将整个轨迹对齐到参考结构
alignment = align.AlignTraj(
mobile,
reference,
select='backbone',
filename='aligned_trajectory.dcd',
weights='mass'
)
alignment.run()
print(f"Trajectory aligned and saved to aligned_trajectory.dcd")
alignment = align.AlignTraj(
mobile,
reference,
select='backbone',
filename='aligned_trajectory.dcd',
weights='mass'
)
alignment.run()
print(f"轨迹已对齐并保存到aligned_trajectory.dcd")
Verify alignment - RMSD should be lower
验证对齐效果 - RMSD应降低
mobile_original = mda.Universe('topology.pdb', 'trajectory.dcd')
mobile_aligned = mda.Universe('topology.pdb', 'aligned_trajectory.dcd')
ref_backbone = reference.select_atoms('backbone')
mobile_original = mda.Universe('topology.pdb', 'trajectory.dcd')
mobile_aligned = mda.Universe('topology.pdb', 'aligned_trajectory.dcd')
ref_backbone = reference.select_atoms('backbone')
Calculate RMSD before and after alignment
计算对齐前后的RMSD
rmsd_before = []
rmsd_after = []
for ts_orig, ts_align in zip(mobile_original.trajectory, mobile_aligned.trajectory):
orig_backbone = mobile_original.select_atoms('backbone')
align_backbone = mobile_aligned.select_atoms('backbone')
rmsd_before.append(rmsd(orig_backbone.positions, ref_backbone.positions,
center=True, superposition=True))
rmsd_after.append(rmsd(align_backbone.positions, ref_backbone.positions,
center=False, superposition=False))import numpy as np
print(f"Mean RMSD before alignment: {np.mean(rmsd_before):.2f} Å")
print(f"Mean RMSD after alignment: {np.mean(rmsd_after):.2f} Å")
undefinedrmsd_before = []
rmsd_after = []
for ts_orig, ts_align in zip(mobile_original.trajectory, mobile_aligned.trajectory):
orig_backbone = mobile_original.select_atoms('backbone')
align_backbone = mobile_aligned.select_atoms('backbone')
rmsd_before.append(rmsd(orig_backbone.positions, ref_backbone.positions,
center=True, superposition=True))
rmsd_after.append(rmsd(align_backbone.positions, ref_backbone.positions,
center=False, superposition=False))import numpy as np
print(f"对齐前平均RMSD: {np.mean(rmsd_before):.2f} Å")
print(f"对齐后平均RMSD: {np.mean(rmsd_after):.2f} Å")
undefinedPBC Unwrapping
PBC展开
python
import MDAnalysis as mda
from MDAnalysis.transformations import unwrap
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.transformations import unwrap
u = mda.Universe('topology.pdb', 'trajectory.dcd')Molecules can be split across periodic boundaries
分子可能被周期性边界分割
Unwrap to make them whole
展开使分子保持完整
protein = u.select_atoms('protein')
protein = u.select_atoms('protein')
Add unwrap transformation
添加展开变换
workflow = [unwrap(protein)]
u.trajectory.add_transformations(*workflow)
workflow = [unwrap(protein)]
u.trajectory.add_transformations(*workflow)
Now iterate - protein will be automatically unwrapped
现在遍历轨迹 - 蛋白质会自动展开
for ts in u.trajectory[:5]:
com = protein.center_of_mass()
print(f"Frame {ts.frame}: Protein COM = {com}")
for ts in u.trajectory[:5]:
com = protein.center_of_mass()
print(f"帧 {ts.frame}: 蛋白质质心 = {com}")
Save unwrapped trajectory
保存展开后的轨迹
with mda.Writer('unwrapped.dcd', protein.n_atoms) as W:
for ts in u.trajectory:
W.write(protein)
print("Unwrapped trajectory saved")
undefinedwith mda.Writer('unwrapped.dcd', protein.n_atoms) as W:
for ts in u.trajectory:
W.write(protein)
print("展开后的轨迹已保存")
undefinedOn-the-Fly Transformations
实时变换
python
import MDAnalysis as mda
from MDAnalysis.transformations import (
center_in_box,
wrap,
unwrap,
fit_rot_trans
)
u = mda.Universe('topology.pdb', 'trajectory.dcd')
reference = mda.Universe('reference.pdb')
protein = u.select_atoms('protein')
ref_protein = reference.select_atoms('protein')python
import MDAnalysis as mda
from MDAnalysis.transformations import (
center_in_box,
wrap,
unwrap,
fit_rot_trans
)
u = mda.Universe('topology.pdb', 'trajectory.dcd')
reference = mda.Universe('reference.pdb')
protein = u.select_atoms('protein')
ref_protein = reference.select_atoms('protein')Chain multiple transformations
链式多个变换
workflow = [
unwrap(protein), # 1. Unwrap protein
center_in_box(protein), # 2. Center protein in box
fit_rot_trans(protein, ref_protein) # 3. Align to reference
]
u.trajectory.add_transformations(*workflow)
workflow = [
unwrap(protein), # 1. 展开蛋白质
center_in_box(protein), # 2. 将蛋白质居中到盒子
fit_rot_trans(protein, ref_protein) # 3. 对齐到参考结构
]
u.trajectory.add_transformations(*workflow)
All transformations applied automatically when iterating
遍历轨迹时会自动应用所有变换
for ts in u.trajectory[:10]:
# Protein is now unwrapped, centered, and aligned
rmsd_value = mda.analysis.rms.rmsd(protein.positions, ref_protein.positions)
print(f"Frame {ts.frame}: RMSD = {rmsd_value:.2f} Å")
undefinedfor ts in u.trajectory[:10]:
# 蛋白质现在已展开、居中并对齐
rmsd_value = mda.analysis.rms.rmsd(protein.positions, ref_protein.positions)
print(f"帧 {ts.frame}: RMSD = {rmsd_value:.2f} Å")
undefinedAdvanced Analysis
高级分析
Principal Component Analysis (PCA)
主成分分析(PCA)
python
import MDAnalysis as mda
from MDAnalysis.analysis import pca
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.analysis import pca
import numpy as np
import matplotlib.pyplot as plt
u = mda.Universe('topology.pdb', 'trajectory.dcd')Select atoms for PCA (usually CA atoms)
选择用于PCA的原子(通常是CA原子)
ca_atoms = u.select_atoms('protein and name CA')
ca_atoms = u.select_atoms('protein and name CA')
Perform PCA
执行PCA
pc = pca.PCA(u, select='protein and name CA', align=True)
pc.run()
pc = pca.PCA(u, select='protein and name CA', align=True)
pc.run()
Extract results
提取结果
n_pcs = min(10, pc.results.n_components)
variance = pc.results.variance[:n_pcs]
cumulative_variance = pc.results.cumulative_variance[:n_pcs]
n_pcs = min(10, pc.results.n_components)
variance = pc.results.variance[:n_pcs]
cumulative_variance = pc.results.cumulative_variance[:n_pcs]
Plot variance explained
绘制方差解释率
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
ax1.bar(range(1, n_pcs + 1), variance)
ax1.set_xlabel('Principal Component')
ax1.set_ylabel('Variance')
ax1.set_title('Variance per PC')
ax2.plot(range(1, n_pcs + 1), cumulative_variance, marker='o')
ax2.set_xlabel('Number of PCs')
ax2.set_ylabel('Cumulative Variance')
ax2.set_title('Cumulative Variance Explained')
ax2.grid(True)
plt.tight_layout()
plt.savefig('pca_variance.png', dpi=300)
print(f"Variance explained by first 3 PCs: {cumulative_variance[2]:.1%}")
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
ax1.bar(range(1, n_pcs + 1), variance)
ax1.set_xlabel('主成分')
ax1.set_ylabel('方差')
ax1.set_title('各主成分的方差')
ax2.plot(range(1, n_pcs + 1), cumulative_variance, marker='o')
ax2.set_xlabel('主成分数量')
ax2.set_ylabel('累计方差解释率')
ax2.set_title('累计方差解释率')
ax2.grid(True)
plt.tight_layout()
plt.savefig('pca_variance.png', dpi=300)
print(f"前3个主成分解释的方差: {cumulative_variance[2]:.1%}")
Project trajectory onto PC1 and PC2
将轨迹投影到PC1和PC2
projection = pc.transform(ca_atoms, n_components=2)
plt.figure(figsize=(8, 8))
plt.scatter(projection[:, 0], projection[:, 1], c=range(len(projection)),
cmap='viridis', alpha=0.6)
plt.colorbar(label='Frame')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('Trajectory in PC1-PC2 Space')
plt.savefig('pca_projection.png', dpi=300)
projection = pc.transform(ca_atoms, n_components=2)
plt.figure(figsize=(8, 8))
plt.scatter(projection[:, 0], projection[:, 1], c=range(len(projection)),
cmap='viridis', alpha=0.6)
plt.colorbar(label='帧')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('轨迹在PC1-PC2空间的投影')
plt.savefig('pca_projection.png', dpi=300)
Identify frames representing different conformational states
识别代表不同构象状态的帧
Cluster in PC space or select extremes
在PC空间聚类或选择极端值
pc1_min_frame = np.argmin(projection[:, 0])
pc1_max_frame = np.argmax(projection[:, 0])
print(f"PC1 minimum conformation: frame {pc1_min_frame}")
print(f"PC1 maximum conformation: frame {pc1_max_frame}")
undefinedpc1_min_frame = np.argmin(projection[:, 0])
pc1_max_frame = np.argmax(projection[:, 0])
print(f"PC1最小构象对应的帧: {pc1_min_frame}")
print(f"PC1最大构象对应的帧: {pc1_max_frame}")
undefinedDensity Analysis
密度分析
python
import MDAnalysis as mda
from MDAnalysis.analysis import density
import numpy as np
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
from MDAnalysis.analysis import density
import numpy as np
u = mda.Universe('topology.pdb', 'trajectory.dcd')Calculate water density around protein
计算蛋白质周围的水密度
water = u.select_atoms('resname WAT and name O')
water = u.select_atoms('resname WAT and name O')
Create density grid
创建密度网格
dens = density.DensityAnalysis(water, delta=1.0) # 1 Angstrom grid
dens.run()
dens = density.DensityAnalysis(water, delta=1.0) # 1埃网格
dens.run()
Access density grid
获取密度网格
density_grid = dens.results.density
print(f"Density grid shape: {density_grid.grid.shape}")
print(f"Mean density: {np.mean(density_grid.grid):.4f} atoms/ų")
print(f"Max density: {np.max(density_grid.grid):.4f} atoms/ų")
density_grid = dens.results.density
print(f"密度网格形状: {density_grid.grid.shape}")
print(f"平均密度: {np.mean(density_grid.grid):.4f} atoms/ų")
print(f"最大密度: {np.max(density_grid.grid):.4f} atoms/ų")
Export for visualization (can view in VMD, PyMOL, ChimeraX)
导出用于可视化(可在VMD、PyMOL、ChimeraX中查看)
density_grid.export('water_density.dx', type='double')
print("Density exported to water_density.dx")
density_grid.export('water_density.dx', type='double')
print("密度已导出到water_density.dx")
Find high-density regions (binding sites)
寻找高密度区域(结合位点)
threshold = np.mean(density_grid.grid) + 2 * np.std(density_grid.grid)
high_density_points = np.where(density_grid.grid > threshold)
print(f"Number of high-density voxels: {len(high_density_points[0])}")
undefinedthreshold = np.mean(density_grid.grid) + 2 * np.std(density_grid.grid)
high_density_points = np.where(density_grid.grid > threshold)
print(f"高密度体素数量: {len(high_density_points[0])}")
undefinedNative Contacts (Q Analysis)
天然接触(Q分析)
python
import MDAnalysis as mda
from MDAnalysis.analysis.contacts import Contacts
import numpy as np
import matplotlib.pyplot as pltpython
import MDAnalysis as mda
from MDAnalysis.analysis.contacts import Contacts
import numpy as np
import matplotlib.pyplot as pltLoad native structure and trajectory
加载天然结构和轨迹
native = mda.Universe('native.pdb')
u = mda.Universe('topology.pdb', 'trajectory.dcd')
native = mda.Universe('native.pdb')
u = mda.Universe('topology.pdb', 'trajectory.dcd')
Get native contacts (CA atoms within 8 Angstroms)
获取天然接触(距离小于8埃的CA原子对)
native_ca = native.select_atoms('protein and name CA')
native_distances = mda.analysis.distances.distance_array(
native_ca.positions,
native_ca.positions
)
native_ca = native.select_atoms('protein and name CA')
native_distances = mda.analysis.distances.distance_array(
native_ca.positions,
native_ca.positions
)
Define native contacts
定义天然接触
cutoff = 8.0
native_contacts = np.where((native_distances < cutoff) &
(native_distances > 0))
n_native_contacts = len(native_contacts[0]) // 2
print(f"Number of native contacts: {n_native_contacts}")
cutoff = 8.0
native_contacts = np.where((native_distances < cutoff) &
(native_distances > 0))
n_native_contacts = len(native_contacts[0]) // 2
print(f"天然接触对数量: {n_native_contacts}")
Calculate Q (fraction of native contacts) over trajectory
计算随时间变化的Q值(天然接触保留比例)
q_values = []
mobile_ca = u.select_atoms('protein and name CA')
for ts in u.trajectory:
# Calculate current distances
current_distances = mda.analysis.distances.distance_array(
mobile_ca.positions,
mobile_ca.positions
)
# Count how many native contacts are still present
maintained_contacts = current_distances[native_contacts] < cutoff
q = np.sum(maintained_contacts) / n_native_contacts
q_values.append(q)q_values = np.array(q_values)
time = np.arange(len(q_values)) * u.trajectory.dt
q_values = []
mobile_ca = u.select_atoms('protein and name CA')
for ts in u.trajectory:
# 计算当前距离
current_distances = mda.analysis.distances.distance_array(
mobile_ca.positions,
mobile_ca.positions
)
# 统计仍保留的天然接触对数量
maintained_contacts = current_distances[native_contacts] < cutoff
q = np.sum(maintained_contacts) / n_native_contacts
q_values.append(q)q_values = np.array(q_values)
time = np.arange(len(q_values)) * u.trajectory.dt
Plot
绘图
plt.figure(figsize=(10, 6))
plt.plot(time, q_values)
plt.xlabel('Time (ps)')
plt.ylabel('Q (Fraction of Native Contacts)')
plt.title('Native Contacts Conservation')
plt.ylim(0, 1)
plt.grid(True)
plt.savefig('native_contacts.png', dpi=300)
print(f"Mean Q: {np.mean(q_values):.2f}")
print(f"Q at final frame: {q_values[-1]:.2f}")
undefinedplt.figure(figsize=(10, 6))
plt.plot(time, q_values)
plt.xlabel('时间 (ps)')
plt.ylabel('Q值(天然接触保留比例)')
plt.title('天然接触保留情况')
plt.ylim(0, 1)
plt.grid(True)
plt.savefig('native_contacts.png', dpi=300)
print(f"平均Q值: {np.mean(q_values):.2f}")
print(f"最后一帧Q值: {q_values[-1]:.2f}")
undefinedIntegration with Other Tools
与其他工具集成
Converting to Biopython
转换为Biopython格式
python
import MDAnalysis as mda
from Bio.PDB import PDBIO
from Bio.PDB.Structure import Structure
from Bio.PDB.Model import Model
from Bio.PDB.Chain import Chain
from Bio.PDB.Residue import Residue
from Bio.PDB.Atom import Atom
def mda_to_biopython(universe, frame=0):
"""
Convert MDAnalysis Universe to Biopython Structure.
Args:
universe: MDAnalysis Universe
frame: Frame number to convert
"""
# Go to specific frame
universe.trajectory[frame]
# Create Biopython structure
structure = Structure('structure')
model = Model(0)
structure.add(model)
# Get current segment
for seg in universe.segments:
chain = Chain(seg.segid if seg.segid else 'A')
model.add(chain)
for residue in seg.residues:
# Create residue
resname = residue.resname
resid = residue.resid
bio_residue = Residue((' ', resid, ' '), resname, '')
chain.add(bio_residue)
# Add atoms
for atom in residue.atoms:
bio_atom = Atom(
name=atom.name,
coord=atom.position,
bfactor=0.0,
occupancy=1.0,
altloc=' ',
fullname=atom.name,
serial_number=atom.ix,
element=atom.element if hasattr(atom, 'element') else atom.name[0]
)
bio_residue.add(bio_atom)
return structurepython
import MDAnalysis as mda
from Bio.PDB import PDBIO
from Bio.PDB.Structure import Structure
from Bio.PDB.Model import Model
from Bio.PDB.Chain import Chain
from Bio.PDB.Residue import Residue
from Bio.PDB.Atom import Atom
def mda_to_biopython(universe, frame=0):
"""
将MDAnalysis Universe转换为Biopython Structure。
参数:
universe: MDAnalysis Universe对象
frame: 要转换的帧索引
"""
# 跳转到指定帧
universe.trajectory[frame]
# 创建Biopython结构
structure = Structure('structure')
model = Model(0)
structure.add(model)
# 处理每个片段
for seg in universe.segments:
chain = Chain(seg.segid if seg.segid else 'A')
model.add(chain)
for residue in seg.residues:
# 创建残基
resname = residue.resname
resid = residue.resid
bio_residue = Residue((' ', resid, ' '), resname, '')
chain.add(bio_residue)
# 添加原子
for atom in residue.atoms:
bio_atom = Atom(
name=atom.name,
coord=atom.position,
bfactor=0.0,
occupancy=1.0,
altloc=' ',
fullname=atom.name,
serial_number=atom.ix,
element=atom.element if hasattr(atom, 'element') else atom.name[0]
)
bio_residue.add(bio_atom)
return structureExample usage
示例用法
u = mda.Universe('protein.pdb', 'trajectory.dcd')
u = mda.Universe('protein.pdb', 'trajectory.dcd')
Convert frame 100 to Biopython
将第100帧转换为Biopython格式
bio_structure = mda_to_biopython(u, frame=100)
bio_structure = mda_to_biopython(u, frame=100)
Save using Biopython
使用Biopython保存
io = PDBIO()
io.set_structure(bio_structure)
io.save('frame_100_biopython.pdb')
print("Converted to Biopython and saved")
undefinedio = PDBIO()
io.set_structure(bio_structure)
io.save('frame_100_biopython.pdb')
print("已转换为Biopython格式并保存")
undefinedExporting for VMD/PyMOL
导出用于VMD/PyMOL
python
import MDAnalysis as mda
u = mda.Universe('topology.pdb', 'trajectory.dcd')python
import MDAnalysis as mda
u = mda.Universe('topology.pdb', 'trajectory.dcd')Select specific atoms
选择特定原子
protein = u.select_atoms('protein')
important_residues = u.select_atoms('resid 10 20 30 40 50')
protein = u.select_atoms('protein')
important_residues = u.select_atoms('resid 10 20 30 40 50')
Write selection to PDB
将选择结果写入PDB文件
protein.write('protein_only.pdb')
important_residues.write('important_residues.pdb')
protein.write('protein_only.pdb')
important_residues.write('important_residues.pdb')
Write trajectory (DCD format for VMD)
写入轨迹(VMD支持的DCD格式)
with mda.Writer('protein_trajectory.dcd', protein.n_atoms) as W:
for ts in u.trajectory:
W.write(protein)
with mda.Writer('protein_trajectory.dcd', protein.n_atoms) as W:
for ts in u.trajectory:
W.write(protein)
Write specific frames
写入特定帧
frames_to_save = [0, 50, 100, 150, 200]
with mda.Writer('key_frames.pdb', protein.n_atoms) as W:
for frame_idx in frames_to_save:
u.trajectory[frame_idx]
W.write(protein)
print("Exported files for visualization")
undefinedframes_to_save = [0, 50, 100, 150, 200]
with mda.Writer('key_frames.pdb', protein.n_atoms) as W:
for frame_idx in frames_to_save:
u.trajectory[frame_idx]
W.write(protein)
print("已导出用于可视化的文件")
undefinedParallel Processing with Dask
使用Dask并行处理
python
import MDAnalysis as mda
from MDAnalysis.analysis.rms import RMSD
from dask import delayed, compute
import numpy as np
def calculate_rmsd_chunk(topology, trajectory_chunk, reference):
"""
Calculate RMSD for a chunk of trajectory.
Args:
topology: Topology file
trajectory_chunk: List of trajectory files
reference: Reference Universe
"""
u = mda.Universe(topology, *trajectory_chunk)
mobile = u.select_atoms('backbone')
ref = reference.select_atoms('backbone')
R = RMSD(mobile, ref, select='backbone')
R.run()
return R.results.rmsd[:, 2]python
import MDAnalysis as mda
from MDAnalysis.analysis.rms import RMSD
from dask import delayed, compute
import numpy as np
def calculate_rmsd_chunk(topology, trajectory_chunk, reference):
"""
计算轨迹片段的RMSD。
参数:
topology: 拓扑文件路径
trajectory_chunk: 轨迹文件列表
reference: 参考Universe对象
"""
u = mda.Universe(topology, *trajectory_chunk)
mobile = u.select_atoms('backbone')
ref = reference.select_atoms('backbone')
R = RMSD(mobile, ref, select='backbone')
R.run()
return R.results.rmsd[:, 2]Split trajectory into chunks
将轨迹分割为片段
topology = 'topology.pdb'
trajectories = [f'traj_part{i}.dcd' for i in range(10)]
reference = mda.Universe('reference.pdb')
topology = 'topology.pdb'
trajectories = [f'traj_part{i}.dcd' for i in range(10)]
reference = mda.Universe('reference.pdb')
Create delayed tasks
分割为多个片段
chunk_size = 2
chunks = [trajectories[i:i+chunk_size] for i in range(0, len(trajectories), chunk_size)]
tasks = [delayed(calculate_rmsd_chunk)(topology, chunk, reference)
for chunk in chunks]
chunk_size = 2
chunks = [trajectories[i:i+chunk_size] for i in range(0, len(trajectories), chunk_size)]
tasks = [delayed(calculate_rmsd_chunk)(topology, chunk, reference)
for chunk in chunks]
Compute in parallel
并行计算
results = compute(*tasks)
results = compute(*tasks)
Combine results
合并结果
rmsd_all = np.concatenate(results)
print(f"Total frames processed: {len(rmsd_all)}")
print(f"Mean RMSD: {np.mean(rmsd_all):.2f} Å")
undefinedrmsd_all = np.concatenate(results)
print(f"处理的总帧数: {len(rmsd_all)}")
print(f"平均RMSD: {np.mean(rmsd_all):.2f} Å")
undefinedPractical Workflows
实用工作流
Complete Protein Stability Analysis
完整蛋白质稳定性分析
python
import MDAnalysis as mda
from MDAnalysis.analysis import rms, align
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
import matplotlib.pyplot as plt
def analyze_protein_stability(topology, trajectory, output_prefix='analysis'):
"""
Complete workflow for protein stability analysis.
Analyzes:
- RMSD (backbone, all-atom)
- RMSF per residue
- Radius of gyration
- Hydrogen bonds
- Secondary structure retention (if possible)
"""
print("Loading trajectory...")
u = mda.Universe(topology, trajectory)
# Create reference (first frame)
reference = u.copy()
reference.trajectory[0]
results = {}
# 1. RMSD Analysis
print("Calculating RMSD...")
protein = u.select_atoms('protein')
ref_protein = reference.select_atoms('protein')
# Backbone RMSD
rmsd_bb = rms.RMSD(protein, ref_protein, select='backbone')
rmsd_bb.run()
# All-atom RMSD
rmsd_all = rms.RMSD(protein, ref_protein, select='protein')
rmsd_all.run()
results['rmsd_backbone'] = rmsd_bb.results.rmsd[:, 2]
results['rmsd_allatom'] = rmsd_all.results.rmsd[:, 2]
# 2. RMSF Analysis
print("Calculating RMSF...")
ca_atoms = u.select_atoms('protein and name CA')
rmsf_analysis = rms.RMSF(ca_atoms)
rmsf_analysis.run()
results['rmsf'] = rmsf_analysis.results.rmsf
results['residue_ids'] = [atom.resid for atom in ca_atoms]
# 3. Radius of Gyration
print("Calculating Radius of Gyration...")
rg_values = []
for ts in u.trajectory:
rg_values.append(protein.radius_of_gyration())
results['radius_gyration'] = np.array(rg_values)
# 4. Hydrogen Bonds
print("Analyzing hydrogen bonds...")
hbonds = HydrogenBondAnalysis(
universe=u,
donors_sel='protein',
hydrogens_sel='protein',
acceptors_sel='protein'
)
hbonds.run()
# Count H-bonds per frame
hbond_counts = []
for frame in range(len(u.trajectory)):
n_hbonds = len(hbonds.results.hbonds[
hbonds.results.hbonds[:, 0] == frame
])
hbond_counts.append(n_hbonds)
results['hbond_counts'] = np.array(hbond_counts)
# Generate plots
print("Generating plots...")
time = np.arange(len(u.trajectory)) * u.trajectory.dt
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# RMSD
axes[0, 0].plot(time, results['rmsd_backbone'], label='Backbone')
axes[0, 0].plot(time, results['rmsd_allatom'], label='All-atom', alpha=0.7)
axes[0, 0].set_xlabel('Time (ps)')
axes[0, 0].set_ylabel('RMSD (Å)')
axes[0, 0].set_title('RMSD over time')
axes[0, 0].legend()
axes[0, 0].grid(True)
# RMSF
axes[0, 1].plot(results['residue_ids'], results['rmsf'])
axes[0, 1].set_xlabel('Residue Number')
axes[0, 1].set_ylabel('RMSF (Å)')
axes[0, 1].set_title('Per-residue Fluctuations')
axes[0, 1].grid(True)
# Radius of Gyration
axes[1, 0].plot(time, results['radius_gyration'])
axes[1, 0].set_xlabel('Time (ps)')
axes[1, 0].set_ylabel('Rg (Å)')
axes[1, 0].set_title('Radius of Gyration')
axes[1, 0].grid(True)
# Hydrogen Bonds
axes[1, 1].plot(time, results['hbond_counts'])
axes[1, 1].set_xlabel('Time (ps)')
axes[1, 1].set_ylabel('Number of H-bonds')
axes[1, 1].set_title('Intramolecular Hydrogen Bonds')
axes[1, 1].grid(True)
plt.tight_layout()
plt.savefig(f'{output_prefix}_stability.png', dpi=300)
print(f"Saved plot: {output_prefix}_stability.png")
# Print summary
print("\n" + "="*60)
print("STABILITY ANALYSIS SUMMARY")
print("="*60)
print(f"Trajectory length: {len(u.trajectory)} frames ({time[-1]:.1f} ps)")
print(f"\nRMSD (backbone):")
print(f" Mean: {np.mean(results['rmsd_backbone']):.2f} Å")
print(f" Max: {np.max(results['rmsd_backbone']):.2f} Å")
print(f" Final: {results['rmsd_backbone'][-1]:.2f} Å")
print(f"\nRMSF:")
print(f" Mean: {np.mean(results['rmsf']):.2f} Å")
print(f" Most flexible residue: {results['residue_ids'][np.argmax(results['rmsf'])]}")
print(f"\nRadius of Gyration:")
print(f" Mean: {np.mean(results['radius_gyration']):.2f} Å")
print(f" Std: {np.std(results['radius_gyration']):.2f} Å")
print(f"\nHydrogen Bonds:")
print(f" Mean: {np.mean(results['hbond_counts']):.1f}")
print(f" Range: {np.min(results['hbond_counts'])}-{np.max(results['hbond_counts'])}")
# Stability assessment
print("\n" + "="*60)
print("STABILITY ASSESSMENT")
print("="*60)
if np.mean(results['rmsd_backbone']) < 2.0:
print("✓ Structure is STABLE (low RMSD)")
elif np.mean(results['rmsd_backbone']) < 4.0:
print("⚠ Structure shows MODERATE fluctuations")
else:
print("✗ Structure is UNSTABLE (high RMSD)")
if np.std(results['radius_gyration']) < 1.0:
print("✓ Protein maintains COMPACT structure")
else:
print("⚠ Protein shows conformational EXPANSION/CONTRACTION")
return resultspython
import MDAnalysis as mda
from MDAnalysis.analysis import rms, align
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
import matplotlib.pyplot as plt
def analyze_protein_stability(topology, trajectory, output_prefix='analysis'):
"""
完整的蛋白质稳定性分析工作流。
分析内容:
- RMSD(主链、全原子)
- 残基水平RMSF
- 回转半径
- 氢键
- 二级结构保留(如果支持)
"""
print("加载轨迹...")
u = mda.Universe(topology, trajectory)
# 创建参考结构(第一帧)
reference = u.copy()
reference.trajectory[0]
results = {}
# 1. RMSD分析
print("计算RMSD...")
protein = u.select_atoms('protein')
ref_protein = reference.select_atoms('protein')
# 主链RMSD
rmsd_bb = rms.RMSD(protein, ref_protein, select='backbone')
rmsd_bb.run()
# 全原子RMSD
rmsd_all = rms.RMSD(protein, ref_protein, select='protein')
rmsd_all.run()
results['rmsd_backbone'] = rmsd_bb.results.rmsd[:, 2]
results['rmsd_allatom'] = rmsd_all.results.rmsd[:, 2]
# 2. RMSF分析
print("计算RMSF...")
ca_atoms = u.select_atoms('protein and name CA')
rmsf_analysis = rms.RMSF(ca_atoms)
rmsf_analysis.run()
results['rmsf'] = rmsf_analysis.results.rmsf
results['residue_ids'] = [atom.resid for atom in ca_atoms]
# 3. 回转半径
print("计算回转半径...")
rg_values = []
for ts in u.trajectory:
rg_values.append(protein.radius_of_gyration())
results['radius_gyration'] = np.array(rg_values)
# 4. 氢键分析
print("分析氢键...")
hbonds = HydrogenBondAnalysis(
universe=u,
donors_sel='protein',
hydrogens_sel='protein',
acceptors_sel='protein'
)
hbonds.run()
# 统计每帧的氢键数量
hbond_counts = []
for frame in range(len(u.trajectory)):
n_hbonds = len(hbonds.results.hbonds[
hbonds.results.hbonds[:, 0] == frame
])
hbond_counts.append(n_hbonds)
results['hbond_counts'] = np.array(hbond_counts)
# 生成绘图
print("生成绘图...")
time = np.arange(len(u.trajectory)) * u.trajectory.dt
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# RMSD图
axes[0, 0].plot(time, results['rmsd_backbone'], label='主链')
axes[0, 0].plot(time, results['rmsd_allatom'], label='全原子', alpha=0.7)
axes[0, 0].set_xlabel('时间 (ps)')
axes[0, 0].set_ylabel('RMSD (Å)')
axes[0, 0].set_title('RMSD随时间变化')
axes[0, 0].legend()
axes[0, 0].grid(True)
# RMSF图
axes[0, 1].plot(results['residue_ids'], results['rmsf'])
axes[0, 1].set_xlabel('残基编号')
axes[0, 1].set_ylabel('RMSF (Å)')
axes[0, 1].set_title('残基水平波动')
axes[0, 1].grid(True)
# 回转半径图
axes[1, 0].plot(time, results['radius_gyration'])
axes[1, 0].set_xlabel('时间 (ps)')
axes[1, 0].set_ylabel('Rg (Å)')
axes[1, 0].set_title('回转半径')
axes[1, 0].grid(True)
# 氢键图
axes[1, 1].plot(time, results['hbond_counts'])
axes[1, 1].set_xlabel('时间 (ps)')
axes[1, 1].set_ylabel('氢键数量')
axes[1, 1].set_title('分子内氢键')
axes[1, 1].grid(True)
plt.tight_layout()
plt.savefig(f'{output_prefix}_stability.png', dpi=300)
print(f"绘图已保存: {output_prefix}_stability.png")
# 打印总结
print("\n" + "="*60)
print("稳定性分析总结")
print("="*60)
print(f"轨迹长度: {len(u.trajectory)} 帧({time[-1]:.1f} ps)")
print(f"\nRMSD(主链):")
print(f" 平均值: {np.mean(results['rmsd_backbone']):.2f} Å")
print(f" 最大值: {np.max(results['rmsd_backbone']):.2f} Å")
print(f" 最后一帧值: {results['rmsd_backbone'][-1]:.2f} Å")
print(f"\nRMSF:")
print(f" 平均值: {np.mean(results['rmsf']):.2f} Å")
print(f" 最柔性残基: {results['residue_ids'][np.argmax(results['rmsf'])]}")
print(f"\n回转半径:")
print(f" 平均值: {np.mean(results['radius_gyration']):.2f} Å")
print(f" 标准差: {np.std(results['radius_gyration']):.2f} Å")
print(f"\n氢键:")
print(f" 平均值: {np.mean(results['hbond_counts']):.1f}")
print(f" 范围: {np.min(results['hbond_counts'])}-{np.max(results['hbond_counts'])}")
# 稳定性评估
print("\n" + "="*60)
print("稳定性评估")
print("="*60)
if np.mean(results['rmsd_backbone']) < 2.0:
print("✓ 结构稳定(RMSD低)")
elif np.mean(results['rmsd_backbone']) < 4.0:
print("⚠ 结构存在中等波动")
else:
print("✗ 结构不稳定(RMSD高)")
if np.std(results['radius_gyration']) < 1.0:
print("✓ 蛋白质保持紧凑结构")
else:
print("⚠ 蛋白质存在构象扩张/收缩")
return resultsExample usage
示例用法
results = analyze_protein_stability(
'topology.pdb',
'trajectory.dcd',
output_prefix='protein'
)
undefinedresults = analyze_protein_stability(
'topology.pdb',
'trajectory.dcd',
output_prefix='protein'
)
undefinedProtein-Ligand Binding Analysis
蛋白质-配体结合分析
python
import MDAnalysis as mda
from MDAnalysis.analysis import distances, contacts
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
import matplotlib.pyplot as plt
def analyze_protein_ligand_binding(topology, trajectory,
ligand_resname='LIG',
binding_site_resids=None):
"""
Comprehensive protein-ligand binding analysis.
Args:
topology: Topology file
trajectory: Trajectory file
ligand_resname: Residue name of ligand
binding_site_resids: List of binding site residue IDs
"""
print("Loading complex...")
u = mda.Universe(topology, trajectory)
protein = u.select_atoms('protein')
ligand = u.select_atoms(f'resname {ligand_resname}')
if len(ligand) == 0:
raise ValueError(f"No ligand found with resname {ligand_resname}")
print(f"Found ligand with {len(ligand)} atoms")
results = {}
# 1. Ligand RMSD
print("Calculating ligand RMSD...")
from MDAnalysis.analysis.rms import RMSD
ligand_rmsd = RMSD(ligand, ligand, select=f'resname {ligand_resname}',
ref_frame=0)
ligand_rmsd.run()
results['ligand_rmsd'] = ligand_rmsd.results.rmsd[:, 2]
# 2. Protein-Ligand Distance
print("Calculating protein-ligand distance...")
if binding_site_resids:
binding_site = u.select_atoms(f'protein and resid {" ".join(map(str, binding_site_resids))}')
else:
binding_site = protein
min_distances = []
for ts in u.trajectory:
dist_array = distances.distance_array(
ligand.center_of_mass()[np.newaxis, :],
binding_site.positions,
box=u.dimensions
)
min_distances.append(np.min(dist_array))
results['min_distance'] = np.array(min_distances)
# 3. Protein-Ligand Contacts
print("Analyzing contacts...")
ca = contacts.Contacts(
u,
selection=(ligand, protein),
refgroup=(ligand, protein),
radius=4.5
)
ca.run()
results['contacts'] = ca.results.timeseries[:, 1]
# 4. Protein-Ligand H-bonds
print("Analyzing hydrogen bonds...")
# Ligand as donor
hb_lig_don = HydrogenBondAnalysis(
universe=u,
donors_sel=f'resname {ligand_resname}',
hydrogens_sel=f'resname {ligand_resname}',
acceptors_sel='protein'
)
hb_lig_don.run()
# Protein as donor
hb_prot_don = HydrogenBondAnalysis(
universe=u,
donors_sel='protein',
hydrogens_sel='protein',
acceptors_sel=f'resname {ligand_resname}'
)
hb_prot_don.run()
# Combine
hbond_counts = []
for frame in range(len(u.trajectory)):
n1 = len(hb_lig_don.results.hbonds[hb_lig_don.results.hbonds[:, 0] == frame])
n2 = len(hb_prot_don.results.hbonds[hb_prot_don.results.hbonds[:, 0] == frame])
hbond_counts.append(n1 + n2)
results['hbonds'] = np.array(hbond_counts)
# Plot results
print("Generating plots...")
time = np.arange(len(u.trajectory)) * u.trajectory.dt
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Ligand RMSD
axes[0, 0].plot(time, results['ligand_rmsd'])
axes[0, 0].set_xlabel('Time (ps)')
axes[0, 0].set_ylabel('Ligand RMSD (Å)')
axes[0, 0].set_title('Ligand Stability')
axes[0, 0].grid(True)
# Distance to binding site
axes[0, 1].plot(time, results['min_distance'])
axes[0, 1].set_xlabel('Time (ps)')
axes[0, 1].set_ylabel('Min Distance (Å)')
axes[0, 1].set_title('Ligand-Protein Proximity')
axes[0, 1].grid(True)
# Contacts
axes[1, 0].plot(time, results['contacts'])
axes[1, 0].set_xlabel('Time (ps)')
axes[1, 0].set_ylabel('Number of Contacts')
axes[1, 0].set_title('Protein-Ligand Contacts (< 4.5 Å)')
axes[1, 0].grid(True)
# H-bonds
axes[1, 1].plot(time, results['hbonds'])
axes[1, 1].set_xlabel('Time (ps)')
axes[1, 1].set_ylabel('Number of H-bonds')
axes[1, 1].set_title('Protein-Ligand Hydrogen Bonds')
axes[1, 1].grid(True)
plt.tight_layout()
plt.savefig('protein_ligand_analysis.png', dpi=300)
# Summary
print("\n" + "="*60)
print("PROTEIN-LIGAND BINDING SUMMARY")
print("="*60)
print(f"\nLigand RMSD:")
print(f" Mean: {np.mean(results['ligand_rmsd']):.2f} Å")
print(f" Max: {np.max(results['ligand_rmsd']):.2f} Å")
print(f"\nLigand-Protein Distance:")
print(f" Mean: {np.mean(results['min_distance']):.2f} Å")
print(f" Min: {np.min(results['min_distance']):.2f} Å")
print(f"\nContacts:")
print(f" Mean: {np.mean(results['contacts']):.1f}")
print(f" Range: {np.min(results['contacts'])}-{np.max(results['contacts'])}")
print(f"\nHydrogen Bonds:")
print(f" Mean: {np.mean(results['hbonds']):.1f}")
print(f" Max: {np.max(results['hbonds'])}")
# Binding assessment
print("\n" + "="*60)
print("BINDING ASSESSMENT")
print("="*60)
if np.mean(results['ligand_rmsd']) < 2.0:
print("✓ Ligand remains STABLE in binding site")
else:
print("⚠ Ligand shows SIGNIFICANT movement")
if np.mean(results['contacts']) > 5:
print("✓ STRONG protein-ligand interactions")
elif np.mean(results['contacts']) > 2:
print("⚠ MODERATE protein-ligand interactions")
else:
print("✗ WEAK protein-ligand interactions")
if np.mean(results['hbonds']) >= 2:
print(f"✓ Ligand forms stable H-bonds ({np.mean(results['hbonds']):.1f} average)")
elif np.mean(results['hbonds']) >= 1:
print(f"⚠ Limited H-bonding ({np.mean(results['hbonds']):.1f} average)")
else:
print("✗ Minimal H-bonding interaction")
return resultspython
import MDAnalysis as mda
from MDAnalysis.analysis import distances, contacts
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
import numpy as np
import matplotlib.pyplot as plt
def analyze_protein_ligand_binding(topology, trajectory,
ligand_resname='LIG',
binding_site_resids=None):
"""
完整的蛋白质-配体结合分析工作流。
参数:
topology: 拓扑文件路径
trajectory: 轨迹文件路径
ligand_resname: 配体的残基名称
binding_site_resids: 结合位点残基ID列表
"""
print("加载复合物...")
u = mda.Universe(topology, trajectory)
protein = u.select_atoms('protein')
ligand = u.select_atoms(f'resname {ligand_resname}')
if len(ligand) == 0:
raise ValueError(f"未找到残基名称为 {ligand_resname} 的配体")
print(f"找到配体,包含 {len(ligand)} 个原子")
results = {}
# 1. 配体RMSD
print("计算配体RMSD...")
from MDAnalysis.analysis.rms import RMSD
ligand_rmsd = RMSD(ligand, ligand, select=f'resname {ligand_resname}',
ref_frame=0)
ligand_rmsd.run()
results['ligand_rmsd'] = ligand_rmsd.results.rmsd[:, 2]
# 2. 蛋白质-配体距离
print("计算蛋白质-配体距离...")
if binding_site_resids:
binding_site = u.select_atoms(f'protein and resid {" ".join(map(str, binding_site_resids))}')
else:
binding_site = protein
min_distances = []
for ts in u.trajectory:
dist_array = distances.distance_array(
ligand.center_of_mass()[np.newaxis, :],
binding_site.positions,
box=u.dimensions
)
min_distances.append(np.min(dist_array))
results['min_distance'] = np.array(min_distances)
# 3. 蛋白质-配体接触对
print("分析接触对...")
ca = contacts.Contacts(
u,
selection=(ligand, protein),
refgroup=(ligand, protein),
radius=4.5
)
ca.run()
results['contacts'] = ca.results.timeseries[:, 1]
# 4. 蛋白质-配体氢键
print("分析氢键...")
# 配体作为供体
hb_lig_don = HydrogenBondAnalysis(
universe=u,
donors_sel=f'resname {ligand_resname}',
hydrogens_sel=f'resname {ligand_resname}',
acceptors_sel='protein'
)
hb_lig_don.run()
# 蛋白质作为供体
hb_prot_don = HydrogenBondAnalysis(
universe=u,
donors_sel='protein',
hydrogens_sel='protein',
acceptors_sel=f'resname {ligand_resname}'
)
hb_prot_don.run()
# 合并结果
hbond_counts = []
for frame in range(len(u.trajectory)):
n1 = len(hb_lig_don.results.hbonds[hb_lig_don.results.hbonds[:, 0] == frame])
n2 = len(hb_prot_don.results.hbonds[hb_prot_don.results.hbonds[:, 0] == frame])
hbond_counts.append(n1 + n2)
results['hbonds'] = np.array(hbond_counts)
# 生成绘图
print("生成绘图...")
time = np.arange(len(u.trajectory)) * u.trajectory.dt
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# 配体RMSD图
axes[0, 0].plot(time, results['ligand_rmsd'])
axes[0, 0].set_xlabel('时间 (ps)')
axes[0, 0].set_ylabel('配体RMSD (Å)')
axes[0, 0].set_title('配体稳定性')
axes[0, 0].grid(True)
# 配体-蛋白质距离图
axes[0, 1].plot(time, results['min_distance'])
axes[0, 1].set_xlabel('时间 (ps)')
axes[0, 1].set_ylabel('最小距离 (Å)')
axes[0, 1].set_title('配体-蛋白质接近程度')
axes[0, 1].grid(True)
# 接触对图
axes[1, 0].plot(time, results['contacts'])
axes[1, 0].set_xlabel('时间 (ps)')
axes[1, 0].set_ylabel('接触对数量')
axes[1, 0].set_title('蛋白质-配体接触对(< 4.5 Å)')
axes[1, 0].grid(True)
# 氢键图
axes[1, 1].plot(time, results['hbonds'])
axes[1, 1].set_xlabel('时间 (ps)')
axes[1, 1].set_ylabel('氢键数量')
axes[1, 1].set_title('蛋白质-配体氢键')
axes[1, 1].grid(True)
plt.tight_layout()
plt.savefig('protein_ligand_analysis.png', dpi=300)
# 打印总结
print("\n" + "="*60)
print("蛋白质-配体结合分析总结")
print("="*60)
print(f"\n配体RMSD:")
print(f" 平均值: {np.mean(results['ligand_rmsd']):.2f} Å")
print(f" 最大值: {np.max(results['ligand_rmsd']):.2f} Å")
print(f"\n配体-蛋白质距离:")
print(f" 平均值: {np.mean(results['min_distance']):.2f} Å")
print(f" 最小值: {np.min(results['min_distance']):.2f} Å")
print(f"\n接触对:")
print(f" 平均值: {np.mean(results['contacts']):.1f}")
print(f" 范围: {np.min(results['contacts'])}-{np.max(results['contacts'])}")
print(f"\n氢键:")
print(f" 平均值: {np.mean(results['hbonds']):.1f}")
print(f" 最大值: {np.max(results['hbonds'])}")
# 结合评估
print("\n" + "="*60)
print("结合评估")
print("="*60)
if np.mean(results['ligand_rmsd']) < 2.0:
print("✓ 配体在结合位点保持稳定")
else:
print("⚠ 配体存在显著移动")
if np.mean(results['contacts']) > 5:
print("✓ 蛋白质-配体相互作用强")
elif np.mean(results['contacts']) > 2:
print("⚠ 蛋白质-配体相互作用中等")
else:
print("✗ 蛋白质-配体相互作用弱")
if np.mean(results['hbonds']) >= 2:
print(f"✓ 配体形成稳定氢键(平均 {np.mean(results['hbonds']):.1f} 个)")
elif np.mean(results['hbonds']) >= 1:
print(f"⚠ 氢键形成有限(平均 {np.mean(results['hbonds']):.1f} 个)")
else:
print("✗ 氢键相互作用极少")
return resultsExample usage
示例用法
results = analyze_protein_ligand_binding(
'complex.pdb',
'trajectory.dcd',
ligand_resname='LIG',
binding_site_resids=[45, 46, 89, 92, 115, 118]
)
This comprehensive MDAnalysis guide covers 50+ examples for molecular dynamics trajectory analysis!results = analyze_protein_ligand_binding(
'complex.pdb',
'trajectory.dcd',
ligand_resname='LIG',
binding_site_resids=[45, 46, 89, 92, 115, 118]
)
这份全面的MDAnalysis指南涵盖了50多个分子动力学轨迹分析示例!