Understanding Molecular Dynamics Simulations

The prediction of atomic and molecular properties via computational modelling is becoming more prominent in the field of chemistry. Molecular modelling methods offer an invaluable tool-set for investigating the physical basis of chemical phenomena and predicting the outcomes of chemical reactions, drug discovery and protein folding, amongst several other applications.

Molecular modelling encapsulates everything from quantum mechanical methods to classical force field-based approaches [1]. The choice of modelling method often hinges on the balance between the accuracy requirements of the study, and the computational resources available. This post will provide a succinct overview of molecular dynamics simulations, focusing on their principles, applications, and limitations.

Molecular Dynamics

Molecular dynamics (MD) is a powerful computational tool which is used extensively in understanding the dynamic behaviour of particles over time. MD simulations are grounded in the principles of Newtonian mechanics, using Newton’s equations of motion to capture a system of particles’ temporal evolution [2]. Such simulations are fundamentally designed to demonstrate how atoms interact within a system, allowing the calculation of various properties intrinsic to their interactions. With the aid of suitable visualisation software, these simulations provide valuable insights into atomic trajectories during the simulation process, illuminating the dynamics of molecular systems [3,4].

The resolution of MD simulations spans atomistic to coarse-grained methods. The choice between atomistic and coarse-grained simulations often hinges on the desired level of resolution and the specific properties one aims to extract from the simulations. Understanding these different approaches helps researchers select the most appropriate method for their specific research questions, balancing computational cost against the level of detail required.

Atomistic vs. Coarse-grained Modelling

Atomistic models operate within a femtosecond timestep, necessitated by the detailed calculations for positioning each atom and determining their interactions. However, the timestep can be extended by employing coarse-grained modelling techniques [5].

Coarse-grained models are an advantageous choice for simulating large systems, such as polymers, proteins and biomolecules [5,6,7,8]. By treating clusters of atoms as a single computational unit, characterised by average charges and collective mass, coarse-graining techniques sidestep the need for atom-by-atom modelling [9]. This approach significantly reduces the computational intensity of these simulations, as fewer calculations are required to construct a dynamic model of the system.

The increase in computational efficiency afforded by coarse-graining methods results in a loss of accuracy compared to analogous atomistic methods [10]. However, this does not suggest that coarse-graining methods lack utility. Numerous properties, such as radius of gyration, viscosity, miscibility, end-to-end distance, and microphase separation remain discernable through coarse-graining techniques [11,12,13]. Therefore, coarse-grained models continue to provide valuable chemical insights across a range of systems.

Statistical Mechanics Underpinning Molecular Modelling

To understand how MD simulations work and why they produce meaningful results, we must first explore the statistical mechanics framework that connects the microscopic behavior of individual atoms to the macroscopic properties we observe. This connection is essential because MD simulations don’t directly calculate bulk properties, instead, they generate trajectories of atomic motion that must be interpreted through statistical mechanics to extract meaningful observables.

The Hamiltonian in molecular dynamics serves as the cornerstone for linking microscale particle behaviour with macroscale system properties. It provides a complete description of the system’s energy, encoding both how fast particles are moving (kinetic energy) and how they interact with each other (potential energy). It is expressed as a sum of kinetic and potential energy terms:

\[H(\vec{r}, \vec{p}) = K(\vec{p}) + U(\vec{r})\]

where $K(\vec{p})$ represents the kinetic energy, typically formulated as $K = \frac{1}{2m} \sum_i \vec{p}_i^2$, summing over the momenta $\vec{p}_i$ of particles with mass $m$. The potential energy term, $U(\vec{r})$, is a function of the positional coordinates $\vec{r}$ and encapsulates the interparticle interactions.

Why does this matter? The Hamiltonian tells us the total energy of the system at any moment, but to understand what we’ll actually observe in an experiment, we need to know which states the system is likely to visit. This is where statistical mechanics comes in, it provides the framework for connecting the energy landscape described by the Hamiltonian to the probabilities of finding the system in different configurations.

The probability of a system residing in a particular state within the canonical ensemble is dictated by the Boltzmann distribution:

\[p(H) = \frac{e^{-\beta H}}{Z}\]

with $Z$ as the partition function and $\beta$ as the inverse temperature $(1/k_BT)$. This equation is fundamental because it tells us that states with lower energy are exponentially more probable, a principle that governs everything from chemical reactions to protein folding. The partition function $Z$ ensures that probabilities sum to one, acting as a normalization constant.

To compute macroscopic observables, the properties we actually measure in experiments, statistical averaging is performed over the Boltzmann-weighted microstates:

\[\langle A \rangle = \int A(\vec{r},\vec{p}) \frac{e^{-\beta H(\vec{r},\vec{p})}}{Z} d\vec{r} d\vec{p}\]

where $A(\vec{r},\vec{p})$ represents the microscopic value of an observable $A$. This equation shows that to calculate any macroscopic property, we need to average over all possible states, weighted by their Boltzmann probability. In practice, we can’t evaluate this integral analytically for complex systems, which is why MD simulations are so valuable, they provide a way to sample these states numerically.

The decomposition of the Hamiltonian into kinetic and potential energy components enables a separation of the probability distribution into kinetic and configurational parts. This separation is crucial because it allows us to treat the motion of particles (kinetic energy) separately from their spatial arrangement (potential energy), simplifying both theoretical analysis and computational implementation:

\[p(H) = p_\textrm{config}(\vec{r}) \cdot p_\textrm{kinetic}(\vec{p}) = \frac{e^{-\beta U(\vec{r})} e^{-\beta K(\vec{p})}}{\int e^{-\beta U(\vec{r})} d\vec{r} \int e^{-\beta K(\vec{p})} d\vec{p}}\]

The Maxwell-Boltzmann distribution for velocities is derived from the kinetic energy term:

\[p_\textrm{kinetic}(\vec{p}) = (2\pi m k_B T)^{-3/2} e^{-\beta \vec{p}^2/(2m)}\]

The configurational probability, dependent on the potential energy, is expressed as:

\[p_\textrm{config}(\vec{r}) = \frac{e^{-\beta U(\vec{r})}}{\int e^{-\beta U(\vec{r})} d\vec{r}}\]

Ensemble averages of observables are obtained by integrating over this configurational probability distribution:

\[\langle A \rangle = \int A(\vec{r}) \frac{e^{-\beta U(\vec{r})}}{Z} d\vec{r}\]

In practice, this is approximated via summation over sampled configurations:

\[\langle A \rangle \approx \frac{\sum_{i=1}^{N} A(\vec{r}_i) e^{-\beta U(\vec{r}_i)}}{\sum_{i=1}^{N} e^{-\beta U(\vec{r}_i)}}\]

where $\vec{r}_i$ represents the $i$-th sampled configuration, and $N$ is the total number of samples.

Considering the high-dimensional nature of molecular systems, uniform sampling of phase space is computationally prohibitive. Therefore, importance sampling is employed, where system configurations are sampled according to their Boltzmann probability, providing an efficient estimation of ensemble averages:

\[\langle A \rangle \approx \frac{1}{N} \sum_{i=1}^N A_i\]

where $A_i$ are values of $A$ in the sampled states. This is the key insight: if we can generate a trajectory that samples states according to their Boltzmann probability, we can estimate ensemble averages simply by averaging over the trajectory. Techniques such as molecular dynamics simulations, especially with enhanced sampling algorithms, are pivotal for effectively exploring the phase space of complex biomolecules following the Boltzmann distribution.

This connection between statistical mechanics and MD simulations is what makes the approach so powerful. By running a simulation that follows Newton’s laws, we naturally generate a trajectory that samples the Boltzmann distribution (under appropriate conditions), allowing us to extract macroscopic properties from microscopic simulations.

The Verlet Algorithm

Having established the statistical mechanics framework, we now need a practical method to actually evolve the system forward in time. The Verlet algorithm is often applied in MD simulations for numerically integrating Newton’s equations of motion. This algorithm is essential because while we know the forces acting on each atom (from the potential energy function), we need a reliable way to compute how positions and velocities change over time. Unlike analytical solutions, which are often impractical or impossible for complex systems with many interacting particles, numerical methods like the Verlet algorithm provide a way to simulate the dynamic behaviour of these systems over time.

Newton’s second law for atom $i$ is given by:

\[\mathbf{F}(\mathbf{r_{i}}) = m_{i}\mathbf{a_{i}} = m_{i}\frac{d^{2}\mathbf{r_{i}}(t)}{dt^{2}}\]

where $\mathbf{a_i}$ is the acceleration and $m_i$ represents its mass [9,2]. The force on each atom is related to the potential energy, $U(\mathbf{r_i})$ [14], by:

\[\mathbf{F}(\mathbf{r_{i}}) = -\nabla U(\mathbf{r_{i}})\]

To predict the future positions and velocities of atoms, the Verlet algorithm uses information about the current positions, velocities, and the forces acting on them. The Verlet integrator is a symplectic integrator, meaning it conserves the Hamiltonian of the system, which is crucial for accurately simulating physical systems over time. This conservation property ensures that energy doesn’t artificially drift during long simulations, maintaining the connection to the statistical mechanics framework we established earlier.

The Verlet algorithm is based on Taylor expansions for the position of atom $i$, which allows us to express future and past positions in terms of the current position and its derivatives:

\[\mathbf{r_{i}}(t + \delta t) = \mathbf{r_{i}}(t) + \frac{d\mathbf{r_{i}}(t)}{dt}\delta t + \frac{1}{2}\frac{d^{2}\mathbf{r_{i}}(t)}{dt^{2}}\delta t^{2} + \frac{1}{6}\frac{d^{3}\mathbf{r_{i}}(t)}{dt^{3}}\delta t^{3} + O(\delta t^{4})\] \[\mathbf{r_{i}}(t - \delta t) = \mathbf{r_{i}}(t) - \frac{d\mathbf{r_{i}}(t)}{dt}\delta t + \frac{1}{2}\frac{d^{2}\mathbf{r_{i}}(t)}{dt^{2}}\delta t^{2} - \frac{1}{6}\frac{d^{3}\mathbf{r_{i}}(t)}{dt^{3}}\delta t^{3} + O(\delta t^{4})\]

Combining these expansions gives the Verlet algorithm:

\[\mathbf{r_{i}}(t + \delta t) = 2\mathbf{r_{i}}(t) - \mathbf{r_{i}}(t - \delta t) + \frac{d^{2}\mathbf{r_{i}}(t)}{dt^{2}}\delta t^{2} + O(\delta t^{4})\]

Acceleration $\mathbf{a_{i}}(t)$ on each particle, $i$, is computed using the forces from the potential energy function:

\[\mathbf{a_{i}}(t) = \frac{d^{2}\mathbf{r_{i}}(t)}{dt^{2}} = -\frac{1}{m_{i}}\nabla U(\mathbf{r_{i}})\]

Velocity $\mathbf{v_i}(t)$ is calculated by:

\[\mathbf{v_i}(t) = \frac{\mathbf{r_{i}}(t + \delta t) - \mathbf{r_{i}}(t - \delta t)}{2\delta t}\]

The beauty of the Verlet algorithm lies in its simplicity and stability. By using the second derivative (acceleration) and avoiding explicit velocity calculations in the position update, it maintains excellent energy conservation properties. This makes it ideal for long MD simulations where we need to accurately sample the Boltzmann distribution over millions of timesteps.

Running MD Simulations

Now that we understand the theoretical foundations, how statistical mechanics connects microscopic motion to macroscopic properties, and how the Verlet algorithm evolves the system forward in time, we can explore the practical aspects of actually running an MD simulation. In MD simulations, the time evolution of particle positions and velocities is crucial. Initially, the atomic positions and velocities are set, often based on experimental data. The forces acting on each atom are calculated using a chosen forcefield [9]. From these forces, the accelerations and subsequently the new positions and velocities of the atoms are computed using the Verlet algorithm. This process repeats, treating the new positions and velocities as the starting point for the next iteration.

The typical workflow for running an MD simulation involves:

System Preparation: Setting up the initial coordinates of all atoms in the system, often from experimental structures or previous simulations.
Forcefield Selection: Choosing an appropriate forcefield that describes the interactions between atoms. Common forcefields include AMBER, CHARMM, and GROMOS for biomolecular systems.
Energy Minimization: Relaxing the initial structure to remove any steric clashes or unrealistic geometries before starting the dynamics.
Equilibration: Running the simulation under controlled conditions (e.g., constant temperature and pressure) to allow the system to reach equilibrium.
Production Run: The actual MD simulation where data is collected for analysis.
Analysis: Extracting properties of interest from the trajectory, such as root mean square deviation (RMSD), radius of gyration, or interaction energies.

The choice of timestep is critical in MD simulations. Too large a timestep can lead to numerical instabilities and energy drift, which breaks the connection to the statistical mechanics framework we established. If energy isn’t conserved, the simulation won’t properly sample the Boltzmann distribution, and our calculated properties will be incorrect. Conversely, too small a timestep makes simulations computationally expensive without providing additional accuracy. For atomistic simulations, timesteps are typically on the order of 1-2 femtoseconds, while coarse-grained simulations can use larger timesteps (picoseconds or longer) because the effective interactions are smoother.

Conclusion

Molecular dynamics simulations provide a powerful framework for understanding the dynamic behavior of molecular systems. By combining statistical mechanics principles with numerical integration methods like the Verlet algorithm, MD simulations enable researchers to explore the temporal evolution of complex systems that would be difficult or impossible to study experimentally. The choice between atomistic and coarse-grained approaches allows researchers to balance computational efficiency with the level of detail required for their specific research questions.

References

Dorsett, D. D. (2000). An overview of molecular modeling. Journal of Chemical Education, 77(11), 1401-1405.
Hollingsworth, S. A., & Dror, R. O. (2018). Molecular dynamics simulation for all. Neuron, 99(6), 1129-1143.
Plimpton, S. (1995). Fast parallel algorithms for short-range molecular dynamics. Journal of Computational Physics, 117(1), 1-19.
Stukowski, A. (2010). Visualization and analysis of atomistic simulation data with OVITO, the Open Visualization Tool. Modelling and Simulation in Materials Science and Engineering, 18(1), 015012.
Curtin, W. A., & Miller, R. E. (2003). Atomistic/continuum coupling in computational materials science. Modelling and Simulation in Materials Science and Engineering, 11(3), R33.
Sridhar, A., et al. (2020). Emergence of chromatin hierarchical loops from protein disorder and nucleosome asymmetry. Proceedings of the National Academy of Sciences, 117(13), 7216-7224.
Farr, S. E., et al. (2021). Nucleosome plasticity is a critical element of chromatin liquid, liquid phase separation and multivalent nucleosome interactions. Nature Communications, 12(1), 2883.
Collepardo-Guevara, R., et al. (2015). Chromatin unfolding by epigenetic modifications explained by dynamical impairment of histone, DNA interactions. Biophysical Journal, 108(9), 2241-2249.
Frenkel, D., & Smit, B. (2001). Understanding Molecular Simulation: From Algorithms to Applications. Academic Press.
Wang, S., et al. (2009). Comparative atomistic and coarse-grained study of water: what do we lose by coarse-graining? The European Physical Journal E, 28(2), 221-229.
Alarcón, F., et al. (2013). Coarse-grained model for the mechanical properties of multi-block copolymer melts. Macromolecules, 46(8), 3374-3384.
Ruiz, L., et al. (2018). Coarse-grained molecular dynamics simulations of phase transitions in symmetric diblock copolymer melts. Macromolecules, 51(4), 1408-1418.
Weyman, A., et al. (2018). Microphase separation in block copolymer thin films. Macromolecules, 51(15), 5886-5896.
Swope, W. C., et al. (1982). A computer simulation method for the calculation of equilibrium constants for the formation of physical clusters of molecules: Application to small water clusters. The Journal of Chemical Physics, 76(1), 637-649.