# New pre-print: Reaction dynamics of cyanohydrins

Recently posted our paper on the Reaction dynamics of cyanohydrins with hydrosulfide in water on the ArXiv : arXiv:1806.08841 (2018)

# Checking sampling distribution of velocities: the KS test

When generating multiple initial geometries or velocities, it is good practice to check that we are sampling in the correct way, i.e that we are uniformily sampling from the distribution we expect or desire to sample from.

In this post I will show an example where the Kolmogorov-Smirnov test (KS-test) is used to check that velocities are sampled uniformily from the Maxwell-Boltzmann distribution function.

From a practical point of view, the test consists in computing the maximum difference $\sf D^{max}_n$ between the cumulative of the probability distribution function of your n-point dataset $\sf C^{data}_n$ and the cumulative of the continuous distribution you believe you are sampling from $\sf C^{distr}$:

$\sf D^{max}_{n}=sup_{x}(|C^{data}_n(x)-C^{distr}(x)|)$

When the distributions match, as n tends to infinity, the distance should tend to zero.

Depending on how large this difference is respect to a critical value, one can establish whether their data was sampled from the continuous distribution.

Lets assume we have a set of 1000 values of velocities (norm of vectors) $\sf [ v_0,v_1,...,v_i,...,v_n ]$ we can compute the probability and cumulative distribution of the velocities $\sf C^{data}(v_i)$. Here below I show a histogram plot of the probability distribution function at T=100K for a set of 1000 sampled velocities:

We then also compute the cumulative distribution of the distribution we expect to match, i.e the Maxwell-Boltzmann distribution,

$\sf p(v) = \sqrt{(\frac{m}{ 2 \pi k_B T})^3} \,\,4 \pi v^2 e^{-\frac{mv^2}{2 k_B T}}$

and here is a plot showing both cumulatives:

We see that the two cumulatives are very similar. The maximum distance in this case is D = 0.0238454225145 and the critical probability value for this number of points is $\sf p_{critical} = 0.0430069761783$ (computed as $\sf 1.36/\sqrt{n}$ see e.g. http://www.cas.usf.edu/~cconnor/colima/Kolmogorov_Smirnov.htm for a 0.05 level of significance)

We have found D < $\sf p_{critical}$ therefore the test confirms the fact that our data’s distribution is the Maxwell-Boltzmann distribution.

# Randomly rotated molecules and bimolecular collision

In this post I will show an example where initial molecules are rotated to generate initial conditions for bimolecular collisions. The molecules which are considered are sulfanide and acetone cyanohydrin. These molecules most likely played an important role in the production of origins of life compounds (see B.H. Patel, C. Percivalle, D. J. Ritson, C. D. Duffy & J. D. Sutherland, Nature Chemistry 7, 301–307 (2015))

In Fig. 1, the minimized structure of each molecule is shown.

Fig 1: Geometry of two molecules, on the left acetone cyanohydrin, on the right sulfanide

In Fig. 2, a set of randomly rotated structures of the molecules of Fig. 1 is shown.

Fig 2: Two sets of rotated molecules. The molecules from Fig 1 were all randomly rotated and are shown here in the same plot.

With these initial positions and by defining some initial atomic velocities we can run dynamics for bimolecular collisions. Here this was done using TeraChem (http://www.petachem.com/). In the first video here below, I show a set of multiple trajectories (each of these was run separately) where the molecules collide at 300K.

In the second video a single trajectory is shown. From this we can see that the angle at which reactants approach is an important variable in determining whether a reaction will occur. In fact for this trajectory no reaction occurs.

# Initial conditions – uniform random rotations

In theoretical chemistry we often simulate dynamics by propagating trajectories on potential energy surfaces. From these trajectories one can derive, e.g. a reaction rate constant. For each trajectories, initial conditions, i.e velocities and coordinates, need to be defined for the molecules. One interesting part of the process is defining uniformly distributed random rotations in three dimensions (uniform distribution on SO(3)).  In fact usually we want to give the molecules a random rotation. For a detailed theoretical description of this, please see e.g. Ref [1].

At first, one may be tempted to extract three random Euler angles $\alpha, \beta$ and $\gamma$, each from a uniform distribution and to compute the rotation matrix R as $\sf{R=R_x(\alpha)\cdot R_y(\beta) \cdot R_z(\gamma)}$. If this procedure is repeated, a set of random matrices will be obtained. However, this set will not be uniformly distributed. In fact, if we take a vector $v=(1,0,0)$ and rotate it 1000 times using the above procedure, where angles are extracted from uniform distributions $\alpha,\gamma\in[-\pi,\pi]$ and $\beta\in[-\pi/2,\pi/2]$ we get

We can see a higher density of points around the north and south poles. This happens because in three dimensions, the Haar measure of a matrix R in SO(3) can be written as $[dR]=cos\beta d\beta d\alpha d\gamma$. The cosine term is fundamental. If we do not take it into account when sampling  $\beta$, as we get closer to the poles, circles have a smaller circumference than near the equator thus if the $\beta$ angle is sampled uniformly one gets more rotations in those locations (see e.g. Ref. [2]).

One of the solutions was proposed by Arvos (Ref. [3]): “To generate uniformly distributed random rotations of a unit sphere, first perform a random rotation about the vertical axis, then rotate the north pole to a random position.”   This is straightforward to implement and we can see (image below) that indeed we get a uniform distribution of rotations

Also, we can extract the angles from the random matrices and see that the distributions are as expected

Now we can apply this to a molecule and generate uniformly distributed rotations. I will show an image of this in the next post!

[1] David Kirk, Graphic Gems III, San Diego: Academic Press Professional (1992).

[2] John T. Kent and Asaad M. Ganeiber: Simulation of random rotation matrices, Events and Meetings of Italian Statistical Society, 46th scientific meeting of the Italian statistical society (2012).

[3] James Arvo, Fast random rotation matrices, Graphics Gems III, p117-120 (2012).

# NEB: Computing a minimum energy path

Nudged Elastic Band (NEB) is a method which can be used to find the minimum energy path (MEP) and the transition state which connects reactants to products. Reactants and products are considered in their local minima and their states need to be provided by the user.

The algorithm works as following. A set of images (i.e a specific geometric configuration of the atoms) connecting the initial (reactant) and final (product) states are linearly interpolated.

Each image is connected to the other by a spring. Then, the energy of this string of images is minimized. The perpendicular component of the spring force is removed to avoid interference with relaxation to the MEP when minimizing. If the minimization converges, one obtains a guess of the minimum energy path.

For example, for the $\sf{H+H_2}\rightarrow\sf{H_2+H}$ reaction, the blue images (see below) would be interpolations between start and end geometries. During minimization the intermediate configurations will change until the MEP is found.

NEB path for a reaction

Here I will describe how to write input files to submit an NEB calculation using TeraChem (http://www.petachem.com/products.html).

To begin one minimizes the reactant and products to get a converged geometry in their local minima.

Here is an example of the input file for TeraChem minimization:

coordinates     reactants.xyz
charge          0
method          rhf
basis           6-31g*
run             minimize
min_coordinates cartesian
end

When the simulation converges, the output file in scr/optim.xyz will contain the geometry of reactants at each optimization step. The last frame is the optimized frame.The same type of minimization is repeated for the products. The two optimized geometries are inserted into a coordinates.xyz file, first the reactant geometry then the product geometry. This file is used as input for the NEB calculation, here is the NEB input file:

coordinates     coordinates.xyz
charge          0
method          rhf
basis           6-31g*
run             ts
ts_method       neb_frozen
maxit           200
nstep           200
min_image       10
end

The main difference is the keyword ts_method used to select the NEB method with frozen end points. When the algorithm converges, the TS optimized geometry can be found in the optim.xyz file of the scr/ directory.

# Finding the minimum energy path

To run reaction dynamics or compute reaction rate constants ab-initio, one needs a potential energy surface. In the case of transition state theory the minimum energy path needs to be identified and computed and subsequently the transition state needs to be determined. If the transition state is known then the full path need not be determined to implement TST.

The computation of a full potential energy surface is generally time-consuming and complex. Therefore, methods have been developed to try to interpolate between known fixed geometries to compute an energy pathway while avoiding to compute the full potential energy surface. These include the Nudge Elastic Band method (NEB) [1], the String Method [2],  the Growing String Method (GSM), the Freezing String Method (FSM) [4].

In the next post I will describe how to implement the NEB method for a reaction.

[1] G. Henkelman and H. Jónsson,  J. Chem. Phys. 113, 9901-9904 (2000).

[2] E. Weinan, W. Ren, and E. Vanden-Eijnden, Phys. Rev. B 66, 052301 (2002).

[3] B. Peters, A. Heyden, A. T. Bell, and A. Chakraborty, J. Chem. Phys. 120, 7877 (2004).

[4] A. Behn, P. M. Zimmerman, A. T. Bell, and M. Head-Gordon, J. Chem. Phys. 135, 224108 (2011)

# Part 2 – Reaction rate constants with TST: rates

We can now insert our partition functions values in the expression for the rate constant at T=300K:

$\sf k(T)=$ $\sf \frac{k_BT}{h}\frac{Q^{\ddagger}(T)}{Q_R(T)}e^{-\beta E_a}$

To find,

$\sf k(T=300K)=$  $\sf \text{\sf 8.229957e+03 } cm^3 / molecule / s$

Note that the overall reaction rate reactant partition function is taken as the product of each reactant partition function. This procedure can be repeated at each temperature to get a profile of the reaction rate constant as a function of temperature. Here is a plot in logarithmic scale where we compare our results (blue curve) to published results (red triangles)

The main role of the temperature dependence is in the exponential, this is why we see a linear dependence of the logarithm of the rate. Further, as expected the rate increases as temperature rises (i.e in the limit of the x axis going to zero) and decreases at lower temperatures. When quantum effects are included, one might see a large change at low temperatures which is due to the presence of tunneling. Other differences arise from the fact that the rate as is comes from a static picture and no dynamics has been carried out.