# Checking sampling distribution of velocities: the KS test

When generating multiple initial geometries or velocities, it is good practice to check that we are sampling in the correct way, i.e that we are uniformily sampling from the distribution we expect or desire to sample from.

In this post I will show an example where the Kolmogorov-Smirnov test (KS-test) is used to check that velocities are sampled uniformily from the Maxwell-Boltzmann distribution function.

From a practical point of view, the test consists in computing the maximum difference $\sf D^{max}_n$ between the cumulative of the probability distribution function of your n-point dataset $\sf C^{data}_n$ and the cumulative of the continuous distribution you believe you are sampling from $\sf C^{distr}$: $\sf D^{max}_{n}=sup_{x}(|C^{data}_n(x)-C^{distr}(x)|)$

When the distributions match, as n tends to infinity, the distance should tend to zero.

Depending on how large this difference is respect to a critical value, one can establish whether their data was sampled from the continuous distribution.

Lets assume we have a set of 1000 values of velocities (norm of vectors) $\sf [ v_0,v_1,...,v_i,...,v_n ]$ we can compute the probability and cumulative distribution of the velocities $\sf C^{data}(v_i)$. Here below I show a histogram plot of the probability distribution function at T=100K for a set of 1000 sampled velocities: We then also compute the cumulative distribution of the distribution we expect to match, i.e the Maxwell-Boltzmann distribution, $\sf p(v) = \sqrt{(\frac{m}{ 2 \pi k_B T})^3} \,\,4 \pi v^2 e^{-\frac{mv^2}{2 k_B T}}$

and here is a plot showing both cumulatives: We see that the two cumulatives are very similar. The maximum distance in this case is D = 0.0238454225145 and the critical probability value for this number of points is $\sf p_{critical} = 0.0430069761783$ (computed as $\sf 1.36/\sqrt{n}$ see e.g. http://www.cas.usf.edu/~cconnor/colima/Kolmogorov_Smirnov.htm for a 0.05 level of significance)

We have found D < $\sf p_{critical}$ therefore the test confirms the fact that our data’s distribution is the Maxwell-Boltzmann distribution.