Background
I am trying to fit a set of energy vs. dihedral angle data obtained from the quantum mechanical software Gaussian. The data was obtained by rotating a certain group of atoms 10 degrees at a time relative to another certain group of atoms and calculating the energy at each rotation. The result is two peaks (corresponding to the symmetric, highest energy conformations) separated by a minimum (corresponding to the lowest energy conformation).
Data
The data are given below:
| Angle | Energy |
|---|---|
| -45.29984 | -0.50819 |
| -35.29987 | -0.51017 |
| -25.29983 | -0.51172 |
| -15.29985 | -0.51276 |
| -5.29986 | -0.5133 |
| 4.70011 | -0.51335 |
| 14.7001 | -0.51291 |
| 24.70008 | -0.51196 |
| 34.70007 | -0.51049 |
| 44.69999 | -0.50855 |
| 54.70017 | -0.50628 |
| 64.70004 | -0.50395 |
| 74.70002 | -0.50194 |
| 84.70007 | -0.50069 |
| 94.70016 | -0.50054 |
| 104.70017 | -0.50151 |
| 114.70015 | -0.50332 |
| 124.70016 | -0.50558 |
| 134.70014 | -0.50789 |
| 144.70018 | -0.50995 |
| 154.70013 | -0.51159 |
| 164.70015 | -0.51271 |
| 174.70014 | -0.51329 |
| -175.29989 | -0.51334 |
| -165.29991 | -0.51286 |
| -155.29992 | -0.51184 |
| -145.29994 | -0.51028 |
| -135.29997 | -0.50826 |
| -125.29997 | -0.50596 |
| -115.29995 | -0.50365 |
| -105.29993 | -0.50173 |
| -95.29982 | -0.50061 |
| -85.29985 | -0.5006 |
| -75.29985 | -0.50171 |
| -65.29985 | -0.50362 |
| -55.29986 | -0.50591 |
| -45.29987 | -0.50819 |
Note: By symmetry, in this case, I could just as easily take only the positive angles.
Fitting
The molecular modeling software LAMMPS uses the following expression to model dihedral energies for the OPLS forcefield (Optimized Potentials for Liquid Simulations; see Watkins and Jorgensen, J Phys Chem A, 105, 4118-4125, 2001) (https://docs.lammps.org/dihedral_opls.html):
$$E = \frac 12 K_1[1+cos(\phi)] +\frac 12 K_2[1-cos(2\phi)] + \frac 12 K_3[1+cos(3\phi)] + \frac 12 K_4[1-cos(4\phi)]$$
The expression requires the 4 K-values to be identified. Some may be zero.
I have tried using the Matlab curve fitting toolbox, but the straightforward approach of using sum of sines has not worked.
What is the best way to fit this data to this expression in order to identify the K-values?
Please note: This is my first time posting on math stackexchange. If this question is not considered sufficiently math-related, just let me know.
This problem is just a multivariable linear regression.
Being myself a quantum mechanicist, I can tell you that, in the model, the term $K_0$ is missing. It should be $$E =K_0+ \frac 12 K_1[1+\cos(\phi)] +\frac 12 K_2[1-\cos(2\phi)] + \frac 12 K_3[1+\cos(3\phi)] + \frac 12 K_4[1-\cos(4\phi)]\tag 1$$ If you look at my papers in the mid $70$'s, you will see that I proposed and used $$E=\sum_{n=0}^p a_n\,\cos(n \phi)$$ $p$ being determined by stepwise regression procedures.
What I suspect is that, as written, the model is for $\Delta E$ and not for $E$.
Anyway, using your data and $(1)$ we have $$\begin{array}{clclclclc} \text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\ K_0 & -0.513347 & 0.000051 & \{-0.513451,-0.513243\} \\ K_1 & -0.000198 & 0.000048 & \{-0.000295,-0.000101\} \\ K_2 & +0.012812 & 0.000048 & \{+0.012714,+0.012910\} \\ K_3 & +0.000203 & 0.000048 & \{+0.000106,+0.000300\} \\ K_4 & -0.001285 & 0.000047 & \{-0.001381,-0.001190\} \\ \end{array}$$ all coefficients being highly significant.
The fit is almost perfect : maximum absolute error $0.0001715$ and mean absolute error $0.000083$.
Edit
If I may suggest, you can improve your model from a physical point of view. After this analysis (which must be done), you find a maximum at $\phi=90.6443^{\circ}$; mother nature does not like such kind of numbers except in some sterically hindered molecules.
So for your case, there two parameters which must be non fitted : $K_0$ which is $E$ for $\phi=0^{\circ}$ and $K_3=\frac 2 3 K_1$. This means three parameters to be adjusted instead of five.
Ignoring the value of $K_0$ ($\phi=0^{\circ}$ is absent from the table), I refitted the model with $K_3=\frac 2 3 K_1$. This gives $$\begin{array}{clclclclc} \text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\ K_0 & -0.513303 & 0.0000734 & \{-0.513453,-0.513152\} \\ K_1 & -0.000045 & 0.0000585 & \{-0.000165,+0.000074\} \\ K_2 & +0.012812 & 0.0000704 & \{+0.012669,+0.012956\} \\ K_3 & -0.000030 & & \\ K_4 & -0.001300 & 0.0000685 & \{-0.001439,-0.001160\} \\ \end{array}$$ which are quite different from the previous ones for almost the same quality of fit.