Derivative of a KernelRidge regression model based on Coulomb Matrix descriptor

48 Views Asked by At

I am trying to take analytical derivatives of a KernelRidge regression model that takes as input a Coulomb Matrix descriptor. A Coulomb Matrix is a way of representing a molecular structure basically using an inverse euclidean pairwise distance matrix of its atoms. Actually this is multiplied by atomic numbers, but these are a constant here, and thus not particularly relevant. I would like to analytically derive the cartesian gradient with respect to the KRR model, which in this context is a scalar function of this inverse 3D euclidean distance matrix. My math is really rusty at this point, and I have tried to work out this derivative as reported below. However, I am stuck in my derivation, I hope someone can help me figure it out. Any help is appreciated.

Derivatives of distance between two points with respect to cartesian displacements:

\begin{gather} r_{ij} = ||{r_j - r_i}|| \\ \frac{\partial r_{ij}}{\partial k} = \frac{k_j - k_i}{r_{ij}}, k = x, y, z \end{gather}

where $\frac{\partial r_{ij}}{\partial k}$ has dimension 3 (a vector).

Derivatives of inverse distance between two points with respect to cartesian displacements:

\begin{gather} f(r) = \frac{1}{r_{ij}} \\ \frac{\partial f(r)}{\partial k} = - \frac{k_j - k_i}{r_{ij}^3} \end{gather}

Derivatives of Coulomb Matrix with respect to cartesian displacements: \begin{equation} C_{ij} = \begin{cases} 0.5 Z_i^{2.4}, & i = j \\ \frac{Z_i Z_j}{r_{ij}}, & i \neq j \end{cases} \end{equation}

\begin{equation} \frac{\partial C_{ij}}{\partial k} = \begin{cases} 0, & i = j \\ -\frac{Z_i Z_j (k_j - k_i)}{r_{ij}^3}, & i \neq j \end{cases} \end{equation} where $C$ has dimension $N_{atoms} \times N_{atoms}$, while $\frac{\partial C_{ij}}{\partial k}$ has dimensions $N_{atoms} \times N_{atoms} \times 3$. To use $C$ in KRR we unroll it to a vector of dimensions $N_{atoms}^2$, and thus its derivative has dimensions $N_{atoms}^2 \times 3$

The KRR model is defined as:

\begin{gather} y^{pred} = \sum_{m=1}^{M^{train}} \alpha_m \cdot k(C^{pred}, C_m) \end{gather}

where $\alpha_m$ are the coefficients obtained during training, and $k(C^{pred}, C_m)$ is a function mapping two vectors $C^{pred}, C_m$ to a scalar. It can be interpreted as a similarity function, and its input are unrolled Coulomb matrices. The set of Coulomb matrices $C_m$ does not change, and it is the set used during training. We only want gradients for new predictions $y^{pred}$.

In my case, I am using a Gaussian kernel:

\begin{gather} k(C^{pred}, C_m) = e^{-\gamma ||{C^{pred} - C_m}^2}|| \\ \frac{\partial k(C^{pred}, C_m)}{C^{pred}} = -2 \gamma \cdot (C^{pred} - C_m) \cdot k(C^{pred}, C_m) \end{gather} This derivative has size $N_{atoms}^2$.

Now if I want the derivative with respect to cartesian displacements $x_i$ (where $i = 1, 2, 3$) I can use the chain rule and get % \begin{gather} \frac{\partial k(C^{pred}, C_m)}{\partial x_{ia}} = \frac{\partial k}{\partial C^{pred}} \frac{\partial C^{pred}_{ab}}{\partial x_{ia}} = \\ \sum_b -2 \gamma \cdot (C^{pred} - C_m) \cdot k(C^{pred}, C_m) \cdot -\frac{Z_a Z_b (x_{ib} - x_{ia})}{r_{ab}^3} \end{gather}

where $a,b$ run over atoms of the molecule, $i$ runs over cartesian coordinates. This should be the derivatives of a single element of the kernel matrix, and should have size $3$. However, I want something that has size $N_{atoms} \times 3$, thus I probably did something wrong.