Partial derivatives of the marginal likelihood of a Gaussian Process

493 Views Asked by Bumbble Comm At 31 Mar 2026 - 9:01

In Chapter 5 of "Gaussian Processes for Machine Learning" by Rasmussen and Williams on page 114 (p.10 in pdf) they give the equation (5.9) to calculate the partial derivatives of the marginal likelihood w.r.t. the hyperparameters:

$$ \frac{\partial \log p(\mathbf{y} \mid X, \theta)}{\partial \theta_j} = \frac{1}{2} \mathbf{y}^\top K^{-1} \frac{K}{\partial \theta_j} K^{-1} \mathbf{y} - \frac{1}{2} \text{tr} \left(K^{-1} \frac{K}{\partial \theta_j} \right) \\ = \frac{1}{2} \text{tr} \left( (\alpha \alpha^\top - K^{-1})\frac{K}{\partial \theta_j} \right) $$

With $\alpha = K^{-1}\mathbf{y}$.

How can you derive how to get the second equation ($\frac{1}{2} \text{tr} \left( (\alpha \alpha^\top - K^{-1})\frac{K}{\partial \theta_j} \right)$) from the first equation ($\frac{1}{2} \mathbf{y}^\top K^{-1} \frac{K}{\partial \theta_j} K^{-1} \mathbf{y} - \frac{1}{2} \text{tr} \left(K^{-1} \frac{K}{\partial \theta_j} \right)$)?

I understand where the first part comes from, and how to get this derivative of the marginal likelihood, I just don't understand how they get the term with the $\alpha$'s. My linear algebra is a bit rusty, I tried do do the derivation myself, but I couldn't find a solution.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 13 Dec 2015 - 11:08

Ok, I think I have something:

Since the trace of an inner product is just the inner product I can write:

$$\mathbf{y}^\top K^{-1} \frac{K}{\partial \theta_j} K^{-1} \mathbf{y} = \text{tr}\left(\mathbf{y}^\top K^{-1} \frac{K}{\partial \theta_j} K^{-1} \mathbf{y}\right)$$

And since $tr(ABC) = tr(BCA) = tr(CAB)$, and $K^{-1}=K^{-\top}$ because $K^{-1}$ is symmetric, and $(AB)^\top = B^\top A^\top$, I can rewrite this term as:

$$\text{tr}\left(K^{-1}\mathbf{y} \mathbf{y}^\top K^{-1} \frac{K}{\partial \theta_j} \right) = \text{tr}\left((K^{-1}\mathbf{y}) ( K^{-1} \mathbf{y})^\top \frac{K}{\partial \theta_j} \right)$$

If we fill this in our formula, and take into account that $tr(A) + tr(B) = tr(A + B)$ we can write: $$ \frac{\partial \log p(\mathbf{y} \mid X, \theta)}{\partial \theta_j} = \frac{1}{2} \mathbf{y}^\top K^{-1} \frac{K}{\partial \theta_j} K^{-1} \mathbf{y} - \frac{1}{2} \text{tr} \left(K^{-1} \frac{K}{\partial \theta_j} \right) \\ = \frac{1}{2} \left( \text{tr} \left((K^{-1}\mathbf{y}) ( K^{-1} \mathbf{y})^\top \frac{K}{\partial \theta_j} \right) - \text{tr} \left(K^{-1} \frac{K}{\partial \theta_j} \right) \right) \\ = \frac{1}{2} \text{tr} \left( (\alpha \alpha^\top - K^{-1})\frac{K}{\partial \theta_j} \right) $$

With $\alpha = K^{-1}\mathbf{y}$.

Partial derivatives of the marginal likelihood of a Gaussian Process

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in DERIVATIVES

Related Questions in PARTIAL-DERIVATIVE

Related Questions in MATRIX-EQUATIONS

Related Questions in TRACE

Trending Questions

Popular # Hahtags

Popular Questions