Finding the scalar derivative of a matrix product

329 Views Asked by At

I'm trying to find $$\frac{\partial}{\partial \lambda}y^T \left(\sigma^2 I + \lambda^{-1}K_{\theta}^{-1}\right)^{-1}y$$ where $y \in \mathbb{R^n}$ is fixed, $\lambda \in \mathbb{R}$ and $K_{\theta}^{-1}$ is a known symmetric, positive definite matrix. Here's what I did so far:

$$\frac{\partial}{\partial \lambda}y^T (\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1}y = \frac{\partial}{\partial \lambda}\text{tr}\left(y^T (\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1}y\right)$$ where tr denotes the trace. By the cyclic property of the trace, we can write $$\frac{\partial}{\partial \lambda}\text{tr}\left(y^T (\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1} y\right) = \frac{\partial}{\partial \lambda}\text{tr}\left(y^T y(\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1} \right)$$ $$ = \frac{\partial}{\partial \lambda}\sum y_i ^2\text{tr}\left( \sigma^2 I + \lambda^{-1}K_{\theta}^{-1}\right)^{-1} = \sum y_i ^2\text{tr}\left(\frac{\partial}{\partial \lambda}(\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1}\right)$$

Since for any invertible matrix $M(\alpha)$ whose entries are differentiable in $\alpha \in \mathbb{R}$ it holds that $$\frac{d}{d\alpha} M(\alpha)^{-1} = M(\alpha)^{-1}\left(\frac{d}{d\alpha} M(\alpha)\right) M(\alpha)^{-1}$$ we have $$\sum y_i ^2\text{tr}\left(\frac{\partial}{\partial \lambda}(\sigma^2 I + \lambda^{-1}K_{\theta}^{-1})^{-1}\right) = \sum y_i^2 \text{tr}\left[ (\sigma^2 I + \lambda^{-1} K_{\theta}^{-1})^{-1}(-\lambda^{-2} K_{\theta}^{-1}) (\sigma^2 I + \lambda^{-1} K_{\theta}^{-1})^{-1}\right]$$

I can simplify this to $$-\sum y_i^2 \text{tr}\left[(\lambda\sigma^2 K_{\theta} + I)^{-1}(\lambda\sigma^2 I + K_\theta^{-1})^{-1}\right]$$$$ =-\sum y_i^2 \text{tr}\left[(\lambda^2\sigma^4 K_{\theta} + 2\lambda\sigma^2 I + K_{\theta}^{-1})^{-1}\right]$$ but this is where I'm stuck as I can't analyse this expression analytically (or can I?). Is there any way to simplify this expression? I tried to use the Woodbury matrix identity on the latter matrix but to no success yet. Any help would be greatly appreciated.

1

There are 1 best solutions below

3
On BEST ANSWER

Since $K$ is diagonalizable so $K=ADA^T$ where$ D=$diag$(d_1, \ldots, d_n)$.

So $(\delta^2 I + \lambda^{-1} K^{-1})^{-1} = A (\delta^2 I + \lambda^{-1} D^{-1})^{-1} A^T = A $diag$ ( ... , \frac{\lambda d_i}{ \lambda d_i\delta^2 +1 } , ... ) A^T$

So $\frac{\partial }{\partial \lambda} \frac{\lambda d_i}{ \lambda d_i\delta^2 +1 } =\frac{ d_i}{ (\lambda d_i\delta^2 +1)^2 } $

$\frac{\partial }{\partial \lambda} y^T (\delta^2 I + \lambda^{-1} K^{-1})^{-1} y = y^T A (\lambda^2\delta^2 D + D^{-1} +2\lambda \delta^2 I) A^T y = y^T( \lambda^2\delta^2 K + K^{-1} +2\lambda \delta^2 I) y$

Correction--------------------------------------------------------------------------------

So $\frac{\partial }{\partial \lambda} y^T (\delta^2 I + \lambda^{-1} K^{-1})^{-1} y = y^T A$ diag (..., $\frac{ d_i}{ (\lambda d_i\delta^2 +1)^2 } , ... ) A^T y$

Let $ T = A$ diag (..., $\frac{ d_i}{ (\lambda d_i\delta^2 +1)^2 } , ... ) A^T$. Then $T^{-1}=A$ diag ($..., \frac{ 2\lambda d_i\delta^2 + 1 + \lambda^2 d_i^2 \delta^4}{ d_i }, ... ) A^T = 2\lambda\delta^2 I + K^{-1} + \lambda^2 \delta^4 K $

So $\frac{\partial }{\partial \lambda} y^T (\delta^2 I + \lambda^{-1} K^{-1})^{-1} y = y^T( 2\lambda\delta^2 I + K^{-1} + \lambda^2 \delta^4 K)^{-1} y $