I have a function $f : \Bbb R \to \Bbb R$ defined by
$$ f(z) := z \, \beta^T \left({\Sigma}+zI\right)^{-1} \beta $$
and I want to find its derivative with respect to $z$. I tried making use of
and
in a chain rule, but it doesn't seem correct.
I have a function $f : \Bbb R \to \Bbb R$ defined by
$$ f(z) := z \, \beta^T \left({\Sigma}+zI\right)^{-1} \beta $$
and I want to find its derivative with respect to $z$. I tried making use of
and
in a chain rule, but it doesn't seem correct.
On
Let $u(z) = z\beta^\intercal$ and $v(z) = (\Sigma + zI)^{-1} \beta.$ Then, $$ \partial_z (uv) = (\partial_z u) v(z) + u \partial_z v(z). $$ In the equation above, all derivatives are matrix representation, so $\partial_zu = \beta^\intercal.$ Using $\partial_x (A(x) \beta) = (\partial_x A) \beta,$ and your formula for the inverse, we see $$ \partial_z v = -(\Sigma + zI)^{-1} \partial_z (\Sigma + zI) (\Sigma + zI)^{-1} \beta = -(\Sigma + zI)^{-2} \beta. $$ Therefore, $$ \partial_z(uv) = \beta^\intercal(\Sigma + zI)^{-1}\beta - z\beta^\intercal(\Sigma + zI)^{-2} \beta. $$
Note. If you write $\Sigma = P \Lambda P^{-1}$ as the other answer does, you will find that this result and the other one match. Now, when the matrix is non-diagonalisable, apparently, one can do a diagonalisation approximation, but even then one still needs to prove that one can change the limit and the derivative for you are writing $f(z) = \lim\limits_n f_n(z)$ with the matrices in $f_n$ diagonalisable.
On
$ \def\c#1{\color{red}{#1}} \def\l{\big(} \def\r{\big)} \def\p{{\partial}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\hess#1#2#3{\frac{\p^2 #1}{\p #2\,\p #3^T}} $For typing convenience, define the matrix $$\eqalign{ A = (\Sigma+zI) \quad\implies\quad \c{dA = I\,dz} \\ }$$ and use a colon to denote the trace/Frobenius product $$\eqalign{ X:Y &= \sum_{i=1}^m \sum_{j=1}^n X_{ij} Y_{ij}\;=\;{\rm Tr}(X^TY) \\ X:X &= \big\|X\big\|_F^2 \\ }$$ Write the function using the above notation, then calculate its differential and gradient. $$\eqalign{ f &= \beta\beta^T:A^{-1}z \\ df &= \beta\beta^T:\l A^{-1}\,dz + z\;\c{dA^{-1}}\r \\ &= \beta\beta^T:\l A^{-1}\,dz \c{-} z\;\c{A^{-1}\,dA\,A^{-1}}\r \\ &= \beta\beta^T:\l A - zI\r A^{-2}dz \\ &= \beta\beta^T:\Sigma A^{-2}dz \\ \grad{f}{z} &= \beta\beta^T:\Sigma A^{-2} \quad=\; \beta^T\Sigma A^{-2}\beta \\ }$$ where the part in $\c{\rm red}$ makes use of the second of your identities.
It is good to come back to entries themselves. Here is how.
Let us assume that $\Sigma$ is diagonalizable under the form:
$$\Sigma = P \Lambda P^{-1}$$
with $\Lambda=diag(\lambda_k)$. We can transform:
$$f(z)=z\beta^T\left({\Sigma}+zI\right)^{-1}\beta$$
into:
$$f(z)=z\beta^T\left(P \Lambda P^{-1}+zPP^{-1}\right)^{-1}\beta$$
$$f(z)=z\beta^T\left(P(\Lambda +zI)P^{-1}\right)^{-1}\beta$$
$$f(z)=z\underbrace{\beta^TP}_{U^T}(\Lambda +zI)^{-1}\underbrace{P^{-1}\beta}_V$$
As $\Lambda +zI$ is a diagonal matrix with entries $\lambda_k+z$, is we denote by $u_k$ and $v_k$ the entries of $U$ and $V$, we get:
$$f(z)=\sum_{k=1}^{n}\dfrac{u_kv_kz}{\lambda_k+z}$$
whose derivative is :
$$f'(z)=\sum_{k=1}^{n}\dfrac{u_kv_k\lambda_k}{(\lambda_k+z)^2}=\sum_{k=1}^{n}u_k \lambda_k \dfrac{1}{(\lambda_k+z)^2} v_k$$
Now it remains to do the journey back to a matricial expression...
$$f'(z)=\underbrace{(\beta^TP)}_{U^T} \Lambda (\Lambda +zI)^{-2}\underbrace{(P^{-1}\beta)}_V$$
$$f'(z)=\beta^T\underbrace{(P \Lambda P^{-1})}_{\Sigma} \underbrace{(P(\Lambda +zI)^{-2}P^{-1})}_{(\Sigma + zI)^{-2} }\beta$$
i.e.,
$$f'(z)=\beta^T\Sigma \left({\Sigma}+zI\right)^{-2}\beta$$
or equivalently:
$$f'(z)=\beta^T \left({\Sigma}+zI\right)^{-2} \Sigma\beta$$
Remarks:
If $\Sigma$ is not diagonalizable, our reasoning is still valid because a non diagonalizable matrix can always be considered as the limit of a sequence of diagonalizable matrices.
If $\Sigma$ is symmetric (which is likely), $P^{-1}=P^T$ implying that $U=V$.