Derivative of matrix function with respect to scalar variable

252 Views Asked by At

Given: $f : \mathbb{R} \to \mathbb{R}$,

$$f(z) = \operatorname{tr}(PXX^{\top}) - y^{\top}P X X^{\top}P y = \operatorname{tr}(PXX^{\top}) - (y^{\top}P X) (y^{\top}P X)^{\top}$$ and

$$P = B-v(1_n^{\top}v)^{-1}v^{\top}$$

where

  • Only the elements of $v$ depend on the scalar variable $z \in \mathbb{R}$, i.e. $v = (v_1(z), \ldots,v_n(z))^{\top}$.

  • $y,v,1_n \in \mathbb{R}^n$ with $1_n^{\top} = (1,\ldots,1)$.

  • $B,P \in \mathbb{R}^{n \times n}$ and $X \in \mathbb{R}^{n \times r}$.

  • $P = P^{\top}$.

I need $f'(z) = \frac{\mathrm{d} f}{\mathrm{d} z}$. From reading related questions I was hoping there's a nice trick using matrix differentials to get a closed form. Unfortunately, I'm really new to it. Any hints are really appreciated!

2

There are 2 best solutions below

0
On BEST ANSWER

Let's denote derivatives (wrt $z$) by putting a dot over the variable name.

Then the derivative of the function is simply $$\eqalign{ \dot f &= \operatorname{Tr}\left(\dot PXX^T\right)-y^T\dot PXX^TPy-y^TPXX^T\dot Py \\ }$$ Now we just need the derivative of $P$. The following scalar product will be useful. $$\eqalign{ \mu &= {\tt1}^Tv \\ \dot\mu &= {\tt1}^T\dot v \\ }$$ So the derivative of $P$ is $$\eqalign{ P &= B-\mu^{-1}vv^T \\ \dot P &= \mu^{-2}\dot\mu vv^T - \mu^{-1}(\dot vv^T+v\dot v^T) \\ &= \mu^{-2}\Big({\tt1}^T\dot vvv^T - {\tt1}^Tv\dot vv^T - {\tt1}^Tvv\dot v^T\Big) \\ }$$ The final result is just a matter of back-substitution, which I leave to you.

0
On
  • Apply the chain rule as follows (refer to Section 2.8.1, Matrix Cookbook): $$\frac{\partial }{\partial z}f(z) = \sum_{i=1}^n \frac{\partial }{\partial v_i}f(v_i)\frac{\partial }{\partial z}v_i = \left< \frac{\partial }{\partial v}f(v),\frac{\partial }{\partial z}v\right>. $$
    • Substituting for $P$ in $f$ to decompose $f$ into two terms: one that depends on $v$, and the other which is independent of $v$:

\begin{align} \frac{\partial f}{\partial v} &= \frac{\partial }{\partial v}\left[\text{tr}((B\!-\!v(1_n^{\top}v)^{-1}v^{\top})XX^{\top}) \!-\! y^{\top}(B\!-\!v(1_n^{\top}v)^{-1}v^{\top}) X X^{\top}(B\!-\!v(1_n^{\top}v)^{-1}v^{\top}) y \right]\\ &=\frac{\partial }{\partial v}\left[-\text{tr}(v(1_n^{\top}v)^{-1}v^{\top}XX^{\top}) +2 y^{\top}B X X^{\top}v(1_n^{\top}v)^{-1}v^{\top} y \right.\\ &\hspace{8cm}\left.-y^{\top}v(1_n^{\top}v)^{-1}v^{\top} X X^{\top}v(1_n^{\top}v)^{-1}v^{\top} y\right]\\ &=\frac{\partial }{\partial v}\left[-\frac{v^{\top}XX^{\top}v}{1_n^{\top}v}+\frac{2 y^{\top}B X X^{\top}vv^{\top} y}{1_n^{\top}v} -\frac{(y^{\top}v)^2}{(1_n^{\top}v)^2}v^{\top} X X^{\top}v\right]\\ &=-\frac{\partial }{\partial v}\left[\frac{v^{\top}(XX^{\top}-2 yy^{\top}B X X^{\top})v}{1_n^{\top}v} +\frac{(y^{\top}v)^2}{(1_n^{\top}v)^2}v^{\top} X X^{\top}v\right]\\ &=-\frac{[1_n^{\top}v] [2(XX^{\top}-2 yy^{\top}B X X^{\top})v]-[v^{\top}(XX^{\top}-2 yy^{\top}B X X^{\top})v][v]}{(1_n^{\top}v)^2}\\ &\hspace{2cm}-2\frac{(y^{\top}v)^2}{(1_n^{\top}v)^2} X X^{\top}v-[v^{\top} X X^{\top}v]\left[2\frac{y^{\top}v}{1_n^{\top}v}\frac{(1_n^{\top}v)y-(y^{\top}v)1_n}{(1_n^{\top}v)^2}\right]. \end{align}

Can you complete it now?