Matrix Calculus question: Taking the derivative of the following equation?

84 Views Asked by At

I am encountering matrix calculus for the first time, and I'm completely lost on the following problem:

I'm trying to get the derivative of $$ (x-\mu)^T F (x - \mu) -1 =0 $$ where $x = (I+2\lambda F )^{-1} (y + 2 \lambda F \mu)$ which transforms the function to the following:

$$((I+2\lambda F )^{-1} (y + 2 \lambda F \mu) - \mu ) ^T F ((I+2\lambda F )^{-1} (y + 2 \lambda F \mu) - \mu ) -1 = 0$$

My intuition tells me that the gradient is just the following (extrapolating from the example of x)

$$ \nabla f(\lambda) = 2 ((I+2\lambda F )^{-1} (y + 2 \lambda F \mu) - \mu ) ^T F $$

In this case, we have $\lambda$ to be scalar, $\mu$ and $y$ are vectors and $F \succcurlyeq 0 $

Which seems to be incorrect, can someone shed light on how to approach this problem?

2

There are 2 best solutions below

0
On BEST ANSWER

$ \def\x{(x-\mu)} \def\o{{\tt1}} \def\d{\dot} \def\A{A^{-\o}} \def\AD{{\d A}^{-\o}} \def\a{\alpha}\def\b{\beta}\def\l{\lambda} \def\qiq{\quad\implies\quad} \def\g#1#2{\frac{d #1}{d #2}} $Use a dot to denote derivatives with respect to $\l$ and note the following rules $$\eqalign{ \g{(Ab)}{\l} &= \d Ab + A\d b \qquad&\big({\rm derivative\:of\:a\:product}\big) \\ \d c &= 0 \qquad&\big({\rm derivative\:of\:a\:constant}\big) \\ \d\l &= \o \\ }$$ The derivative of a matrix inverse is tricky, but follows directly from these rules $$\eqalign{ I &= A\A \qquad&\big({\rm a\:matrix\:product}\big) \\ 0 &= \d A\A + A\,\AD \qquad&\big(I\:{\rm is\:a\:constant}\big) \\ \AD &= -\A\d A\A \qquad&\big({\rm solve\:for\:}\AD\big) \\ }$$ For typing convenience, define the variables $$\eqalign{ A &= I+2\l F,\qquad &\d A = 2F \\ b &= y+2\l F\mu,\qquad &\d b = 2F\mu \qquad\qquad\quad \\ }$$ Now we can differentiate $x$ $$\eqalign{ x &= {\A b} \\ \d x &= \A\d b - \A\d A\A b \\ &= \A(2F\mu) - \A(2F){\A b} \qquad\qquad \\ &= 2\A F(\mu-x) \\ }$$ Differentiating the main function yields $$\eqalign{ f &= \x^T F\x - \o \\ \d f &= \d x^T F\x + \x^T F\d x \qquad\qquad\quad \\ }$$ Assuming that $F$ is symmetric, this can be simplified to $$\eqalign{ \d f &= 2\,\x^T F\d x \\ &= 4\,\x^T F\A F(\mu-x) \qquad\qquad\quad \\ }$$

0
On
  • From $$ G(x):=(x-\mu)^\top F(x-\mu)=\sum_{ij}(x_i-\mu_i)F_{ij}(x_j-\mu_j) $$ we get, by the product rule, $$\tag{1} \frac{\partial}{\partial x_i}G(x)=\sum_{j\not=i}F_{ij}(x_j-\mu_j)+2F_{ii}(x_i-\mu_i)\,. $$

  • Writing $$ (I+2\lambda F)\,x(\lambda)=y+2\lambda F\mu $$ we get, by differentiating w.r.t. $\lambda$, $$ 2F\,x(\lambda)+(I+2\lambda F)\,\dot{x}(\lambda)=2F\mu\,. $$ This gives $$\tag{2} \dot{x}(\lambda)=(I-2\lambda F)^{-1}2F(\mu-x(\lambda))\,. $$

  • The function you want to differentiate is $f(\lambda)=G(x(\lambda))$ which maps a scalar to a scalar.

  • By the chain rule we finally get $$ \dot{f}(\lambda)=\sum_i\frac{\partial}{\partial x_i}G(x(\lambda))\,\dot{x}_i(\lambda) $$ for the derivative you are seeking. Use (1) and (2) to write this out.