Jacobian and Hessian of inverse norm of linear map

82 Views Asked by At

$f(x) = \frac{1}{\lVert Cx \rVert_2^2}$ so $\nabla f(x) = \frac{-2 C^T C x}{\lVert C x \rVert_2^4}$ if I am not mistaken. But now, if I apply the quotient rule or the product rule to $\nabla f(x)$ to obtain the hessian I get a sum of a matrix and a vector. Can somebody tell me what I did wrong? Many thanks!

2

There are 2 best solutions below

2
On BEST ANSWER

$ \newcommand\trans[1]{#1^{\mathrm T}} $

Assuming the "quotient rule" and "product rule" make sense is not a good idea; these, as usually thought of, are just things from single-variable calculus.

There is, however, a sort of "generalized product rule" we can use. I assume that what you want is the Hessian matrix; the operator that takes a scalar function to its Hessian is $\nabla\trans\nabla$, so we need to take the transpose of your gradient $$ \trans\nabla f(x) = \frac{-2\trans x\trans CC}{||Cx||_2^4} $$ and apply $\nabla$. This looks like $$\begin{aligned} \nabla\frac{-2\trans x\trans CC}{||Cx||_2^4} &= \dot\nabla\frac{-2\trans{\dot x}\trans CC}{||Cx||_2^4} + \dot\nabla\frac{-2\trans x\trans CC}{||C\dot x||_2^4} \\ &= (\nabla\trans x)\frac{-2\trans CC}{||Cx||_2^4} + \left(\nabla\frac1{||Cx||_2^4}\right)(-2\trans x\trans CC). \end{aligned}$$ Note that $\nabla\trans x = I$, the identity matrix. The dots here indicate what $\dot\nabla$ is differentiating, and the undotted $x$ in each expression should be thought of as being held constant. What is very important is that the order of multiplication matters, so we cannot just move $\nabla$ however we please; that is why this sort of dot notation is necessary. What we can do is move around scalars and take advantage of associativity, which is how we get the last equality. Continuing with the second term, using the chain rule we get $$ \nabla\frac1{||Cx||_2^4} = \frac{-4\nabla||Cx||_2}{||Cx||_2^5} = \frac{-4\trans CCx}{||Cx||_2^6}. $$ Putting it altogether and simplifying, $$ \nabla\trans\nabla\frac1{||Cx||_2^2} = \frac{-2}{||Cx||_2^4}\trans CC + \frac8{||Cx||_2^6}\trans CCx\trans x\trans CC. $$

3
On

Try using differentials to approach the problem $$\eqalign{ \def\qiq{\quad\implies\quad} \def\cred#1{\color{red}{#1}} \def\cblu#1{\color{blue}{#1}} \def\cgrn#1{\color{green}{#1}} \def\p{\partial} w &= Cx &\qiq \cred{dw} &= C\,dx \\ h &= w^Tw &\qiq \cblu{dh} &= 2\,\cred{dw}^Tw &= 2\,dx^TC^TCx \\ f &= h^{-1} &\qiq df &= -h^{-2}\cblu{dh} &= -2f^2\,dx^TC^TCx \\ &&&&= -2f^2x^TC^TC\,dx \\}$$ $$\eqalign{ g &= \frac{\p f}{\p x} = -2f^{2}C^TCx \qquad\qquad\qquad\cred{\big\{{\rm gradient}\big\}}\qquad\quad \\ dg &= -(C^TCx)\,(4f\,df) \;-\; 2f^{2}C^TC\,dx \\ &= \Big(8f^{3}C^TCxx^TC^TC \;-\; 2f^{2}C^TC\Big)\,dx \\ \frac{\p g}{\p x} &= 8f^{3}C^TC\cblu{xx^T}C^TC \;-\; 2f^{2}C^TC \quad\cred{\big\{{\rm hessian}\big\}} \\ &= 2hgg^T \;-\; 2f^{2}C^TC \\ }$$ So that is one way to calculate the hessian.
Note that your gradient is missing a factor of $\;-2$.