Gradient and Hessian of a matrix

98 Views Asked by At

For $g(y) = f(D^{\frac{1}{2}}y)$ where $D^{\frac{1}{2}}$ is a matrix to the power half and $ x = D^{\frac{1}{2}}y$

Then $\nabla g(y) = \nabla f(D^{\frac{1}{2}}y) = D^{\frac{1}{2}} \nabla f(D^{\frac{1}{2}}y) = D^{\frac{1}{2}} \nabla f(x)$

What I struggle with is the form of the Hessian given in Amir Beck's Introduction to non-linear optimisation on page 63. Why is the $D^{\frac{1}{2}}$ mutliplied on the RHS in the Hessian?

$\nabla^2 g(y) = D^{\frac{1}{2}} \nabla^2 f(D^{\frac{1}{2}}y) D^{\frac{1}{2}} = D^{\frac{1}{2}} \nabla^2 f(x) D^{\frac{1}{2}}$

2

There are 2 best solutions below

2
On BEST ANSWER

Imagine that you have $A^{m\times n}, y^{n\times 1}$, and have the function $f:\mathbb R^m\rightarrow\mathbb R$. Let $g(y)=f(Ay)=f(x)$ with $x^{m\times 1}=Ay$

The gradient of g at y is given by

$$\begin{split}\underbrace{\nabla g(y)}_{\in\mathbb R^n}&=A^T\nabla f(Ay)\\ &=\underbrace{A^T}_{\in\mathbb R^{n\times m}}\underbrace{\nabla f(x)}_{\in\mathbb R^{m\times 1}}\end{split}$$

Then the hessian is given by

$$\begin{split}\underbrace{\nabla^2 g(y)}_{\in\mathbb R^{n\times n}}&=\nabla\left[A^T\nabla f(Ay)\right]\\ &=A^T\nabla^2f(Ay)A\\ &=\underbrace{A^T}_{\in\mathbb R^{n\times m}}\underbrace{\nabla^2f(x)}_{\in\mathbb R^{m\times m}}\underbrace{A}_{\in\mathbb R^{m\times n}}\end{split}$$

Thus $(D^{\frac 1 2})^T=D^{\frac 1 2}$ is multiplied on the left and $D^{\frac 1 2}$ is multiplied on the right.

0
On

Suppose you've calculated the gradient and Hessian of a function in terms of the variable $x$
$$\eqalign{ \phi = f(x),\,\,\,\,\, p = \frac{\partial f}{\partial x},\,\,\,\,\,\, A = \frac{\partial p}{\partial x} }$$ Then you learn that $x$ is not independent, but actually depends on another variable $(x = Sy).\,\,$ Note that the matrix $S$ does not need to be invertible. It might even be rectangular.

Recalculate everything in terms of the new variable $$\eqalign{ \phi = g(y),\,\,\,\,\, q = \frac{\partial g}{\partial y},\,\,\,\,\,\, B = \frac{\partial q}{\partial y} }$$ using differentials $$\eqalign{ &d\phi = p^Tdx = p^T(S\,dy) = (S^Tp)^Tdy = q^Tdy \quad &\therefore\;\; &q = \frac{\partial g}{\partial y} = S^Tp \\ &dq = S^T\,dp = S^T(A\,dx) = S^TA(S\,dy) = B\,dy \quad &\therefore\;\; &B =\frac{\partial q}{\partial y} = S^TAS \\ }$$ Obviously, in this case we have $\,S=S^T=D^{\frac 12}$