$\bar f(y) = f(Ty)$, how to compute the Hessian of $\bar f(y) $?

69 Views Asked by At

From Convex Optimization by Boyd and Vandenberghe: Let $T \in \Bbb R^{n \times n}$ be nonsingular. Let $f: \Bbb R^n \rightarrow \Bbb R$ convex and twice continuously differentiable. Define $\bar f(y) = f(Ty)$, and $x=Ty$. Then $\nabla \bar f(y) = T^T \nabla f(x)$.

$T^T$ is the transpose of $T$

My calculation process is as follows: $\nabla \bar f(y) = (\bar f'(y))^T=(f'(x)*T)^T=T^T \nabla f(x)$, since the gradient is the transpose of the derivative.

But I do not know how does $\nabla^2\bar f(y)=T^T\nabla^2 f(x)T$ come from.

3

There are 3 best solutions below

0
On BEST ANSWER

Suppose you know the gradient $(g)$ and Hessian $(H)$ of a function in terms of the variable $x$
$$\eqalign{ f = f(x),\,\,\,\,\, g = \frac{\partial f}{\partial x},\,\,\,\,\,\, H = \frac{\partial g}{\partial x} }$$ You are then told that $x$ is not independent, but actually depends on another variable $(x = Sy).\,\,$ Note that the matrix $S$ does not need to be invertible. It might even be rectangular.

Let's find the gradient $(p)$ and Hessian $(Q)$ with respect to this new variable, by way of differentials. $$\eqalign{ df &= g^Tdx = g^T(S\,dy) = (S^Tg)^Tdy = p^Tdy \cr p &= \frac{\partial f}{\partial y} = S^Tg \cr \cr dp &= S^T\,dg = S^T(H\,dx) = S^TH(S\,dy) = Q\,dy \cr Q &=\frac{\partial p}{\partial y} = S^THS \cr\cr }$$

0
On

It comes about because we often represent scalar valued $2$-linear operators as in the form $(u,v) \mapsto u^TAv$ for some square matrix.

Very informally, pick some $u$ and form the scalar valued operator $x \mapsto u^T T^T \nabla f(Tx)$. Then (informally) we have $u^T T^T \nabla f(T(x+h))-u^T T^T \nabla f(Tx) = u^T T^T (\nabla f(Tx+Th) -\nabla f(Tx)) \approx u^TT^T \nabla^2f(Tx) Th$.

Note that the 'A' matrix here is $A=T^T \nabla^2f(Tx) T$.

0
On

If $\bar{f}(y) = f(Ty)$, the chain rule gives $D\bar{f}(y)(v) = Df(Ty)(Tv)$, since $T$ being linear means that $DT(y) = T$ for all $y$. Now, gradients are the vectors corresponding to these total derivatives (which are linear functionals) under the usual inner product in $\Bbb R^n$. So $$ \nabla \bar{f}(y)^\top v = D\bar{f}(y)(v) = Df(Ty)(Tv) = \nabla f(Ty)^\top Tv = (T^\top \nabla f(Ty))^\top v$$for all $v \in \Bbb R^n$, due to the general properties $A^{\top \top} = A$ and $(AB)^\top = B^\top A^\top$. This means that $\nabla \bar{f}(y) = T^\top \nabla f(Ty)$. For Hessians, we take total derivatives on both sides again, to get $$D(\nabla \bar{f})(y) = T^\top \circ D(\nabla f)(Ty) \circ T$$by the chain rule, using that $T^\top$ is itself linear. Converting the above expression (which is at the endomorphism level) to the matrix level gives $D^2\bar{f}(y) = T^\top D^2f(Ty)T$, as wanted.