Find the gradient and hessian of $g(x)=f(Ax)$

331 Views Asked by At

Let $f(z):\mathbb R^m\rightarrow \mathbb R$ be a real-valued function from $\mathbb R^m$ to $\mathbb R$. Let $A^{m\times n}, x^{n\times 1}$, and let $g(x)=f(Ax)$.

Find the gradient and hessian of g(x) in terms of A, $\nabla f(x)$, and the hessian $H(f(x))$

I tried using the chain rule $\underbrace{\nabla g(x)}_{n\times 1}=\underbrace{A}_{m\times n} \underbrace{\nabla f(Ax)}_{n\times 1}$

But it seems as if the dimensions don't work out. Is the dimension of $\nabla g(x)$ actually mx1? I thought it should be nx1 because there are n elements of x. If that is the case, my application of chain rule is probably wrong.

2

There are 2 best solutions below

2
On BEST ANSWER

Let $h : \mathbb{R}^n \to \mathbb{R}^m$ be the map $h(x)=Ax$. Then $h$ is linear and $\mathrm{d}h(x) = A$. Remark that $g = f\circ h$. The chain rule then says $$ \forall x \in \mathbb{R}^n,~ \mathrm{d}g(x) = \mathrm{d}f(h(x))\circ \mathrm{d}h(x) $$ Their gradients are defined thanks to the euclidean metric : \begin{align} \forall x\in \mathbb{R}^n, \forall v \in \mathbb{R}^n,~ \langle\nabla g(x),v \rangle_{\mathbb{R}^n} &=\mathrm{d}g(x)v \\ &= \mathrm{d}f(h(x))\circ \mathrm{d}h(x)v \\ &= \mathrm{d}f(Ax) Av \\ &= \langle \nabla f(Ax),Av\rangle_{\mathbb{R}^m} \\ &= \langle A^T\nabla f(Ax),v\rangle_{\mathbb{R}^n} \end{align} the last equality resulting of the fact that for all $u\in \mathbb{R}^n,\forall v\in \mathbb{R}^m,~ \langle Au,v \rangle_{\mathbb{R}^m} = \langle u,A^Tv\rangle_{\mathbb{R}^n}$. It follows that $\nabla g(x) = A^T\nabla f(Ax) \in \mathbb{R}^n$

0
On

$ \def\a{\alpha}\def\b{\beta}\def\g{\gamma} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} \def\CLR#1{\c{\LR{#1}}} $Let's use a colon to denote the Frobenius product, which is a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \|A\|^2_F \\ }$$ and define the variables $$\eqalign{ w &= Ax,\qquad \b = g(w) \qquad\qquad\qquad \\ }$$ Given the gradient of $\b$ with respect to $w$ $\Big({\rm i.e.}\;\grad{\b}{w}\Big),\:$ let's find the gradient with respect to $x$ by expanding the differential $d\b$ and then changing the independent variable from $w\to x$ $$\eqalign{ d\b &= \gradLR{\b}{w}:\c{dw} \\ &= \gradLR{\b}{w}:\CLR{A\:dx} \\ &= A^T\gradLR{\b}{w}:{dx} \\ \grad{\b}{x} &= A^T\gradLR{\b}{w} \\ }$$


From the definition above, it is straightforward to derive rules for manipulating a Frobenius product. Here are a few of the most useful $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:\LR{AB} &= \LR{CB^T}:A \\&= \LR{A^TC}:B \\ }$$ When applied to vectors $({\rm i.e.}\;n=1),\;$ the Frobenius product is identical to the ordinary dot product $$a:b \;=\; a\cdot b \;=\; a^Tb$$