If $f(x)=g(a^Tx)$, where $g:\mathbb{R}\to\mathbb{R}$ and $a, x\in\mathbb{R}^n$, why is the Hessian equal to $g''(a^Tx)aa^T$?

181 Views Asked by At

I am trying to understand the solution to below problem:

Let $f(x)=g(a^Tx)$, where $g:\mathbb{R}\mapsto\mathbb{R}$ is continuously differentiable and $a\in\mathbb{R}^n$ is a vector. What are $\nabla f(x)$ and $\nabla^2f(x)$?

The answer is

$\nabla f(x)=g'(a^Tx)a$

$\nabla^2 f(x)=g''(a^Tx)aa^T$

I understand the answer for the gradient, but why does the answer for the hessian contain $a^T$ at the end?

3

There are 3 best solutions below

0
On BEST ANSWER

Since $\nabla f(x)=g'(a^Tx)a$, the $i$-th component of $\nabla f(x)$, call it $h_i(x)$, is $g'(a^Tx)a_i$.

By definition, the $i$-th row of $\nabla^2f(x)$ is the transpose of the gradient of the $i$-th component of $\nabla f(x)$. In other words, the $i$-th row of $\nabla^2f(x)$ is the transpose of $\nabla h_i(x)$.

Well, $\nabla h_i(x)=g''(a^Tx)a_ia$. The transpose of this is $g''(a^Tx)a_ia^T$ (because the first two factors are constants).

So, $\nabla^2f(x)$ has the form

$$\begin{pmatrix}g''(a^Tx)a_1a^T\\\vdots\\g''(a^Tx)a_na^T\end{pmatrix}=g''(a^Tx)\underbrace{\begin{pmatrix}a_1a^T\\\vdots\\a_na^T\end{pmatrix}}=g''(a^Tx)\underbrace{aa^T}$$

The equivalence of the two underlined expressions is just a simple consequence of the definition of matrix multiplication.

0
On

Each successive application of the $\nabla$ operator increases the order of the derivative of $g(\lambda)$ by one. It also increases the tensorial order of the result by one by liberating another $a$-vector from the function argument $\,\lambda=a^Tx,\;$ i.e. $$\eqalign{ \nabla f(x) &= g^{\prime}(\lambda)\;&a \\ \nabla^2f(x) &= g^{\prime\prime}(\lambda)\;&a^2 \\ \nabla^3f(x) &= g^{\prime\prime\prime}(\lambda)\;&a^3 \\ \vdots\quad &= \qquad\vdots &\;\vdots \\ \nabla^nf(x) &= g^{(n)}(\lambda)\;&a^n \\ }$$ where $a^n$ and $\nabla^n$ denote $n^{th}$order tensor (aka dyadic) products. The presence of the term $aa^T$ in your formula is merely the way that $a^2$ is expressed in matrix notation.

0
On

Writing $g(t) = g(t_0) + (t-t_0)g'(t_0) + g''(t_0)(t-t_0)^2/2 + o(|t-t_0|^2)$ with $t = a^T(x+h)$ and $t_0=a^Tx$ gives $$f(x+h) = f(x) + h^T [a g'(t_0)] + h^T[a a^T g''(t_0)] h /2 + o(\|h\|^2).$$ The gradient $\nabla f(x)$ and Hessian $\nabla^2 f(x)$ are uniquely defined by $$f(x+h) = f(x) + h^T \nabla f(x) + h^T[\nabla^2 f(x) ] h /2 + o(\|h\|)$$ so it provides the required answer.