I am trying to understand the solution to below problem:
Let $f(x)=g(a^Tx)$, where $g:\mathbb{R}\mapsto\mathbb{R}$ is continuously differentiable and $a\in\mathbb{R}^n$ is a vector. What are $\nabla f(x)$ and $\nabla^2f(x)$?
The answer is
$\nabla f(x)=g'(a^Tx)a$
$\nabla^2 f(x)=g''(a^Tx)aa^T$
I understand the answer for the gradient, but why does the answer for the hessian contain $a^T$ at the end?
Since $\nabla f(x)=g'(a^Tx)a$, the $i$-th component of $\nabla f(x)$, call it $h_i(x)$, is $g'(a^Tx)a_i$.
By definition, the $i$-th row of $\nabla^2f(x)$ is the transpose of the gradient of the $i$-th component of $\nabla f(x)$. In other words, the $i$-th row of $\nabla^2f(x)$ is the transpose of $\nabla h_i(x)$.
Well, $\nabla h_i(x)=g''(a^Tx)a_ia$. The transpose of this is $g''(a^Tx)a_ia^T$ (because the first two factors are constants).
So, $\nabla^2f(x)$ has the form
$$\begin{pmatrix}g''(a^Tx)a_1a^T\\\vdots\\g''(a^Tx)a_na^T\end{pmatrix}=g''(a^Tx)\underbrace{\begin{pmatrix}a_1a^T\\\vdots\\a_na^T\end{pmatrix}}=g''(a^Tx)\underbrace{aa^T}$$
The equivalence of the two underlined expressions is just a simple consequence of the definition of matrix multiplication.