I'm reading Numerical Optimization from Nocedal/Wright, and was playing around the matrix notation of the Taylor expansion, regarding which I would have two questions.
We have a 3-times differentiable function $f\,:\,\mathbb{R}^n\rightarrow \mathbb{R}$, for which the Taylor expansion around the point $a$, up to the quadratic term, is the following: where $x,\,\,a,\,\,p\,\in\,\mathbb{R}^n$, and $p$ is the distance between $a$ and $x$. ($x = a + p$)
$$ f\left( x \right) =f\left( a+p \right) \approx f\left( a \right) +\nabla f\left( a \right) ^Tp+\frac{1}{2}p^T\nabla ^2f\left( a \right) p $$
- My first question is: Can the quadratic term be expressed the following way?
$$ \frac{1}{2}p^T\nabla ^2f\left( a \right) p\,\,\overset{?}{=}\,\,\frac{1}{2}\nabla \left( \nabla f\left( a \right) ^Tp \right) ^Tp $$
(I tried to prove it with mapping matrix indices with each other, but always got lost somewhere.)
- My second question stands only if the answer to the first is yes. Is it possible to write the third-order term in this manner?
$$ \frac{1}{6}\nabla \left( p^T\nabla ^2f\left( a \right) p \right) ^Tp $$
I know that these forms would be of little practical use, I'm just asking out of curiosity.
EDIT:
I've made some calculations, and it worked out in the case I tried, but I still can't prove it:
$$ f\left( a \right) =2a_{1}^{3}a_{2}^{4}+a_{1}^{2} $$
$$ a=\left[ \begin{array}{c} a_1\\ a_2\\ \end{array} \right] =\left[ \begin{array}{c} 1\\ 2\\ \end{array} \right] \,\,\,\,\,\,\,\,\,\,\,\,p=\left[ \begin{array}{c} p_1\\ p_2\\ \end{array} \right] =\left[ \begin{array}{c} 2\\ 3\\ \end{array} \right] $$
$$ \nabla f\left( a \right) =\left[ \begin{array}{c} 6a_{1}^{2}a_{2}^{4}+2a_1\\ 8a_{1}^{3}a_{2}^{3}\\ \end{array} \right] $$
$$ \nabla ^2f\left( a \right) =\left[ \begin{matrix} 12a_1a_{2}^{4}+2& 24a_{1}^{2}a_{2}^{3}\\ 24a_{1}^{2}a_{2}^{3}& 24a_{1}^{3}a_{2}^{2}\\ \end{matrix} \right] $$
$$ p^T\nabla ^2f\left( a \right) p=\left[ \begin{matrix} 2& 3\\ \end{matrix} \right] \left[ \begin{matrix} 194& 192\\ 192& 96\\ \end{matrix} \right] \left[ \begin{array}{c} 2\\ 3\\ \end{array} \right] =3944 $$
$$ \nabla \left( \nabla f\left( a \right) ^Tp \right) ^Tp=\nabla \left( \left[ \begin{matrix} 6a_{1}^{2}a_{2}^{4}+2a_1& 8a_{1}^{3}a_{2}^{3}\\ \end{matrix} \right] \left[ \begin{array}{c} p_1\\ p_2\\ \end{array} \right] \right) ^Tp= \\ =\nabla \left( 6a_{1}^{2}a_{2}^{4}p_1+2a_1p_1+8a_{1}^{3}a_{2}^{3}p_2 \right) ^Tp= \\ =\left[ \begin{array}{c} 12a_1a_{2}^{4}p_1+2p_1+24a_{1}^{2}a_{2}^{3}p_2\\ 24a_{1}^{2}a_{2}^{3}p_1+24a_{1}^{3}a_{2}^{2}p_2\\ \end{array} \right] ^T\left[ \begin{array}{c} p_1\\ p_2\\ \end{array} \right] = \\ =\left[ \begin{matrix} 964& 672\\ \end{matrix} \right] \left[ \begin{array}{c} 2\\ 3\\ \end{array} \right] =1928+2016=3944 $$
You can do something like that, though you need to be a bit careful with your notation. $\nabla f(a)^Tp$ no longer depends on $x$, so if you differentiate it again, you will just the zero matrix. But I know what you mean: let's define $Jf(x)$ to be the Jacobian of $f$ at $x$ (a row vector, for $f:\mathbb{R}^n\to \mathbb{R}$). Then indeed you can write the second-order term as $$\frac{1}{2}J\left[J(x)p\right](a)p.$$ Here the inside $J(x)p$ is again a function of $x$, and so can be differentiated again.
The proof is straightforward, in coordinates:
$$Jf(x)p = \sum_{i=1}^n \frac{\partial f}{\partial x_i}(x) p_i$$ $$J[Jf(x)p](a)p = \sum_{j=1}^n \left[\frac{\partial}{\partial x_j}\left(\sum_{i=1}^n \frac{\partial f}{\partial x_i}(x)p_i\right)\right]_{x=a}p_j = \sum_{i,j=1}^n \frac{\partial^2 f}{\partial x_i\partial x_j}(a)p_ip_j = p^T Hf(a)p$$ where we have used equality of mixed partials and the fact that $p$ is a constant that does not depend on the $x_i$.