Is there a $\nabla^3 f(x)$?

In multivariable calculus we have so far seen the gradient, and the hessian,

So it is natural to ask whether if $\nabla^3 f(x)$ exists

Can anyone let me know what comes after the hessian?


Consider a ($k$ times) differentiable function $f: \Bbb R^n \to \Bbb R$. The derivative of this function $Df(\mathbf x)$ is also called the gradient and is given by $$Df(\mathbf x) = \pmatrix{\partial_1 f(\mathbf x) & \cdots & \partial_n f(\mathbf x)}$$

This is an $n\times 1$ row matrix. However, once you fix the vector $\mathbf x\in \Bbb R^n$, $Df(\mathbf x)$ can also be considered a function from $\Bbb R^n\to \Bbb R$. In particular it is the linear function $$Df(\mathbf x)(\mathbf h) = \pmatrix{\partial_1 f(\mathbf x) & \cdots & \partial_n f(\mathbf x)}\pmatrix{h_1 \\ \vdots \\ h_n} = \sum_{i=1}^n h_i\partial_i f(\mathbf x)$$

What about the second derivative? Well if $Df(\mathbf x)$ is a function from $\Bbb R^n\to \Bbb R$, then we could just call this function $g$ and take the derivative of it. We know that once we fix a vector $\mathbf y$, the derivative of $g$ is given by $Dg(\mathbf y)(\mathbf h) = \sum_{i=1}^n h_i\partial_i g(\mathbf y)$. Then plugging back in $Df(\mathbf x) = g$, we get $$Dg(\mathbf y)(\mathbf h) = D^2f(\mathbf x)(\mathbf y)(\mathbf h) = \sum_{i=1}^n h_i\partial_i\sum_{j=1}^n y_j\partial_j f(\mathbf x) \stackrel{(*)}= \sum_{i=1}^n\sum_{j=1}^n h_iy_j\partial_i\partial_j f(\mathbf x)$$

where $(*)$ is possible because $\mathbf y$ was fixed. In matrix notation notice that this is just $$D^2 f(\mathbf x)(\mathbf y)(\mathbf h) = \pmatrix{h_1 & \cdots & h_n}\pmatrix{\partial_1\partial_1 f(\mathbf x) & \cdots & \partial_1\partial_n f(\mathbf x) \\ \partial_2\partial_1 f(\mathbf x) & \cdots & \partial_2\partial_n f(\mathbf x) \\ \vdots & \ddots & \vdots \\ \partial_n\partial_1 f(\mathbf x) & \cdots & \partial_n\partial_n f(\mathbf x)}\pmatrix{y_1 \\ y_2 \\ \vdots \\ y_n} = \mathbf h^T[Hf(\mathbf x)]\mathbf y$$ where $Hf(\mathbf x)$ is the Hessian matrix of $f$ at $\mathbf x$.

Using the summation notation there is a clear way to continue to the third (and even the $k$th) derivative. For instance we can see that $$D^3f(\mathbf x)(\mathbf y)(\mathbf z)(\mathbf h) = \sum_{i=1}^n\sum_{j=1}^n\sum_{k=1}^n h_iz_jy_k\partial_i\partial_j \partial_kf(\mathbf x)$$ However there isn't a way to represent this summation using matrices. What we would need is a way to get a scalar (or equivalently a $1\times 1$ matrix) out of three column matrices and some other type of matrix, but there's no way to do this that produces the correct result. What you need is the concept of a tensor. But this is usually not covered in multivariable calculus courses, so it's unlikely that you'll see the $k$th derivative of a function from $\Bbb R^n\to \Bbb R$.

What you should be able to do now though is to evaluate the third derivative of a (at least $3$ times differentiable) function $f:\Bbb R^n \to \Bbb R$ at the ordered $4$-tuple of points $(\mathbf x, \mathbf y, \mathbf z, \mathbf h)$.

A little exposition on tensors

A $k$-tensor is a multilinear function from $k$ copies of a vector space to scalars. Thus $T: \underbrace{V\times V \times \cdots \times V}_{k\text{ times}} \to \Bbb R$, where $V$ is a vector space, is a $k$-tensor. (One little note: this isn't the full definition of a tensor, but it'll work for what we're doing).

From this we see that $Df(\mathbf x)$ defined by $[Df(\mathbf x)](\mathbf h) = \nabla f(\mathbf x)\cdot \mathbf h$ is a $1$-tensor and $D^2f(\mathbf x)$ defined by $[D^2f(\mathbf x)](\mathbf h_1,\mathbf h_2) = {\mathbf h_2}^T[Hf(\mathbf x)]\mathbf h_1$ is a $2$-tensor. Note that the matrix expressions make it clear that $Df(\mathbf x)$ is a linear function from $\Bbb R^n\to \Bbb R$ and $D^2f(\mathbf x)$ is a bilinear function from $\Bbb R^n\times\Bbb R^n\to \Bbb R$.

Then we know that the third derivative $D^3f(\mathbf x)$ should be defined by $$[D^3f(\mathbf x)](\mathbf h_1, \mathbf h_2, \mathbf h_3) = \sum_{i,j,k} (\mathbf h_3)_i(\mathbf h_2)_j(\mathbf h_1)_k\partial_i\partial_j\partial_k f(\mathbf x)$$

Using this and continuing in the obvious way, we can see that the $k$th order Taylor polynomial of a $k$-times differential function $f:\Bbb R^n\to \Bbb R$ at the point $\mathbf x+\mathbf h$ is given by $$P_k(\mathbf x + \mathbf h) = f(\mathbf x) + [Df(\mathbf x)](\mathbf h) + \frac{1}{2!}[D^2f(\mathbf x)](\mathbf h,\mathbf h) + \cdots + \frac{1}{k!}[D^kf(\mathbf x)](\underbrace{\mathbf h,\cdots, \mathbf h}_{k \text{ arguments}})$$ where $D^nf(\mathbf x)$ with $n\in\{1,2,\dots, k\}$ is defined as above.

Compare this with the scalar version of Taylor's theorem: Let $f:\Bbb R\to \Bbb R$ be a $k$-times differentiable function. Then the $k$th order Taylor polynomial of $f$ at $x+h$ is given by $$P_k(x+h) = f(x) + f'(x)h + \frac1{2!}f''(x)h^2 + \cdots + \frac{1}{k!}f^{(k)}(x)h^k$$