In multivariable calculus we have so far seen the gradient, and the hessian,
So it is natural to ask whether if $\nabla^3 f(x)$ exists
Can anyone let me know what comes after the hessian?
In multivariable calculus we have so far seen the gradient, and the hessian,
So it is natural to ask whether if $\nabla^3 f(x)$ exists
Can anyone let me know what comes after the hessian?
Consider a ($k$ times) differentiable function $f: \Bbb R^n \to \Bbb R$. The derivative of this function $Df(\mathbf x)$ is also called the gradient and is given by $$Df(\mathbf x) = \pmatrix{\partial_1 f(\mathbf x) & \cdots & \partial_n f(\mathbf x)}$$
This is an $n\times 1$ row matrix. However, once you fix the vector $\mathbf x\in \Bbb R^n$, $Df(\mathbf x)$ can also be considered a function from $\Bbb R^n\to \Bbb R$. In particular it is the linear function $$Df(\mathbf x)(\mathbf h) = \pmatrix{\partial_1 f(\mathbf x) & \cdots & \partial_n f(\mathbf x)}\pmatrix{h_1 \\ \vdots \\ h_n} = \sum_{i=1}^n h_i\partial_i f(\mathbf x)$$
What about the second derivative? Well if $Df(\mathbf x)$ is a function from $\Bbb R^n\to \Bbb R$, then we could just call this function $g$ and take the derivative of it. We know that once we fix a vector $\mathbf y$, the derivative of $g$ is given by $Dg(\mathbf y)(\mathbf h) = \sum_{i=1}^n h_i\partial_i g(\mathbf y)$. Then plugging back in $Df(\mathbf x) = g$, we get $$Dg(\mathbf y)(\mathbf h) = D^2f(\mathbf x)(\mathbf y)(\mathbf h) = \sum_{i=1}^n h_i\partial_i\sum_{j=1}^n y_j\partial_j f(\mathbf x) \stackrel{(*)}= \sum_{i=1}^n\sum_{j=1}^n h_iy_j\partial_i\partial_j f(\mathbf x)$$
where $(*)$ is possible because $\mathbf y$ was fixed. In matrix notation notice that this is just $$D^2 f(\mathbf x)(\mathbf y)(\mathbf h) = \pmatrix{h_1 & \cdots & h_n}\pmatrix{\partial_1\partial_1 f(\mathbf x) & \cdots & \partial_1\partial_n f(\mathbf x) \\ \partial_2\partial_1 f(\mathbf x) & \cdots & \partial_2\partial_n f(\mathbf x) \\ \vdots & \ddots & \vdots \\ \partial_n\partial_1 f(\mathbf x) & \cdots & \partial_n\partial_n f(\mathbf x)}\pmatrix{y_1 \\ y_2 \\ \vdots \\ y_n} = \mathbf h^T[Hf(\mathbf x)]\mathbf y$$ where $Hf(\mathbf x)$ is the Hessian matrix of $f$ at $\mathbf x$.
Using the summation notation there is a clear way to continue to the third (and even the $k$th) derivative. For instance we can see that $$D^3f(\mathbf x)(\mathbf y)(\mathbf z)(\mathbf h) = \sum_{i=1}^n\sum_{j=1}^n\sum_{k=1}^n h_iz_jy_k\partial_i\partial_j \partial_kf(\mathbf x)$$ However there isn't a way to represent this summation using matrices. What we would need is a way to get a scalar (or equivalently a $1\times 1$ matrix) out of three column matrices and some other type of matrix, but there's no way to do this that produces the correct result. What you need is the concept of a tensor. But this is usually not covered in multivariable calculus courses, so it's unlikely that you'll see the $k$th derivative of a function from $\Bbb R^n\to \Bbb R$.
What you should be able to do now though is to evaluate the third derivative of a (at least $3$ times differentiable) function $f:\Bbb R^n \to \Bbb R$ at the ordered $4$-tuple of points $(\mathbf x, \mathbf y, \mathbf z, \mathbf h)$.
A little exposition on tensors
A $k$-tensor is a multilinear function from $k$ copies of a vector space to scalars. Thus $T: \underbrace{V\times V \times \cdots \times V}_{k\text{ times}} \to \Bbb R$, where $V$ is a vector space, is a $k$-tensor. (One little note: this isn't the full definition of a tensor, but it'll work for what we're doing).
From this we see that $Df(\mathbf x)$ defined by $[Df(\mathbf x)](\mathbf h) = \nabla f(\mathbf x)\cdot \mathbf h$ is a $1$-tensor and $D^2f(\mathbf x)$ defined by $[D^2f(\mathbf x)](\mathbf h_1,\mathbf h_2) = {\mathbf h_2}^T[Hf(\mathbf x)]\mathbf h_1$ is a $2$-tensor. Note that the matrix expressions make it clear that $Df(\mathbf x)$ is a linear function from $\Bbb R^n\to \Bbb R$ and $D^2f(\mathbf x)$ is a bilinear function from $\Bbb R^n\times\Bbb R^n\to \Bbb R$.
Then we know that the third derivative $D^3f(\mathbf x)$ should be defined by $$[D^3f(\mathbf x)](\mathbf h_1, \mathbf h_2, \mathbf h_3) = \sum_{i,j,k} (\mathbf h_3)_i(\mathbf h_2)_j(\mathbf h_1)_k\partial_i\partial_j\partial_k f(\mathbf x)$$
Using this and continuing in the obvious way, we can see that the $k$th order Taylor polynomial of a $k$-times differential function $f:\Bbb R^n\to \Bbb R$ at the point $\mathbf x+\mathbf h$ is given by $$P_k(\mathbf x + \mathbf h) = f(\mathbf x) + [Df(\mathbf x)](\mathbf h) + \frac{1}{2!}[D^2f(\mathbf x)](\mathbf h,\mathbf h) + \cdots + \frac{1}{k!}[D^kf(\mathbf x)](\underbrace{\mathbf h,\cdots, \mathbf h}_{k \text{ arguments}})$$ where $D^nf(\mathbf x)$ with $n\in\{1,2,\dots, k\}$ is defined as above.
Compare this with the scalar version of Taylor's theorem: Let $f:\Bbb R\to \Bbb R$ be a $k$-times differentiable function. Then the $k$th order Taylor polynomial of $f$ at $x+h$ is given by $$P_k(x+h) = f(x) + f'(x)h + \frac1{2!}f''(x)h^2 + \cdots + \frac{1}{k!}f^{(k)}(x)h^k$$