I have $y:\mathbb{R}\to \mathbb{R}^n$ and $f:\mathbb{R}^n\to\mathbb{R}^n$ and $\dot y = f(y)$. And I have the differentiation with respect to the arguement of $y$: \begin{align*} \dot y &= f(y) \\ \ddot y &= f’(y)\dot y \quad\quad& \text{step 2}\\ y^{(3)} &= f’’(y)(\dot y, \dot y) + f'(y)\ddot y \quad\quad& \text{step 3}\\ y^{(4)} &= f’’’(y)(\dot y, \dot y,\dot y) + 3f’’(y)(\ddot y, \dot y) + f’(y)y^{(3)}\quad\quad& \text{step 4} \end{align*}
My question is:
- What is $f’’’(y)(\dot y, \dot y,\dot y)$ in matrix form? I don't understand the notation and how it works.
- What are the rules governing this differentiation? How does one go from step 2 to step 3 to step 4?
The chain rule for vector differentiation is a bit more complicated than that for scalar functions since you need to ensure the dimensions of each multiplication are well defined. For instance, suppose $\dot{y}(t) = f(y(t))$ for $t\in\mathbb{R}$. Then \begin{equation*} \frac{d}{dt}\dot{y}_i(t) = \frac{d}{dt} f_i(y(t)) = \nabla_y f_i(y(t))^\top \dot{y}(t) ~ \text{for $i\in\{1,2,\dots,n\}$}. \end{equation*} Compactly, we have that \begin{equation*} \ddot{y}(t) = \begin{bmatrix} \nabla_y f_1(y(t))^\top \\ \nabla_y f_2(y(t))^\top \\ \vdots \\ \nabla_y f_n(y(t))^\top \end{bmatrix} \dot{y}(t) = Df(y(t)) \dot{y}(t), \end{equation*} where $Df$ is the Jacobian of $f$. Taking another derivative, \begin{gather*} \frac{d}{dt}\ddot{y}_i(t) = \frac{d}{dt}\nabla_y f_i(y(t))^\top \dot{y}(t) = \left(\frac{d}{dt} \nabla_y f_i(y(t))\right)^\top\dot{y}(t) + \nabla_{y} f_i(y(t))^\top \ddot{y}(t) \\ = \left( \nabla^2_y f_i(y(t)) \dot{y}(t) \right)^\top \dot{y}(t) + \nabla_y f_i(y(t))^\top \ddot{y}(t) \\ = \dot{y}(t)^\top \nabla^2_y f(y(t)) \dot{y}(t) + \nabla_y f_i(y(t))^\top \ddot{y}(t), \end{gather*} for $i\in\{1,2,\dots,n\}$. Now, since the Hessian $\nabla_y^2 f_i(y(t))$ is an $n\times n$ matrix, there is not really a simple method for combining the above equations into a vector equation for $\frac{d}{dt}\ddot{y}(t)$ without introducing tensors or bi-linear forms.