"Differential of" as an operator

81 Views Asked by At

I am reading the paper Learning Dynamical Systems from Partial Observations and I am having a hard time following the calculations in Appendix B. Let $F: \mathbb{R}^n \rightarrow \mathbb{R}^m$ be a multivariate function. The paper defines the differential $\partial_u F(u_0)$ of $F$ with respect to $u$ as the operator such that $$F(u_0 + \delta u) = F(u_0) + \partial_u F(u_0) \delta u + o(\delta u).$$

Now, let $F_{\theta}$ to mean that $F$ is a function of $\theta$. Likewise, let $X^{\theta}$ mean that $X$ is a function of $\theta$.

The paper states that

$$\partial_{\theta} F_{\theta} (X_t^{\theta} + \partial_{\theta} X_t^{\theta} \cdot \delta \theta + o(\delta \theta)) = \partial_{\theta} F_{\theta}(X_t^{\theta}) + \partial_{X} \partial_{\theta} F_{\theta}(X_t^{\theta}) \cdot \partial_{\theta} X_t^{\theta} \cdot \delta \theta + o(\delta \theta).$$

Here are my questions:

  1. Is the definition of the differential accurate? Shouldn't it be $$F(u_0 + \delta u) = F(u_0) + \partial_u F(u_0) \delta u + o(\|\delta u\|^2)$$ instead?

  2. Can anyone guide me on the calculation of $$\partial_{\theta} F_{\theta} (X_t^{\theta} + \partial_{\theta} X_t^{\theta} \cdot \delta \theta + o(\delta \theta))?$$ I can't figure out the calculation. Just looking at the resulting expression above, I wonder why the argument $(X_t^{\theta} + \partial_{\theta} X_t^{\theta} \cdot \delta \theta + o(\delta \theta)$ is not present at all in the $F_{\theta}$'s that appear on the right hand side if the chain rule was indeed used.

Suggestions appreciated. Thanks!

1

There are 1 best solutions below

1
On BEST ANSWER

Question 1

No, it should not be $o\left(\Vert\delta u\Vert^2\right)$, but some mathematicians would likely prefer $o\left(\Vert\delta u\Vert\right)$ over $o(\delta u)$. See English Wikipedia on "the total derivative as a linear map" for details, or search the internet for things like little o definition of derivative.

Question 2

The paper writes $\partial_\theta F_\theta\left(X_t^\theta+\partial_\theta X_t^\theta\cdot\delta\theta+o(\delta\theta)\right)=\partial_\theta F_\theta\left(X_t^\theta\right)+\partial_X\partial_\theta F_\theta\left(X_t^\theta\right)\cdot\partial_\theta X_t^\theta\cdot\delta\theta+o(\delta\theta)$, and we must justify this.

For simplicity, use "$\delta X$" to denote $\partial_\theta X_t^\theta\cdot\delta\theta+o(\delta\theta)$. Then we have $\partial_\theta F_\theta\left(X_t^\theta+\delta X\right)$ on the left. By the twice differentiability of $F$, we can apply the earlier $X$-differentiability equation to write \begin{align}\partial_\theta F_\theta\left(X_t^\theta+\delta X\right)&=\partial_\theta F_\theta\left(X_t^\theta\right)+\partial_X\partial_\theta F_\theta\left(X_t^\theta\right)\cdot\delta X+o(\delta X)\\ &=\partial_\theta F_\theta\left(X_t^\theta\right)+\partial_X\partial_\theta F_\theta\left(X_t^\theta\right)\cdot\left(\partial_\theta X_t^\theta\cdot\delta\theta+o(\delta\theta)\right)+o\left(\partial_\theta X_t^\theta\cdot\delta\theta+o(\delta\theta)\right)\\&=\partial_{\theta}F_{\theta}\left(X_{t}^{\theta}\right)+\partial_{X}\partial_{\theta}F_{\theta}\left(X_{t}^{\theta}\right)\cdot\partial_{\theta}X_{t}^{\theta}\cdot\delta\theta+\partial_{X}\partial_{\theta}F_{\theta}\left(X_{t}^{\theta}\right)\cdot o(\delta\theta)\\&\phantom{=}+o\left(\partial_{\theta}X_{t}^{\theta}\cdot\delta\theta+o(\delta\theta)\right)\text{ by linearity of }\partial_{X}\partial_{\theta}F_{\theta}\left(X_{t}^{\theta}\right)\\ &=\partial_{\theta}F_{\theta}\left(X_{t}^{\theta}\right)+\partial_{X}\partial_{\theta}F_{\theta}\left(X_{t}^{\theta}\right)\cdot\partial_{\theta}X_{t}^{\theta}\cdot\delta\theta+o(\delta\theta)\checkmark\end{align}

The last step in the above uses the fact that the linear operators $\partial_{X}\partial_{\theta}F_{\theta}\left(X_{t}^{\theta}\right)$ and $\partial_{\theta}X_{t}^{\theta}$ are continuous/bounded, so they won't change whether something is $o(\delta\theta)$.