Truncation Error in Mixed Derivative

60 Views Asked by At

In Pattern Recognition and Machine Learning Ch 5.4.4 Finite Differences equation 5.90 gives a finite difference approximation to a mixed partial derivative:

$$\frac{\partial^2E}{\partial w_{ji}\partial w_{lk}}=\frac{1}{4\epsilon^2}[E(w_{ji}+\epsilon,w_{lk}+\epsilon)-E(w_{ji}+\epsilon,w_{lk}-\epsilon)-E(w_{ji}-\epsilon,w_{lk}+\epsilon)+E(w_{ji}-\epsilon,w_{lk}-\epsilon))] + O(\epsilon^2)$$

I tried to derive this myself and it seems like to derive it in the general case you need to treat the perturbations $\epsilon_1$ and $\epsilon_2$ of $w_{ji}$ and $w_{lk}$ independently. Then you use the second order Taylor approximations and in each expansion you have a remainder term $O(\epsilon_1^3,\epsilon_2^3)$ since they are second order approximations. In order to solve for the mixed partial derivative above I believe based on my own calculations and this answer here that you need to divide the remainder term $O(\epsilon_1^3,\epsilon_2^3)$ by $\epsilon_1\epsilon_2$ and the result according to Pattern Recognition and Machine Learning should be $O(\epsilon^2)$ where $\epsilon_1=\epsilon_2=\epsilon$. I'm somewhat familiar with big O notation but don't know much about truncation error and I'm wondering how you justify

$$O(\epsilon_1^3,\epsilon_2^3)/(\epsilon_1\epsilon_2)=O(\epsilon^2)$$

when $\epsilon_1=\epsilon_2=\epsilon$. It seems like dividing by the product $\epsilon_1\epsilon_2$ you first divide the $\epsilon_1^3$ by the $\epsilon_1$ and you get $O(\epsilon_1^2,\epsilon_2^3)$ and then you divide by the $\epsilon_2$ and you get $O(\epsilon_1^2,\epsilon_2^2)$ which if $\epsilon_1=\epsilon_2=\epsilon$ equals $O(\epsilon^2)$.

Is my intuition about this remainder correct and if so what justifies the fact that you can split the division component wise?

EDIT: It looks like the text here does not use an $O(\epsilon_1^3,\epsilon_2^3)$ term but instead uses two terms which are $O(\epsilon_1^2)$ and $O(\epsilon_2^2)$ but I haven't had the time to read this text in detail yet, I suspect the method of derivation is different to mine.

1

There are 1 best solutions below

0
On

Let's consider the special case for $\epsilon_1 = \epsilon_2 = \epsilon$. Then, the RHS becomes after spelling out the Taylor expansions (lets forget about the prefactor $1 / (4 \epsilon^2)$ for the moment) around $w_{ji}, w_{lk}$ up to third order. Let $\partial_x E = E_x$ denote the derivative w.r.t. to the first argument of $E$ and $\partial_y = E_y$ the derivative w.r.t. to the second argument: \begin{align} \epsilon \big( E_x + E_y\big) + \frac{\epsilon^2}{2} \big( E_{xx} + E_{yy} + 2 E_{xy}\big) + \frac{\epsilon^3}{6} \big(E_{xxx} + E_{yyy} + 3 E_{xxy} + 3 E_{xyy}\big) \\ -\Big[\epsilon \big( E_x - E_y\big) + \frac{\epsilon^2}{2} \big( E_{xx} + E_{yy} - 2 E_{xy}\big) + \frac{\epsilon^3}{6} \big(E_{xxx} - E_{yyy} - 3 E_{xxy} + 3 E_{xyy} \big)\Big] \\ -\Big[\epsilon \big( -E_x + E_y\big) + \frac{\epsilon^2}{2} \big( E_{xx} + E_{yy} - 2 E_{xy}\big) + \frac{\epsilon^3}{6} \big(-E_{xxx} + E_{yyy} + 3 E_{xxy} - 3 E_{xyy} \big)\Big] \\ +\Big[\epsilon \big( -E_x - E_y\big) + \frac{\epsilon^2}{2} \big( E_{xx} + E_{yy} + 2 E_{xy}\big) + \frac{\epsilon^3}{6} \big(-E_{xxx} - E_{yyy} - 3 E_{xxy} - 3 E_{xyy} \big)\Big] \\ + \mathcal{O}(\epsilon^4) \end{align} Now you have to compare the coefficients of every derivative:

$E_x:\epsilon \big( 1 -1 -(-1) -1 \big) = 0 \: \checkmark$

$E_y:\epsilon \big( 1 -(-1) -(-1) -1 \big) = 0 \: \checkmark$

$E_{xx}: \frac{\epsilon^2}{2} \big( 1 - 1 - 1 + 1 \big) = 0 \: \checkmark$

$E_{yy}: \frac{\epsilon^2}{2} \big( 1 - 1 - 1 + 1 \big) = 0 \: \checkmark$

$E_{xy}: \frac{\epsilon^2}{2} \big( 2 -(-2) -(-2) + 2 \big) = 4 \epsilon^2 \: \checkmark$ (after dividing by the prefactor)

$E_{xxx} : \frac{\epsilon^3}{6} \big( 1 - 1 - (-1) - 1 \big) = 0 \: \checkmark$

$E_{yyy} : \frac{\epsilon^3}{6} \big( 1 - (-1) - 1 - 1 \big) = 0 \: \checkmark$

$E_{xxy} : \frac{\epsilon^3}{6} \big( 3 - (-3) - 3 - 3 \big) = 0 \: \checkmark$

$E_{xyy} : \frac{\epsilon^3}{6} \big( 3 -3 -(- 3) - 3 \big) = 0 \: \checkmark$

As you can see, every term of order $\epsilon^3$ drops out, the errors are of order $\mathcal{O}(\epsilon^4)$, dividing by $\epsilon^2$ gives you the (desired) result.