I'm reading Bishop's "Pattern Recognition and Machine Learning" section on the Calculus of Variations (Appendix D) and he defines the functional derivative of $\frac{\delta F}{\delta y(x)}$ as:
$$ F[y(x) + \epsilon \eta(x)] = F[y(x)] + \epsilon \int\frac{\delta F}{\delta y(x)}\eta(x)dx + O(\epsilon^2) $$
Then, for $F[y] = \int G(y(x), y'(x), x)dx$ we get $$ F[y(x) + \epsilon \eta(x)] = F[y(x)] + \epsilon \int \left[ \frac{\partial G}{\partial y} \eta(x) + \frac{\partial G}{\partial y'} \eta'(x)\right ]dx + O(\epsilon^2) $$
How does one arrive at this expression? Most derivations of the Euler-Lagrange equation I've seen use the total derivative $\frac{\delta F}{\delta \epsilon}$. I'm unsure of the connection between these two notations.
A multivariable function $z=f(x,y)$ is differentiable if
$\Delta z = \frac{\partial f}{\partial x}\Delta x + \frac{\partial f}{\partial y} \Delta y + \epsilon_1 \Delta x + \epsilon_2 \Delta y \quad \quad \text{Equation 1}$
where $\epsilon_1 \rightarrow 0$, $\epsilon_2 \rightarrow 0$ as $\Delta x \rightarrow 0$ and $\Delta y \rightarrow 0$.
To understand the definition of differentiability for multivariable functions it helps to have in mind the definition of the tangent plane to a function $z=f(x,y)$:
$\Delta z = \frac{\partial f}{\partial x}\Delta x + \frac{\partial f}{\partial y} \Delta y$
For small $\Delta x$ and $\Delta y$ we have $\epsilon_1$ and $\epsilon_2$ are small and the function can be approximated well be its tangent plane.
If you use the definition of the functional given by $D.5$:
$F[y(x)]=\int G(y(x),y'(x),x) dx$
and replace $y(x)$ by $y+\epsilon \eta (x)$ you get:
$F[y(x)+\epsilon \eta(x)]=\int G(y(x)+\epsilon \eta(x), y'(x)+\epsilon \eta'(x), x) dx$
$=F[y(x)]+\int G(y(x)+\epsilon \eta(x), y'(x)+\epsilon \eta'(x), x) - G(y(x),y'(x),x)dx \quad \quad \text{Equation 2}$
If you apply the definition of differentiability to the function G which is a function of three variables you get:
$\Delta G=\frac{\partial G}{\partial y} \Delta y+\frac{\partial G}{\partial y'} \Delta y' + \frac{\partial G}{\partial x}\Delta x + \epsilon_1 \Delta y+\epsilon_2 \Delta y' + \epsilon_3 \Delta x \quad \quad \text{Equation 3}$
Now consider the change in G in equation 2 inside the integral.
This corresponds to a change in G with $\Delta y=\epsilon \eta (x)$, $\Delta y' = \epsilon \eta'(x)$ and $\Delta x = 0$.
Therefore using equation 3 we have:
$\Delta G = \frac{\partial G}{\partial y} \epsilon \eta(x)+\frac{\partial G}{\partial y'} \epsilon \eta' (x) + \epsilon_1 \epsilon \eta (x) + \epsilon_2 \epsilon \eta' (x)$
and substituting this back into Equation 2 we get:
$F[y(x)+\epsilon \eta(x)]=F[y(x)]+\int \frac{\partial G}{\partial y} \epsilon \eta(x)+\frac{\partial G}{\partial y'} \epsilon \eta' (x)dx + \epsilon_1 \epsilon \int \eta (x) dx + \epsilon_2 \epsilon \int \eta' (x) dx $
$=F[y(x)]+\int \frac{\partial G}{\partial y} \epsilon \eta(x)+\frac{\partial G}{\partial y'} \epsilon \eta' (x)dx + O(\epsilon_1 \epsilon) + O(\epsilon_2 \epsilon)$
and if $\epsilon_1$, $\epsilon$ and $\epsilon_2$ are all the same magnitude then you get D.6:
$\Delta G=F[y(x)]+\int \frac{\partial G}{\partial y} \epsilon \eta(x)+\frac{\partial G}{\partial y'} \epsilon \eta' (x)dx + O( \epsilon^2)$