The textbook on calculus of variations by Liberson gives the following definition of "first variation":
It also gives the definition of the "Gateaux derivative"
I want to prove that if $G$ is the first variation of $J$, it is also the Gateaux derivative of $J$. This seems very simple, but I'm not sure about one particular step.
My derivation:
Define $A(\alpha, \eta)=\frac{J(y+\alpha \eta)-J(y)}{\alpha}+\frac {o(\alpha)} \alpha$
From the definition of first variation, it follows that $\forall \alpha \forall \eta:\delta J|_y(\eta)=A(\alpha, \eta)$.
$(1)$ Here is the step I'm not sure about: Since $\delta J|_y(\eta)=A(\alpha, \eta)$ holds for all $\alpha$, it must in particular also hold for $\alpha$ as it tends to $0$. Therefore:
$$\delta J|_y(\eta)=\lim_{\alpha \to 0}A(\alpha, \eta)$$
$(2)$ By the definition of little-oh, this is equal to $\lim_{\alpha \to 0}\frac{J(y+\alpha \eta)-J(y)}{\alpha}$, which is the definition of the Gateaux derivative.
Is step $(1)$ in particular, and my derivation in general, rigorous?
.
EDIT: I have now tried to apply the same principle to the second variation, which the book defines as:
(just a small point: shouldn't there be a $\frac 1 2 $ before the second variation there?)
However, this gives me the following nonsensical result:
Define $A(\alpha, \eta)=\frac {J(y+\alpha \eta)-J(y)-\alpha \cdot\delta J|y(\eta)} {\alpha^2}+\frac {o(\alpha^2)} {\alpha^2}$
From the definition of second variation, it follows that $\forall \alpha \forall \eta:\delta^2 J|_y(\eta)=A(\alpha, \eta)$.
$(1)$ Hence by the same step I used before:
$$\delta^2 J|_y(\eta)=\lim_{\alpha\to 0} A(\alpha, \eta)= \lim_{\alpha\to 0} \left( \frac{J(y+\alpha \eta)-J(y)-\alpha \lim_{\alpha\to 0} \left ( \frac{J(y+\alpha \eta)-J(y)}{\alpha}\right)}{\alpha^2}\right)= \lim_{\alpha\to 0} \left( \frac{J(y+\alpha \eta)-J(y)-J(y+\alpha \eta)+J(y)}{\alpha^2}\right)=0$$
What am I doing wrong there? Am I not allowed to remove the limit inside the limit?



Okay I am giving an example of what my concern is about. Since this wouldn't fit into a comment I write it as an answer.
$\newcommand{\R}{\mathbb{R}}$ Lets regard $J:\R \to \R, x\mapsto x^2$ then for a fixed $\eta$ we know \begin{align*} J(x+\alpha \eta) = J(x) + J'(x)\cdot\eta\cdot\alpha + \underbrace{o(\alpha)}_{\alpha^2\eta^2} . \end{align*} This means $J|_x(\eta)=2x\eta$, clearly independent of $\alpha$. Now \begin{align} A(\alpha, \eta) =\frac{J(x+\alpha \eta)-J(x)}{\alpha}+\frac {o(\alpha)} \alpha = \frac{2x\alpha\eta + \alpha^2\eta^2}{\alpha} + %\frac{\alpha^2\eta^2}{\alpha} \frac{o(\alpha)}{\alpha} = 2x\eta + \alpha\eta^2 + \frac{o(\alpha)}{\alpha}. \end{align} Now you are saying $J|_x(\eta)= A(\alpha,\eta)$ which is a little bit confusing. You can say this is true if you choose the right function in $o(\alpha)$ but it isn't clear that you can find such a function in $o(\alpha)$. This would mean that $J|_x(\eta)-\frac{J(x+\alpha \eta)-J(x)}{\alpha} = \frac{o(\alpha)}{\alpha}$ which is true but you didn't say why.
edit the definition of $A$ is somehow a problem because $o(\alpha)$ isn't a function but a set. You can say you choose the same function for $o(\alpha)$ as in the definition of the first variation but in this case your claim doesn't hold as my example demonstrates.
In my first equation the right choice of $o(\alpha)$ is $\alpha^2\eta^2$ but in order to receive the equalities for the function $A$, you have to choose $o(\alpha)$ as $-\alpha^2\eta^2$.
Of course everything works fine somehow because the result is right but it seems you are doing circular arguments. I just wanted to warn you about a bad way of proving things. It isn't clear from the beginning that there is a function in $o(\alpha)$ such that $\delta J|_y(\eta) = A(\alpha,\eta)$ holds. Nevertheless it is easy to show but you can't start with that because you are already done if you start with that assumption.
Maybe this seems like bean-counting but this is what mathematics is all about ;)
In order to give a rigorous prove I would do the following \begin{align*} \lim_{\alpha\to 0} \frac{f(y+\alpha\eta)-f(y)}{\alpha} = \lim_{\alpha\to 0} \frac{f(y)+\delta J|_y(\eta)\alpha+o(\alpha)-f(y)}{\alpha} \end{align*} Where $\delta J|_y(\eta)$ is the first variation. I just used the definition of the first variation. Now I am doing some simplification \begin{align*} = \lim_{\alpha\to 0} \delta J|_y(\eta)+\frac{o(\alpha)}{\alpha} =\delta J|_y(\eta)+\lim_{\alpha\to 0}\frac{o(\alpha)}{\alpha} =\delta J|_y(\eta) \end{align*} So this justifies to use the same symbol for both definitions.