I just began studying variational calculus, and I'm having some issues getting a conceptual grasp on functional differentiability.
Let $J[y]$ be a functional defined on some normed linear space, and let $$\Delta J [h] = J [y+h]-J [y] $$ Suppose that $$\Delta J[h] = \phi [h] + \epsilon ||h||$$ Where $\phi [h] $ is a linear functional and $\epsilon \to 0$ as $||h|| \to 0$. Then $J [h]$ is said to be differentiable.
From the definition of $\Delta J [h] $, it is clear that we're concerned with finding how $J $ varies as its argument changes (very analogous to traditional differentiation). But I don't think we could graph this result, as infinitely many functions will have the same norm (graphing the tangent line is generally the easiest way to understand a derivative in Real Analysis), so I'm struggling to see the full meaning of this definition.
For our norm, we are using $||h|| = \max_{a \le x \le b} |h(x)|$. If $||h|| \to 0$, then $h$ approaches the zero function, so that $y+h \to y$, which is consistent with ordinary derivatives.
But why would the fact that we can write $\Delta J[h] = \phi [h] + \epsilon ||h||$ correspond to differentiability (perhaps I'm trying to relate it too much to real analysis here)? It seems to say that the if a (generally) nonlinear functional is differentiable, we can relate 'small' changes in $J$ to some linear functional (sort of similar to a linear approximation, I suppose), as:
$$\frac {\Delta J [h]-\phi [h]}{||h||} = \epsilon $$ If $||h|| \to 0$ and $\epsilon \to 0$, it necessarily demands that $\Delta J [h] - \phi [h] \to 0$
What is the purpose of having this class of functionals and how does it relate to how a general functional changes with respect to its variable?
This notion of differentiability is called Frechet differentiability and is very useful in many contexts. It is commonly applied if $F:U_1 \rightarrow B_2$ is a map between two Banach spaces $B_1,B_2$, where $U_1\subseteq B_1$ denotes an open subset. One reason it is useful is that many properties of differentiable functions of several variables in $\mathbb{R}^n$ carry over to continuously Frechet differentiable functions between (open subsets of) Banach spaces.
Note that if $B_1=B_2=\mathbb{R}$ and $U\subseteq\mathbb{R}$ is open, then $F$ is Frechét differentiable if and only if it is differentiable in the ordinary sense. In this case, in your notation, $\phi(h)=F'(y)h$. That is, $\mathbb{R}$ is a Banach space, and if we have (for fixed $y\in U$) existence of $$ F'(y)=\lim_{h\rightarrow 0}\frac{F(y+h)-F(y)}{h}, $$ then there is a number $\phi\in\mathbb{R}$ (namely $\phi=F'(y)$) such that, for some $\delta>0$, $$ F(y+h)-F(y)-\phi h = \epsilon(y,h) h, \quad \mbox{ if } \lvert h \rvert \leqslant \delta, $$ where $\epsilon(y,h)\rightarrow 0$ as $h\rightarrow 0$. In this case, we can define a linear functional $\phi:\mathbb{R}\rightarrow\mathbb{R}$ by $\phi(h)=\phi h$. On the other hand, if there is a linear functional $\phi:\mathbb{R}\rightarrow\mathbb{R}$ such that, for some $\delta>0$, $$ F(y+h)-F(y)-\phi(h) = \epsilon(y,h) h, \quad \mbox{ if } \lvert h \rvert \leqslant \delta $$ where $\epsilon(y,h)\rightarrow 0$ as $h\rightarrow 0$, then we have existence of $$ F'(y)=\lim_{h\rightarrow 0}\frac{F(y+h)-F(y)}{h}=\phi(1). $$
If $B_1=\mathbb{R}^{n_1},B_2=\mathbb{R}^{n_2}$, then $F$ is continuously Frechet differentiable if and only if all partial derivatives of $F$ exist and are continuous, see for instance 'Principles of Mathematical Analysis' by Rudin, or any other text which treats functions of several variables in $\mathbb{R}^n$. In this case we have $\phi(h)_j=\sum_{i=1}^{n_1} h_i(\partial_i F_j)(y)$. This result carries over (more or less) to the Banach space setting, see for instance Theorem 1.1.6 in 'The Analysis of Linear Partial Differential Operators I' by Hörmander.
One of the main theorems on differentiable functions of several variables is the inverse function theorem. This also carries over to continuously Frechét differentiable mappings between Banach spaces, with a proof that is also standard for functions of several variables in $\mathbb{R}^n$, see for instance Theorem 1.1.7 in 'The Analysis of Linear Partial Differential Operators I' by Hörmander.
The list goes on, but the main point is simply that many of the results that one is accustomed to from the theory of functions of several variables in $\mathbb{R}^n$ carry over more or less directly to continuously Frechet differentiable mappings in Banach spaces. Furthermore, the proofs that work in $\mathbb{R}^n$ often carry over to the Banach space setting as well. In this sense, it is a natural generalization of differentiability. Of course, it is then also a natural generalization in case $B_2=\mathbb{R}$!