Functional derivatives are a cornerstone in many engineering and physics concepts. For instance, engineering students typically encounter this concept in analytical mechanics and optimization theory.
As an engineering student, I've found the concept of functional derivatives somewhat confusing. This confusion mainly stems from my understanding that derivatives have been defined in $\mathbb{R}^n$ in the calculus textbook. . Almost all the physics/engineering textbooks I've come across don't formally "define" functional derivatives; they seem to introduce the concept rather abruptly. I suspect this might be because a formal definition requires a grounding in functional analysis.
However, I'm eager to explore this further and would like to ask: Is it possible to define functional derivatives without delving into the depths of functional analysis?
Short answer: yes.
The process of defining and proving facts about a functional derivative in a rigorous context is very difficult, particularly because the different functions often might lie in different spaces, or operators involved in the computation might move things to different spaces, etc. Especially for constrained optimization problems, it is truly a very technical endeavor. This method of figuring out exactly which spaces everything lies in, exactly how operators should be restricted to different domains, etc. is sometimes known as the exact Lagrange method and is how one must prove optimality conditions for these types of problems. If all you wish to do is derive them, we can do something easier.
The alternative if the formal Lagrange method, which is very helpful in deriving expressions for derivatives, gradients, and optimality conditions for variational problems in function spaces. The key idea is to manipulate various operators, like $\Delta$, $\nabla$, etc. only formally, and to assume that every function and all their derivatives are sufficiently integrable for everything to remain defined. We then basically just use basic definitions of directional derivatives and gradients adapted from finite dimensions. In particular, we usually use something like $$ \delta F(u)h := \frac{d}{d \epsilon}F(u+\epsilon h) $$ fopr the directional derivative, and $$ \langle g(u), v\rangle = \delta F(u)v, \,\forall v $$ for the gradient. These are the exact same definitions of the directional derivative and gradient in general inner product spaces, and we can see how they match up with our finite-dimensional intuition.
For an example, say we want to minimize the functional $$ F(u) = \frac{1}{2}\int_{\Omega}\|\nabla u\|^2~\mathrm{d}x + \frac{1}{2}\int_\Omega(u - f)^2~\mathrm{d}x. $$ We will perturb this functional by $\epsilon h$ and collect everything of order $\epsilon$, but we will not pay any mind to the smoothness or integrability of $u$, $h$, or $f$. We assume they are nice enough to do whatever we want and proceed formally. I'm skipping steps, but you should obtain $$ F(u+\epsilon h) = F(u) + \epsilon\int_\Omega \nabla u\cdot\nabla h + (u-f)h~\mathrm{d}x + \mathcal{O}(\epsilon^2), $$ so our directional derivative is given by $$ \delta F(u)h = \int_\Omega \nabla u\cdot\nabla h + (u-f)h~\mathrm{d}x. $$ If we wish to obtain the gradient, we need to find some expression $g(u)$ satisfying $$ \int_\Omega g(u)v~\mathrm{d}x = \delta F(u)v,~\forall v. $$ This is the actual definition of the gradient that you may not have learned in calculus, but it explicitly relates the directional derivative to the inner product to obtain a vector $g(u)$ that we can, e.g., set equal to zero to obtain optimality conditions. Using Green's identities, we obtain $$ \delta F(u)v = \int_\Omega \nabla u\cdot\nabla v + (u-f)v~\mathrm{d}x = \int_\Omega(-\Delta u + u - f)v~\mathrm{d}x + \int_{\partial\Omega} v \nabla u \cdot \hat{n}~\mathrm{d}\Gamma. $$ We see now that we have obtain an expression $g(u) = -\Delta u + u - f$ that integrates against $v$ to obtain the directional derivative $\delta F(u)v$. There is a little business about the boundary, but if we want optimality conditions, obtained by setting the above expression to zero for every $v$, we see that we must have $\nabla u\cdot \hat{n}=0$ identically on the boundary. The optimality conditions can then be expressed as $$ g(u) = 0 \implies -\Delta = f-u,\,x\in\Omega,\quad \nabla u\cdot\hat{n} = 0,\, x\in\partial\Omega. $$ This expression is correct and is the same one we would obtain if we did the technical exact Lagrange method, and now all that needs to be done is properly assign spaces, but that is easier now since we know what operators need to apply to which functions, etc. However, if all you need to do is derive or solve an Euler-Lagrange equation, then we didn't need any of the technical stuff. Even for implementing these numerically, we don't need much more information aside from how smooth the solution should be.
This is discussed extensively in, c.f., Optimal Control of Partial Differential Equations: Theory, Methods and Applications by Tröltzsch.