I am trying to make an introduction to the calculus of variations. This field has many connections with functional analysis, in which I do not have an experience. I recently learned about function spaces, for example the space of all continuous functions $C[a,b]$ on the interval $[a,b]$ and the space of square integrable functions $L^2[a,b]$ on the interval $[a,b]$. It seems that in many resources about Calculus of Variations, the functional derivative in the direction of a function $v$ is given with the inner product $\langle \nabla J[u],v\rangle$ where $\nabla J[u]$ is the functional gradient. The inner product is defined as the $L^2$ norm of the form $\langle f, g \rangle = \int_{b}^{a}f(x)g(x)dx$. From there it follows the derivation of the Euler-Langrange formula, etc.
My question is, why does one chooses the $L^2$ norm here specifically, in the context of Calculus of Variations? It seems that there are different norms associated with different function spaces in functional analysis: Doesn't this limit the candidate functions in an optimization problem to be in $L^2$ space? Maybe the reason is that $L^2$ norm can be seen as a natural expansion of the dot product in finite Euclidean spaces to infinite dimensional function spaces? I have not well understood this point; maybe I am asking something too obvious, excuse me if it is so, since I am a beginner in this field.
Recall the definition of the derivative of a scalar-valued function $G : \mathbb{R}^n \to \mathbb{R}$. We say that $G$ is differentiable at $x$ with derivative $G'_x$ if there is a linear scalar-valued function $G'_x$ such that
$$G(y) = G(x) + G'_x(y-x) + o(\| y - x \|).$$
The definition of a functional derivative is exactly the same, except that we replace $\mathbb{R}^n$ with a normed function space $F$. Note that the norm appears explicitly in this equation. In finite dimensions this does not matter, because all norms are equivalent, so all norms induce the same derivatives. But in infinite dimensions there are non-equivalent norms, so our choice of norm does affect the derivative. Hence we should be careful to ensure that a "good linear approximation" does what we want.
If $F$ is a Hilbert space, then we have the Riesz representation theorem. This tells us that $G'_x(z)=\langle g_x,z \rangle$ for some $g_x \in F$. In light of this, we can view the functional derivative as a functional gradient $\nabla G : F \to F$ if and only if $F$ is a Hilbert space. I think it is clear that this is a nice property. Requiring it rules out a lot of candidates for our choice of function space. In particular it rules out $L^p[0,1]$ for $p \neq 2$ and it rules out all of the $C^k[0,1]$ spaces.
On $L^2$ and the related $L^2$-based spaces, we have the Fourier transform. This is extremely powerful for any problems involving derivatives, and we lose it if we go away from $L^2$-based spaces.
That said, I think the more important thing is that the action functionals should be continuous. For example, in one of the very basic problems you need to minimize
$$L[y] = \int_0^1 y'(s)^2 ds$$
subject to $y(0)=a,y(1)=b$. Assuming you agree with me that you should not assume you have more derivatives or integrability than you are actually using, making $L$ continuous leaves two choices: either you work on the Sobolev space $H^1$ or you work on the continuous function space $C^1$. Since $H^1$ is a Hilbert space and contains $C^1$, it seems natural that this is the better bet.
But with a different form of action functional, you might need to deal with a different space in order to ensure that the functional is continuous. So it depends rather strongly on the problem.