I was wondering since Calculus of Variations is about determining extremas and Training Neural Networks is about finding a set of weights such that the total error is minimized, is it possible to draw an analogy between the two and treat them as the same discipline.
For the sake of simplicity lets train using linear regression on a huge data set. We are trying to determine a polynomial that minimizes the total error. In this scenario, the input of the functional is coefficients(weights) that uniquely determine a function. After running the functional we are given an error rate.
To determine if its the lowest error rate we change the coefficients up a bit and rerun the functional. The new coefficients are determined using gradient descent.
In this way, we progress towards the lowest error rate.
I recognize that there are methods like Euler-Lagrange equation but none of them relate to deep-learning according to my knowledge.
As the comment above succinctly says, they are rather different because neural networks are large parametric models, i.e. $f(x;\theta)$, for some parameters $\theta$, the network weights. We can use classical (non-variational) calculus to train this by simply doing: $ \theta \leftarrow \theta - \eta\nabla_\theta E$ for some error function $E$. In other words, we choose a function $f$, and then use it to solve a problem $ \theta^* = \arg\min_\theta E(\theta)$. Note that we do not find $f$, it is fixed in advance by our network structure choice.
The variational calculus is almost the opposite. One instead starts with a problem, (e.g. find $\gamma(t)$ such that $ \int_0^T J[\gamma(t),\gamma'(t)]dt $ is minimal), and finds an $f$ that optimally solves it directly. It's not really clear how to do this numerically on a computer without parametrizing $f$, because the resulting search is in an infinite dimensional function space. In other words, one does not wish or have to assume the form of $f$ in advance, but rather to find the right form for $f$. (However, it is worth noting that this may not be that important, since a sufficiently deep neural network probably has sufficient representational power to learn whatever you want).
There are some applications to ML, e.g. look up variational inference algorithms, but theyrequire clever parametrizations, and usually ultimately reduce to non-variational numerical optimization.