This is probably a pretty basic question, but I can't figure out what people mean when they say "variational forms" in optimization. For example, in this paper I'm reading, the variational form of a norm is just the dual norm. For that matter, what do people mean when they say variational analysis, and what is so "variational" about it? I figure it probably originates from PDEs but I don't quite see the connection.
Can anyone provide some context / insight? Thanks!
As far as I know, "variational form" is a lot like "dual form": it has a precise technical meaning in some specific situations ("form of the problem to which the Calculus of Variations is amenable") but has elastically expanded to encompass a general idea: writing the problem in a form where the solution is the maximizer/minimizer or maximum/minimum of some real-valued function.
Some examples:
The median. Instead of defining it as the middle entry of a list of numbers $\{x_i\}$ when written in sorted order, you can define it as $$\arg \min_{y\in\mathbb{R}} \sum |x_i-y|.$$
The heat equation. A steady-state solution to the heat equation is one that is harmonic: $\Delta f=0$. It's also the one which minimizes the Dirichlet energy: $$\arg\min_f \int_{\Omega} \|\nabla f\|^2\,dA.$$
Smallest eigenvalue. The minimum-magnitude eigenvalue is the solution to the optimization problem $$\min_{v} \frac{\|Av\|}{\|v\|}.$$
Minimal surfaces. A minimal surface ("soap film") interpolating a set of wires is a surface with zero mean curvature. It's also a surface of minimal surface area: $$\arg \min_{S} \int_{S} 1\,dA.$$
Now why would one care about the variational form of a problem? Several reasons:
1) The variational form is often more amenable to computation. The reason for this is that minimizing a real-valued function is a local problem: one can begin with a guess at the solution and iteratively improve it using any number of tools like gradient descent, Newton's method with line search, etc. Often, this local problem is much easier (computationally) than the original global problem (and moreover, when one has a powerful collection of hammers, it's convenient to try to turn every problem into a nail).
2) It can be easier to reason and prove theorems about the solution to the variational form than about the original problem. For example, it's very common to prove correctness of a proposed iterative scheme for solving a problem by showing that each iteration decreases the energy in the corresponding variational form.
3) It can be easier to extend and generalize the variational form. For example, the variational form of the median should immediately suggest a reasonable extension to medians of points in the plane. As another example, consider the problem of finding a minimal surface computationally: you might want to approximate $S$ as some discrete polyhedron. How do you even define the mean curvature at a point of a polyhedron? Hmm. On the other hand, discretizing the area-minimizing definition of a minimal surface is straightforward.