For some problems, in discrete case, for example, I have seen people taking a min-max problem and replacing maximization by a constant with a constraint, like:
$$\min_{x\in X} \max_{y \in Y} f(x, y)$$
if $X, Y$ are finite sets, then you can replace it with
$$\min_{x\in X, V} V$$ $$ \text{s.t.} \ V > f(x, y) \ \forall y\in Y$$
which leads to a finite number of constraints, if $Y$ is finite. If $f(x, y)$ is linear, that even turns into linear constraints that are easy, just LP.
I wonder if you can do similar things for continuous functions. If I just do Lagrange multipliers, min-max problem re-appears, with even more issues, as something like (I am not very experienced with primal-dual conversions, unfortunately):
$$\min_{x\in X, V} \max_{y, \lambda>0} V + \lambda(V-f(x, y))$$
Reformulating the constraint as $V > \max_y f(x, y)$ and then using something like a brier function also does not seem to be a good idea you end up getting even more complicated min-max problem.
People somehow manage to do this kind of conversion, for example in Monge-Kantorovich Duality Theorem, as far as I understand, though I can not understand how one can generalize it to arbitrary task in form above.
My primary motivation is that I hope that the resulting "min-min" problem might be solved nicely with first-order methods, whereas gradient decent for min-max problems might not even converge. Basic example of this is just a hyperbolic function with single saddle point in the origin: $f(x, y) = xy$ - ordinary gradient decent would not be able to handle this. Let me know, if that seems to be wrong.
Thank you!