How to interpret functions that are defined in terms of $\text{argmax}$ in optimization?

301 Views Asked by At

In optimization, sometimes you encounter functions or difference equations defined in terms of $\text{argmin}$ or $\text{argmax}$

$p: X \to X, x \mapsto \text{argmin}_{y \in X} \phi(y) + \|x-y\|^2$

I am puzzled by the fact we need to solve an optimization algorithm in order to evaluate these functions.

For example, consider $p: X \to X, x \mapsto \text{argmin}_{y \in X} \phi(y) + \|x-y\|^2$ .

On the one hand, the interpretation of this function is clear. Given a vector $x \in X$, we map it to a vector that gives the minimum of $\phi(y) + \|x-y\|^2$.

But on the other hand, we have no idea of this map is even well defined, since it is well known in optimization this map $\text{argmin}$ could have one, multiple, or no solution. So given a sequence of $x_k$, $p(x_k)$ could yield something like $y_1, \{y_{21},y_{22}\}, \text{no solution}, y_4, ...$

Further, it is not at all clear how this minimizer $y$ is found, it is usually not stated as part of the function. I would be much happier if the function was defined as "$p: X \to X, x \mapsto \text{argmin}_{y \in X} \phi(y) + \|x-y\|^2$ where we solve for $y$ using ... algorithm which guarantees a unique minimizer"

How do people in optimization deal with this ambiguity.

1

There are 1 best solutions below

1
On BEST ANSWER

Regarding your first question: indeed, one has to take care when defining a mapping like this to ensure that the solution to the minimization/maximization is unique.

Fortunately, in this case, we likely have that. The specific mapping you have chosen here is the proximal mapping, and in the convex optimization settings where it is often found, the function $\phi$ is convex (although not necessarily differentiable). If that is the case, then the squared term $\|x-y\|_2^2$ makes the function strongly convex, which guarantees that the minimization will yield a single, unique solution.

Regarding your second question. The practice of mathematics does not require that the algorithm for solving an optimization problem be provided before that problem can be studied, or before anything useful can be said about it. That's a good thing, because there simply is no way one can define a single, universal, practical algorithm for solving optimization problems, even convex ones.

Take the specific example of the proximal map: the solution method depends critically on the choice of $\phi(x)$. For some choices, like $\phi(x)=\|x\|_1$, the solution can be obtained analytically. For others, an iterative numerical approach (like a gradient search or Newton method) is required. Thus while there are general principles and patterns that can guide you, the solution method must be determined on a case-by-case basis.

But again, there is quite a bit we can say about the proximal mapping before we ever get around to solving it.