Intuition behind the max-min inequality

998 Views Asked by At

Here is the statement of the max-min inequality:

enter image description here

I understand the proof, but I'm struggling to visualize this inequality.

Here is my attempt at intuition. Let $f: \mathbb{R} \times \mathbb{R} \to \mathbb{R}$. The first slot of $f$ is the $z$-coordinate and the second is the $w$-coordinate. Then to find the value of $\sup_{z \in Z} \inf_{w \in W} f(z, w)$, we first walk along the $z$-axis. At each point along the $z$-axis, we stop and look out over the line that is parallel to the $w$-axis and intersects the point where we are currently standing. We record the elevation of the lowest valley created by $f$ along this line. We continue this process until we've walked along the entirety of the $z$-axis. At the end, we scour our notebook and return the largest number we recorded. In other words, $\sup_{z \in Z} \inf_{w \in W} f(z, w)$ is the highest, lowest valley found through this process.

$\inf_{w \in W} \sup_{z \in Z} f(z, w)$ can be found through an analogous process. This time we walk along the $w$-axis and look out over the lines parallel to the $z$-axis. We note the highest peak along each line parallel to the $z$-axis. At the end, we return the lowest of the numbers we recorded, i.e., the lowest, highest peak.

I am struggling to intuitively see why the highest, lowest valley will always be lower than the lowest, highest peak.

Is there a better interpretation of this inequality? Thanks!

3

There are 3 best solutions below

0
On

You can translate each step of the proof into your analogy. For intuition's sake, I've assumed in some places that the infima/suprema are actually attained, but the argument holds even if the they are not attained.


If you stand at a fixed $z=z_0$ and look at the line parallel to the $w$-axis, by definition, the lowest valley $\inf_w f(z_0, w)$ will be less than or equal to any other point on this line $f(z_0, w_0)$ (for any $w_0$). So far, this leads to $$\inf_w f(z_0, w) \le f(z_0, w_0).$$


This is true for any $z_0$, so we can take a supremum on both sides to obtain $$\sup_z \inf_w f(z, w) \le \sup_z f(z, w_0).$$ The intuition is that the left-hand side is the "highest lowest valley" in your words. If we fix $z$ at the maximizer $z^*$, the lowest valley is still lower than any other point on that line where $z=z^*$, i.e. $\sup_z \inf_w f(z,w) = \inf_w f(z^*, w) \le f(z^*, w_0)$. But this last value is smaller than the highest peak when fixing $w=w_0$, i.e. $f(z^*, w_0) \le \sup_z f(z, w_0)$.


The above is true for any $w_0$, so we can take an infimum over the right-hand side to obtain $$\sup_z \inf_w f(z, w) \le \inf_w \sup_z f(z, w).$$ If you like, you can let $w^*$ be the minimizer on the right-hand side, and then step through the intuition above with $w^*$ instead of $w_0$.

0
On

I think it makes sense when you step through the proof heuristically. So using your analogy, imagine first walking across the $z$ axis and for every $z$ finding the value of $w$ that minimizes $f(z,w)$. Now clearly this is going to be smaller or equal to any $f(z,w)$ for all $z$ and $w$ because we essentially just did that operation and picked the smallest value. Now suppose we take that list of $w$ that gives us the minimum value for any $z$ and we now only consider the value of $z$ that gives us the largest value of $f(z,w)$ for that list of minimizing $w$ values. This is the max-min value. And note that because we already knew that no matter the choice of $z$ or $w$ this would be less than or equal to the corresponding $f(z,w)$ we have the inequality,

$$\sup_z\inf_w f(z,w)\leq \sup_z f(z,w)$$

Now we are almost done. All that is left is to realize is that this inequality holds for any choice of $w$ on the right-hand side and so it must also hold for the minimizing $w$ of $\sup_z f(z,w)$. And so,

$$\sup_z\inf_w f(z,w)\leq \inf_w\sup_z f(z,w)$$

So what this is saying is that when we found the lowest valley for every $z$, even when we then found the $z$ so that this valley is the "highest lowest valley". It has to be as low or lower than the "lowest highest peak" because we know the highest lowest valley is lower than every peak from the first expression. But then it must also be lower or at least as low as the lowest peak.

0
On

Another path towards understanding the intuition of this statement is the game theory interpretation. Let's say we have players $Z$ and $W$ and a function $f(z, w)$ which is the result/score of the game. Player $Z$ wants a high score while $W$ wants a low score.

Say one player chooses a value first, then the other player chooses their value, then you get the resulting score $f(z, w)$.

The intuition is that going second is always better for the player. if $W$ can choose their move based off $Z$'s move, then this always gives a better (smaller) outcome compared with $Z$ choosing their move based off $W$'s move. Case in point, winning in rock-paper-scissors is much easier if you know your opponent's move ahead of time rather than vice versa.

$\sup_z \inf_w f(z, w)$ translates into "first $z$ is chosen, then $w$ is chosen." ($Z$ goes first)

$\inf_w \sup_z f(z, w)$ translates into "first $w$ is chosen, then $z$ is chosen." ($W$ goes first)

To go along with the proof, we could say:

  • $g(z)$ chooses the best strategy for $W$ given $z$ will be played. For any choice of $z$ or $w$, $g(z)$ will be better: $$ \forall z, w: g(z) \leq f(z, w)$$
  • So if player $Z$ chooses something, knowing that player $W$ will choose after seeing this (i.e. $\sup_z g(z)$) then this is going to be worse for $Z$ than if $Z$ could choose given any alternative $w$ (i.e $\sup_z f(z, w)$): $$ \forall w: \sup_z g(z) \leq \sup_z f(z, w)$$
  • This is also still worse no matter what $w$ is played, since $Z$ chooses based on the $w$ chosen. So we can turn the "for all" into an $\inf$ over $w$: $$ \sup_z g(z) \leq \inf_w \sup_z f(z, w)$$

It seems like the crux of the intuition here is being comfortable converting the universal quantifiers into $\inf$'s and $\sup$'s. In the visual analogy with the grid, this represents transferring between thinking about all possible points $(z, w)$ in the grid to a given row/column.