Geometric Intuition for Caratheodory's Theorem (for Convex Sets)

933 Views Asked by At

Consider the Wikipedia proof for Caratheodory's Theorem, the statement of which I have reproduced below. In short, I am looking for some geometric intuition about the modified coefficients in the proof, something that I may have been able to "see" for myself if I were asked to prove the theorem without looking it up.

Theorem (Caratheodory). Let $X \subset \mathbb{R}^d$. Then each point of $\mathrm{conv}(X)$ can be written as a convex combination of at most $d+1$ points in $X$.

From the proof, each $y \in \mathrm{conv}(X)$ can be written as the following convex combination, where we assume $k \geq d+2$:

$$ y = \sum_{j=1}^k \lambda_j x_j \text{ with } \sum_{j=1}^k \lambda_j = 1 \text{ and } \lambda_j > 0 \quad \forall\, j=1,\dots,k $$

The resulting $k \geq d+2$ points $x_j \in \mathbb{R}^d$ are affinely dependent, so

$$ \sum_{j=1}^k \mu_j x_j = 0 \text{ with } \sum_{j=1}^k \mu_j = 0 $$

The remainder of the proof uses some funky manipulations of the coefficients for $y$ to show that one of the points in the convex combination for $y$ is really unnecessary. The new coefficients are:

$$ y = \sum_{j=1}^k \left(\lambda_j - \frac{\lambda_i}{\mu_i} \mu_j \right) x_j $$

where $i = \arg\min_{j \;:\; \mu_j > 0} \frac{\lambda_j}{\mu_k}$. The $i$th coefficient turns out to be zero, completing the proof. I understand why this choice of coefficients is desirable, but I do not understand why it's the "right" or "obvious" choice. My own drawings do not make the situation any clearer to me.

What do the new coefficients mean geometrically, and in particular, how can I interpret the ratio $\lambda_i/\mu_i$ geometrically? What does the $\max$ correspond to?

2

There are 2 best solutions below

3
On BEST ANSWER

You basically add $$y = \sum_{j = 1}^k \lambda_j \, x_j$$ and $$0 = \sum_{j = 1}^k \alpha \, \mu_j \, x_j, $$ for some $\alpha \in \mathbb{R}$. This yields $$y = \sum_{j = 1}^k \underbrace{(\lambda_j + \alpha \, \mu_j)}_{=:\Lambda_j} \, x_j. $$ This directly yields $$\sum_{j=1}^k \Lambda_j = 1.$$ However, you additionally need $$\Lambda_j \ge 0 \;\forall j \qquad\text{and}\qquad \Lambda_i = 0 \text{ for some } i,$$ such that you obtain a convex combination, in which one coefficient is zero.

Now, try to figure out how to choose $\alpha$ and $i$.

5
On

Even more than geometrically you can see that .. physically.

Take three points $A,B,C$, in 2D, so three vectors. Assign them non-negative weights, like if they were point masses.
Their weighted average, the barycenter, will be internal to the triangle, i.e. their convex hull.
For the weighted average to be external, the weights shall have different signs. You can "see" that if two are subject to gravity while the third is being pulled up.

Now take a fourth point $Q$ inside the triangle: it will be the weighted combination of $A,B,C$ with certain non-negative weights.
A further point $P$ which is a non-negative weighted average of $A,B,C,Q$ will reduce to a non-negative weighted average of just $A,B,C$, but not in general of e.g. $A,B,Q$, because with respect to this triangle $C$ will have some negative coefficients.

--- in reply to your comment ---

If $Q$ is instead taken outside of $\triangle ABC$, then its expression in terms of $A,B,C$ will contain some negative weights.
Now a non-negative combination of $A,B,C,Q$, i.e. a point $P$ inside that quadrilateral, will not always reduce to a non-negative combination of $A,B,C$.

We get the conclusion that, given $n$ points, their weighted average with non-negative weights, i.e. their linear combination with coefficients in $[0,1]$ summing to $1$ (convex combination), will always reduce to the convex combination of the $m \le n$ points which define the convex hull of the $n$ points.