Variance computation in B. Maurey's 'empirical method' proof of the approximate form of Caratheodory's theorem

379 Views Asked by At

I am currently studying the textbook High-Dimensional Probability by Roman Vershynin. Theorem 0.0.2 of the opening chapter Using Probability to Cover a Geometric Set presents the Approximate form of Caratheodory's theorem as follows:

Theorem 0.0.2 (Approximate form of Caratheodory's theorem) Consider a set $T \subset \mathbb{R}^n$ whose diameter$^1$ is bounded by $1$. Then, for every point $x \in \text{conv}(T)$ and every integer $k$, one can find points $x_1, \dots, x_k \in T$ such that $$\left\vert \left\vert x - \dfrac{1}{k} \sum_{j = 1}^k x_j \right\vert \right\vert_2 \le \dfrac{1}{\sqrt{k}}.$$ There are two reasons why this result is surprising. First, the number of points $k$ in convex combinations does not depend on the dimension $n$. Second, the coefficients of convex combinations can be made all equal. (Note, however, that repetitions among the points $x_i$ are allowed.)

$^1$ The diameter of $T$ is defined as $\text{diam}(T) = \text{sup} \{ \vert\vert s - t \vert\vert_2 : s, t \in T \}$. We have assumed that $\text{diam}(T) = 1$ for simplicity. For a general set $T$, the bound in the theorem changes to $\text{diam}(T)/\sqrt{k}$. Check this!

Proof Our argument is known as the empirical method of B. Maurey.
Translating $T$ if necessary, we may assume that not only the diameter but also the radius of $T$ is bounded by $1$, i.e., $$\vert\vert t \vert\vert_2 \le 1 \ \ \ \text{for all $t \in T$}. \tag{0.2}$$ Fix a point $x \in \text{conv}(T)$ and express it as a convex combination of some vectors $z_1, \dots, z_m \in T$ as in (0.1). Now, interpret the definition of the convex combination (0.1) probabilistically, with the $\lambda_i$ taking the roles of probabilities. Specifically, we can define a random vector $Z$ that takes the value $z_i$ with probability $\lambda_i$: $$\mathbb{P} \{ Z = z_i \} = \lambda_i, \ \ \ i = 1, \dots, m.$$ (This is possible by the fact that the weights $\lambda_i$ are non-negative and sum to $1$.) Then $$\mathbb{E} Z = \sum_{i = 1}^m \lambda_i z_i = x .$$ Consider independent copies $Z_1, Z_2, \dots$ of $Z$. by the strong law of large numbers, $$\dfrac{1}{k} \sum_{j = 1}^k Z_j \to x \ \ \ \text{almost surely as $k \to \infty$} .$$ To get a quantitative form of this result, let us compute the variance of $\dfrac{1}{k} \sum_{j = 1}^k Z_j$. (Incidentally, this computation is at the heart of the proof of the weak law of large numbers.) We obtain $$\begin{align} \mathbb{E} \left\vert \left\vert x - \dfrac{1}{k} \sum_{j = 1}^k Z_j \right\vert \right\vert_2^2 &= \dfrac{1}{k^2} \mathbb{E} \left\vert \left\vert \sum_{j = 1}^k (Z_j - x) \right\vert \right\vert_2^2 \ \ \ \text{(since $\mathbb{E} (Z_i - x) = 0)$} \\ &= \dfrac{1}{k^2} \sum_{j = 1}^k \mathbb{E} \vert\vert Z_j - x \vert\vert_2^2 . \end{align}$$

I am unsure about this part:

$$\begin{align} \mathbb{E} \left\vert \left\vert x - \dfrac{1}{k} \sum_{j = 1}^k Z_j \right\vert \right\vert_2^2 &= \dfrac{1}{k^2} \mathbb{E} \left\vert \left\vert \sum_{j = 1}^k (Z_j - x) \right\vert \right\vert_2^2 \ \ \ \text{(since $\mathbb{E} (Z_i - x) = 0)$} \\ &= \dfrac{1}{k^2} \sum_{j = 1}^k \mathbb{E} \vert\vert Z_j - x \vert\vert_2^2 . \end{align}$$

The Wikipedia article for variance derives variance using expected values as follows:

$${\displaystyle {\begin{aligned}\operatorname {Var} (X)&=\operatorname {E} \left[(X-\operatorname {E} [X])^{2}\right]\\[4pt]&=\operatorname {E} \left[X^{2}-2X\operatorname {E} [X]+\operatorname {E} [X]^{2}\right]\\[4pt]&=\operatorname {E} \left[X^{2}\right]-2\operatorname {E} [X]\operatorname {E} [X]+\operatorname {E} [X]^{2}\\[4pt]&=\operatorname {E} \left[X^{2}\right]-\operatorname {E} [X]^{2}\end{aligned}}}$$

Comparing this Wikipedia proof to the textbook proof, I don't see the correspondence. In particular, I don't see how the expressions $\mathbb{E} \left\vert \left\vert x - \dfrac{1}{k} \sum_{j = 1}^k Z_j \right\vert \right\vert_2^2$ and $\dfrac{1}{k^2} \sum_{j = 1}^k \mathbb{E} \vert\vert Z_j - x \vert\vert_2^2$ are analogous to the expressions in the Wikipedia proof of variance. It seems to me that the presence of the $L^2$-norm is the odd part out. So how exactly is this textbook proof analogous to a variance computation of $\dfrac{1}{k} \sum_{j = 1}^k Z_j$?

1

There are 1 best solutions below

9
On BEST ANSWER

By linearity of expectation we have $$\Bbb E\left[\sum_{j=1}^kZ_j\right]=\sum_{j=1}^k\Bbb E[Z_j]=\sum_{j=1}^kx=kx\implies \Bbb E\left[\frac1k\sum_{j=1}^kZ_j\right]=x.$$ Using the Wikipedia definition of variance, $$\Bbb V\left[\frac1k\sum_{j=1}^kZ_j\right]=\Bbb E\left[\left\|\frac1k\sum_{j=1}^kZ_j-\Bbb E\left[\frac1k\sum_{j=1}^kZ_j\right]\right\|_2^2\right]$$ where we take the $\ell^2$-norm as the square of the vector is just the dot product of itself. From the first equation, $$\Bbb V\left[\frac1k\sum_{j=1}^kZ_j\right]=\Bbb E\left[\left\|x-\frac1k\sum_{j=1}^kZ_j\right\|_2^2\right]=\Bbb E\left[\left\|\frac1k\left(kx-\sum_{j=1}^kZ_j\right)\right\|_2^2\right]$$ and the rest is straightforward.