Application of Pinsker's inequality

121 Views Asked by At

I am not very experienced with Information Theory and I trying to the following exercice in Rao&Yehudayoff's book: Let $n$ be odd and let $x \in \{0,1\}^n$ be sampled uniformly among all $x$ with $\sum_{i=1}^n x_i \geq n/2$. Use Pinsker's inequality to show that the expected number of $1$'s is at most $n/2 + O(\sqrt{n})$. Here is my attempt:

I define the random variables $A_1, \cdots, A_n \sim \{0,1\}$ uniformly and independently and $B = 1$ if and only if $\sum_{i=1}^n A_i \geq n/2$, $0$ else.

Thus $I(A_1, \cdots, A_n : B) \leq 1$ (the information $B$ conveys about $A_1, \cdots, A_n$ is at most one bit).

By Pinsker's inequality (or rather a consequence of it, averaging over $i$, $b$ and $A_{<i}$, we have $$|p(a_i\mid i, b, a_{<i})- p(a_i)| < O(\sqrt{H(B)/n}) = O(1/\sqrt{n})$$ $p(a_i)$ just means the probability of the event $A_i = a_i$, so the above should denote the fact that the expected total variation per coordinate is bounded $O(1/\sqrt{n})$: $$\mathbb{E}_{i, a_{<i}, b}[|p(a_i\mid i, b, a_{<i}) - p(a_i)|] \leq \epsilon := O(\sqrt{H(B)/n})$$ Since $b = 1$ with probability $1/2$, conditioning on $b=1$ and by Markov: $$\mathbb{E}_{i, a_{<i}, b=1}[|p(a_i\mid i, b, a_{<i}) - p(a_i)|] \leq 2\epsilon$$ And from this I would like to conclude (*): $$\mathbb{E}_i[|p(a_i \mid b = 1) - p(a_i)|] \leq 2\epsilon$$ With this, I would argue that $$\mathbb{E}[\text{1's in }x] = \sum_{i=1}^n p(a_i\mid b=1) = n \sum_{i=1}^n \frac{1}{n}p(a_i \mid b=1) = n\mathbb{E}_{i}[|p(a_i\mid b=1)|] = n\mathbb{E}_{i}[|p(a_i\mid b=1)-p(a_i)|] + n\mathbb{E}_{i}[|p(a_i|] = O(\sqrt{n}) + n/2$$

Already, is this more or less correct? (I am not using $n$ odd...) I would really like to see a super clean argument using Pinsker's theorem. My main issue is (*), I do have trouble to (very formally) justify removing the averaging over $a_{<i}$ - both for $\mathbb{E}$ and inside $p(a_i\mid i, b, a_{<i})$. In general, is it always true that (for random variables A, B, C, jointly distributed: $$|p(a \mid B, C) - p(a)|<\epsilon \Rightarrow |p(a|B) - p(a)|<\epsilon$$ i.e. conditioning on less makes the probability does not increase the total variation. Can one also do this when these types of statements hold on the average (over, say $i$ and $a_{<i}$)?

Formally, (comment of stochastic boy), can I proceed like this to obtain (*)?: $$2\epsilon > |\mathbb{E}_{i, a_{<i}, b=1}[p(a_i\mid i, b, a_{<i}) - p(a_i)]| = |\mathbb{E}_{i, b=1} \mathbb{E}_{a_{<i}}[p(a_i\mid i, b, a_{<i}) - p(a_i)]| = |\mathbb{E}_{i, b=1} [p(a_i\mid i, b) - p(a_i)]|$$ I use $\mathbb{E}_{a_{<i}} p(a_{<i}) = 1$ and $p(a_i\mid i, b, a_{<i})p(a_{<i}) = p(a_i\mid i, b)$ Is the rest correct?