Variance of X vs Variance of a binary function of X

447 Views Asked by At

Let $X$ be a random variable in $[0, 1]$ and $m$ its median such that $P(X \le m) = P(X \ge m)$.

Define $\beta(X)$ as $$\beta (x) = \left\{ \begin{array}{c} \begin{align*} 1&,\space X \ge m; \\ 0&,\space otherwise. \end{align*} \end{array} \right. $$ (a) Is it true that $Var(X) \le Var(\beta(X))$?

(b) What if $X$ is continuous?

Where I got stuck: If $X$ is a discrete random variable, $\beta(X)$ is just Bernoulli with $p = 0.5$ and $Var(\beta(X)) = 0.25$. I couldn't come up with any discrete X whose variance is bigger than that. Tried simple Bernoulli; $X = 0.5^{i-1}$ with $p(x_{i})=0.5^i$. All variances are smaller than or equal to 0.25. However, I couldn't come up with a formal proof either.

Reasoning for (b) will depend on the proof/counterexample with (a) I guess.

Please help!

P.S. First time poster hear. Apologies if something's wrong with my post

2

There are 2 best solutions below

0
On BEST ANSWER

Incomplete answer / Possible approach:

First of all if $P(X=m) = p = 0$ then (regardless of whether $X$ is otherwise continuous) the answer by @GNUSupporter suffices to show that $Var(X) \le {1 \over 4} = Var(\beta(X))$.

So for the rest assume $P(X=m) = p > 0$. I think (a) is true even for this case. (Further I think this is due to the (IMHO) quite restrictive definition of median in the OP - see my comments at the end.)

Define a new random variable $Y$ s.t.

  • $X < m \implies Y = 0$ (this happens with probability $q = {1-p \over 2}$)

  • $X = m \implies Y = m$ (this happens with probability $p$)

  • $X > m \implies Y = 1$ (this happens with probability $q = {1-p \over 2}$)

The following chain shows that $Var(Y) \le Var(\beta(X))$:

$Var(Y) = E[Y^2] - E[Y]^2 = (p m^2 + q) - (pm + q)^2 $

$\quad = p m^2 + q - (p^2 m^2 + 2pqm + q^2)$

$\quad = m^2 p (1-p) - 2pqm + q(1-q) $

$\quad = 2pq m^2 - 2pq m + q (1-q) \quad \quad \text{...because $(1-p)= 2q$}$

$\quad = 2pq m(m-1) + q(1-q) $

$\quad \le q(1-q) \quad \quad \text{...because $(m-1)\le 0$}$

$\quad = P(X < m)P(X \ge m) = Var(\beta(X))$

Therefore, all that remains is to show that $Var(X) \le Var(Y)$. This ought to be true from a "moment of inertia" perspective, as we are simply "pushing" the mass on both sides of $m$ to the extreme values of $0, 1$. However, I was not able to prove this. :(


Further comment: I was very surprised by the above result (if indeed the proof can be completed, i.e. $Var(X) \le Var(Y)$). I now think the reason is that the definition of median in the OP is quite restrictive.

E.g., if $X \in \{0, \epsilon, 1\}$ with probabilities $(0.02, 0.49, 0.49)$ then the median as defined in the OP does not exist.

Instead, lets consider a less restrictive definition: $m'$ is a newmedian if $P(X \ge m') \ge {1\over 2}$ and $P(X \le m') \ge {1 \over 2}$. Then in the example above the newmedian is $\epsilon$. And this finally is a valid "counterexample" when $\epsilon \rightarrow 0^+$:

  • $X \rightarrow Bern(0.49)$

  • $Var(X) \rightarrow (0.51)(0.49) \approx {1 \over 4}$

  • $\beta(X)$ (redefined using newmedian) $\rightarrow Bern(0.98)$

  • $Var(\beta(X)) \rightarrow (0.98)(0.02) \approx 0.02 < Var(X)$

7
On

$$Var(\beta(X)) = E[1_{\{X \ge m\}}^2] - E[1_{\{X \ge m\}}]^2 = \cdots = P(X \ge m) P(X<m)$$

If $X$ is continuous, then $P(X = x) = 0$ for all $x$. $$Var(\beta(X)) = \cdots = P(X \ge m) P(X<m) = \frac14$$ Note that $0 \le X \le 1$, so $0 \le X^2 \le X \le 1$. $$Var(X) = E[X^2] - E[X]^2 \le E[X] - E[X]^2 \le E[X] (1 - E[X]) \le \frac14$$ The last inequality can be proved using or .