Variance of a censored distribution

106 Views Asked by At

Let's consider some discrete random variable $\xi$, which has it pdf as $p_{\xi}(x), \ x \in A \subset \mathbb{N}$.

And let's call $p^{u}_{\xi}(x)$ - censored distribution (upper), which is

$p_{\xi}^u(x) = \left\{ \begin{array}{rcl} p_{\xi}(x), & \ x \in A \cap \{x \in \mathbb{N} | x < x_{censored\_value}\} \\ Const, & \ x = x_{censored\_value} \end{array}\right.$

For example, for Poisson distribution: $p_{\xi}(x) = \frac{\lambda^{x}}{x!} \cdot e^{-\lambda}, \ x \in \{0, 1, 2, \dots \}$

and $p_{\xi}^u(x) = \left\{ \begin{array}{rcl} \frac{\lambda^{x}}{x!} \cdot e^{-\lambda}, \ x \in \{0, 1, \dots, x_{censored\_value}-1 \} \\ 1 - \sum\limits_{x < x_{censored\_value}} \frac{\lambda^{x}}{x!} \cdot e^{-\lambda}, \ x = x_{censored\_value} \end{array}\right.$

Question: Is that true in general, that a censored distribution always has lower variance than original one?

2

There are 2 best solutions below

4
On

It is true in general. Let's suppose you have a a random variable $X$ and an upper censoring point $x_c$ with the censored random variable being $Y$. Let:

  • $p=\mathbb P( X \le x_c)$ with $1-p=\mathbb P( X \gt x_c)$
  • $q_\le=\mathbb E[ X -x_c\mid X \le x_c]$ and $q_>=\mathbb E[ X-x_c \mid X \gt x_c]$
  • $r_\le=\mathbb E[ (X-x_c)^2 \mid X \le x_c]$ and $r_>=\mathbb E[ (X-x_c)^2 \mid X \gt x_c]$

so $q_\le \le 0$ and $q_> \ge 0$ and $r_\le \ge q_\le^2$ and $r_> \ge q_>^2$.

We can then say

  • $\mathbb E[ X-x_c ] = pq_\le+(1-p)q_>$
  • $\mathbb E[ Y-x_c ] = pq_\le$
  • $\mathbb E[ (X-x_c)^2 ] = pr_\le+(1-p)r_>$
  • $\mathbb E[ (Y-x_c)^2 ] = pr_\le$
  • $Var(X)=Var(X-x_c)= pr_\le+(1-p)r_> -(pq_\le+(1-p)q_>)^2$
  • $Var(Y)=Var(Y-x_c)= pr_\le -(pq_\le)^2$

so $Var(X)-Var(Y)= (1-p)r_> - 2pq_\le(1-p)q_> -((1-p)q_>)^2$ which is non-negative because

  • $-2pq_\le(1-p)q_> \ge 0$, since $q_\le \le 0)$, and
  • $(1-p)r_> -((1-p)q_>)^2 \ge (1-p)r_> -(1-p)q_>^2 \ge (1-p)(r_> -q_>^2) \ge0$, since $r_> \ge q_>^2$,

making $Var(X) \ge Var(Y)$ as intuitively expected.

0
On

Writing $x_* = x_{\text{censored_value}}$ for simplicity, we find that the censored distribution $p_{\xi}^u$ is realized as the distribution of a truncated random variable:

$$X \sim p_{\xi} \qquad\implies\qquad \min\{X,x_*\} \sim p_{\xi}^u $$

Moreover, the function $t \mapsto \min\{t, x_*\}$ is $1$-Lipschitz. So, it suffices to prove the following more general statement:

Theorem. Let $X$ be a random variable having finite variance, and let $f : \mathbb{R} \to \mathbb{R}$ be $1$-Lipschitz. Then we have $$ \mathbf{Var}(f(X)) \leq \mathbf{Var}(X). $$

Proof. Let $X'$ be an i.i.d. copy of $X$, meaning that $X$ and $X'$ are independent and identically distributed. Then

\begin{align*} \mathbf{Var}(X) &= \frac{1}{2}\mathbf{E}[(X - X')^2] \geq \frac{1}{2}\mathbf{E}[(f(X) - f(X'))^2] = \mathbf{Var}(f(X)). \end{align*}