Consistency of complete-case analysis or mixed random variable mean estimator

Question

Consistency of complete-case analysis or mixed random variable mean estimator

39 Views Asked by Bumbble Comm At 29 Mar 2026 - 11:00

Let $Y$ be a random variable and $R \in \{0,1\}$ a random variable that indicates whether $Y$ is observed or missing

\begin{equation} Y \cdot R = \begin{cases} Y & R=1, \\ 0 & R=0. \end{cases} \end{equation}

Let $(Y_{i},R_{i})_{i=1}^{n}$ denote a sample of size n.

In complete case analysis (CC) or listwise deletion, we delete all missing values and just work with all observed variables.

In the following course "Statistical Methods for Analysis With Missing Data", the listwise deletion or complete-case mean estimator is estimated by

\begin{equation} \widehat{\mu}^c=\frac{\sum_{i=1}^N R_i Y_i}{\sum_{i=1}^N R_i} \end{equation}

It is shown that in case $R_{i}$ is independent of $Y_{i}$, when $Y$ is missing completely at random (MCAR), that estimator $\widehat{\mu}^c$ is consistent.

Quote of the proof [from page 22]:

"It follows that, as $N \rightarrow \infty$, by the weak law of large numbers, \begin{equation} \widehat{\mu}^c=\frac{N^{-1} \sum_{i=1}^N R_i Y_i}{N^{-1} \sum_{i=1}^N R_i} \stackrel{p}{\longrightarrow} \frac{E(R Y)}{E(R)}=\frac{E(R) E(Y)}{E(R)}=\mu. \ \text{"} \end{equation}

It is evident that \begin{equation} Z_{n} := \frac{1}{n} \sum_{i=1}^{n} R_{i} \overset{p}{\longrightarrow} E \left( R \right) \neq 0, \end{equation} where $E(R) \neq 0$. However if we want to apply the continuous mapping theorem to get $\frac{1}{Z_{n}} \overset{p}{\longrightarrow} \frac{1}{E(R)}$, the following needs to be satisfied

$$Z_{n} \neq 0, \ \forall n, \ \text{a.s.} $$

Questions

Why should consistency hold ?

Precisely: Why is $$Z_{n} \neq 0, \ \forall n, \ \text{a.s . ?} $$

If $R_{i} \sim Bernoulli (p)$ and consistency is satisfied, how do I compute the bias

\begin{equation} E \left( \mu^{c} \right) - \mu. \end{equation}

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Suppose $\{R_i\}_{i=1}^{\infty}$ are i.i.d. Bernoulli$(p)$ with $p\in (0,1]$. Suppose $\{Y_i\}_{i=1}^{\infty}$ are i.i.d. with mean $E[Y]$. Moreover suppose $\{R_i\}_{i=1}^{\infty}$ and $\{Y_i\}_{i=1}^{\infty}$ are independent. Define: $$\hat{\mu}_n= \left\{\begin{array}{cc} \frac{\sum_{i=1}^n R_iY_i}{\sum_{i=1}^nR_i} & \mbox{ if $\sum_{i=1}^n R_i\neq 0$} \\ 0 & \mbox{ else} \end{array}\right.$$ Define $W_n=\sum_{i=1}^nR_i$.

Then \begin{align} E[\hat{\mu}_n] &= \sum_{k=0}^{n} E[\hat{\mu}_n|W_n=k]P[W_n=k]\\ &=0 + \sum_{k=1}^{n} E[\hat{\mu}_n|W_n=k]P[W_n=k]\\ &=\sum_{k=1}^n\left(\frac{\sum_{i=1}^nE[R_iY_i|W_n=k]}{k}\right)P[W_n=k]\\ &=\sum_{k=1}^n\left(\frac{E[Y]}{k}\sum_{i=1}^nE[R_i|W_n=k]\right)P[W_n=k]\\ &=E[Y]\sum_{k=1}^n\frac{1}{k}E\left[\sum_{i=1}^n R_i|W_n=k\right]P[W_n=k]\\ &=E[Y]\sum_{k=1}^{n} P[W_n=k]\\ &=E[Y]P[W_n\neq 0]\\ &=E[Y](1-(1-p)^n) \end{align} where we have used:

$E[R_iY_i|W_n=k]=E[Y]E[R_i|W_n=k]\quad \forall i \in \{1, ... ,n\}$
$E[\sum_{i=1}^n R_i|W_n=k]=k$

Thus $$\boxed{E[\hat{\mu}_n]=E[Y] - (1-p)^nE[Y] \quad \forall n \in \{1, 2, 3, ...\}}$$ and this is asymptotically unbiased: $$ \lim_{n\rightarrow\infty} E[\hat{\mu}_n]=E[Y]$$

The above addresses the bias issue which was of main concern as revealed in the comments. Regarding the original question, I think a lot is resolved just by defining $\hat{\mu}_n=0$ in the previously undefined case $\sum_{i=1}^nR_i=0$ (which, for each $n$, is a case of positive probability). For what it is worth, the strong law of large numbers (SLLN) seems easier to use here: Define events $A, B, C$ as follows: \begin{align} A &= \left\{\lim_{n\rightarrow\infty}\frac{1}{n}\sum_{i=1}^nR_i = p\right\}\\ B &= \left\{\lim_{n\rightarrow\infty}\frac{1}{n}\sum_{i=1}^nR_iY_i = pE[Y] \right\}\\ C &= \left\{\lim_{n\rightarrow\infty}\hat{\mu}_n=E[Y]\right\} \end{align} We know by the SLLN that $P[A]=1$ and $P[B]=1$. We also see that if $A$ and $B$ are both true then $C$ must be true, so $$A\cap B \subseteq C$$ which implies $$P[A\cap B] \leq P[C]$$ On the other hand, since $P[A]=1$ and $P[B]=1$ then we know $P[A\cap B]=1$. Thus $1\leq P[C]$, so $P[C]=1$. Thus, $\hat{\mu}_n\rightarrow E[Y]$ almost surely.

Consistency of complete-case analysis or mixed random variable mean estimator

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in CONVERGENCE-DIVERGENCE

Related Questions in MEANS

Trending Questions

Popular # Hahtags

Popular Questions