Let $Y$ be a random variable and $R \in \{0,1\}$ a random variable that indicates whether $Y$ is observed or missing
\begin{equation} Y \cdot R = \begin{cases} Y & R=1, \\ 0 & R=0. \end{cases} \end{equation}
Let $(Y_{i},R_{i})_{i=1}^{n}$ denote a sample of size n.
In complete case analysis (CC) or listwise deletion, we delete all missing values and just work with all observed variables.
In the following course "Statistical Methods for Analysis With Missing Data", the listwise deletion or complete-case mean estimator is estimated by
\begin{equation} \widehat{\mu}^c=\frac{\sum_{i=1}^N R_i Y_i}{\sum_{i=1}^N R_i} \end{equation}
It is shown that in case $R_{i}$ is independent of $Y_{i}$, when $Y$ is missing completely at random (MCAR), that estimator $\widehat{\mu}^c$ is consistent.
Quote of the proof [from page 22]:
"It follows that, as $N \rightarrow \infty$, by the weak law of large numbers, \begin{equation} \widehat{\mu}^c=\frac{N^{-1} \sum_{i=1}^N R_i Y_i}{N^{-1} \sum_{i=1}^N R_i} \stackrel{p}{\longrightarrow} \frac{E(R Y)}{E(R)}=\frac{E(R) E(Y)}{E(R)}=\mu. \ \text{"} \end{equation}
It is evident that \begin{equation} Z_{n} := \frac{1}{n} \sum_{i=1}^{n} R_{i} \overset{p}{\longrightarrow} E \left( R \right) \neq 0, \end{equation} where $E(R) \neq 0$. However if we want to apply the continuous mapping theorem to get $\frac{1}{Z_{n}} \overset{p}{\longrightarrow} \frac{1}{E(R)}$, the following needs to be satisfied
$$Z_{n} \neq 0, \ \forall n, \ \text{a.s.} $$
Questions
- Why should consistency hold ?
Precisely: Why is $$Z_{n} \neq 0, \ \forall n, \ \text{a.s . ?} $$
- If $R_{i} \sim Bernoulli (p)$ and consistency is satisfied, how do I compute the bias
\begin{equation} E \left( \mu^{c} \right) - \mu. \end{equation}
Suppose $\{R_i\}_{i=1}^{\infty}$ are i.i.d. Bernoulli$(p)$ with $p\in (0,1]$. Suppose $\{Y_i\}_{i=1}^{\infty}$ are i.i.d. with mean $E[Y]$. Moreover suppose $\{R_i\}_{i=1}^{\infty}$ and $\{Y_i\}_{i=1}^{\infty}$ are independent. Define: $$\hat{\mu}_n= \left\{\begin{array}{cc} \frac{\sum_{i=1}^n R_iY_i}{\sum_{i=1}^nR_i} & \mbox{ if $\sum_{i=1}^n R_i\neq 0$} \\ 0 & \mbox{ else} \end{array}\right.$$ Define $W_n=\sum_{i=1}^nR_i$.
Then \begin{align} E[\hat{\mu}_n] &= \sum_{k=0}^{n} E[\hat{\mu}_n|W_n=k]P[W_n=k]\\ &=0 + \sum_{k=1}^{n} E[\hat{\mu}_n|W_n=k]P[W_n=k]\\ &=\sum_{k=1}^n\left(\frac{\sum_{i=1}^nE[R_iY_i|W_n=k]}{k}\right)P[W_n=k]\\ &=\sum_{k=1}^n\left(\frac{E[Y]}{k}\sum_{i=1}^nE[R_i|W_n=k]\right)P[W_n=k]\\ &=E[Y]\sum_{k=1}^n\frac{1}{k}E\left[\sum_{i=1}^n R_i|W_n=k\right]P[W_n=k]\\ &=E[Y]\sum_{k=1}^{n} P[W_n=k]\\ &=E[Y]P[W_n\neq 0]\\ &=E[Y](1-(1-p)^n) \end{align} where we have used:
$E[R_iY_i|W_n=k]=E[Y]E[R_i|W_n=k]\quad \forall i \in \{1, ... ,n\}$
$E[\sum_{i=1}^n R_i|W_n=k]=k$
Thus $$\boxed{E[\hat{\mu}_n]=E[Y] - (1-p)^nE[Y] \quad \forall n \in \{1, 2, 3, ...\}}$$ and this is asymptotically unbiased: $$ \lim_{n\rightarrow\infty} E[\hat{\mu}_n]=E[Y]$$
The above addresses the bias issue which was of main concern as revealed in the comments. Regarding the original question, I think a lot is resolved just by defining $\hat{\mu}_n=0$ in the previously undefined case $\sum_{i=1}^nR_i=0$ (which, for each $n$, is a case of positive probability). For what it is worth, the strong law of large numbers (SLLN) seems easier to use here: Define events $A, B, C$ as follows: \begin{align} A &= \left\{\lim_{n\rightarrow\infty}\frac{1}{n}\sum_{i=1}^nR_i = p\right\}\\ B &= \left\{\lim_{n\rightarrow\infty}\frac{1}{n}\sum_{i=1}^nR_iY_i = pE[Y] \right\}\\ C &= \left\{\lim_{n\rightarrow\infty}\hat{\mu}_n=E[Y]\right\} \end{align} We know by the SLLN that $P[A]=1$ and $P[B]=1$. We also see that if $A$ and $B$ are both true then $C$ must be true, so $$A\cap B \subseteq C$$ which implies $$P[A\cap B] \leq P[C]$$ On the other hand, since $P[A]=1$ and $P[B]=1$ then we know $P[A\cap B]=1$. Thus $1\leq P[C]$, so $P[C]=1$. Thus, $\hat{\mu}_n\rightarrow E[Y]$ almost surely.