I was reading Lee's (1988) paper entitled A Test of Missing Completely at Random for Multivariate Data With Missing Values and Li's (2013) paper entitled Little's test of missing completely at random. Both are incredible papers, but I am just simply confused about missing completely at random (MCAR) and missing at random (MAR).
Let $\mathbf{R}$ represent a stacked matrix of $\mathbf{r}$, which represents whether each component in vector $\mathbf{y}_i$ is observed. When $\mathbf{r}_{ij} = 1$, that means element $y_{ij}$ is observed, and vice-versa. Also, let $\mathbf{Y}_{obs}, \mathbf{Y}_{miss}$ represent the observed and missing values of a data-matrix $\mathbf{Y}$. Lastly, let $\mathbf{X}$ represent a matrix of covariate values.
Li's (2013) paper defines missing at random to be $$\Pr[\mathbf{R}|\mathbf{Y}_{miss},\mathbf{Y}_{obs}, \mathbf{X}] = \Pr[\mathbf{R}|\mathbf{Y}_{obs}, \mathbf{X}],$$ which means that $\mathbf{R}$ is independent of $\mathbf{Y}_{miss}$ but can still depend on $\mathbf{Y}_{obs}$. IOW, the distribution of missing indicators only depends on the observed data. She also defines missing completely at random to be $$\Pr[\mathbf{R}|\mathbf{Y}_{miss},\mathbf{Y}_{obs}, \mathbf{X}] = \Pr[\mathbf{R}],$$ which means $\mathbf{R}$ is independent of $\mathbf{Y}_{miss}, \mathbf{Y}_{obs},$ and $\mathbf{X}$. IOW, the distribution of missing indicators doesn't depend on anything and is completely random.
I just have a series of questions about this that I can't seem to find answers to.
- What's the difference between $\mathbf{R}$ and $\mathbf{Y}_{obs}, \mathbf{Y}_{miss}$?
- Is missing data the data that's literally missing (non-responses) or is it data that we don't have access to?
- The assumption of MCAR is clearly stronger than MAR. How is it testable but MAR isn't?
I am only learning about this from reading Li's 2013 paper, so take this with a grain of salt. But my understanding is:
$\mathbf Y$ is an $n \times p$ matrix of random variables. From here, possibly depending on $\mathbf Y$ in some way and possibly not, $\mathbf R$ is an $n\times p$ matrix of random $\{0,1\}$-valued random variables. Finally $\mathbf Y_o$ and $\mathbf Y_m$ are random variables entirely determined by $\mathbf Y$ and $\mathbf R$: $\mathbf Y_o$ is all the entries of $\mathbf Y$ for which the corresponding entry of $\mathbf R$ is $1$, and $\mathbf Y_m$ is all the entries of $\mathbf Y$ for which the corresponding entry of $\mathbf R$ is $0$.
As a result, $\mathbf Y_m$ consists of actual data that we can never know. I think it's helpful to think of our probability space as including all the data above, but restrict our algorithms to only work with $\mathbf R$ and $\mathbf Y_o$.
As for why it's possible to test MCAR but not MAR: because MCAR is a stronger assumption, it's possible to encounter data that's unlikely under MCAR (and therefore lets us reject MCAR) but not very unlikely under MAR. Some concrete examples:
Note that "testing for MCAR" does not mean that we'll be able to detect all possible scenarios that violate MCAR. If it did, it would indeed be weird that a stronger assumption is testable when a weaker assumption is not. Instead, "testing for MCAR" means we can compute a $p$-value which:
The second bullet point is necessarily vague, and this is where we discuss the power of different tests in different situations.