Self-contained proof of $\chi^2$ test for independence in two-way tables?

84 Views Asked by At

How can I derive the asymptotic distribution of the $\chi^2$ statistic for independence in two-way tables?

For those who are unfamiliar, here is the setup of the problem. We classify $n$ individuals in terms of two categorical variables $A=\{a_1,\dotsc, a_I\}$ and $B=\{b_1,\dotsc,b_J\}$. We can arrange the data in a table \begin{array}{c|c|c|c|} & b_1 & b_2 & \cdots & b_J &\text{total}\\ \hline a_1 & n_{11}& n_{12}&\cdots &n_{1J}&R_1\\ \hline \vdots & \vdots & & &\vdots&\vdots\\ \hline a_I & n_{I1} & n_{I2}& & n_{IJ}&R_I\\ \hline \text{total} & C_1 & C_2& \cdots & C_J & n\\ \hline \end{array} where $n_{ij}$ counts the number of individuals classified as $a_i$ and $b_j$, $R_i$ and $C_j$ are the row $i$ and column $j$ totals, respectively, and $n=\sum_{i,j}n_{ij}$. If $A$ and $B$ are independent, then we expect to see $E_{ij} = R_i C_j/n$ individuals in cell $(i,j)$. Pearson's $\chi^2$ statistic is defined as \begin{align*} X^2=\sum_{i,j}\frac{(n_{ij}-E_{ij})^2}{E_{ij}}. \end{align*} If $A$ and $B$ are independent, then $X^2\overset{d}{\to}\chi^2_{(I-1)(J-1)}$ as $n\to\infty$.

I would like to know how to explicitly calculate this asymptotic distribution. I understand how to derive the asymptotic distribution of Pearson's $\chi^2$ statistic for goodness-of-fit, but this appears significantly more complicated because of the multiple factors and the constant row/column sums, and I'm not quite sure where to begin.