If the hypothesis that the classifications of elements by the characteristics $A$ and $B$ are independent, the table of contingency is given with \begin{array}{|c|c|c|c|} \hline & B1 & B2 & B3 \\ \hline A1 & 7 & 50 & 5 \\ \hline A2 & 52 & 399 & 39 \\ \hline A3 & 5 & 39 & 4 \\ \hline \end{array} The exact number of elements that have both the characteristics $A1$ and $B1$ is 6, $A1$ and $B2$ 48, $A2$ and $B2$ 402, $A2$ and $B3$ 36. Check if the characteristics are independent with the level of significance $5\%$. $\square$
I know that I have to calculate the observed and expected values to determine the $\chi$ value, but the thing that is confusing me are the numbers under the table, that is, the elements that have both characteristics. Can anyone give me directions on how to approach this problem.
Here is an outline for working this problem:
1) The table you give is for the nine expected cell means $E_{ij}$. Start by finding the row, column, and grand totals for this table. For example, the total for the first row is 62.
2) Now make a separate table for the observed counts $X_{ij}$ in each cell. Start with a 3-by-3 table and write the same row, column, and grand totals as for the table of expected counts above. Then use the numbers below the table to fill in the observed counts. For example, you are given that two of the observed counts in the first row are 6 and 48; then you can find the remaining count because the total must be 62. From the observed counts provided and the row and column totals you computed above, you can fill in all nine cells.
3) Then compute the chi-squared goodness-of-fit (GOF) statistic. It is $$Q = \sum \frac{(X_{ij}-E_{ij})^2}{E_{ij}} \stackrel{aprx}{\sim} \mathsf{Chisq(4)}.$$
The sum is taken over all nine cells. Each of the nine quantities added is called a 'contribution' to the GOF statistic.
4) At the 5% level, the critical value is 9.49 (from tables or from software, as below). If $Q > 9.49,$ then you reject the null hypothesis of independence; otherwise do not reject.
You must have a similar formula for the GOF statistic in your book, even if the notation is a little different. The degrees of freedom are given by $\nu = (r-1)(c-1) = (3-1)(3-1) = 4,$ where $r$ is the number of rows and $c$ is the number of columns in the table. (Columns of totals are not included.)
Note: This problem is unusual because one is usually given the table of observed counts, and then the expected counts need to be found from them, using the row and column totals for the various cells.