Unclear how expected value is arrived at in chi squared test calculation

42 Views Asked by At

Here is a question I'm doing to study for my introductory statistics final exam coming up in a few weeks. Specifically, I'm attempting part (c) here:

A factory manufactures widgets. To successfully manufacture widgets, two machines must both be working. The production supervisor has noticed that in a $28$ day period, machine A was broken on $5$ days, and machine B was broken on $8$ days. Assuming that the chance of a machine to be broken on a given day is independent of its chance to be broken on any day.

(a) Calculate a $95$% confidence interval for the percentage of days that machine A is broken.

(b) Is there evidence in the data collected that machine B is less reliable than machine A? What additional assumptions are you making, if any?

(c) The production supervisor's intuition suggests that the machines break down together more than should be expected. Over a $60$ day period, she observes that there are $14$ days when only machine A breaks down, $13$ days when only machine B breaks down, and $2$ days when both break down. Does this data support her intuition?

Here's the beginning of the model solutions provided:

Picture of beginning of solution to part (c).

However, it's unclear how they come up with the numbers $33$, $12$, $11$, and $2$ for expected values. It seems like they're pulling those numbers out of thin air. How do I get them?

I understand that the observed values are $31$, $14$, $13$, and $2$, but I don't understand why the expected values are $33$, $12$, $11$, and $2$. For instance, why can't they be e.g. $32$, $13$, $12$, and $3$ instead?

For reference, here are the complete solutions, where the solution to (c) starts on page 3: https://ams005-winter16-02.courses.soe.ucsc.edu/system/files/attachments/practice_final_1_solutions.pdf

1

There are 1 best solutions below

0
On BEST ANSWER

There is nothing magical here. Let $E_{ij}$ be the expected count in cell $(i, j)$ (that is, row $i$, column $j$). Let $x_{ij}$ be the actual count in $(i, j)$. For any row $i$, the row count is

$$S_{i, \cdot} = \sum_{k = 1}^n X_{ik}$$

and similar for the column count. Then

$$E_{ij} = \frac{S_{i, \cdot} S_{\cdot, j}}{\sum_{i, \ j} X_{ij}} $$

is the expected count in cell $(i, j).$ That is, as I said in my comment, The product of the row and column counts divided by the grand total.

Let's do the counts for your specific example.

$$\begin{align*} E_{11} &= \frac{45 \cdot 44}{60} = 33 \\ E_{12} &= \frac{45 \cdot 16}{60} = 12 \\ E_{21} &= \frac{15 \cdot 44}{60} = 11 \\ E_{22} &= \frac{15 \cdot 16}{60} = 4 \end{align*}$$

I'm hoping this answers your question because I am not sure what else you would be looking for. The short version: just do the computations.