How do I know if a Binomial model is appropriate?

201 Views Asked by At

I have a question which is about the number of weeks out of 5 in which an event occurs. I have a frequency table with a sample of 40 - with x = 0,1,2,3,4,5 and freq, 2,7,11,12,6,2.

I have worked out the unbiased population mean and estimate - but then I'm not sure whether binomial what I need or not? I have to decide if a bionomial model is appropriate.

I can see that the data is discrete but its not binary like "event happens" or "event does not happen". It seems relatively symmetrical - and almost normally distributed? I'm not really sure how to work this out? Is a binomial model right or not?

1

There are 1 best solutions below

2
On

If this is your first chi-squared test, the clues in the comments may be a bit too sparse. Without working the problem for you, I offer the following more complete outline: (Use it along with whatever examples your text or class notes may have to offer.)

It is appropriate to try a binomial model, and obviously $n = 5.$ From the given data you can find the sample mean of the 40 observations. Setting that equal to the binomial mean $np$ you can get @Henry's estimate $\hat p = 0.495.$

By looking at the PDF of $\mathsf{Binom}(5, 0.495).$ you can find the expected counts $E_i.$ (multiply the probabilities by 40.) Your observed counts are $F = (2,7,11,12,6,2).$

Next, you can find the chi-squared statistic $Q =\sum_{i=0}^5 \frac{(F_i - E_i)^2}{E_i},$ which is approximately distributed as $\mathsf{Chisq}(\nu=4).$ [Ordinarily, a chi-squared test with 6 categories would have $\nu = 6-1 = 5,$ but you have used the data to estimate parameter $p,$ so you 'lose' a degree of freedom for that and $\nu = 4.]$

I got $Q = 1.1815.$ The critical value for a chi-squared test with $\nu = 4$ at the 5% level is the 95th percentile $c = 9.487$ of $\mathsf{Chisq}(\nu=4).$ You can find this number in printed tables of the chi-squared distribution or using software (as with R below).

qchisq(.95, 4)
9.487729

This means that you would reject the null hypothesis that the data are consistent with $\mathsf{Binom}(n=5, p=0.495)$ only if $Q \ge c = 9.487.$

There is one remaining difficulty. The chi-squared test is usually deemed to be accurate only if all expected counts exceed 5. Your first and last $E_i$s are too small. One cure for this is to combine 'categories' $0$ and $1$, and 'categories' $4$ and $5.$ In each tail, combine categories by adding the two observed frequencies and adding the two expected frequencies.

You will now have four categories and $\nu = 4-1 -1=2$ degrees of freedom. Re-compute $Q$ and find the new $c$ (as below). [According to my computations, you will still not reject $H_0.]$

qchisq(.95, 2)
[1] 5.991465