Comparing expected counts to observed counts

70 Views Asked by At

The question is as follows: A restaurant offers 7 different dishes and predicts the dishes will be ordered in the following proportions: 1 (25%), 2 (20%), 3 (10%), 4 (15%), 5 (5%), 6 (6%), 7 (15%).

I am then given the actual proportions in which the dishes are ordered and asked to conduct the appropriate hypothesis test to determine whether the owners' belief regarding the order proportions is correct, then find the p-value (or bounds for the p-value) associated with this test statistic.

I am not asking someone to do this for me, simply to give me the steps. I initially thought to create an ANOVA table but then was unsure what I would use for the grand mean to then compute MSTR and MSE. Additionally, if I do need to put bounds on the p-value, I'm unsure of how I would go about doing that.

Please help direct me! Thank you so much.

1

There are 1 best solutions below

0
On

Chi-squared GOF test. You don't say how many dishes $n$ are involved in the percentage data provided. I suppose you intend to do a chi-squared goodness-of-fit (GOF) test. If so, the 'expected' counts for the various dishes would be $E_1 = .25n,\; E_2 = .20n,$ and so on. (Note: Do not round expected counts to integers.)

If $n$ is large enough that all seven of the $E_i > 5,$ then you can use each dish as a category, and the degrees of freedom for the approximating chi-squared distribution would be $k - 1 = 7 - 1 = 6.$ (If some of the $E_i$ are too small, combine dishes into 'categories' of two or more until you get all $E_i$ large enough. Then $k$ will be reduced accordingly, and the degrees of freedom also. For example, with $n = 55,$ you would have to combine dishes 5 an 6 into a single category with 11%.)

GOF test statistic. These expected counts are to be compared with the observed counts $X_i,$ which are the numbers of each type of dish, not the proportions. Then the chi-squared GOF statistic is

$$Q = \sum_{i=1}^k \frac{(X_i - E_i)^2}{E_i},$$

which is approximately distributed as $Chisq(df = k - 1).$

Larger values of this computed value of $Q$ indicate poorer fit to your probability model upon which the $E_i$ are based. The question is how large $Q$ can be before the fit is so bad that you 'reject' the model.

P-values. Now, for the p-value. Suppose you have $df = 6$ and $Q = 10.7.$ The p-value is the area under the density curve of $Chisq(6)$ to the right of 10.7.

You can use a printed 'chi-squared' table to 'bracket' the p-value within 'bounds'. In the table I'm looking at, I notice the the row for $df = 6$ has entries for 10.64 and 12.59. Headers of the corresponding columns show right-tail probabilities .10 and .05, respectively. So you can say the p-value is between .10 and .05; a lot closer to .10 than to .05, but still not exact. Thus, you would not reject (at the 5% level) the model consisting of the percentages given at the start of your Question.

If you have software available, you can use the CDF function to find the the exact probability below 10.7 (and subtract from 1). In R statistical software the CDF for chi-squared distributions is called pchisq, so the following R code gives the desired result: the exact p-value is .098 (which is indeed between .10 and .05).

 1 - pchisq(10.7, 6)
 ## 0.09810273

Illustrating tail areas under PDF curve. The figure below shows the PDF of $Chisq(6)$. vertical dotted red lines are at 10.64 and 12.59 (the values in found in row $df = 6$ of the table); areas under the curve to the right of these lines are .10 and .05, respectively. The vertical blue line is at $Q = 10.7$, and the area to the right of this line is the p-value .098.

enter image description here

Note: Analysis of variance (ANOVA) has nothing to do with this problem.