I am not good at maths. So, don't mind if it is silly. Suppose we have a mode choice of transport given - as such as people who take
- bike - 1%
- car - 45%
- Walk- 54% This is the representation of the population of 20,000 people.
Now, suppose, I want to translate this choice to 20 people. Will it be the same (1, 45, 54%)? and how do I check if it's right If there are 20 new people every time for 100 iterations?
Sampling. Suppose that the true proportions in categories B, C, W of a population are $0.01, 0.45, 0.54,$ respectively.
Then if you take a huge random sample of size $n = 20,000$ from the population, you might get the counts below. (Sampling and computations in R.)
Then corresponding proportions would be $0.0099, 0.4533, 0.5368,$ which are very close to the population proportions $0.01, 0.45, 0.54.$ (Discrepancies seem like rounding errors.)
However, if I take a tiny sample of only size 20, then I will not get proportions so close to the true population proportions.
Testing. By contrast, a question can arise in research about the validity of a hypothetical population proportion, perhaps arising from theory about human behavior or from a supposition that behavior has not changed since the last large survey was done ten years ago.
From whatever source suppose our null hypothesis is that the population proportions are $0.01, 0.45, 0.54.$ A take a moderate-sized random sample of size $n = 200.$ And I get counts, $5, 100, 65.$
The proportions don't agree exactly with the hypothesis. The question is whether the disagreement is sufficiently large to reject the null hypothesis as untrue, or whether random sampling error can account for the discrepancy.
Oberved and expected counts. I will compare by observed counts with the expected counts according to the null hypothesis. I get the expected counts by multiplying the sample size 200 by the hypothetical population proportions. (I happen to get integers here, but expected counts should not be rounded to integers if they're not integers.)
Test statistic. In a chi-squared test, the chi-squared statistic is $$Q = \sum_{i=1}^K \frac{(X_i = E_i)^2}{E_i},$$ where $K$ is the number of categories, $X_i$ are the observed counts and $E_i$ are the corresponding expected counts. For our data $Q = 7.18.$
Distribution of test statistic. Provided that all of the $E_i > 5,$ we have $Q \sim \mathsf{Chisq}(2),$ the chi-squared distribution with $K-1 = 2$ degrees of freedom.
Critical value. The critical value $c = 5.991$ for a test at the 5% level is the value that cuts 5% of the probability from the upper tail of this distribution. [You can find this value in printed tables of chi-squared distributions, or by using software, as below.]
Because we have $Q = 7.18 > 5.99,$ we reject the null hypothesis. We say that the counts we observed are not consistent with the null hypothesis.
P-value. Another way to test the null hypothesis is to get the P-value. It is the probability of a more extreme result than observed. Specifically, it is $P(Q \ge 7.18),$ computed using $A \sim \mathsf{Chisq}(2).$ For our test, the P-value is $0.276 < 0.05 = 5\%,$ so we can use the P-value to reject the null hypothesis.
You usually can't get exact P-values from printed chi-squared tables. But you may be able to see from tables that the P-value is between 0.01 and 0.05. Statistical software usually gives a P-value as part of the output from the test procedure.
The plot below shows the density function of $\mathsf{Chisq}(2).$ The vertical red dotted line shows the critical value, and the vertical black line show the value of the test statistic. The area under the density curve to the right of the red line is 5%' the area to the right of the black line is the P-value.
Chi-squared test in R. Below is output from the procedure
chisq.testin R. (It differs slightly from results above because of differences in rounding.)X-squared = 7.1759, df = 2, p-value = 0.02765
There is a warning message that the P-value may not be exactly correct. One of our expected counts is $2$ not $> 5,$ so $Q$ might not have exactly the distribution $\mathsf{Chisq}(2).$ Many textbooks say it is OK if most of the $E_i > 5$ and all $E_i > 3.$ So we should have used a slightly larger sample.
In such cases, the
chisq.testin R can simulate the the P-value. Abbreviated output is shown below:There seems no doubt we can reject at the 5% level.