I have read a couple of hours about random sampling and the distribution and I guess that I have figured it out, but I am not 100 % sure. So, maybe one could cross-check my claims :-)
Assume we have a Population $P$ where we are interested in e.g. the body weight per individual.
- We assume that the distribution of the body weight of the population is $W_0$.
- Let $W$ denote the body weight of an individual within this population (so, $W$ and $W_0$ should be identically distributed, right?)
- Now, we assume that the observed body weight for the $i$th individual is $W_i$. Hence, $W_i$ is also a random variable and $W_i$ has the same distribution as $W$ since $W$ is the distribution of an individual of the population, correct?
The conclusions I draw and where I am not sure about are bold. Thank you very much in advance!
You might believe that weights in a certain population are normally distributed (weights in pounds), with mean $\mu = 150$ and some unknown standard deviation $\sigma.$ To check this belief you might test $H_0: \mu = 150$ against $H_a:\mu \ne 150.$
You randomly sample $n = 100$ subjects from some normal population, obtaining weights $W_i, W_2, \dots W_{100}$ with sample mean $\bar W = \frac 1n \sum_i X_i = 149.65$ and sample standard deviation $S = \sqrt{\frac{1}{n-1}\sum_i (W_i - \bar W)^2} = 20.05.$
So you didn't get exactly $\bar W = \mu = 150.$ The question is whether, considering the variability of weights in this population, the difference between $\bar W$ and $\mu = 150$ is due to chance of whether the difference is 'statistically significant' at the 5% level.
Then you could use a one-sample t test to decide the issue of statistical significance. In this case, the tests statistic is $T = \frac{\bar X - 150}{S/\sqrt{n}} = -0.17681.$
You would reject $H_0$ at the 5% level of significance if $|T| \ge c = 1.984,$ where $T \sim \mathsf{T}(\nu = n-1 = 99),$ Student's t distribution with 99 degrees of freedom. The critical value $c=1.984$ cuts probability $0.025$ from the upper tail of the t distribution. [In R,
qtis the quantile function (inverse CDF) of a t distribution.] We do not reject $H_0$ because $|T|$ is so near the critical value $c.$Also, the P-value of this test is the probability of a value of $T$ might be more extreme (in a positive or negative directions) than the observed value $-0.177.$ [In R,
ptis the CDF of a t distribution.] We do not reject $H_0$ at the 5% level because the P-value exceeds $0.05 = 5\%.$All of this (except for the critical value $c$ of a test at the 5% level), is shown as output to the procedure
t.testin R, as follows.In addition, the output of
t.testshows a 95% confidence interval $(145.67,\, 153.62).$ This indicates the $\mu = 150$ (inside the interval) is a believable value of the population mean based on what we see in the sample.Below is a plot of the density function of the distribution $\mathsf{T}(99).$ The observed value of $T = -0.1768$ is shown as a solid vertical line. The P-value is twice the area under the curve to the left of this line. (The dotted vertical line is as far from 0 as the solid one.)
The critical values, $\pm c = \pm 1.984$ are shown as vertical red dashed lines.
R code for figure:
Notes: (1) In case it is of interest, here is R code used to sample the fictitious data used above:
(2) In case you had advanced information that standard deviations of weights in this population are $\sigma = 20.$ The you could do a z test instead of a t test.
Some people seem to think it is OK to do a z test anytime $n > 30,$ using the sample SD $S$ as if it were the same as $\sigma$ (rarely exactly true). That's a somewhat risky approximate procedure.
For my particular fictitious data, it happens that an approximate z test would not have led to rejecting $H_0.$ R does not have a named procedure for z tests. In case it is of interest, I'll show results from a z test from Minitab statistical software below.