Definition of sampling distribution and its role

640 Views Asked by At

Define the sampling distribution of statistics T as follows: Assume $T(X^n)$.

The probability distribution of a statistic T is called the sample distribution of T.

I am trying to understand this definition. We have n random variables. So say toss a coin and each realization is stored as data in $Xi$ where 0 means heads and 1 means tails.

Suppose the statistics of interest is the sample mean. Then, is the sampling distribution basically what the X-bar produces in my $K$ experiments?

Is this understanding correct? Now what is the value of looking at the statistics such as sample mean or variance? Is this because ultimately in statistical inference, under certain assumptions, they are enough to estimate population mean and variance?

I am also confused with the clause; "If assumed a normal population, we can derive the exact sampling distribution for sample mean and variance for any given finite sample size n."

Any 2 cent on clarification of the concepts appreciated.

1

There are 1 best solutions below

1
On BEST ANSWER

Although you mention a binomial example, you seem to be mainly interested in sampling from normal populations. So I begin there.

Suppose the popultion distribution is $Norm(\mu=100, \sigma=15),$ so that any one random observation has $E(X_i) = \mu = 100$ and $V(X_i) = \sigma^2 = 15^2 = 225.$

Now suppose you want to make a confidence interval (CI) for $\mu$ or to test a hypothesis involving $\mu.$ Then we use the statistic $T_1 = \bar X$ to estimate $\mu$. One can show that $E(\bar X) = \mu_{T_1} = \mu = 100$ and that $V(\bar X) = \sigma_{T_1}^2 = \sigma^2/n = 225/n.$

Furthermore, because the population is normal, we have $\bar X \sim Norm(\mu_{T_1} = 100, \sigma_{T_1} = 15/\sqrt{n}).$ If $\sigma$ is known, then we have a 95% CI for $\mu$ of the form $\bar X \pm 1.96\sigma/\sqrt{n}.$

Notice that the sampling distribution of $T_1 = \bar X$ is used to find the CI: $$0.95 = P\left(-1.96 \le Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \le 1.96\right) = \dots\\ = P(\bar X - 1.96\sigma/\sqrt{n} \le \mu \le \bar X + 1.96\sigma/\sqrt{n}),$$ where $Z \sim Norm(0,1).$ In this situation $\sigma/\sqrt{n}$ is often called the standard error of $\bar X.$

Now consider a somewhat different situation: If $\sigma^2$ is unknown and estimated by the sample variance $S_X^2,$ then the statistic $$T_2 = \frac{\bar X - \mu}{S_X/\sqrt{n}}$$ has Student's t distribution with $df = \nu = n-1.$ A resulting 95% CI for $\mu$ in this case is of the form $\bar X \pm t^*S_X/\sqrt{n},$ where $\pm t^*$ cut 2.5% from the upper and lower tails of $T(n-1),$ respectively. Also, if we wish to test $H_0: \mu = \mu_0$ against $H_a: \mu \ne \mu_0$ at the 5% level, we reject if $|T| = \frac{|\bar X - \mu_0|}{S_X/\sqrt{n}} > t^*.$ In these situations, $S_X/\sqrt{n}$ is called the (estimated) standard error of the sample mean.

Again, if the data are normal and both $\mu$ and $\sigma^2$ are unknown, then the statistic $$T_3 = \frac{(n-1)S_X^2}{\sigma^2} \sim Chisq(df = \nu = n-1).$$ Then $T_3$ can be used to make a 95% CI for $\sigma^2$ and to test hypotheses about $\sigma^2.$ Specifically, one begins the derivation of the CI with $P(L < (n-1)S_X^2/\sigma^2 < U) = 0.95,$ where $U$ cuts 2.5% of the area from the lower tail of $Chisq(n-1)$ and $U$ cuts 2.5% from its upper tail. Then one 'pivots' to isolate $\sigma^2$ between limits that can be computed from the data and known properties of the distribution of $T_3.$

In your binomial example, you have observed $X$ successes, where $n$ is known and the success probability $\theta$ is not. Then either $X$ or the estimator $\hat \theta = X/n$ of $\theta$ could be called a statistic. Then $X \sim Binom(n, \theta)$ and $\hat \theta$ has an appropriately scaled variant of a binomial distribution. If $n\hat \theta$ and $n(1-\hat \theta)$ are sufficiently large, then $X$ and $\hat \theta$ may be approximately normally distributed.

Various authors have somewhat more specific definitions of 'statistic' than either you or I have used. You should check your textbook and class notes to be sure of the technical definition you are using in your course. The Wikipedia article on 'statistic' seems unfinished at this moment, but even in its current state it may also be of some use.