Can a Random Variable be a Sample or just a Single Value?

109 Views Asked by At

In my introductory statistics course, I've seen statements like 'Y1 . . . , Yn is a simple random sample without replacement, of size n, from the population.' and that the CLT applies if 'Y1, . . . , Yn are independent and identically distributed random quantities'.

The first suggests that when you say you've 'drawn a random variable from a population', your variable represents a single member of that population, randomly chosen. The second, and other definitions of the CLT or E[X] for example, suggest that a random variable drawn from a population is a random$\ sample$ whose members are randomly chosen.

When someone talks of taking a 'random variable' from a population, is the exact meaning of what the r.v. is flexible, and depends on the context (e.g. on the particular formula) the variable is used in? Or is it more fixed? It's just been confusing me because as I try to understand the meaning of E[X], the CLT etc, I keep getting confused as to $what$ from a statistical population each formula applies to.

Many thanks indeed, really appreciate your help.

1

There are 1 best solutions below

0
On

Suppose you have a random sample $X_1, X_n, \dots, X_n$ from a population with the distribution $\mathsf{Norm}(\mu, \sigma).$ Then each of the $n$ values $X_i$ is a random variable distributed $X_i \sim \mathsf{Norm}(\mu, \sigma).$

Sometimes the terminology 'independent and identically distributed' is used. The 'independent' part means that one of the $X_i$ values does not depend on any of the others. (For example, you wouldn't notice that $X_1$ is 'big' and the deliberately look for an $X_2$ that is a little smaller.) The 'identically distributed' part means that each of the $X_i$ viewed separately has the same distribution as the population.

Once you have this sample of size $n$, then you can find the sample mean $\bar X = \frac{\sum_{i-1}^n X_i}{n}.$ This is a different random variable and it has the distribution $\bar X \sim \mathsf{Norm}(\mu. \sigma/\sqrt{n}).$ (The Central Limit Theorem says that the distribiton of $\bar X$ tends to be nearly normal for large $n$ even if the original population is not normal.)

Example: Suppose your population is $\mathsf{NORM}(\mu = 70, \sigma = 7)$ (maybe weights of male college swimmers in kg). Then suppose you sample $n = 9$ of them at random and find the sample mean weight $\bar X$ of the nine observations. Then $\bar X \sim \mathsf{Norm}(70, 7/3).$ The figure below shows the density function of the population curve (blue dotted) and the density function of the sampling distribution of $\bar X.$

enter image description here

Roughly speaking, the density function for $\bar X$ is 'three times as tall' as the population density, and so it must be 'a third as wide'; both curves enclose an area of 1.

About 68% of the individual swimmers in the population weigh between 63 and 77kg. But if you take the average of nine swimmers, then it is very likely that the average will lie between 63 and 77kg.