Variability of sum of independent random variables

194 Views Asked by At

I am trying to understand the Central limit theorem, especially the ${1\over \sqrt{n}}$ coefficient of a random variable $S_n = {1\over \sqrt{n}}(\sum_{i=1}^n {{X_i - \mu} \over {\sigma}})$.

Lets assume we have n independent random variables $X_i$ all with the same probability distribution, $EX_i = 0$ and $var X_i = 1$ which, in other words, represent an n-times repeated experiment. If we define a random variable $Y = {1 \over n}\sum_{i=1}^nX_i$, its expected value stays the same but $var Y = {n \over n^2} = {1 \over n}$. I think I do not understand the variability computation here very well.

Surely $var Y = {1 \over n^2} var (X_1 + ... + X_n) = {1 \over n^2}(var X_1 + ... + var X_n) = {n \over n^2}$, but shouldn't this be the same as ${1 \over n^2} var(n X_1) = {n^2 \over n^2} var X_1 = 1$? What is the difference?

Thank you very much.

1

There are 1 best solutions below

0
On

To understand why they are not the same, suppose I have conduct two experiments, $A$ and $B$. In Experiment $A$, I measure the heights of a random sample of $n = 100$ people. In Experiment $B$, I measure the height of a single person selected at random.

Do you expect the variability of the sample mean in Experiment $A$ to be the same as the variability of the sample mean (the single measurement) in Experiment $B$? Of course not: The variability of the two experiments can be intuitively understood as how much we expect the sample mean to change each time we conduct the experiments. In Experiment $A$, what we will generally observe is that the sample will consist of people who are both shorter and taller than average, and that the mean of that sample will have a tendency to be closer to the mean of the population from which that sample was drawn. If we repeat Experiment $A$ many times, we intuitively expect to see the sample mean to be less influenced by random outliers; whereas if we repeat Experiment $B$, the variability of the sample is identical to the variability of a single randomly selected individual's height.

Here is an experiment for you to try. Take a (fair) coin and toss it twice. Calculate the proportion of heads you obtained, so this will be either $0$, $1/2$, or $1$. Repeat this experiment $10$ times, and record the resulting proportions in a list.

Now take that same coin, and toss it $20$ times. Calculate the proportion of heads you obtained, so this number will be among $\{0, 0.05, 0.10, 0.15, \ldots, 0.95, 1\}.$ Also repeat this experiment $10$ times, and record the resulting proportions in a list.

Next, compare the variability of these two lists: even both experiments were conducted with a fair coin, you can immediately tell that the list for the second experiment is extremely unlikely to have a $0$ or $1$ in it; that most of the proportions in the list will be between around $0.40$ to $0.60$; whereas for the first list, you are much more likely to observe $0$ or $1$. You can also calculate the sample variances of the two lists and see that the second one has a much smaller variance than the first.