Scaling Normal distributions does not seem to be a problem. The variability can be derived as follows: VAR[y] = VAR[sX] = s^2 * VAR[X] = s^2 * sigma_squared
I understand up until this point, but the implications seem counterintuitive to me. Suppose we have a normally distributed variable. Next, I halve the distribution and sum up two of the resulting distributions. Doing this, I expect to arrive at the original distribution, but I do not.
- VAR[Y] = VAR[0.5 X] = O.25 * sigma_squared
- VAR[Y] + VAR[Y] = 0.25 * sigma_squared + 0.25 * sigma_squared = 0.5 * sigma_squared
A concrete example that I would like to apply this to is the following. I have the distribution of people being allowed into an amusement park on weekends. It is normally distributed with average AVG and variance sigma_sq. Assuming that the distributions of people being allowed in on both days are the same, I derive that they follow the following distributions:
- AVG_SATURDAY = AVG_SUNDAY = 0.5 * AVG
- VAR_SATURDAY = VAR_SUNDAY = 0.25 * sigma_sq
Now, suppose that I had different measurements. I did not measure the total amount of people being allowed in on weekends, but I measured the total amount of people allowed in on Saturdays and Sundays separately. If I wanted to know the distribution of people being allowed in on weekends, I would sum up the distributions:
- AVG_WEEKEND = AVG_SATURDAY + AVERAGE_SUNDAY
- VAR_WEEKEND = VAR_SATURDAY + VAR_SUNDAY
As a result, I have conflicting variances for the same distribution which I measured in two different ways. What am I doing that is not allowed?
This should not be surprising. When you divide a random variable by $2$ you reduce the variance by $4$ as you say. When you add two of the divided variables together you recover the original mean but the variance is smaller that the variance on just one. This reflects the fact that the more variables you add, the more the sum tends to cluster around the mean because some are above and some below.