Can the standard deviation change if the entire dataset is replicated?

1.2k Views Asked by At

Let's say I have a dataset :

{ a, b, c, d, e }.

I calculate the standard deviation of the set and name it S.

If I replicate my dataset and now use this one : { a, b, c, d, e, a', b', c', d', e' } with a' = a, b' = b and so on... which is basically :

{ a, a, b, b, c, c, d, d, e, e }

Will the standard deviation of this new dataset always be the same as S, the standard deviation of the original dataset ?

1

There are 1 best solutions below

4
On BEST ANSWER

If you mean the sample standard deviation (and sample variance), then it will actually decrease a bit. That's because the additional observations make us a bit more confident that the sample mean $\hat \mu$ is close to the true mean $\mu$.

Since the sample standard deviation is a measure of the deviations from the sample mean, there is a slight bias here: the sample mean is presumably better fitted to our data than the true mean. The doubled observations increase our confidence in the accuracy of the sample mean, and therefore they decrease this bias.

Specifically, in formulas, the sample standard deviation is given by $$ \hat\sigma = \sqrt\frac{\sum_{i = 1}^n(x_i - \hat \mu)^2}{n-1} $$ See that $n-1$ in the denominator? That's how we cope with the bias mentioned in the above paragraphs. As we double the observations, the sum in the numerator doubles, while the denominator is slightly more than doubled, resulting in a net decrease in sample standard deviation.