Standard deviation of sum of random variables vs. standard deviation of linear transformation of single random variable

347 Views Asked by At

The standard deviation of a linear tranformation $Y=a+bX$ of a random variable $X$ is $\sigma_X= |b| \cdot \sigma_X$ so, for example, if $Y=4X$, then $$ \sigma_{4X} = 4\cdot \sigma_X \tag{1} $$

However, the standard deviation of a sum of random variables $X_1$ and $X_2$ is $\sigma_{X_1+X_2} = \sqrt{\sigma_{X_1}^2+\sigma_{X_2}^2}$. It seems to me that if you take a sum of four random variables $X$, which are all the same, this formula would lead to \begin{align*} \sigma_{X+X+X+X} &= \sqrt{\sigma_{X}^2+\sigma_{X}^2+\sigma_{X}^2+\sigma_{X}^2} \\ &=\sqrt{4 \cdot \sigma_{X}^2} \\ &=2 \cdot \sigma_{X} \tag{2} \end{align*} but summing four random variables $X+X+X+X$ is simply $4X$, and we showed in equation (1) that the standard deviation of $Y=4X$ is actually $4 \sigma_X$, not $2 \ \sigma_X$. Is this a contradiction? Where did our calculations go wrong?

2

There are 2 best solutions below

2
On BEST ANSWER

You have forgotten the covariances

$\sigma_{X_1+X_2+X_3+X_4}$

$= \sqrt{\sigma_{X_1}^2+\sigma_{X_2}^2+\sigma_{X_3}^2+\sigma_{X_4}^2+2cov(X_1,X_2)}$

$\overline{+2cov(X_1,X_3)+2cov(X_1,X_4)+2cov(X_2,X_3)+2cov(X_2,X_4)+2cov(X_3,X_4)}$

Let $X=X_1=X_2=X_3=X_4$, then we get

$\sigma_{X+X+X+X}=\sigma_{4X}$

$= \sqrt{\sigma_{X}^2+\sigma_{X}^2+\sigma_{X}^2+\sigma_{X}^2}$ $\overline{+2cov(X,X)+2cov(X,X)+2cov(X,X)+2cov(X,X)+2cov(X,X)+2cov(X,X)}$

$= \sqrt{\sigma_{X}^2+\sigma_{X}^2+\sigma_{X}^2+\sigma_{X}^2+2\sigma_{X}^2+2\sigma_{X}^2+2\sigma_{X}^2+2\sigma_{X}^2+2\sigma_{X}^2+2\sigma_{X}^2}=\sqrt{16\sigma^2_X}=4\sigma_X$

2
On

Whoa! Even in the simplest case of adding two identically distributed and independent random variables, it is typically not true that $X+X$ simplifies to $2X$.

Consider for example the results of flipping a fair coin. The outcome can be modeled by random variable $X$ that equals 1 if a head is flipped, and equals 0 otherwise. Assume both outcomes are equally likely.

(i) $2X$ takes the values 2 and 0 with equal probability, each 1/2.

(ii) $X+X$ takes the values 0,1,2 with probabilities 1/4, 2/4,1/4 respectively.

Since $2X\ne X+X$ it should not be surprising that usually $Var(X+X)\ne Var (2X)$

In general when modeling the sum of two or more independent random variables, Var(kX) =k^2 Var (X) and Var (X+Y)= Var (X) + Var(Y), and likewise for more terms

Thus Var(X+X+X+X)= 4Var(X) but Var(4X)= 16 Var(X).