Why are correlation and standard deviation unitless?

3.8k Views Asked by At

I am sorry if this is too complicated a question to answer simply.

I am interested in understanding the mathematical intution behind why standard deviation and correlation are unitless when the metrics from which they are directly calculated (variance and covariance) both have units attached to them.

Its not really obvious as to why taking the square root of the variance produces the standard deviation and why is it unitless, and why does dividing the covariance by the product of the standard deviations always gives a number between $-1$ and $+1$ and why it is the correlation and why is it unitless.

I am unable to find any texts that offer any simple explanation for this. Hence I need some help understanding it as simply as possible.

1

There are 1 best solutions below

6
On

Standard deviation isn't unitless. If I have some random variable measured in meters, and the standard deviation is $1$, then the same variable converted to feet will have stqandard deviation $3.28$. The standard deviation has the same unit as the variable, and will scale with them when you change units.

The correlation coefficient, on the other hand, is unitless. If you have two random variables measured in meters, and the correlation is $0.7$, then the correlation is still $0.7$ if you convert the samples to meters. Or even if you just convert one of the variables. It is unitless because you take the covariance and divide by the product of standard deviations. Scaling the value of the samples (by changing units) will shange the covariance in exactly the same way that it changes the product of the standard deviations, and the division makes the changes cancel out.

The correlation of two variables $X$ and $Y$ lie between $-1$ and $1$ because the covariance necessarily lies between $-\sigma_X\sigma_Y$ and $\sigma_X\sigma_Y$: If $Y = aX + b$ for $a, b\in \Bbb R, a\neq 0$, then $\operatorname{cov}(X, Y) = \pm \sigma_X\sigma_Y$ (depending on the sign of $a$), and any deviation from this will cause a covariance closer to $0$. However, the covariance isn't unitless, meaning a rescaling of one or two of the variables will result in a different covariance.