Why does the correlation coefficient work?

205 Views Asked by At

I understand that one can calculate the correlation coefficient $r_{xy}$ between observations $x_i$ and $y_i$ with

$$ r_{xy} = \frac{\sum_{i=1}^n (x_i - x_m)(y_i - y_m)}{\sqrt{(\sum_{i=1}^n (x_i - x_m)^2)(\sum_{i=1}^n (y_i - y_m)^2)}}$$

where $x_m,y_m$ is the mean value. I have read that if the absolute value from $r_{xy}$ is close to 1 then this means that the observations are probably correlated, and if the value is close to 0 then its not correlated:

enter image description here

However, this is not obvious to me when looking at the formula. Can someone explain to me why it follows from the formula that $$|r_{xy}| \approx 1$$ means correlation and $$|r_{xy}| \approx 0$$ means no correlation?

1

There are 1 best solutions below

1
On BEST ANSWER

You're computing the cosine of the angle between two vectors; this is obvious if you know dot products well. A high positive correlation means the vectors are nearly parallel; a very negative correlation means they're nearly antiparallel.