I can't understand the intuition behind Pearson Product-moment correlation coefficient Formula
for bivariate data. The formula is :
$$\rho = \frac{\mathrm{cov}(X,Y)}{S_x \cdot S_y} $$
where cov is covariance, $S_x$ and $S_y$ are standard deviations of $x$ and $y$.
I want to know how that formula come. I searched on net but couldn't find how that formula came.
First the denominator can be removed, if the values in $X$ and $Y$ are standardized, call them $Z_x = X/S_x$ and $Z_y = Y/S_y$ , such that the standard-deviations of $S_{Z_x}=1$ and $S_{Z_y}=1$.
Then the correlation is simply the sum of the products of the individual values divided by n $$ \rho = \sum_{k=1}^n Z_{x,k}*Z_{y,k} / n $$ or the average of something like the common excess from the mean where we understand the $Z_{x,k}$ and $ Z_{y,k}$ as such excesses.
I like the model of $\rho$ as the cosine of an angle between an $X$-vector and $Y$-vector in the multidimensional euclidean space with origin at zero and head at the coordinate of the $Z_x$-values resp of the $Z_y$-values , where each observation (the k'th case) defines another dimension/axis. (That also indicates, why the individual observations/measures should be (conceptionally) independent of each other so that the axes in that n-dimensional space are rectangular to each other). Then it is also immediately obvious, that there is a rotation of the two vectors (as fixed wire-model) in this space such that we need only two dimensions, because two vectors from the origin define a plane only (Immediately see the generalization to more vectors)