Motivation for the definition of the Pearson correlation coefficient

245 Views Asked by At

Let $X$ and $Y$ be two random variables with joint distribution $P_{X,Y}$ and marginal distributions $P_X$ and $P_Y$. The Pearson correlation coefficient is defined to be $$\rho_{X,Y}=\dfrac{\mathbb{E}(XY)-\mathbb{E}(X)\mathbb{E}(Y)}{\sigma_X\sigma_Y}\tag{1}$$ where $\mathbb{E}$ means the mean value and $\sigma_X,\sigma_Y$ are the respective standard deviations.

This is meant to be a quantifier of correlation. As put in Wikipedia's page:

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data.

My question is: given this intuitive idea about correlation, what is the motivation to define (1) as a quantifier of correlation? How do we motivate definition (1)?

It is also hinted upon on the linked page that $\rho_{X,Y}$ is "Mathematically, it is defined as the quality of least squares fitting to the original data". But I still fail to see wy this would be a good quantifier of correlations.

2

There are 2 best solutions below

6
On BEST ANSWER

It helps to instead write an equivalent definition,$$\rho_{X,\,Y}=\frac{\Bbb E((X-\Bbb EX)(Y-\Bbb EY))}{\sigma_X\sigma_Y}.$$This is a covariance divided by a product of standard deviations. I've explained before that covariance is an inner product (with some qualifying statements you'll find at that link). Then standard deviation is like a squared length, so the above formula is like$$\cos\theta=\frac{a\cdot b}{|a||b|}.$$In particular, perfectly correlated variables are "parallel" in a vector space of random variables, whereas uncorrelated ones are orthogonal.

0
On

Initially, Pearson's correlation coefficient was introduced in the context of linear regression (e.g. Pearson, 1986): $$ Y=\alpha+\beta X+\epsilon, $$ where $\mathsf{E}[\epsilon\mid X]=0$. In this case $$ \beta=\rho_{X,Y}\times\frac{\sigma_Y}{\sigma_X}. $$ So $\rho_{X,Y}$ is a measure of linear dependence between $X$ and $Y$ and typically fails to account for nonlinear dependence, e.g. when $Y=X^2$ and the distribution of $X$ is symmetric about the origin, $\rho_{X,Y}=0$.