Using correlation as a distance metric

40 Views Asked by At

I was trying to come up with a distance measure $d(X;Y)$ based on correlation $\rho_{XY}\in[-1;1]$ satisfying the following conditions:

  1. $d(X;Y)\ge0$ $\forall X,Y$
  2. $d(X;X)=0$
  3. $d(X;Y)=d(Y;X)$
  4. $d(X;Z)\le d(X;Y)+d(Y;Z)$
  5. When $\rho_{XY}\rightarrow0$, $d(X;Y)\rightarrow+\infty$
  6. $\rho_{XY}=\pm1\Leftrightarrow d(X;Y)=0$

I quickly constructed the function $d(X;Y)=\frac{1}{|\rho_{XY}|}-1$, which, as far as I know, satisfies every criteria above, except for the triangle inequality, which I am struggling to prove. Could anyone point out a way to prove the triangle inequality for this function or a way to show that it does not hold always and my idea isn’t appropriate?

My question essentially is $$\frac{1}{|\rho_{XZ}|}+1 \overset{?}{\le} \frac{1}{|\rho_{XY}|} + \frac{1}{|\rho_{YZ}|}$$

1

There are 1 best solutions below

0
On BEST ANSWER

That's not true: it's possible that $\rho_{XZ} = 0$ while $\rho_{XY} \neq 0$ and $\rho_{YZ} \neq 0$. For example, if $X$ and $Z$ are independent normally distributed variables, and $Y = X + Z$, then left part is infinite, and right part is finite.

The standard way to define distance between random variables is $d(X, Y) = |\sigma_{X - Y}|$ - generated by using covariance as inner product. This, however, doesn't satisfy your properties 5 and 6.

Your properties 4 and 5 can't be satisfied simultaneously, as we can find variables $X$, $Y$ and $Z$ s.t. $\rho_{XY} = \rho_{YZ} = \frac{1}{3}$, but $\rho_{XZ}$ can be arbitrarily close to $0$.