I was trying to come up with a distance measure $d(X;Y)$ based on correlation $\rho_{XY}\in[-1;1]$ satisfying the following conditions:
- $d(X;Y)\ge0$ $\forall X,Y$
- $d(X;X)=0$
- $d(X;Y)=d(Y;X)$
- $d(X;Z)\le d(X;Y)+d(Y;Z)$
- When $\rho_{XY}\rightarrow0$, $d(X;Y)\rightarrow+\infty$
- $\rho_{XY}=\pm1\Leftrightarrow d(X;Y)=0$
I quickly constructed the function $d(X;Y)=\frac{1}{|\rho_{XY}|}-1$, which, as far as I know, satisfies every criteria above, except for the triangle inequality, which I am struggling to prove. Could anyone point out a way to prove the triangle inequality for this function or a way to show that it does not hold always and my idea isn’t appropriate?
My question essentially is $$\frac{1}{|\rho_{XZ}|}+1 \overset{?}{\le} \frac{1}{|\rho_{XY}|} + \frac{1}{|\rho_{YZ}|}$$
That's not true: it's possible that $\rho_{XZ} = 0$ while $\rho_{XY} \neq 0$ and $\rho_{YZ} \neq 0$. For example, if $X$ and $Z$ are independent normally distributed variables, and $Y = X + Z$, then left part is infinite, and right part is finite.
The standard way to define distance between random variables is $d(X, Y) = |\sigma_{X - Y}|$ - generated by using covariance as inner product. This, however, doesn't satisfy your properties 5 and 6.
Your properties 4 and 5 can't be satisfied simultaneously, as we can find variables $X$, $Y$ and $Z$ s.t. $\rho_{XY} = \rho_{YZ} = \frac{1}{3}$, but $\rho_{XZ}$ can be arbitrarily close to $0$.