Show a closer correlation between random variables

56 Views Asked by At

Thank you for paying attention on my post!


The Original Question (informal):

I have $(2n+1)$ random variables $X, Y_1, \cdots, Y_n, Z_1,\cdots,Z_n$ satisfying that

$\forall i,j,Cov(X,Y_i)=\sigma^2\rho(X,Y_i)>Cov(X,Z_j)=\sigma^2\rho(X,Z_j)\ge 0$,

$\forall i,Var(X)=Var(Y_i)=Var(Z_i)=\sigma^2$,

and $\forall i\ne j, Cov(Y_i,Y_j)=Cov(Z_i,Z_j)$ (this condition can be removed in some situations),

where $\sigma\in\mathbb{R}^+$ is a constant, $\rho$ denotes the Pearson's correlation coefficient.

In addition, $\rho(X,Z_i)$s are positive numbers convergent to $0$ in some situations. (We can discuss the case of $\forall i,\rho(X,Z_i)\equiv0$.) And $\rho(X,Y_i)$s are positive constants and not larger than $1$.

Is there a way to show that for any weighting factors $w_i$s, the weighted variable $Y=\sum_{i}w_iY_i$ is more closely correlated to $X$ than $Z=\sum_{i}w_iZ_i$?


My Efforts and Supplemental Information:

There may be some intuitions:

$(A1)$ All the $Y_i$s are more closely related to $X$, compared to $Z_i$s;

$(A2)$ $Y=\sum_{i}w_iY_i$ has a closer relation to $X$, compared to $Z=\sum_{i}w_iZ_i$.

(For example, $Y_i$s are actually scaled versions of $X$, and $Z_i$s are some values of random noise.)

However, since $w_i$s are not limited to be positive or negative, it is difficult to show this through some metrics like Pearson's correlation coefficient.

In other words, I may know that if all $w_i$s are non-negative, the above $(A2)$ can be revealed by the Pearson's correlation coefficient or covariance:

\begin{align*} Cov(X,Y)&=\sum_i{w_iCov(X,Y_i)}>\sum_i{w_iCov(X,Z_i)}=Cov(X,Z),\\ \rho(X,Y)&=\frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}=\frac{\sum_i{w_iCov(X,Y_i)}}{\sqrt{Var(X)Var(Y)}}\\ &>\frac{\sum_i{w_iCov(X,Z_i)}}{\sqrt{Var(X)Var(Z)}}=\frac{Cov(X,Z)}{\sqrt{Var(X)Var(Z)}}=\rho(X,Z). \end{align*}

But the $w_i$s are not guaranteed to be non-negative. :(

There are some things that should not be changed:

$(B1)$ The weighting factors $w_i$s are randomly produced and can be positive or negative.

$(B2)$ The weighted summation form of $Y=\sum_{i}w_iY_i$ and $Z=\sum_{i}w_iZ_i$ should not be changed or modified.

$(B3)$ I may want to show $(A2)$ through some metrics/indices/measures (not limited to Pearson's correlation coefficient or the covariance).

Therefore, I am trying in the following aspects:

$(C1)$ (Change the goal) I am finding some other variables $Y^\prime$ and $Z^\prime$ constructed upon $Y$ and $Z$ (for example, $Y^\prime=Y^2$ and $Z^\prime=Z^2$), and trying to show that $Y^\prime$ is more closely related to $X$, compared to $Z^\prime$.

$(C2)$ (Change the metrics/indices/measures) Beyond Pearson's correlation coefficient and the covariance, I am trying to find some other metrics/indices/measures which can reflect that $Y$ is more closely related to $X$, compared to $Z$.

But after a long struggle (about a week), I can still not find a solution to show something like $(A2)$.

Is there a way to modify something or made some operations like $(C1)$ and $(C2)$ to show some conclusions like $(A2)$?

I am still thinking and finding ...


Why do I try to find a solution toward a conclusion like $(A2)$?

This is actually an intermediate step in my work.

I am stuck by it for a long time.

Now I am feeling that my thoughts may be in a muddle.

I am refining and re-organizing my above words and sentences to make the expressions clearer.

Any ideas/suggestions about "explicitly showing a closer relation" are very welcomed.

Thank you very much for reading such a long informal post!

1

There are 1 best solutions below

2
On BEST ANSWER

I would say that the claim that, for any weighting factors $w_1, \dots, w_n \in \mathbb{R}$, the weighted variable $Y = \sum_i w_iY_i$ is more closely related to $X$ than $Z = \sum_i w_iZ_i$ given your constraints is not correct. I'll give a counter-example using covariance.

Suppose $n = 2$, and that $Y_1, Y_2$ are independent $N(0, \sigma^2)$ random variables. Let $$X = \frac{\sqrt{2}}{2}Y_1 + \frac{\sqrt{2}}{2}Y_2.$$ Also define $$Z_1 = \frac{99}{100}Y_1 + \frac{\sqrt{199}}{100}V_1, Z_2 = \frac{1}{100}Y_2 + \frac{\sqrt{9999}}{100}V_2,$$ where $$V_1 \sim N\left(0, \sigma^2\right), V_2 \sim N\left(0, \sigma^2\right)$$ are independent random variables.

We have the following quantities: $$Var(Y_1) = Var(Y_2) = \sigma^2.$$ $$Var(X) = Var\left(\frac{\sqrt{2}}{2}Y_1 + \frac{\sqrt{2}}{2}Y_2\right) = \frac{1}{2}\sigma^2 + \frac{1}{2}\sigma^2 = \sigma^2.$$ $$Var(Z_1) = Var\left(\frac{99}{100}Y_1 + \frac{\sqrt{199}}{100}V_1\right) = \frac{9{,}801}{10{,}000}\sigma^2 + \frac{199}{10{,}000}\sigma^2 = \sigma^2.$$ $$Var(Z_2) = Var\left(\frac{1}{100}Y_2 + \frac{\sqrt{9{,}999}}{100}V_2\right) = \frac{1}{10{,}000}\sigma^2 + \frac{9{,}999}{10{,}000}\sigma^2 = \sigma^2.$$ We also have $$Cov(Y_1, X) = Cov\left(Y_1, \frac{\sqrt{2}}{2}Y_1 + \frac{\sqrt{2}}{2}Y_2\right) = \frac{\sqrt{2}}{2}Cov(Y_1, Y_1) = \frac{\sqrt{2}}{2}\sigma^2.$$ $$Cov(Y_2, X) = Cov\left(Y_2, \frac{\sqrt{2}}{2}Y_1 + \frac{\sqrt{2}}{2}Y_2\right) = \frac{\sqrt{2}}{2}Cov(Y_2, Y_2) = \frac{\sqrt{2}}{2}\sigma^2.$$ $$Cov(Z_1, X) = Cov\left(\frac{99}{100}Y_1 + \frac{\sqrt{199}}{100}V_1, \frac{\sqrt{2}}{2}Y_1 + \frac{\sqrt{2}}{2}Y_2\right) = \frac{99\sqrt{2}}{200}Cov(Y_1, Y_1) = \frac{99\sqrt{2}}{200}\sigma^2.$$ $$Cov(Z_2, X) = Cov\left(\frac{1}{100}Y_2 + \frac{\sqrt{9{,}999}}{100}V_2, \frac{\sqrt{2}}{2}Y_1 + \frac{\sqrt{2}}{2}Y_2\right) = \frac{\sqrt{2}}{200}Cov(Y_2, Y_2) = \frac{\sqrt{2}}{200}\sigma^2.$$ Therefore $$Cov(Y_1, X) = Cov(Y_2, X) > Cov(Z_1, X) > Cov(Z_2, X) > 0.$$ And finally $$Cov(Y_1, Y_2) = 0 = Cov(Z_1, Z_2).$$ So all the constraints are met.

Now let $w_1 = 1, w_2 = -1$. We have $$Cov(Y, X) = Cov(Y_1 - Y_2, X) = Cov(Y_1, X) - Cov(Y_2, X) = 0,$$ $$Cov(Z, X) = Cov(Z_1 - Z_2, X) = Cov(Z_1, X) - Cov(Z_2, X) = \frac{98\sqrt{2}}{200}\sigma^2.$$ So $X$ is uncorrelated with $Y$ but $Z$ is correlated with $X$.

I've generated a thousand $(X, Y, Z)$ triplets and the relationship is clearly much stronger for $Z$ than for $Y$. <span class=$Y$ vs $X$" /> <span class=$Z$ vs $X$" />