Correlation maximization (Canonical correlation analysis)

653 Views Asked by At

A question concerning part b):

I replaced $X$ with $X=Xw_x$ and $Y$ with $Y=Yw_y$ and took the derivative wrt $w_x$ and $w_y$. Unfortunately, I wasn't able to get to the end as my dimensions do not match after applying the derivative quotient rule. Then, I looked at the solution and realized that the solution was completely different from mine.

My questions are:

1) Why doesn't the denominator matter? It contains the variables we're optimizing...

2) Why are we setting values for variances (with value 1 in this example)? If they don't matter shouldn't we just maximize the nominator?

3) Why can't I just normally optimize by taking the derivative of the whole Corr expression without any comments about the denominator and nominator and optimize all together?

NB: I know how Lagrange multipliers work, I just don't understand how/why we get the constraints in this example.

1

There are 1 best solutions below

2
On BEST ANSWER

This problem corresponds to the well-known Canonical Correlation Analysis (CCA).

Assuming two zero-mean random vectors ${\bf x}$ and ${\bf y}$, we define $x$ and $y$ as linear combinations of the elements of ${\bf x}$ and ${\bf y}$ (i.e. $x = {\bf x}^T{\bf w}_x$ and $y = {\bf y}^T{\bf w}_y$). What we want in CCA is to find ${\bf w}_x$ and ${\bf w}_y$ that maximize the correlation between $x$ and $y$.

We define the correlation coefficient as: $$ \rho = \frac{E[xy]}{\sqrt{E[x^2]E[y^2]}} =\frac{E[{\bf w}_x^T {\bf x} {\bf y}^T {\bf w}_y]} {\sqrt{E[{\bf w}_x^T {\bf x} {\bf x}^T {\bf w}_x]E[{\bf w}_y^T {\bf y} {\bf y}^T {\bf w}_y]}} =\frac{{\bf w}_x^T E[{\bf x} {\bf y}^T] {\bf w}_y} {\sqrt{{\bf w}_x^T E[{\bf x} {\bf x}^T] {\bf w}_x {\bf w}_y^T E[{\bf y} {\bf y}^T] {\bf w}_y}} =\frac{{\bf w}_x^T {\bf C}_{xy} {\bf w}_y} {\sqrt{{\bf w}_x^T {\bf C}_{xx} {\bf w}_x {\bf w}_y^T {\bf C}_{yy} {\bf w}_y}} $$

Note that $\rho$ is the same regardless of the scaling of ${\bf w}_x$ and ${\bf w}_y$. If, for e.g. we multiply ${\bf w}_x$ by $\alpha$ and ${\bf w}_y$ by $\beta$ we get the original $\rho$:

$$ \rho=\frac{\alpha{\bf w}_x^T {\bf C}_{xy} \beta{\bf w}_y} {\sqrt{\alpha{\bf w}_x^T {\bf C}_{xx} \alpha{\bf w}_x \beta{\bf w}_y^T {\bf C}_{yy} \beta{\bf w}_y}} =\frac{\alpha \beta {\bf w}_x^T {\bf C}_{xy} {\bf w}_y} {\sqrt{\alpha^2\beta^2}\sqrt{{\bf w}_x^T {\bf C}_{xx} {\bf w}_x {\bf w}_y^T {\bf C}_{yy} {\bf w}_y}} $$

This means that we can always play (re-scale) ${\bf w}_x$ and ${\bf w}_y$ to modify the denominator without affecting $\rho$. Particularly, we use this fact to simplify the problem by setting to get solutions that satisfy: $$ {\bf w}_x^T {\bf C}_{xx} {\bf w}_x = 1\\ {\bf w}_y^T {\bf C}_{yy} {\bf w}_y = 1 $$

Hence, the problem becomes optimizing the numerator of $\rho$ subject to the previous constraints.