Why does the solution for CCA always lie in the span of the data?

Question

Why does the solution for CCA always lie in the span of the data?

159 Views Asked by Bumbble Comm At 27 Mar 2026 - 8:19

Assume you have the following optimization problem, which is exactly the optimization problem for canocial correlation analysis:

$$arg \max_{w_x, w_y} w_x^T C_{xy} w_y$$ s.t. $$w_x^T C_{xx} w_x = 1$$ $$w_y^T C_{yy} w_y = 1$$ where $C_{xx} = XX^T$, $C_{yy} = YY^T$, $C_{xy} = XY^T$, and $X \in \mathbb{R}^{d_1 \times N}$, $Y \in \mathbb{R}^{d_2 \times N}$.

I want to show that there is always an optimal solution that lies in the span of the data, i.e. $w_x = X \alpha_x, w_y = Y \alpha_y$, $\alpha_x,\alpha_y \in \mathbb{R}^{N}$.

To do this, I tried to show that any optimal solution $w_x^T$ can be replaced by a vector $w_x' = X \alpha_x$. However, I failed to find out a formula for the coefficients, so I am starting to doubt this is the right approach.

How can I show that the solution always lies in the span of the data?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2020-05-14 08:26:41

Assume $w_x$ and $w_y$ can be rewritten as the sum of two vectors. One lies in the span of the data and the other is orthogonal to the data, i.e.,

$$ w_x = X \alpha_x + n_x \\ w_x = X \alpha_y + n_y \, ,$$

where $n_x, n_y \in \mathbb{R}^n$. Let us use the following notation $s_x := X \alpha_x$ and $s_y := X \alpha_y$. The maximization problem now reads

$$ \text{argmax} \ (s_x + n_x)^T C_{xy} (s_y + n_y) \\ \text{s.t} \ (s_x + n_x)^T C_{xx} (s_x + n_x) = 1 \\ (s_y + n_y)^T C_{yy} (s_y + n_y) = 1 \, .$$

If we expand the objective function, we obtain

$$ (s_x + n_x)^T C_{xy} (s_y + n_y) \\ = s_x^T C_{xy} s_y + s_x^T C_{xy} n_y \\ + n_x^T C_{xy} s_y + n_x^T C_{xy} n_y \, .$$

Since $n_x$ and $n_y$ are orthogonal to $X$ and $Y$ respectively, the above equation becomes

$$ (s_x + n_x)^T C_{xy} (s_y + n_y) = s_x^T C_{xy} s_y \, .$$

Thus, the solution $w_x$,$w_y$ of your shown optimization problem lies in the span of the data.

Why does the solution for CCA always lie in the span of the data?

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions