maximize $\|X^TA\|$ for $X$

117 Views Asked by At

Given $A\in\mathbb{R}^{m\times n}$. Find a matrix $X\in\mathbb{R}^{m\times r}$ with orthonormal columns, s.t. $\|X^TA\|$ is maximized, where $\|\cdot\|$ denotes the Frobenius norm.

The answer is given by the $r$ leading left singular vectors of $A$, could anyone help to explain why is that?


My thoughts: with the (reduced) SVD $A=U\Sigma V^T$, we have $$\|X^TA\|=\|X^TU\Sigma\|.$$ If $X=U$ then the norm is $\|U^TU\Sigma\|=\|\Sigma\|$, but how can I show this is the maximum?

2

There are 2 best solutions below

2
On BEST ANSWER

I will assume that $r \leq n$. Otherwise, there's some extra "bookkeeping" to be done regarding the singular values of $A$.

It's easy to show that taking $X$ to be the leading left singular vectors of $A$ (i.e. the leading columns of $U$) yields $\|X^TA\| = \sigma_1^2 + \cdots + \sigma_r^2$, where $\sigma_1 \geq \cdots \geq \sigma_n$ denote the singular values of $A$. To show that this is the maximal possible value of $\|X^TA\|$, it suffices to show that $\sigma_i(A) \geq \sigma_i(X^TA)$ for all $i = 1,\dots,r$.

One strategy is to use the Courant-Fischer (min-max) theorem, noting that $\|X^Tv\| \leq \|v\|$ holds for all vectors $v$. Another approach is to note that for any such $X$, there exists a $Y$ such that $U = [X\ \ Y]$ is square with orthonormal columns and hence unitary. Note that $A$ has the same singular values as the matrix $$ U^TA = \pmatrix{X^TA\\ Y^TA}. $$ Now, the singular values of $U^TA$ are the positive eigenvalues of the block-matrix $$ \pmatrix{0 & (U^TA)^T\\ U^TA & 0} = \pmatrix{0 & A^TX & A^TY\\X^TA & 0 & 0\\Y^TA & 0 & 0}. $$ By Cauchy's interlacing theorem, these eigenvalues interlace the positive eigenvalues of the principal submatrix $$ \pmatrix{0 & (X^TA)^T\\ X^TA & 0}, $$ whose positive eigenvalues are the singular values of $X^TA$.

0
On

I can propose an alternative solution. The objective is to find a rank $r$ matrix $X\in\mathbb{R}^{m\times r}$ that mimimizes $||X^TA||_F$ and that satisfies $X^TX=I$.

It is known that $$||X^TA||_F^2=\mathrm{trace}(A^TXX^TA)=\mathrm{trace}(AA^TXX^T).$$

If we let $X=(x_1,\ldots,x_r)$, we obtain that

$$\mathrm{trace}(AA^TXX^T)=\sum_{i=1}^r\mathrm{trace}(A^TAx_ix_i^T)=\sum_{i=1}^rx_i^TA^TAx_i$$ where $x_i^Tx_i=1$ and $x_i^Tx_j=0$, $i\ne j$.

Now, using the fact that the matrix $A^TA$ is positive semidefinite, we can order its (real and nonnegative) eigenvalues as $\lambda_1\ge\lambda_2\ge\ldots\lambda_n$. We also note that eigenvectors of symmetric matrices are orthogonal.

Using this information, we can can see that to maximize the above sum, it is enough to pick the $x_i$'s as the eigenvectors associated with the $r$ largest eigenvalues of $A^TA$.

This can be done recursively. First, we choose $x_1$ to the eigenvector associated with $\lambda_1$, as this is the best we can do. Now, we need to choose $x_2$. Since, it has to be orthogonal to $x_1$, the best choice is to pick $x_2$ to be eigenvector associated with $\lambda_2$. And so on..

This leads to the result that the maximum of $||X^TA||_F^2$ over all rank-$r$ matrices $X$ such that $X^TX=I$ is given by $\sum_{i=1}^r\lambda_i$ or, equivalently, to the sum of the $r$ largest singular values of $A$.

This leads to the final result that the maximum of $||X^TA||_F$ over all rank-$r$ matrices $X$ such that $X^TX=I$ is equal to the square root of the sum of the $r$ largest singular values of $A$.