Hi , i want to prove this result: In order to maximize the variance, we can maximize the trace of the matrix.

35 Views Asked by At

$$ \arg \min _{Q \in \mathbf{R}^{d \times k}} \frac{1}{n} \sum_{i=1}^{n}\left\|\vec{x}_{i}-Q Q^{\top} \vec{x}_{i}\right\|^{2}=\arg \max _{Q \in \mathbf{R}^{d \times k}} \operatorname{tr}\left(Q^{\top}\left(\frac{1}{n} X X^{\top}\right) Q\right) $$

1

There are 1 best solutions below

1
On BEST ANSWER

I assume the following:

  • $k \leq d$,
  • $Q$ is constrained to have orthonormal columns,
  • $X$ is the matrix whose columns are $x_i$.

Because $Q$ has orthonormal columns, we have $(QQ^T)^2 = QQ^T$. With that established, note that $$ \begin{align} \sum_{i=1}^n \|x_i - QQ^T x_i\|^2 &= \sum_{i=1}^n (x_i - QQ^Tx_i)^T(x_i - QQ^Tx_i) \\ & = \sum_{i=1}^n (x_i^Tx_i - 2x_i^TQQ^Tx_i + x_i^T[QQ^T]^2x_i) \\ & = \sum_{i=1}^n (x_i^Tx_i - x_i^TQQ^Tx_i) \\ & = \sum_{i=1}^n (x_i^Tx_i - (Q^Tx_i)^TQ^Tx_i) \\ & = \left(\sum_{i=1}^n x_i^Tx_i\right) - \operatorname{tr}[(Q^TX)^T(Q^TX)] \\ & = \left(\sum_{i=1}^n x_i^Tx_i\right) - \operatorname{tr}[(Q^TX)(Q^TX)^T] \\ & = \left(\sum_{i=1}^n x_i^Tx_i\right) - \operatorname{tr}[Q^TXX^TQ]. \end{align} $$ The expression $\left(\sum_{i=1}^n x_i^Tx_i\right)$ does not depend on $Q$, so minimizing the above quantity equivalent to maximizing $\operatorname{tr}[Q^TXX^TQ]$.