How to prove PCA using induction?

3.3k Views Asked by At

In Deep Learning (Goodfellow, et al), the optimization objective of PCA is formulated as

$$D^* = \arg\min_D ||X - XDD^T||_F^2 \quad \text{s.t.} \quad D^T D=I$$

The book gives the proof of the $1$-dimension case, i.e.

$$\arg\min_{d} || X - X dd^T||_F^2 \quad \text{s.t.} \quad d^T d = 1$$

equals the eigenvector of $X^TX$ with the largest eigenvalue. And the author says the general case (when $D$ is an $m \times l$ matrix, where $l>1$) can be easily proved by induction.

Could anyone please show me how I can prove that using induction?

I know that when $D^T D = I$:

$$D^* = \arg\min_D ||X - XDD^T||_F^2 = \arg\min_D tr D^T X^T X D$$

and

$$tr D^T X^T X D = \left(\sum_{i=1}^{l-1} \left(d^{(i)}\right)^T X^TX d^{(i)}\right) + \left(d^{(l)}\right)^T X^TX d^{(l)}$$

where the left-hand side of the addition reaches maximum when $d^{(i)}$ is the $ith$ largest eigenvector of $X^T X$ according to induction hypothesis. But how can I be sure that the result of the addition in a whole is also maximal?

1

There are 1 best solutions below

0
On

I am at the same point actually, and wondering if you found an answer.

My idea was to use the Theorem of Fan which says:

Let $X \in S^n$ be a symmetric matrix with eigenvalues $\lambda_1 \geqslant \ldots \geqslant \lambda_k$, then

$\lambda_1+ \cdots + \lambda_k = max \{Tr(XDD^T): D \in \mathbb{R}^{n \times k} D^TD=I_k\}$