How to derive Lagrangian associated with Projected Variance Maximisation approach to PCA with orthogonal projection

127 Views Asked by At

In Projected Variance Maximisation approach to PCA, given orthogonal projection $P_U$ of unlabelled data into the space spanned by the columns of U, s.t projection of the data point $x^{(i)}$ is given by: $P_Ux^{(i)}=U(U^TU)^{-1}U^Tx$.

How could we prove that PCA leads to an optimisation problem which can be solved by finding the stationary points associated with the following Lagrangian:

$tr(U^TSU) + tr(H(I-U^TU))$.

And in so doing how can we show that: $S = \frac{1}{n}X^TX$ and $[H] = \begin{cases} \lambda^{[i,i]},& i=j, \\ \frac{\lambda^{[i,j]}}{2} & i \not = j, \end{cases}$


I understand that In ordinary PCA setting, where we only require orthonormality: $u^{[i]} \cdot u^{[j]} = \delta_{ij}$, we compute sample variance of the projected data using:

$\displaystyle \frac{1}{n} \sum^n_{i=1} \left( u^{[j]} \cdot x^{(i)} - u^{[j]} \cdot \hat{x} \right) = u^{[j]T}Su^{[j]}$

Where $\hat x$ is sample mean and $S$ is sample covariance matrix $\frac{1}{n}X^TX$.

With this we form Lagrangian $\displaystyle argmax_{ \{ u^{[j]} \}^d_{j=1}} L$ where $ \displaystyle L = \sum^d_{j=1} u^{[j]T}Su^{[j]}$ subject to orthormal $ \{ u^{[j]} \}^d_{j=1}$.

Therefore obtaining first part of the equation $tr(U^TSU)$ sort of makes sense, showing that $S = \frac{1}{n}X^TX$ is not a problem either. I am confused about the second part of the equation.

Could anyone clarify this please?