I am stuck on the following step which is part of the derivation of Principal Component Analysis in the book "Foundations of Machine Learning" by Mohri, Rostamizadeh, page 283
Here is the context and the matrices involved:
- $P = P^T$ be an orthogonal projection matrix
- By definition of the orthogonal projections $P = UU^T$ for some $U \in R^{N \times k}$ containing orthogonal columns.
- Using the invariance of the trace operator under the cyclic permutations and orthogonality of the columns of $U$ we have
$Tr[X^TPX] = U^TXX^TU = \sum_{i=1}^ku_i^TXX^Tu_i$
$ \ \ \ \ \ \ (1) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3) $
I struggle to see how to go through these steps. Here are my thoughts:
$Tr[X^TPX] = Tr[X^TUU^TX] = Tr[U^TXX^TU]$ = ??
$XX^T$ is symmetric positive (i.e. similar to the covariance matrix depending on the normalization of the data) and U is orthogonal, but how do we remove the trace from this equation ?
Thanks
For the first step you are right, as $P = U U^T $, thus $$ X^TPX = X^T U U ^T X, $$ now denote $X^T U = A $, so $ U ^ T X = A ^ T $, as such you have $$ tr(AA^T) = tr (A^T A), $$ namely, $$ tr(X^T P X)= tr(U ^ T X X^T U). $$ Now, recall that you are interested only in the diagonal terms of this $k\times k$ matrix, hence, you should take only the $ii$th terms, for $i=1,...,k$. In other words, $$ tr( U ^ T X X^ T U ) = \sum_{i=1}^k u_i ^ T X X^T u_i, $$ where $u_i ^ k X$ is an $1 \times k$ vector, hence $$ (U ^T X X^T U)_{ii} = u_i ^ T X X^T u_i, $$ for any $i \neq j$, the multiplication $ u_i ^ T X X^T u_j$ will be the $ij$th term of $ U ^T X X^T U $, which is irrelevant for the trace operator.