I'm doing the Mathematics for Machine Learning course on Coursera (Course 3, Week 4). I am trying to understand the derivation of PCA.
Specifically from:
$J =\frac{1}{N} \sum_{n=1}^{N}\Vert \sum_{j=M+1}^{D}(b_j^TX_n)b_j\Vert^2$ to
$J =\frac{1}{N} \sum_{n=1}^{N} \sum_{j=M+1}^{D}(b_j^TX_n)^2$
Why does the trailing $b_j$ disappear?
Apparently it's due to b being a vector in the orthonormal basis. But can anyone help me understand. TIA.
Additional info:
$b$- vectors forming an Orthonormal basis.
$J$- Average reconstruction error.
$X_n$ - A datapoint
