I am auditing an online course titled "Mathematics for Machine Learning: PCA". The associated book is available here, (page 88 eqn 3.65). The discussion is limited to $\mathbb{R}^n$. The author claims both in the course and in the book that the projection of vector $x$ onto the subspace $U$ with basis matrix $\mathbb{B}$: $$\pi_u(x) = \mathbb{B}(\mathbb{B}^T\mathbb{B})^{-1}\mathbb{B}^Tx$$
can be simplified to $$\pi_u(x) = \mathbb{B}\mathbb{B}^Tx$$
when $\mathbb{B}$ is an Orthonormal basis ($\mathbb{B}^T\mathbb{B} = \mathbb{I}$).
But, if that is the case, wouldn't $\mathbb{B}$ also annihilate $\mathbb{B}^T$ which would result in: $$\pi_u(x) = x$$ What am I missing here?
The main reason why is because $B$ is not a square matrix. It appears that if $U$ is a $k$-dimensional subspace with the an orthonormal basis of $U$ arranged in the columns of $B$, then $B$ will be an $n\times k$ matrix, hence when computing $B^TB$ this will be an $k\times k$ identity matrix. To illustrate this, imagine that $B = [u_1 \;\; u_2 \;\; \ldots \;\; u_k]$ where each $u_i$ is a column vector, then the computation
$$ B^TB \;\; =\;\; \left [ \begin{array}{c} u_1^T \\\vdots \\ u_k^T \\ \end{array} \right ]\left [ \begin{array}{ccc} u_1 & \ldots & u_k \\ \end{array} \right ] \;\; =\;\; \left [ \begin{array}{cccc} \langle u_1, u_1\rangle & \langle u_1, u_2 \rangle & \ldots & \langle u_1, u_k\rangle \\ \langle u_2, u_1\rangle & \langle u_2, u_2 \rangle & \ldots & \langle u_2, u_k\rangle \\ \vdots & \vdots & \ddots & \vdots \\ \langle u_k, u_1\rangle & \langle u_k, u_2 \rangle & \ldots & \langle u_k, u_k\rangle \\ \end{array} \right ] $$
which clearly reduces to the $k\times k$ identity matrix. Notice that if we arrange the other product $BB^T$ we don't obtain the same form as we do above.