I am reading this paper, and in equation (7) on page 9, it's written that
$$E(E(Y-\gamma^TX|A)^2) = E[P_A(Y-\gamma^TX)^2]$$
where $Y, X, A$ are random, $\gamma$ is fixed, $P_A$ is the linear $L_2$ projection onto the column space spanned by $A$.
I think $P_A = A(A^TA)^{-1}A^T$. However, I'm still not quite sure why
$$E(E(Y-\gamma^TX|A)^2) = E[A(A^TA)^{-1}A^T(Y-\gamma^TX)^2].$$
Our probability space is $(\Omega,\mathcal{A},\mu)$ and we consider the projection $P_A:\mathcal{L}^2(\mathcal{A}) \to \mathcal{L}^2(\sigma(A))$ where $\sigma(A)\subset \mathcal{A}$ and $\mathcal{L}^2(\sigma(A))$ is a closed linear subspace of $\mathcal{L}^2(\mathcal{A}) $. We have by definition $$E[Y-\gamma^TX|A]=P_A(Y-\gamma^TX)$$ Recall that $u:=P_A(Y-\gamma^TX)$ is itself a random variable. So if one writes $P_A(Y-\gamma^TX)^2$ it means $u^2$, not $P_A((Y-\gamma^T X)^2)$. Indeed, with data $$\sum_jP_\mathbf{A}(\mathbf{Y}-\mathbf{X}\gamma)^2_j=(\mathbf{A}(\mathbf{A}^T\mathbf{A})^{-1}\mathbf{A}^T(\mathbf{Y}-\mathbf{X}\gamma))^T(\mathbf{A}(\mathbf{A}^T\mathbf{A})^{-1}\mathbf{A}^T(\mathbf{Y}-\mathbf{X}\gamma))= \\= (\mathbf{A}^T(\mathbf{Y}-\mathbf{X}\gamma))^T(\mathbf{A}^T\mathbf{A})^{-1}\mathbf{A}^T\mathbf{A}(\mathbf{A}^T\mathbf{A})^{-1}(\mathbf{A}^T(\mathbf{Y}-\mathbf{X}\gamma))= \\ = (\mathbf{A}^T(\mathbf{Y}-\mathbf{X}\gamma))^T(\mathbf{A}^T\mathbf{A})^{-1}(\mathbf{A}^T(\mathbf{Y}-\mathbf{X}\gamma))= \\=(\mathbf{Y}-\mathbf{X}\gamma)^T\mathbf{A}(\mathbf{A}^T\mathbf{A})^{-1}\mathbf{A}^T(\mathbf{Y}-\mathbf{X}\gamma)$$ as shown in $(9)$.