I'm struggling with the following problem from Gilbert Strang's book on linear algebra:
First assumption behind least squares $Ax=b-e$ where $e$ is noise with mean zero. Multiply the error vectors $e=b-Ax$ by $(A^TA)^{-1}A^T$ to get $\hat x-x$. The estimation errors $\hat x-x$ also average to zero. The estimate $\hat x$ is unbiased.
OK, so I multiply as requested and do get $\hat x-x$. The problem is, how do I know that if $e=b-Ax$ averages to zero, then $(A^TA)^{-1}A^T(b-Ax)$, which is the expression after multiplication, also averages to zero? The textbook answer only states this fact without any explanation.
$$(A^TA)^{-1}A^Te=(A^TA)^{-1}A^T(b-Ax)$$
Taking expectation both sides. notice that each row of the left hand side is just linear combination of $\epsilon_i$, which has expectation $0$.