I am reading the Optimization by vector space method, David Luenberger, and I came across this problem that I cannot figure out:
let $\hat{\beta} = Ky$ be the minimum-variance linear estimate of a random vector $\beta$ based on the random vector $y$, then show that $E[(\beta-\hat{\beta})(\beta-\hat{\beta})^T] = E[\beta \beta^T] - E[\hat{\beta}\hat{\beta}^T]$.
I tried expanding the terms using $y = M\beta + \epsilon$, but some terms just cannot cancel out.
Source: this is problem 7 of chapter 4.
Edit:
After some digging, I think it is a typo in the book, the correct answer should be $E[(\beta-\hat{\beta})(\beta-\hat{\beta})^T] = E[\beta \beta^T] - E[\hat{\beta}\beta^T]$, this can be obtained by realizing the orthogonality condition $E[(\hat{\beta}-\beta)y^T] = 0$