Let $y = X\theta_* + z$, where $X \in \mathbb{R}^{n \times d}$, $\theta_* \in \mathbb{R}^{d}$, $y \in \mathbb{R}^n$, and $z = \mathcal{N}(0,I) \in \mathbb{R}^n$, and suppose $rank(X) = d$. Prove that if $\widehat{\theta} = \arg\min_{\theta} \|X\theta - y\|_2^2$, then \begin{align*} \mathbb{E}[\|\theta_* - \widehat{\theta}\|_2^2] = tr((X^{\top}X)^{-1}) \end{align*}
I don't know where to start for this problem. I compute the value for $tr((X^{\top}X)^{-1})$ and find $\|\theta_* - \widehat{\theta}\|_2^2$ but at a loss of how to go forward.
So you're doing linear regression on white noise, \begin{equation} y = X\theta_{*} + z \end{equation} The best estimator is OLS in this case, i.e. \begin{equation} \hat{\theta} = (X^T X)^{-1} X^T y \end{equation} So, your error becomes \begin{equation} \Vert \theta_* - \hat{\theta} \Vert^2 = \Vert \theta_* - (X^T X)^{-1} X^T y \Vert^2 \end{equation} But $y = X\theta_{*} + z$, so \begin{equation} \Vert \theta_* - \hat{\theta} \Vert^2 = \Vert \theta_* - (X^T X)^{-1} X^T ( X\theta_{*} + z) \Vert^2 \end{equation} Hence \begin{equation} \Vert \theta_* - \hat{\theta} \Vert^2 = \Vert \theta_* - \theta_{*} - (X^T X)^{-1} X^T z \Vert^2 = \Vert (X^T X)^{-1} X^T z \Vert^2 \end{equation} But \begin{equation} \Vert \alpha \Vert^2 = \alpha^T \alpha \end{equation} So \begin{equation} \Vert \theta_* - \hat{\theta} \Vert^2 = \Vert \theta_* - \theta_{*} - (X^T X)^{-1} X^T z \Vert^2 = \big( (X^T X)^{-1} X^T z \big)^T \big( (X^T X)^{-1} X^T z \big) \end{equation} that is \begin{equation} \Vert \theta_* - \hat{\theta} \Vert^2 = z^T X (X^T X)^{-1}(X^T X)^{-1} X^T z \end{equation} But since $ \Vert \theta_* - \hat{\theta} \Vert^2$ is a scalar, then \begin{equation} \Vert \theta_* - \hat{\theta} \Vert^2 = \text{trace } \Vert \theta_* - \hat{\theta} \Vert^2 \end{equation} So \begin{equation} \Vert \theta_* - \hat{\theta} \Vert^2 = \text{trace } z^T X (X^T X)^{-1}(X^T X)^{-1} X^T z \end{equation} Now use trace $AB = $ trace $BA$ \begin{equation} \Vert \theta_* - \hat{\theta} \Vert^2 = \text{trace } X (X^T X)^{-1}(X^T X)^{-1} X^T zz^T \end{equation} One more time now \begin{equation} \Vert \theta_* - \hat{\theta} \Vert^2 = \text{trace } (X^T X)^{-1}(X^T X)^{-1} X^T zz^TX \end{equation} Now the expectation comes in \begin{equation} E \Vert \theta_* - \hat{\theta} \Vert^2 = E \text{trace } (X^T X)^{-1}(X^T X)^{-1} X^T zz^T X \end{equation} everything is constant except for $zz^T$, i.e \begin{equation} E \Vert \theta_* - \hat{\theta} \Vert^2 = \text{trace } (X^T X)^{-1}(X^T X)^{-1} X^T Ezz^T X \end{equation} But $Ezz^T = I$ so \begin{equation} E \Vert \theta_* - \hat{\theta} \Vert^2 = \text{trace } (X^T X)^{-1}(X^T X)^{-1} X^T X = \text{trace } (X^T X)^{-1} \end{equation}