I am exploring the expected error of two least squares methods under a specific data distribution. Let $X=[Z,\mathbb{1}_{n,1}]$ where $Z\sim \texttt{N}(0,1)^{n\times d}$ and $y\sim \texttt{N} (\vec{\beta}\cdot \vec{x}, \sigma^2 \cdot(\sum_j z_{j}^2) + 1)$, a heteroskedastic noise model. The covariance matrix $\Sigma$ is then defined as $\Sigma=\texttt{Diag}\left[((\sigma Z)\mathbb{1})((\sigma Z)\mathbb{1})]\right)$.
I want to compute the expected error for two methods:
- Unweighted Least Squares: $(X^TX)^{-1}X^Ty$
- Weighted Least Squares with vector norms: $(X^TN^{-1}X)^{-1}N^{-1}X^Ty$, where $N=\texttt{Diag}[X^TX]$
I believe they are the least variance estimators for $\sigma=0,1$ respectively.
Core Question: The expected error of each method is: $\mathbb{E}\left[\texttt{Trace}[(X^TX)^{-1}X^T\Sigma X(X^TX)^{-1}]\right]$ and $\mathbb{E}\left[\texttt{Trace}[(X^TN^{-1}X)^{-1}N^{-1}X^T\Sigma XN^{-1}(X^TN^{-1}X)^{-1}]\right]$ How can I express their expected error as a function of $\sigma, n, d$?
I suspect this involves the Wishart distribution. Can anyone guide me through the process of going from these expectations to a simplified formula or approximation?
For context, plugging in the definition of $X,\Sigma$ I get:
$\mathbb{E}\left[\texttt{Trace}[([Z,\mathbb{1}]^T[Z,\mathbb{1}])^{-1}[Z,\mathbb{1}]^T\texttt{Diag}\left([\sigma Z,\mathbb{1}]^T[\sigma Z,\mathbb{1}]\right) [Z,\mathbb{1}]([Z,\mathbb{1}]^T[Z,\mathbb{1}])^{-1}]\right]$
and
$\mathbb{E}\left[\texttt{Trace}[([Z,\mathbb{1}]^TN^{-1}[Z,\mathbb{1}])^{-1}N^{-1}[Z,\mathbb{1}]^T\texttt{Diag}\left([\sigma Z,\mathbb{1}]^T[\sigma Z,\mathbb{1}]\right) [Z,\mathbb{1}]N^{-1}([Z,\mathbb{1}]^TN^{-1}[Z,\mathbb{1}])^{-1}]\right]$, with $N=\texttt{Diag}\left([Z,\mathbb{1}]^T[Z,\mathbb{1}]\right)$.