I am working through some problems about BLUE estimators (Best Linear Unbiased Estimators). What I have discovered recently is that the error covariance of an unbiased estimator is not the same thing as its measurement noise.
In the online class EE 363 at Stanford (https://see.stanford.edu/Course/EE263) it mentions that in a BLUE Estimator, we have the following: $y=Cx+\eta$ where $y$ are the measurements and $x$ is the vector to be estimated, and $\eta$ is the "measurement noise".
Then it also mentions that, because it is unbiased, then $E(\eta)=0$, and $E(\eta\eta^T)=I$. Finally, it mentions that "the error covariance of an unbiased estimator is $E(\hat{x}-x)(\hat{x}-x)^T$".
I always learned covariance is: $\operatorname{cov}(x,y)=E(E(\hat{x})-x)(E(\hat{y})-y))$ and $\hat{y}$ is obviously our estimate for $x$.
So, my first question is, how is the error covariance of an unbiased estimator is $E(\hat{x}-x)(\hat{x}-x)^T$, when covariance of two random variables $x$ and $y$ is what i have shown above?
My second thing that I do not understand is, how is the measurement noise such that $E(\eta\eta^T)=I$, when covariance is what I have shown above?
I do not understand how the covariance formula relates to the measurmeent noise of $\eta$, and the error covariance of $x$.
Suppose $x$ is a $p\times 1$ column vector. Then so is its least-squares estimator $\hat x,$ and so is $\hat x -x.$ You have $\operatorname{E}\hat x = x.$
And then: \begin{align} & (\hat x-x)(\hat x - x)^T = \begin{bmatrix} \hat x_1 -x_1 \\ \vdots \\ \hat x_p - x_p \end{bmatrix} \begin{bmatrix} \hat x_1 - x_1 & \cdots & \hat x_p - x_p \end{bmatrix} \\[12pt] = {} & \begin{bmatrix} (\hat x_1 - x_1)(\hat x_1-x_1) & \cdots & (\hat x_1-x_1)(\hat x_k-x_k) & \cdots & (\hat x_1-x_1)(\hat x_p - x_p) \\ \vdots & & \vdots & & \vdots \\ (\hat x_j -x_j)(\hat x_1-x_1) & \cdots & (\hat x_j-x_j)(\hat x_k - x_k) & \cdots & (\hat x_j-x_j)(\hat x_p - x_p) \\ \vdots & & \vdots & & \vdots \\ (\hat x_p-x_p)(\hat x_1-x_1) & \cdots & (\hat x_p - x_p)(\hat x_k-x_k) & \cdots & (\hat x_p-x_p)(\hat x_p-x_p) \end{bmatrix} \end{align} Now observe that
$$ \operatorname{E}((\hat x_j-x_j)(\hat x_k-x_k)) = \operatorname{cov}(\hat x_j, \hat x_k). $$ So the entries in the expected value $\operatorname{E}((\hat x-x)(\hat x-x)^T)$ are the covariances between the components of that vector. Some people therefore call this matrix the covariance matrix or just the covariance, and that appears to be what is happening here. It's really the higher-dimensional analog of the variance of a random variable, and for that reason some others call it the variance (in particular, William Feller's famous two-volume book on probability theory does that).