I'm trying to reconcile the notion of a "least square estimator" with the definition of an estimator I have in my head.
An estimator for me assumes a set $\Omega$ and a family of probability measures $$\{\mu_\theta: \theta \in \Theta\}$$ on $\Omega$, and a function on the index set $g:\Theta \rightarrow \mathbb{R}$. An unbiased estimator for $g$ is then a function $\tilde{g}: \Omega \rightarrow \mathbb{R}$ such that $$\int_\Omega \tilde{g} \, d\mu_\theta = g(\theta)$$ for all $\theta\in\Theta$.
A "least squares estimator" on the other hand assumes we have some uncorrelated random variables $X_1,\ldots,X_n: \mathscr{X} \rightarrow \mathbb{R}$ on a space $\mathscr{X}$ (I think with a fixed probability measure?) with common variance and expectations $\theta_1,\ldots,\theta_n$ assumed to lie in some linear subspace $V\subset \mathbb{R}^n$ (as something varies? what specifically i don't know. Maybe varies with a family of probability measures on $\mathscr{X}$?). Then given an "observation" $x=(x_1,\ldots,x_n)\in \mathbb{R}^n$ we "estimate" $(\theta_1,\ldots,\theta_n)$ by just projecting $x$ to the subspace $V$ obtaining say $x'$. Then given any linear $L: \mathbb{R}^n\rightarrow \mathbb{R}$, $L(x')$ is supposed to be an estimator for $L(X_1,...,X_n)$. (I hope that's a fair treatment. I've tried to simplify it by not including the matrix $A$ which parametrizes subspace $V$)
Now my question is how should I define $\Omega$ and $\Theta$ and $\mu_\theta$ and $g$ and $\tilde{g}$ to show that the latter is an instance of the former? I'd be inclined to just make $\Omega = \mathbb{R}^n$ and $\mu_\theta$ some kind of product measure if the $X_1,\ldots,X_n$ were independent, but they are only assumed uncorrelated.
One simple sort of least-squares estimator often considered is as follows. You have $$ \left[\begin{array}{c} Y_1 \\ \vdots \\ Y_n \end{array}\right] = Y \sim \operatorname N_n(X\theta, \sigma^2 I_n) = \operatorname N\left( \left[\begin{array}{c} 1 & x_1 \\ \vdots & \vdots \\ 1 & x_n \end{array} \right]\left[ \begin{array}{c} \theta_0 \\ \theta_1 \end{array} \right], \sigma^2 \left[ \begin{array}{cccc} 1 \\ & 1 \\ & & \ddots \\ & & & 1 \end{array} \right] \right) \tag 1 $$ where $x_1,\ldots,x_n$ are known non-random numbers and $Y_1,\ldots,Y_n$ are observed, so that the observed data are $(x_i,Y_i),\,\, i=1,\ldots,n.$ The thing to be estimated is $\theta.$ The variance $\sigma^2$ will also be estimated, but not by least squares. The least-squares estimate of $\theta$ is $$ \widehat\theta = (X^T X)^{-1} X^T Y. $$ In this case, $\Theta=\mathbb R^2$ and $\Omega$ can be taken to be $\mathbb R^n$ with the measure given on line $(1)$ above.
If $X_1,\ldots,X_n$ are only assumed uncorrelated and not independent, then you're still dealing with a measure on a product space, albeit not a product measure.