So in the case where data points have the same variance $\sigma^2$, the estimator (in normal equation form) can be written as
$$\theta=(X^TX)^{-1}X^TY$$
I'm not sure how to derive a similar formula when the data points have different variances, and thus the covariance matrix would be
$$\Sigma = diag(\sigma_1^2, \sigma_2^2, ...,\sigma_n^2)$$
If you have any system of linear equations $$ X \theta = Y $$ one can show that $$ X^T X \theta^* = X^T Y $$ has a solution $\theta^*$ $$ \theta^* = (X^T X)^{-1} X^T Y $$ which minimizes $$ e = X\theta - Y \ $$ in the the Euclidean norm ("least squares"): $$ \lVert X\theta^* - Y \rVert_2 \le \lVert X\theta - Y \rVert_2 $$ This is independent from the variance.