First, I will show what I currently do in a (non-redistricted) weighted multivariate regression model. Then I will pose my question as I need the notation I introduce. Suppose we have weighted multivariate least squares problem of the form
$$ \vec{y}_i = F\vec{x}_i + \vec{\epsilon}_i, \qquad \epsilon_i\sim N(\vec{0}, \Omega) $$
where $\vec{y}_i,\epsilon_i \in \mathbb{R}^p$, $\vec{x}_i \in \mathbb{R}^q$, $\vec{\epsilon}_i$ are iid, and each observation $i=1,\dots,n$ has a weight $w_i$. Then the MLE of $F$ is
$$ \hat{F} = \underset{F}{\text{argmin}} \lVert W^{1/2} (Y - XF^\top) \rVert^2 $$
where
$$ Y = (\vec{y}_1,\dots,\vec{y}_n)^\top, \quad X=(\vec{x}_1,\dots,\vec{x}_n)^\top $$
and $W$ is a diagonal matrix with the weights $w_1\dots,w_n$ as the diagonal entries. Let $\tilde Y = W^{1/2}Y$, $\tilde X = W^{1/2} X$, and the QR decomposition of
$$ \tilde X = QR = Q\begin{pmatrix} R \\ 0 \end{pmatrix} $$
Lastly, let
$$ Q^\top \tilde Y = \begin{pmatrix} C_1 \\ C_2 \end{pmatrix} $$
where $C_1$ are the first $q$ rows of $Q^\top \tilde Y$. I have solved the above problem by computing $R$ and $C_1$ in parallel with a low memory footprint as in the bam function in the R package mgcv (see Wood, 2015) since the matrices I worked with are rather large. Then I use that
$$ \hat{F}^\top = (R^\top R)^{-1}R^\top C_1 = R^{-1}C_1 $$
and the MLE of $\Omega$ is
$$ \hat \Omega = (\sum w_i)^{-1}(\tilde Y^\top\tilde Y - C_1^\top C_1) $$
Now suppose that I want restrict some of the parameter in $F$ such that
$$ \text{vec}(F) = K \vec{\theta} \Leftrightarrow \text{vec}(F^\top) = \underbrace{H_{p,q}K}_{D}\vec{\theta} $$
where $K \in \mathbb{R}^{pq \times k}$ is a known matrix with full column rank, $k < pq$, $H_{p,q}$ is the $(p,q)$ commutation matrix, and $\text{vec}(\cdot)$ is the vectorization function. Suppose that we have an estimate of $\Omega$ denoted by $\hat \Omega$ then it is clear that we can use the GLS estimator. That is, we denote
$$ \mathrm{y} = \underbrace{\text{vec}(Y)}_{pn\times 1}, \quad \mathrm{X} = \underbrace{I_p \otimes X}_{pn\times pq}, \quad \mathrm{O} = \underbrace{\Omega \otimes I_n}_{np \times np} $$
where $\otimes$ is the Kronecker product and $I_l$ is the $l$ dimensional identity matrix. It follows that
$$ \mathrm{y} = \mathrm{X}D\vec{\theta} + \mathrm{e}, \qquad \mathrm{e}\sim N(\vec{0}, \mathrm{O}) $$
Let
$$ \tilde{\mathrm{y}} = \text{vec}(\tilde Y), \quad \tilde{\mathrm{X}} = I_p \otimes \tilde X, \quad \widehat{\mathrm{O}} = \hat\Omega \otimes I_n $$
Then we can find that the MLE is
$$ \widehat{\vec{\theta}} = D^+ (\tilde{\mathrm{X}}^\top\widehat{\mathrm{O}}^{-1}\tilde{\mathrm{X}})^{-1} D^{\top+} D^\top \tilde{\mathrm{X}}^\top\widehat{\mathrm{O}}^{-1}\tilde{\mathrm{y}} $$
We can use that
$$ \begin{align*} \tilde{\mathrm{X}}^\top\widehat{\mathrm{O}}^{-1}\tilde{\mathrm{X}} &= (I_p \otimes \tilde X)^\top(\hat\Omega^{-1} \otimes I_n)(I_p \otimes \tilde X) \\ &= (I_p \otimes \tilde X^\top)(\hat\Omega^{-1}\otimes\tilde X) \\ &= \hat\Omega^{-1}\otimes (\tilde X^\top\tilde X) \\ \tilde{\mathrm{X}}^\top\widehat{\mathrm{O}}^{-1}\tilde{\mathrm{y}} &= (I_p \otimes \tilde X)^\top(\hat\Omega^{-1} \otimes I_n)\text{vec}(\tilde Y) \\ &= (I_p \otimes \tilde X^\top)\text{vec}(\tilde Y\hat\Omega^{-1}) \\ &= \text{vec}(\tilde X^\top\tilde Y\hat\Omega^{-1}) \\ \end{align*} $$
Thus,
$$ \widehat{\vec{\theta}} = D^+ (\hat \Omega \otimes (\tilde X^\top \tilde X)^{-1}) D^{\top+}D^\top\text{vec}(\tilde X^\top\tilde Y\hat\Omega^{-1}) $$
and maybe
$$ \widehat{\vec{\theta}} = D^+ (\hat \Omega \otimes (R^\top R)^{-1}) D^{\top+}D^\top\text{vec}(R^\top C_1\hat\Omega^{-1}) $$
which if one of my questions: is this true? If it is true then I can re-use the code I have written in the non-restricted model unless. That is, do it in parallel with a low memory footprint (not having to store the entire matrices at any one point). Are there other smarter (faster, stable, parallelizable, and low memory footprints) ways to do this?
I have seen the GLS estimator posted in multiple places but never come across an actual way to implement the estimation in the context of multivariate regression models.
To answer
Then it is true as this example in R shows