Efficient and stable estimation of restricted weighted multivariate regression model

Question

Efficient and stable estimation of restricted weighted multivariate regression model

84 Views Asked by Bumbble Comm At 11 May 2026 - 6:15

First, I will show what I currently do in a (non-redistricted) weighted multivariate regression model. Then I will pose my question as I need the notation I introduce. Suppose we have weighted multivariate least squares problem of the form

$$ \vec{y}_i = F\vec{x}_i + \vec{\epsilon}_i, \qquad \epsilon_i\sim N(\vec{0}, \Omega) $$

where $\vec{y}_i,\epsilon_i \in \mathbb{R}^p$, $\vec{x}_i \in \mathbb{R}^q$, $\vec{\epsilon}_i$ are iid, and each observation $i=1,\dots,n$ has a weight $w_i$. Then the MLE of $F$ is

$$ \hat{F} = \underset{F}{\text{argmin}} \lVert W^{1/2} (Y - XF^\top) \rVert^2 $$

where

$$ Y = (\vec{y}_1,\dots,\vec{y}_n)^\top, \quad X=(\vec{x}_1,\dots,\vec{x}_n)^\top $$

and $W$ is a diagonal matrix with the weights $w_1\dots,w_n$ as the diagonal entries. Let $\tilde Y = W^{1/2}Y$, $\tilde X = W^{1/2} X$, and the QR decomposition of

$$ \tilde X = QR = Q\begin{pmatrix} R \\ 0 \end{pmatrix} $$

Lastly, let

$$ Q^\top \tilde Y = \begin{pmatrix} C_1 \\ C_2 \end{pmatrix} $$

where $C_1$ are the first $q$ rows of $Q^\top \tilde Y$. I have solved the above problem by computing $R$ and $C_1$ in parallel with a low memory footprint as in the bam function in the R package mgcv (see Wood, 2015) since the matrices I worked with are rather large. Then I use that

$$ \hat{F}^\top = (R^\top R)^{-1}R^\top C_1 = R^{-1}C_1 $$

and the MLE of $\Omega$ is

$$ \hat \Omega = (\sum w_i)^{-1}(\tilde Y^\top\tilde Y - C_1^\top C_1) $$

Now suppose that I want restrict some of the parameter in $F$ such that

$$ \text{vec}(F) = K \vec{\theta} \Leftrightarrow \text{vec}(F^\top) = \underbrace{H_{p,q}K}_{D}\vec{\theta} $$

where $K \in \mathbb{R}^{pq \times k}$ is a known matrix with full column rank, $k < pq$, $H_{p,q}$ is the $(p,q)$ commutation matrix, and $\text{vec}(\cdot)$ is the vectorization function. Suppose that we have an estimate of $\Omega$ denoted by $\hat \Omega$ then it is clear that we can use the GLS estimator. That is, we denote

$$ \mathrm{y} = \underbrace{\text{vec}(Y)}_{pn\times 1}, \quad \mathrm{X} = \underbrace{I_p \otimes X}_{pn\times pq}, \quad \mathrm{O} = \underbrace{\Omega \otimes I_n}_{np \times np} $$

where $\otimes$ is the Kronecker product and $I_l$ is the $l$ dimensional identity matrix. It follows that

$$ \mathrm{y} = \mathrm{X}D\vec{\theta} + \mathrm{e}, \qquad \mathrm{e}\sim N(\vec{0}, \mathrm{O}) $$

Let

$$ \tilde{\mathrm{y}} = \text{vec}(\tilde Y), \quad \tilde{\mathrm{X}} = I_p \otimes \tilde X, \quad \widehat{\mathrm{O}} = \hat\Omega \otimes I_n $$

Then we can find that the MLE is

$$ \widehat{\vec{\theta}} = D^+ (\tilde{\mathrm{X}}^\top\widehat{\mathrm{O}}^{-1}\tilde{\mathrm{X}})^{-1} D^{\top+} D^\top \tilde{\mathrm{X}}^\top\widehat{\mathrm{O}}^{-1}\tilde{\mathrm{y}} $$

We can use that

$$ \begin{align*} \tilde{\mathrm{X}}^\top\widehat{\mathrm{O}}^{-1}\tilde{\mathrm{X}} &= (I_p \otimes \tilde X)^\top(\hat\Omega^{-1} \otimes I_n)(I_p \otimes \tilde X) \\ &= (I_p \otimes \tilde X^\top)(\hat\Omega^{-1}\otimes\tilde X) \\ &= \hat\Omega^{-1}\otimes (\tilde X^\top\tilde X) \\ \tilde{\mathrm{X}}^\top\widehat{\mathrm{O}}^{-1}\tilde{\mathrm{y}} &= (I_p \otimes \tilde X)^\top(\hat\Omega^{-1} \otimes I_n)\text{vec}(\tilde Y) \\ &= (I_p \otimes \tilde X^\top)\text{vec}(\tilde Y\hat\Omega^{-1}) \\ &= \text{vec}(\tilde X^\top\tilde Y\hat\Omega^{-1}) \\ \end{align*} $$

Thus,

$$ \widehat{\vec{\theta}} = D^+ (\hat \Omega \otimes (\tilde X^\top \tilde X)^{-1}) D^{\top+}D^\top\text{vec}(\tilde X^\top\tilde Y\hat\Omega^{-1}) $$

and maybe

$$ \widehat{\vec{\theta}} = D^+ (\hat \Omega \otimes (R^\top R)^{-1}) D^{\top+}D^\top\text{vec}(R^\top C_1\hat\Omega^{-1}) $$

which if one of my questions: is this true? If it is true then I can re-use the code I have written in the non-restricted model unless. That is, do it in parallel with a low memory footprint (not having to store the entire matrices at any one point). Are there other smarter (faster, stable, parallelizable, and low memory footprints) ways to do this?

I have seen the GLS estimator posted in multiple places but never come across an actual way to implement the estimation in the context of multivariate regression models.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2018-08-08 09:56:33

To answer

and maybe ... which if one of my questions: is this true?

Then it is true as this example in R shows

#####
# simulate data
n <- 1000
p <- 2
q <- 4
k <- 2

library(Matrix)
K <- Matrix(0., p * q, k)
K[1, 1] <- K[3, 1] <- K[5, 1] <- K[7, 1] <-   1
K[2, 2] <- K[6, 2] <- 1
theta <- c(2, -.5)
(F. <- Matrix(as.vector(K %*% theta), p, q))
#R 2 x 4 Matrix of class "dgeMatrix"
#R      [,1] [,2] [,3] [,4]
#R [1,]  2.0    2  2.0    2
#R [2,] -0.5    0 -0.5    0

Omega <- Matrix(c(2, -1, -1, 1), 2)

set.seed(63098548)
X <- Matrix(rnorm(q * n), ncol = q)
Y <- X %*% t(F.) + Matrix(rnorm(n * p), n) %*% chol(Omega)
ws <- sample.int(1:5, n, replace = TRUE)
W_sqrt <- Diagonal(n, sqrt(ws))

#####
# Fit model as in the dense case
qr_o <- qr(W_sqrt %*% X)
R <- qr.R(qr_o)
C_1 <- qr.qty(qr_o, as.matrix(W_sqrt %*% Y))[1:q, ]

t(solve(R, C_1))
#R        [,1]   [,2]   [,3]    [,4]
#R [1,]  1.912 1.9934  2.019  2.0322
#R [2,] -0.456 0.0471 -0.516 -0.0273

# and we get the same w/ the GLS estimator
solve(
  kronecker(solve(Omega), crossprod(R)),
  as.vector(crossprod(R, t(solve(Omega, t(C_1))))))
#R 8 x 1 Matrix of class "dgeMatrix"
#R         [,1]
#R [1,]  1.9125
#R [2,]  1.9934
#R [3,]  2.0189
#R [4,]  2.0322
#R [5,] -0.4561
#R [6,]  0.0471
#R [7,] -0.5155
#R [8,] -0.0273

#####
# fit model in restricted case. Somehow we know Omega
D <- matrixcalc::commutation.matrix(p, q) %*% K
t1 <- crossprod(D, as.vector(crossprod(R, t(solve(Omega, t(C_1))))))
t2 <- crossprod(D, kronecker(solve(Omega), crossprod(R)) %*% D)
solve(t2, t1)
#R 2 x 1 Matrix of class "dgeMatrix"
#R        [,1]
#R [1,]  2.003
#R [2,] -0.505

Efficient and stable estimation of restricted weighted multivariate regression model

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in STATISTICS

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in ESTIMATION

Related Questions in MAXIMUM-LIKELIHOOD

Trending Questions

Popular # Hahtags

Popular Questions