Efficient and stable estimation of restricted weighted multivariate regression model

81 Views Asked by At

First, I will show what I currently do in a (non-redistricted) weighted multivariate regression model. Then I will pose my question as I need the notation I introduce. Suppose we have weighted multivariate least squares problem of the form

$$ \vec{y}_i = F\vec{x}_i + \vec{\epsilon}_i, \qquad \epsilon_i\sim N(\vec{0}, \Omega) $$

where $\vec{y}_i,\epsilon_i \in \mathbb{R}^p$, $\vec{x}_i \in \mathbb{R}^q$, $\vec{\epsilon}_i$ are iid, and each observation $i=1,\dots,n$ has a weight $w_i$. Then the MLE of $F$ is

$$ \hat{F} = \underset{F}{\text{argmin}} \lVert W^{1/2} (Y - XF^\top) \rVert^2 $$

where

$$ Y = (\vec{y}_1,\dots,\vec{y}_n)^\top, \quad X=(\vec{x}_1,\dots,\vec{x}_n)^\top $$

and $W$ is a diagonal matrix with the weights $w_1\dots,w_n$ as the diagonal entries. Let $\tilde Y = W^{1/2}Y$, $\tilde X = W^{1/2} X$, and the QR decomposition of

$$ \tilde X = QR = Q\begin{pmatrix} R \\ 0 \end{pmatrix} $$

Lastly, let

$$ Q^\top \tilde Y = \begin{pmatrix} C_1 \\ C_2 \end{pmatrix} $$

where $C_1$ are the first $q$ rows of $Q^\top \tilde Y$. I have solved the above problem by computing $R$ and $C_1$ in parallel with a low memory footprint as in the bam function in the R package mgcv (see Wood, 2015) since the matrices I worked with are rather large. Then I use that

$$ \hat{F}^\top = (R^\top R)^{-1}R^\top C_1 = R^{-1}C_1 $$

and the MLE of $\Omega$ is

$$ \hat \Omega = (\sum w_i)^{-1}(\tilde Y^\top\tilde Y - C_1^\top C_1) $$


Now suppose that I want restrict some of the parameter in $F$ such that

$$ \text{vec}(F) = K \vec{\theta} \Leftrightarrow \text{vec}(F^\top) = \underbrace{H_{p,q}K}_{D}\vec{\theta} $$

where $K \in \mathbb{R}^{pq \times k}$ is a known matrix with full column rank, $k < pq$, $H_{p,q}$ is the $(p,q)$ commutation matrix, and $\text{vec}(\cdot)$ is the vectorization function. Suppose that we have an estimate of $\Omega$ denoted by $\hat \Omega$ then it is clear that we can use the GLS estimator. That is, we denote

$$ \mathrm{y} = \underbrace{\text{vec}(Y)}_{pn\times 1}, \quad \mathrm{X} = \underbrace{I_p \otimes X}_{pn\times pq}, \quad \mathrm{O} = \underbrace{\Omega \otimes I_n}_{np \times np} $$

where $\otimes$ is the Kronecker product and $I_l$ is the $l$ dimensional identity matrix. It follows that

$$ \mathrm{y} = \mathrm{X}D\vec{\theta} + \mathrm{e}, \qquad \mathrm{e}\sim N(\vec{0}, \mathrm{O}) $$

Let

$$ \tilde{\mathrm{y}} = \text{vec}(\tilde Y), \quad \tilde{\mathrm{X}} = I_p \otimes \tilde X, \quad \widehat{\mathrm{O}} = \hat\Omega \otimes I_n $$

Then we can find that the MLE is

$$ \widehat{\vec{\theta}} = D^+ (\tilde{\mathrm{X}}^\top\widehat{\mathrm{O}}^{-1}\tilde{\mathrm{X}})^{-1} D^{\top+} D^\top \tilde{\mathrm{X}}^\top\widehat{\mathrm{O}}^{-1}\tilde{\mathrm{y}} $$

We can use that

$$ \begin{align*} \tilde{\mathrm{X}}^\top\widehat{\mathrm{O}}^{-1}\tilde{\mathrm{X}} &= (I_p \otimes \tilde X)^\top(\hat\Omega^{-1} \otimes I_n)(I_p \otimes \tilde X) \\ &= (I_p \otimes \tilde X^\top)(\hat\Omega^{-1}\otimes\tilde X) \\ &= \hat\Omega^{-1}\otimes (\tilde X^\top\tilde X) \\ \tilde{\mathrm{X}}^\top\widehat{\mathrm{O}}^{-1}\tilde{\mathrm{y}} &= (I_p \otimes \tilde X)^\top(\hat\Omega^{-1} \otimes I_n)\text{vec}(\tilde Y) \\ &= (I_p \otimes \tilde X^\top)\text{vec}(\tilde Y\hat\Omega^{-1}) \\ &= \text{vec}(\tilde X^\top\tilde Y\hat\Omega^{-1}) \\ \end{align*} $$

Thus,

$$ \widehat{\vec{\theta}} = D^+ (\hat \Omega \otimes (\tilde X^\top \tilde X)^{-1}) D^{\top+}D^\top\text{vec}(\tilde X^\top\tilde Y\hat\Omega^{-1}) $$

and maybe

$$ \widehat{\vec{\theta}} = D^+ (\hat \Omega \otimes (R^\top R)^{-1}) D^{\top+}D^\top\text{vec}(R^\top C_1\hat\Omega^{-1}) $$

which if one of my questions: is this true? If it is true then I can re-use the code I have written in the non-restricted model unless. That is, do it in parallel with a low memory footprint (not having to store the entire matrices at any one point). Are there other smarter (faster, stable, parallelizable, and low memory footprints) ways to do this?

I have seen the GLS estimator posted in multiple places but never come across an actual way to implement the estimation in the context of multivariate regression models.

1

There are 1 best solutions below

0
On

To answer

and maybe ... which if one of my questions: is this true?

Then it is true as this example in R shows

#####
# simulate data
n <- 1000
p <- 2
q <- 4
k <- 2

library(Matrix)
K <- Matrix(0., p * q, k)
K[1, 1] <- K[3, 1] <- K[5, 1] <- K[7, 1] <-   1
K[2, 2] <- K[6, 2] <- 1
theta <- c(2, -.5)
(F. <- Matrix(as.vector(K %*% theta), p, q))
#R 2 x 4 Matrix of class "dgeMatrix"
#R      [,1] [,2] [,3] [,4]
#R [1,]  2.0    2  2.0    2
#R [2,] -0.5    0 -0.5    0

Omega <- Matrix(c(2, -1, -1, 1), 2)

set.seed(63098548)
X <- Matrix(rnorm(q * n), ncol = q)
Y <- X %*% t(F.) + Matrix(rnorm(n * p), n) %*% chol(Omega)
ws <- sample.int(1:5, n, replace = TRUE)
W_sqrt <- Diagonal(n, sqrt(ws))

#####
# Fit model as in the dense case
qr_o <- qr(W_sqrt %*% X)
R <- qr.R(qr_o)
C_1 <- qr.qty(qr_o, as.matrix(W_sqrt %*% Y))[1:q, ]

t(solve(R, C_1))
#R        [,1]   [,2]   [,3]    [,4]
#R [1,]  1.912 1.9934  2.019  2.0322
#R [2,] -0.456 0.0471 -0.516 -0.0273

# and we get the same w/ the GLS estimator
solve(
  kronecker(solve(Omega), crossprod(R)),
  as.vector(crossprod(R, t(solve(Omega, t(C_1))))))
#R 8 x 1 Matrix of class "dgeMatrix"
#R         [,1]
#R [1,]  1.9125
#R [2,]  1.9934
#R [3,]  2.0189
#R [4,]  2.0322
#R [5,] -0.4561
#R [6,]  0.0471
#R [7,] -0.5155
#R [8,] -0.0273

#####
# fit model in restricted case. Somehow we know Omega
D <- matrixcalc::commutation.matrix(p, q) %*% K
t1 <- crossprod(D, as.vector(crossprod(R, t(solve(Omega, t(C_1))))))
t2 <- crossprod(D, kronecker(solve(Omega), crossprod(R)) %*% D)
solve(t2, t1)
#R 2 x 1 Matrix of class "dgeMatrix"
#R        [,1]
#R [1,]  2.003
#R [2,] -0.505