Woodbury Matrix Inversion

665 Views Asked by At

I am trying to invert a matrix using Woodbury identity. The inversion using Cholesky decomposition has the following pseudo-code:

For $t=1,2,...$

$(1)\;\; \text{Read}\;x_t\in\mathbb{R}^n$

$(2)\;\;D_{t-1}=diag(|\theta_t^1|,...,|\theta_t^n|)$

$(3)\;\;A_t=A_t+x_tx_t'$

$(4)\;\;A_{t}^{-1}=\sqrt{D_{{t-1}}}\left(a\mathbf{I}+\sqrt{D_{{t-1}}}A_t\sqrt{D_{{t-1}}}\right)^{-1}\sqrt{D_{{t-1}}}$

$(5)\;\;\text{Read}\;y_t\in\mathbb{R}$

$(6)\;\;b=b+y_tx_t$

$(7)\;\;\theta_t=A_{t}^{-1}b$

End For

The well known application of Sherman-Morrison on Recursive Least Squares is as follows:

$$A_t^{-1}=(aI+x_tx_t')^{-1}=A_{t-1}^{-1} - \frac{(A_{t-1}^{-1}x_t)(A_{t-1}^{-1}x_t)'}{1+x_t'A_{t-1}^{-1}x_t}$$

where $A_0^{-1}=\frac{1}{a}I$ and we can set $A_t = A_{t-1}+\sum_{t=1}^Tx_tx_t'$, which will lead to time complexity of $O(n^2)$.

The above technique is mentioned here. The two implementation in $\texttt{R}$ are as follows:

X <-matrix(runif(1000),20,10)
Y<-rnorm(20)
a<- 0.1

Cholsky<-function(X,Y,a){
  X <- as.matrix(X)
  Y <- as.matrix(Y)
  T <- nrow(X)
  N <- ncol(X)
  aI<- diag(a,N)
  bt<- matrix(0,ncol=1,nrow=N)
  for (t in 1:T){
    xt<-X[t,]
    At <- aI + (xt %*% t(xt))
    InvA<-chol2inv(chol(At))
    bt <- bt + (Y[t] * xt)
    theta<- InvA %*% bt
  }
  return(theta)
}
Cholsky(X,Y,a)


Morrison<-function(X,Y,a){
  X <- as.matrix(X)
  Y <- as.matrix(Y)
  T <- nrow(X)
  N <- ncol(X)
  At<-diag(1/a,N)
  bt<- matrix(0,ncol=1,nrow=N)
  for (t in 1:T){
    xt<-X[t,]
    At <- At + (xt %*% t(xt))
    InvA <- At - ((t(xt%*%At)%*%(as.matrix(xt%*%At)))
                    /as.numeric(xt%*%At%*%xt+1))
    bt <- bt + (Y[t] * xt)
    theta<- InvA %*% bt
  }
  return(theta)
}
Morrison(X,Y,a)

They don't give the same result. So, perhaps I should not expect the implementations to be equivalent.

I was wondering if I could invert the following (for the above case) more efficiently:

$$A_t^{-1}=\left(D_{t-1}^{\frac{1}{2}}A_tD_{t-1}^{\frac{1}{2}}+aI\right)^{-1}$$

where $A_t=A_{t-1}+x_tx_t'$ and $A_0=\mathbf{0}$.

Essentially, I want to invert:

$$M_t=\left(a\mathbf{I}+\sqrt{D_{{t-1}}}x_tx_t'\sqrt{D_{{t-1}}}\right)$$

I say:

$$M_{t}^{-1}=M_{t-1}^{-1}-\frac{(M_{t-1}D_{t-1}^{\frac{1}{2}}x_t)(M_{t-1}D_{t-1}^{\frac{1}{2}}x_t)'}{1+x_t'D_{t-1}^{\frac{1}{2}}M_{t-1}D_{t-1}^{\frac{1}{2}}x_t}$$

where $D_0=diag(\mathbf{1}$). So, $M^{-1}_0=\frac{1}{a}I$

RLS identity given above $(aI+u_tv_t)^{-1}$ uses $u=A_{t-1}^{-1}x_t,v_t=u_t'$, I am using $u=M_{t-1}^{-1}D_{t-1}^{\frac{1}{2}}x_t,v_t=u_t'$

One may write the implementations in $\texttt{R}$ as follows:

X <-matrix(runif(1000),20,10)
Y<-rnorm(20)
a<- 0.1

#Cholesky implementation
X <- as.matrix(X)
Y <- as.matrix(Y)
T <- nrow(X)
N <- ncol(X)
bt<- matrix(0,ncol=1,nrow=N)
At<- diag(0,N)
I<- diag(a,N);Mt<-diag(1/a,N)
theta0<- rep(1,N)
for (t in 1:2){
  xt<-X[t,]
  Dt <- diag(sqrt(abs(as.numeric(theta0))))
  At <- At + (xt %*% t(xt))
  Mt <-  I + (Dt%*%At%*%Dt)
  InvA <- chol2inv(chol( Mt )) 
  AAt<- Dt %*%InvA%*% Dt
  bt <- bt + (Y[t] * xt)
  theta0 <- AAt %*% bt
  print(theta0)
}

Above is the correct implementation of the pseudo code. If I swap the following lines. I don't get the same answer.

Mt <-  Mt + (Dt%*%At%*%Dt)
InvA<- Mt - ((t(xt%*%Dt%*%Mt)%*%(as.matrix(xt%*%Dt%*%Mt)))
                   /as.numeric(xt%*%Dt%*%Mt%*%t(as.matrix(xt%*%Dt))+1))

Why is that?