I am trying to invert a matrix using Woodbury identity. The inversion using Cholesky decomposition has the following pseudo-code:
For $t=1,2,...$
$(1)\;\; \text{Read}\;x_t\in\mathbb{R}^n$
$(2)\;\;D_{t-1}=diag(|\theta_t^1|,...,|\theta_t^n|)$
$(3)\;\;A_t=A_t+x_tx_t'$
$(4)\;\;A_{t}^{-1}=\sqrt{D_{{t-1}}}\left(a\mathbf{I}+\sqrt{D_{{t-1}}}A_t\sqrt{D_{{t-1}}}\right)^{-1}\sqrt{D_{{t-1}}}$
$(5)\;\;\text{Read}\;y_t\in\mathbb{R}$
$(6)\;\;b=b+y_tx_t$
$(7)\;\;\theta_t=A_{t}^{-1}b$
End For
The well known application of Sherman-Morrison on Recursive Least Squares is as follows:
$$A_t^{-1}=(aI+x_tx_t')^{-1}=A_{t-1}^{-1} - \frac{(A_{t-1}^{-1}x_t)(A_{t-1}^{-1}x_t)'}{1+x_t'A_{t-1}^{-1}x_t}$$
where $A_0^{-1}=\frac{1}{a}I$ and we can set $A_t = A_{t-1}+\sum_{t=1}^Tx_tx_t'$, which will lead to time complexity of $O(n^2)$.
The above technique is mentioned here. The two implementation in $\texttt{R}$ are as follows:
X <-matrix(runif(1000),20,10) Y<-rnorm(20) a<- 0.1 Cholsky<-function(X,Y,a){ X <- as.matrix(X) Y <- as.matrix(Y) T <- nrow(X) N <- ncol(X) aI<- diag(a,N) bt<- matrix(0,ncol=1,nrow=N) for (t in 1:T){ xt<-X[t,] At <- aI + (xt %*% t(xt)) InvA<-chol2inv(chol(At)) bt <- bt + (Y[t] * xt) theta<- InvA %*% bt } return(theta) } Cholsky(X,Y,a) Morrison<-function(X,Y,a){ X <- as.matrix(X) Y <- as.matrix(Y) T <- nrow(X) N <- ncol(X) At<-diag(1/a,N) bt<- matrix(0,ncol=1,nrow=N) for (t in 1:T){ xt<-X[t,] At <- At + (xt %*% t(xt)) InvA <- At - ((t(xt%*%At)%*%(as.matrix(xt%*%At))) /as.numeric(xt%*%At%*%xt+1)) bt <- bt + (Y[t] * xt) theta<- InvA %*% bt } return(theta) } Morrison(X,Y,a)They don't give the same result. So, perhaps I should not expect the implementations to be equivalent.
I was wondering if I could invert the following (for the above case) more efficiently:
$$A_t^{-1}=\left(D_{t-1}^{\frac{1}{2}}A_tD_{t-1}^{\frac{1}{2}}+aI\right)^{-1}$$
where $A_t=A_{t-1}+x_tx_t'$ and $A_0=\mathbf{0}$.
Essentially, I want to invert:
$$M_t=\left(a\mathbf{I}+\sqrt{D_{{t-1}}}x_tx_t'\sqrt{D_{{t-1}}}\right)$$
I say:
$$M_{t}^{-1}=M_{t-1}^{-1}-\frac{(M_{t-1}D_{t-1}^{\frac{1}{2}}x_t)(M_{t-1}D_{t-1}^{\frac{1}{2}}x_t)'}{1+x_t'D_{t-1}^{\frac{1}{2}}M_{t-1}D_{t-1}^{\frac{1}{2}}x_t}$$
where $D_0=diag(\mathbf{1}$). So, $M^{-1}_0=\frac{1}{a}I$
RLS identity given above $(aI+u_tv_t)^{-1}$ uses $u=A_{t-1}^{-1}x_t,v_t=u_t'$, I am using $u=M_{t-1}^{-1}D_{t-1}^{\frac{1}{2}}x_t,v_t=u_t'$
One may write the implementations in $\texttt{R}$ as follows:
X <-matrix(runif(1000),20,10)
Y<-rnorm(20)
a<- 0.1
#Cholesky implementation
X <- as.matrix(X)
Y <- as.matrix(Y)
T <- nrow(X)
N <- ncol(X)
bt<- matrix(0,ncol=1,nrow=N)
At<- diag(0,N)
I<- diag(a,N);Mt<-diag(1/a,N)
theta0<- rep(1,N)
for (t in 1:2){
xt<-X[t,]
Dt <- diag(sqrt(abs(as.numeric(theta0))))
At <- At + (xt %*% t(xt))
Mt <- I + (Dt%*%At%*%Dt)
InvA <- chol2inv(chol( Mt ))
AAt<- Dt %*%InvA%*% Dt
bt <- bt + (Y[t] * xt)
theta0 <- AAt %*% bt
print(theta0)
}
Above is the correct implementation of the pseudo code. If I swap the following lines. I don't get the same answer.
Mt <- Mt + (Dt%*%At%*%Dt)
InvA<- Mt - ((t(xt%*%Dt%*%Mt)%*%(as.matrix(xt%*%Dt%*%Mt)))
/as.numeric(xt%*%Dt%*%Mt%*%t(as.matrix(xt%*%Dt))+1))
Why is that?