Deriving an identity using the Woodbury matrix identity

1.5k Views Asked by At

I am working through an algorithm derivation in Kernel Adaptive Filtering: A Comprehensive Introduction by Liu, Principe and Haykin. The part I'm having trouble with is on page 104 if you have the book.

They give the Woodbury matrix identity (referred to in the book as the matrix inversion lemma) as: $$(\mathbf{A}+\mathbf{BCD})^{-1}=\mathbf{A}^{-1}-\mathbf{A}^{-1}\mathbf{B}(\mathbf{C}^{-1}+\mathbf{DA}^{-1}\mathbf{B})^{-1}\mathbf{DA}^{-1}$$ They then make these substitutions: $$\lambda\beta^i\mathbf{I}\to\mathbf{A}$$ $$\mathbf{\Phi}\to\mathbf{B}$$ $$\mathbf{B}\to\mathbf{C}$$ $$\mathbf{\Phi}^T\to\mathbf{D}$$ And claim that they get: $$[\lambda\beta^i\mathbf{I}+\mathbf{\Phi}\mathbf{B}\mathbf{\Phi}^T]^{-1}\mathbf{\Phi}\mathbf{B}=\mathbf{\Phi}[\lambda\beta^i\mathbf{B}^{-1}+\mathbf{\Phi}^T\mathbf{\Phi}]^{-1}$$ But I simply can't figure out what manipulations they are doing to reach that conclusion. I'd show my work, but it hasn't really gone anywhere - I feel like I'm missing an obvious bit of algebra. The biggest problem that stands out to me is the $\mathbf{A}^{-1}$ term on the right that becomes $\lambda^{-1}\beta^{-i}\mathbf{I}$. I have no idea how they got rid of it.

1

There are 1 best solutions below

2
On BEST ANSWER

Assume for a while that $A=I$ (no scaling, just for clarity). You want to show that $$\tag{1} (I+BCD)^{-1}BC=B(C^{-1}+DB)^{-1}. $$ Using the Woodbury formula, we have $$\tag{2} (I+BCD)^{-1}=I-B(C^{-1}+DB)^{-1}D. $$ Therefore, $$ \begin{align} (I+BCD)^{-1}BC &=[I-B(C^{-1}+DB)^{-1}D]BC && \text{substitute from (2)}\\ &=BC-B(C^{-1}+DB)^{-1}DBC && \text{expand}\\ &=B(C^{-1}\!+\!DB)^{-1}[(C^{-1}\!+\!DB)\!-\!DB]C && \text{get out $B(C^{-1}+DB)^{-1}$ and $C$} \\ &=B(C^{-1}+DB)^{-1}. && \text{simplify the parentheses}\\ \end{align} $$ This proves (1).

You can also prove it without the Woodbury formula. By multiplying (1) with $C^{-1}+DB$ from right, (1) is equivalent to $$ (I+BCD)^{-1}BC(C^{-1}+DB)=B. $$ Indeed, by a small manipulation on the left-hand side, $$ \begin{split} (I+BCD)^{-1}BC(C^{-1}+DB) &= (I+BCD)^{-1}(B+BCDB)\\ &= (I+BCD)^{-1}(I+BCD)B\\ &= B \end{split} $$

By introducing some scaling with a nonzero scalar $\alpha$, (1) becomes $$ [\alpha I+B(\alpha C)D]^{-1}B(\alpha C)=B[\alpha(\alpha C)^{-1}+DB]^{-1}. $$ Replacing each occurence of $\alpha C$ with just $C$, we get $$ [\alpha I+BCD]^{-1}BC=B[\alpha C^{-1}+DB]^{-1}. $$ Now you can do your substitutions: $\alpha \equiv\lambda\beta^i$, $\Phi\equiv B$, $B\equiv C$, and $\Phi^T\equiv D$.