How to show that $(\frac{1}{x^Tx} x^TA(\sigma^2A + xx^T)^{-1}x)^{-1} = \sigma^2 + x^TA^{-1}x$?

278 Views Asked by At

I want to show that $$\left(\frac{1}{x^Tx} x^TA(\sigma^2A + xx^T)^{-1}x\right)^{-1} = \sigma^2 + x^TA^{-1}x$$ where $A$ is symmetric and positive definite.

The following python program indicates that it is true because the difference between the left- and the right-hand side is always relatively small ($< 10^{-4}$). However, I don't know how to proceed.

To speak informally, I think I need to somehow move the inverse from $(\sigma^2A + xx^T)^{-1}$ to the $A$ because then everything falls into place but I don't see how.

import numpy as np
from numpy.linalg import *

# Sample matrices and data
d = 10
sigma2 = np.random.rand() + 0.0001
A = np.random.randn(d, d)
x = (np.random.rand(d) - 0.5) * 1000

# Ensure invertibility
assert det(A) != 0.0

left = 1/(1 / (x @ x) * x.T @ A @ inv(sigma2 * A + np.outer(x, x)) @ x)
right = sigma2 + x.T @ inv(A) @ x

abs(left - right)
# usually on the order of 10^-5 because of instability in the matrix
# inversion I guess
3

There are 3 best solutions below

1
On BEST ANSWER

By the Sherman Morrison formula, we have $$ \begin{align} (\sigma^2 A + xx^T)^{-1} &= \sigma^{-2} A^{-1} - \sigma^{-4}\frac {A^{-1}xx^T A^{-1}}{1 + \sigma^{-2} x^\textsf{T}A^{-1}x} \\ & = \sigma^{-2} A^{-1} - \sigma^{-2}\frac {A^{-1}xx^T A^{-1}}{\sigma^2 + x^\textsf{T}A^{-1}x} \\ & = \sigma^{-2} \left[A^{-1} - \frac {A^{-1}xx^T A^{-1}}{\sigma^2 + x^\textsf{T}A^{-1}x}\right]. \end{align} $$ It follows that $$ \begin{align} x^TA(\sigma^2 A + xx^T)^{-1}x &= \sigma^{-2} x^TA\left[A^{-1} - \frac {A^{-1}xx^T A^{-1}}{\sigma^2 + x^\textsf{T}A^{-1}x}\right]x \\ & = \sigma^{-2} x^TAA^{-1}x - \sigma^{-2}x^TA\frac {A^{-1}xx^T A^{-1}}{\sigma^2 + x^\textsf{T}A^{-1}x}x \\ & = \sigma^{-2} x^Tx - \sigma^{-2}\frac {x^Tx}{\sigma^2 + x^\textsf{T}A^{-1}x}x^T A^{-1}x \\ & = \sigma^{-2} x^Tx - \sigma^{-2}\frac {x^Tx}{\sigma^2 + x^\textsf{T}A^{-1}x}[\sigma^2 + x^T A^{-1}x] + \frac {x^Tx}{\sigma^2 + x^\textsf{T}A^{-1}x} \\ & = \sigma^{-2} x^Tx - \sigma^{-2}{x^Tx} + \frac {x^Tx}{\sigma^2 + x^\textsf{T}A^{-1}x} \\ & = \frac {x^Tx}{\sigma^2 + x^\textsf{T}A^{-1}x}. \end{align} $$ Your observation follows.

1
On

An intuitive way to understand the identity is to expand $A(\sigma^2A+xx^T)^{-1}=\frac{1}{\sigma^2}\left(I+\frac{xx^TA^{-1}}{\sigma^2}\right)^{-1}$ as a power series in $\frac{xx^TA^{-1}}{\sigma^2}$. Unfortunately, the series does not always converge. But the idea can be salvaged. Let $B=\frac{xx^TA^{-1}}{\sigma^2}$. Then $B$ has rank one. Hence its trace $$ \lambda=\frac{x^TA^{-1}x}{\sigma^2} $$ is an eigenvalue. By Cayley-Hamilton theorem, there is a polynomial $p$ such that $$ p(B)=(I+B)^{-1};\ \text{ hence }\ p(\lambda)=(1+\lambda)^{-1}. $$ Now, for every integer $k\ge0$, we have $$ x^TB^kx =x^T\left(\frac{xx^TA^{-1}}{\sigma^2}\right)^kx =x^Tx\left(\frac{x^TA^{-1}x}{\sigma^2}\right)^k =\lambda^kx^Tx. $$ Therefore $x^Tp(B)x=p(\lambda)\,x^Tx$, i.e. $$ x^T(I+B)^{-1}x=(1+\lambda)^{-1}x^Tx. $$ It follows that \begin{aligned} \left(\frac{1}{x^Tx}x^TA(\sigma^2A+xx^T)^{-1}x\right)^{-1} &=\left(\frac{1}{\sigma^2x^Tx}x^T(I+B)^{-1}x\right)^{-1} =\sigma^2(1+\lambda) =\sigma^2+x^TA^{-1}x. \end{aligned}


Update. As Omnomnomnom suspected, the identity arises naturally from manipulations of block matrices, but we need to rewrite the identity first. Let $A_0=\sigma^2A$. Then \begin{align} &\left(\frac{1}{x^Tx}x^TA(\sigma^2A+xx^T)^{-1}x\right)^{-1}=\sigma^2+x^TA^{-1}x\\ &\Leftrightarrow\left(\frac{1}{x^Tx}x^TA_0(A_0+xx^T)^{-1}x\right)^{-1}=1+x^TA_0^{-1}x\\ &\Leftrightarrow\left(\frac{1}{x^Tx}x^T\left((A_0+xx^T)-xx^T\right)(A_0+xx^T)^{-1}x\right)^{-1}=1+x^TA_0^{-1}x\\ &\Leftrightarrow\left(1-x^T(A_0+xx^T)^{-1}x\right)^{-1}=1+x^TA_0^{-1}x\\ &\Leftrightarrow1-x^T(A_0+xx^T)^{-1}x=\left(1+x^TA_0^{-1}x\right)^{-1}.\tag{1} \end{align} How last identity arises is now evident: it originates from Schur complement identity, which arises from writing the inverse of a block matrix in two equivalent ways: $$ \pmatrix{A&B\\ C&D}^{-1} =\pmatrix{(A-BD^{-1}C)^{-1}&\ast\\ \ast&\ast} =\pmatrix{A^{-1}+A^{-1}B(D-CA^{-1}B)^{-1}CA^{-1}&\ast\\ \ast&\ast} $$ The identity $(1)$ is just the case when $$ \pmatrix{A&B\\ C&D}=\pmatrix{1&-x^T\\ x&A_0}. $$

2
On

From user1551's second proof using the Schur complement identity I learned about the Woodbury matrix identity (or Sherman–Morrison–Woodbury formula) $$(A + UCV)^{-1} = A^{-1} - A^{-1}U(C^{-1} + VA^{-1}U)^{-1}VA^{-1}.$$ This shows the identity in an even more direct way when $A := \sigma^2A$, $U := x$, $C := I$ and $V := x^T$ (though it is essentially the same computation as given by Omnomnomnom).

$$ \begin{align} & \left(\frac{1}{x^Tx} x^TA(\sigma^2A + xx^T)^{-1}x\right)^{-1}\\ = &\, \left(\frac{1}{x^Tx} x^TA(\sigma^{-2}A^{-1} - \sigma^{-4}A^{-1}x(I + \sigma^{-2}x^T A^{-1}x)^{-1}x^T A^{-1})x\right)^{-1}\\ = &\, \sigma^{2}\left(\frac{1}{x^Tx} x^TAA^{-1}x - \frac{\sigma^{-2}}{x^Tx} x^TAA^{-1}x(I + \sigma^{-2}x^T A^{-1}x)^{-1}x^T A^{-1}x\right)^{-1}\\ = &\, \sigma^{2}\left(1 - (I + \sigma^{-2}x^T A^{-1}x)^{-1}\sigma^{-2}x^T A^{-1}x\right)^{-1}\\ = &\, \sigma^{2}\left(1 - (I + \sigma^{-2}x^T A^{-1}x)^{-1}((I + \sigma^{-2}x^T A^{-1}x) - I)\right)^{-1}\\ = &\, \sigma^{2}\left(1 - (I + \sigma^{-2}x^T A^{-1}x)^{-1}(I + \sigma^{-2}x^T A^{-1}x) + (I + \sigma^{-2}x^T A^{-1}x)^{-1}\right)^{-1}\\ = &\, \sigma^{2} + x^T A^{-1} x \end{align} $$