Find diagonal elements of the inverse for large matrices

160 Views Asked by At

Consider the set of invertible matrices $\{\mathbf{C}_{n}\}_{n=1}^N$ defined as $$\mathbf{C}_n = \begin{bmatrix} 1 & \mathbf{0}^\top_{k_1} & \mathbf{0}^\top_{k_2} &\cdots & \mathbf{0}^\top_{k_n} \\ \mathbf{0}_{k_1} & \mathbf{X}^\top_{1}\mathbf{X}_{1} + \boldsymbol{\Gamma}_1 & \mathbf{X}^\top_{1}\mathbf{X}_{2} & \cdots & \mathbf{X}^\top_{1}\mathbf{X}_{n} \\ \mathbf{0}_{k_2} & \mathbf{X}^\top_{2}\mathbf{X}_{1} & \mathbf{X}^\top_{2}\mathbf{X}_{2} + \boldsymbol{\Gamma}_2 &\cdots & \mathbf{X}^\top_{2}\mathbf{X}_{n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \mathbf{0}_{k_n} & \mathbf{X}^\top_{n}\mathbf{X}_{1} & \mathbf{X}^\top_{n}\mathbf{X}_{2} &\cdots & \mathbf{X}^\top_{n}\mathbf{X}_{n} + \boldsymbol{\Gamma}_n\end{bmatrix}, \quad n=1, \ldots, N,$$

where $\mathbf{0}_{k_1}$ denotes a vector of $k_1$ zeros, $\mathbf{X}_{n}\in \mathbb{R}^{m \times k_n}$, $\boldsymbol{\Gamma}_{n}\in \mathbb{R}^{k_n \times k_n}$ are singular but semidefinite positive matrices and we assume that $\mathbf{X}^\top_{n}\mathbf{X}_{n} + \boldsymbol{\Gamma}_n$ is invertible for each $n=1, \ldots, N$.

$\textbf{Objective}$: Find the diagonal elements of $\mathbf{C}_n^{-1}$, the inverse of $\mathbf{C}_n$ (assume that, for large $n$, the size of the matrix $\mathbf{C}_n$ prevents to compute the entire inverse due to numerical issues).

I managed to compute iteratively the set of matrices $$\mathbf{M}_n = \mathbf{B}_n \mathbf{C}_n^{-1} \mathbf{B}_{n}^\top, \qquad \text{with} \quad \mathbf{B}_n = \begin{bmatrix} \mathbf{0}_{m} & \mathbf{X}_{1} & \mathbf{X}_{2} & \cdots & \mathbf{X}_{n} \end{bmatrix}$$ for each $n=1, \ldots, N$, which we assume that are matrices much smaller than $\mathbf{C}_n$.

With them, I was able to compute the elements of the bottom right block of the inverse (and hence, its diagonal elements). Indeed, denoting as $\mathbf{F}_{n}$ the bottom right block of the inverse of $\mathbf{C}_n$, using the block matrix inverse formula we have that \begin{equation*} \mathbf{F}_n = (\mathbf{X}_n^\top \mathbf{X}_n + \boldsymbol{\Gamma}_n - \mathbf{X}_n^\top \mathbf{B}_{n-1}\mathbf{C}^{-1}_{n-1}\mathbf{B}^\top_{n-1} \mathbf{X}_{n})^{-1} = (\mathbf{X}_n^\top \mathbf{X}_n + \boldsymbol{\Gamma}_n - \mathbf{X}_n^\top \mathbf{M}_{n-1}\mathbf{X}_{n})^{-1}. \end{equation*} In this computation, only matrices up to size $\max\{m, k_{n}\}$ have been involved, and thus we assume that they can be calculated precisely. I tried to compute similarly the other blocks, but I was not able to find a pattern, and bigger matrices were involved in the computation. Is there a way to find these diagonal elements precisely?

Any help would be appreciated.

2

There are 2 best solutions below

7
On BEST ANSWER

In case it's helpful (e.g. perhaps it can be modified to work using pseudoinverses), here's an approach that works if $\Gamma_j$ for $j = 1,\dots,n$ are invertible.

First of all, note that for the block matrix $$ A = \pmatrix{1 & 0_k^T\\0_k & B} $$ $A$ is invertible iff $B$ is invertible with inverse given by $$ A^{-1} = \pmatrix{1&0_k^T\\0_k&B^{-1}}. $$ So, we can safely restrict our attention to the submatrix $$ \mathbf{D}_n = \begin{bmatrix} \mathbf{X}^\top_{1}\mathbf{X}_{1} + \boldsymbol{\Gamma}_1 & \mathbf{X}^\top_{1}\mathbf{X}_{2} & \cdots & \mathbf{X}^\top_{1}\mathbf{X}_{n} \\ \mathbf{X}^\top_{2}\mathbf{X}_{1} & \mathbf{X}^\top_{2}\mathbf{X}_{2} + \boldsymbol{\Gamma}_2 &\cdots & \mathbf{X}^\top_{2}\mathbf{X}_{n} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{X}^\top_{n}\mathbf{X}_{1} & \mathbf{X}^\top_{n}\mathbf{X}_{2} &\cdots & \mathbf{X}^\top_{n}\mathbf{X}_{n} + \boldsymbol{\Gamma}_n\end{bmatrix}, \quad n=1, \ldots, N. $$ It is notable that we can write this matrix in the form $\mathbf D_n = \mathbf G_n + \mathbf \Xi_n^T\mathbf \Xi_n$, where $$ \mathbf G_n = \operatorname{diag}(\mathbf \Gamma_1,\dots,\mathbf \Gamma_n), \quad \mathbf \Xi_n = \pmatrix{\mathbf X_1 & \cdots & \mathbf X_n}. $$ $\mathbf G_n$ is block-diagonal, so its inverse is easily computed as $$ \mathbf G_n^{-1} = \operatorname{diag}(\mathbf \Gamma_1^{-1},\dots,\mathbf \Gamma_n^{-1}). $$ From there, we could apply the Woodbury matrix identity to get $$ \mathbf D_n^{-1} =\mathbf G_n^{-1} - \overbrace{\mathbf G_n^{-1} \mathbf \Xi^T}^{\mathbf U_n^T}\overbrace{(I + \mathbf \Xi_n \mathbf G_n^{-1}\mathbf \Xi_n^T)^{-1}}^{\mathbf Y_n}\overbrace{\mathbf \Xi_n \mathbf G_n^{-1}}^{\mathbf U_n} $$ The term $\mathbf U_n^T \mathbf Y_n \mathbf U_n^T$ requires only the computation of an $m \times m$ matrix. Note that we can rewrite $\mathbf U_n, \mathbf Y_n$ as follows. $$ \mathbf Y_n = \left(\mathbf I_m + \mathbf \Xi \mathbf G_n^{-1}\mathbf \Xi^T\right)^{-1} = \left(\mathbf I_m + \sum_{j=1}^m \mathbf X_j \mathbf \Gamma_j^{-1} \mathbf X_j^T\right)^{-1}. $$ Note that $\mathbf Y_n$ can itself be updated from $\mathbf Y_{n-1}$ using the Woodbury identity, which is especially useful if $k_n$ is small compared to $m$. We have $$ \mathbf U_n = [\mathbf X_1 \mathbf \Gamma_1^{-1} \ \ \cdots \ \ \mathbf X_n \mathbf \Gamma_n^{-1}]. $$ Thus, we have $$ \mathbf U_n^T \mathbf Y_n \mathbf U_n^T = \left[\mathbf \Gamma_i^{-T} \mathbf X_i^T\left(\mathbf I_m + \sum_{j=1}^m \mathbf X_j \mathbf \Gamma_j^{-1} \mathbf X_j^T\right)^{-1}\mathbf X_j\mathbf \Gamma_j^{-1}\right]_{i,j = 1}^n. $$


We can make this work for positive semidefinite $\mathbf \Gamma_j$ as follows. Consider the following definitions:

  • For each $j$, let the matrix $L_j$ be such that $L_jL_j^T = \mathbf \Gamma_j$. In particular, I will take $L_j$ to be the positive semidefinite square root of $\mathbf \Gamma_j$ so that $L_j$ is symmetric.
  • $X = \operatorname{diag}(L_1,\dots,L_n)$.
  • $Y = \mathbf \Xi_n^T$
  • $Z = (I - XX^+)Y$
  • $E = I - X^+Y (I - Z^+Z) F^{-1} (X^+Y)^\mathrm T$
  • $F = I + (I - Z^+Z) Y^\mathrm T (XX^\mathrm T)^+ Y (I - Z^+Z)$

We can then compute $$ \mathbf C_n^{-1} = \mathbf C_n^+ = (XX^\mathrm T + YY^\mathrm T)^+ = (ZZ^\mathrm T)^+ + (I - YZ^+)^\mathrm T X^{+\mathrm T} E X^+ (I - YZ^+). $$

Some simplifications:

  • $X^+ = \operatorname{diag}(L_1^+,\dots,L_n^+)$
  • $(XX^T)^+ = \mathbf G_n^+ = (X^+)^2 = \operatorname{diag}([L_1^+]^2,\dots,[L_n^+]^2)$
1
On

If it is not possible to store a matrix of the dimensions of $\mathbf{C}_n$, one possibility is to use a randomized estimator for the diagonal in conjunction with a linear-system solver.

In particular, note that for a matrix $\mathbf{A}$, and random vector $\mathbf{v}$ with iid entries equal to $\pm 1$ each with probability $1/2$, \begin{equation*} \mathbb{E}[\mathbf{v}\circ (\mathbf{A} \mathbf{v})] = \operatorname{diag}(\mathbf{A}). \end{equation*} Here we use $\circ$ to denote the Hadamard Product which just multiplies elements entry-wise.

If you set $\mathbf{A} = \mathbf{C}_n$, then you can compute $\mathbf{A}\mathbf{v}$ by solving the linear system $\mathbf{C}_n \mathbf{x} = \mathbf{v}$. This could be done by a Krylov subspace method which would allow you to avoid even forming $\mathbf{C}_n$ itself. I suspect you can take advantage of the structure of your matrix to speed up product with $\vec{C}_n$. If you have some other solver of choice, you could of course use that instead.

For a single vector $\mathbf{v}$, the variance of $\mathbf{v}\circ (\mathbf{A} \mathbf{v})$ may be big. But you can just average a bunch of iid copies to reduce the variance. There are more clever approaches based on this general idea as well.

Here are some references: