Computing $(A\otimes I + I \otimes A)^{-1} \text{vec}B$

288 Views Asked by At

Suppose I have two positive semi-definite $n$-by-$n$ matrices $A$, $B$ and an $n$-by-$n$ identity matrix $I$, and I'm looking for a way to compute, approximate or bound the following quantity:

$$(A\otimes I + I \otimes A)^{-1} \text{vec}B$$

Concretely I'm dealing with matrices with $n$ ranging from $100$ to $4000$, so $A$ is easy to invert, while $A \otimes I$ is too large, so need a way to compute this using operations on $n$-by-$n$ matrices

Additionally, I found the following to give a decent approximation when $A$, $B$ can be Kronecker-factored, wondering if there's a reason for that.

$$0.5 A^{-0.5} B A^{-0.5}$$

Any tips or literature pointers are appreciated!

3

There are 3 best solutions below

5
On BEST ANSWER

Your equation is equivalent to solving the following equation for $P$

$$ A\,P + P\,A^\top = B. \tag{1} $$

Such equation also solves for the infinite time (controllability) Gramian, which can also be calculated with

$$ P = -\int_0^\infty e^{A\,t}\,B\,e^{A^\top t}\,dt. \tag{2} $$

This integral should also be of size $n \times n$. As time goes to infinity $e^{A\,t}$ is not well defined when $A$ is positive semi-definite. It can also be noted that in order to be able to solve for $P$ the spectra of $A$ and $-A$ need to be disjoint. In order words $A$ can't have an eigenvalue of zero, which would imply that $A$ should be positive definite instead of positive semi-definite. When multiplying both sides of $(1)$ by minus ones keeps it equivalent to the original problem. This is also equivalent to multiplying both $A$ and $B$ by minus one, which transforms $(2)$ into

$$ P = \int_0^\infty e^{-A\,t}\,B\,e^{-A^\top t}\,dt. \tag{3} $$

Now the term $e^{-A\,t}$ does stay bounded and converges to zero as time goes to infinity. The rate of how fast this integral converges should be roughly inversely proportional to the smallest eigenvalue of $A$, however any finite time numerical approximation of this integral can be used as a lower bound for $P$.

0
On

Given symmetric matrices $\mathrm A \succ \mathrm O_n$ and $\mathrm B \succeq \mathrm O_n$, let

$$\mathrm X (t) := \int_0^t e^{-\tau \mathrm A} \mathrm B e^{-\tau \mathrm A} \,\mathrm d \tau$$

Using integration by parts,

$$\mathrm A \mathrm X (t) = \cdots = -\dot{\mathrm X} (t) + \mathrm B - \mathrm X (t) \mathrm A$$

and, thus, we obtain the following matrix differential equation

$$\boxed{\dot{\mathrm X} = - \mathrm A \mathrm X - \mathrm X \mathrm A + \mathrm B}$$

with initial condition $\mathrm X (0) = \mathrm O_n$. Assuming $\rm X$ converges to a steady-state, $\dot{\mathrm X} = \mathrm O_n$, we obtain the following linear matrix equation

$$\mathrm A \mathrm X + \mathrm X \mathrm A = \mathrm B$$

whose solution we denote by $\bar{\rm X}$. This is the desired solution.

Using a numerical ODE solver, we then integrate the matrix differential equation above and find an approximate value of matrix $\bar{\rm X}$. Since matrix $\rm X (t)$ is symmetric for all $t \geq 0$, we can half-vectorize both sides of the matrix differential equation in order to obtain a $\binom{n+1}{2}$-dimensional state vector (rather than a $n^2$-dimensional state vector, which would be wasteful).


Appendix

Note that the solution of the matrix differential equation is given by

$$\mathrm X (t) = \bar{\mathrm X} - e^{-t \mathrm A} \bar{\mathrm X} e^{-t \mathrm A}$$

Since $\mathrm A \succ \mathrm O_n$,

$$\lim_{t \to \infty} \mathrm X (t) = \bar{\mathrm X}$$

If matrix $\rm A$ were allowed to have an eigenvalue at zero, it would be unpleasant.

0
On

$ \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vcc#1{\op{vec}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} $The SANE algorithm by Raydan & Martinez is basically Barzilai-Borwein applied to a system of nonlinear equations rather than a scalar cost function. It satisfies a least-squares form of the Secant Equation, so it can also be viewed as a Quasi-Newton method.

Anyway, their algorithm can be easily adapted to matrix-valued functions $$\eqalign{ F(X) &\equiv AX+XA^T-B = 0 \qquad&\{ {\rm matrix\;NLE\;function} \} \\ \\ X_0 &= I &\{ {\rm initialize} \} \\ F_0 &= F(X_0) \\ \\ k &= 1 &\{ {\rm first\;step} \} \\ dX_1 &= -F_0\cdot 10^{-5} \\ X_1 &= X_0 + dX_1 \\ F_1 &= F(X_1) \\ dF_1 &= F_1 - F_0 \\ \\ {\rm while\;} \big\|F_k\big\|_F^2 \:&> 10^{-12} \\ dX_{k+1} &= -\fracLR{dX_k:dF_k}{dF_k:dF_k}F_k \qquad&\{ {\rm Barzilia\,Borwein\;step} \} \\ X_{k+1} &= X_k + dX_{k+1} \\ dF_{k+1} &= A\:dX_{k+1} + dX_{k+1}\,A^T &\equiv\; F_{k+1} - F_k \\ F_{k+1} &= F_k + dF_{k+1} \\ k &= k+1 \\ }$$ where a colon has been used to denote the matrix inner product, i.e. $$\eqalign{ A:B &= \sum_{i=1}^n\sum_{j=1}^n A_{ij}B_{ij} \\ A:A &= \|A\|^2_F \\ }$$ The nice thing about this algorithm is that it deals with $n\times n$ matrices rather than $n^2\times n^2$ Kronecker matrices. Furthermore, it does not require any matrix inversions or the evaluation of transcendental matrix functions. Each iteration requires exactly one evaluation of the objective function. All of these features make each iteration relatively quick.