Deriving solution to matrix equation $AV + VA - tAVA = I$

69 Views Asked by At

Consider the matrix equation $$ AV + VA - tAVA = I, $$ with $V$ square. Here we assume that $A$ and scalar $t$ are given, with $A$ symmetric positive definite and $0 < t < \tfrac{2}{\lambda_{\rm max}(A)}$. Here, $\lambda_{\rm max}(A)$ denotes the largest eigenvalue of $A$. We consider solutions of this equation in matrix variable $V$.

Apparently, a solution to this equation is $V = (2A - tA^2)^{-1}$. And indeed, it is easy to verify this solution is valid: $$ AV + VA -tAVA = 2(2I - tA)^{-1} - t(2I - tA)^{-1}A = (2I - tA)^{-1}(2I - tA) = I. $$ (This calculation follows since $A$ is nonsingular under the stated hypotheses.)

But I wonder if there is a clean/intuitive/straightforward way to derive this. One can see this easily if one knows that $AV = VA$, i.e., that they commute. It is also clear to see this if the dimension is 1: $$ av + va - tava = 1 \quad \mbox{implies} \quad (2a - ta^2)v = 1. $$ But is there a nice way to see this in general?

3

There are 3 best solutions below

1
On BEST ANSWER

I suppose that $\|A\|$ refers to the induced $2$-norm of $A$, i.e. the largest singular value of $A$. Since $A$ is positive definite, if we denote its eigenvalues by $a_1\ge a_2\ge\cdots\ge a_n\,(>0)$, then $\|A\|$ is just $a_1$. Now, if $0<t<2\|A\|^{-1}=2a_1^{-1}$, then $a_i^{-1}>\frac{t}{2}$ for every $i$. Hence $$ a_i+a_j-ta_ia_j=a_ia_j\left(a_i^{-1}+a_j^{-1}-t\right)>0\tag{1} $$ for all $i$ and $j$. It follows that $I\otimes A+A\otimes I-tA\otimes A$ is positive definite. The equation $(I\otimes A+A\otimes I-tA\otimes A)\operatorname{vec}(V)=\operatorname{vec}(I)$, or equivalently $AV+VA-tAVA=I$, is therefore uniquely solvable. It remains to verify that $V=(2A-tA^2)^{-1}$ is indeed the solution.


Alternatively, by a change of orthonormal basis, we may assume that $A=a_1I_{r_1}\oplus\cdots\oplus a_mI_{r_m}$, where $a_1,a_2,\ldots,a_m$ are distinct. Partition $V$ accordingly and denote its $(i,j)$-th sub-block as $V_{ij}$. Then $AV+VA-tAVA=I$ can be rewritten as \begin{cases} (2a_i-ta_i^2)V_{ii}=I_{r_i}&\text{for each } i,\\ (a_i+a_j-ta_ia_j)V_{ij}=0&\text{when }i\ne j. \end{cases} By $(1)$, $a_i+a_j-ta_ia_j$ is always positive. Therefore $V_{ii}=(2a_i-ta_i^2)^{-1}I_{r_i}$ and $V_{ij}=0$, i.e. $V=(2A-tA^2)^{-1}$.
2
On

If $f(\cdot)$ is a rational function or expressible as a power series in $A$, then $f(A)$ commutes with $A$. Thus, if we assume that $V = f(A)$ for such a function, we can commute $V$ and $A$ and formally derive a solution $V = f(A) = (2A-tA^2)^{-1}$ as you did. More generally, this "guess and check" trick works for many equations which is of the form "a finite linear combination of products of $A$ and $V$ $=0$", but doesn't work in general if, say, $A^\top$ gets involved. Note this method doesn't ensure that the solution produced in this way is unique.

2
On

One nice observation for this equation is that it behaves well with respect to diagonalization of $A$: There must be an orthogonal matrix $U$ and a diagonal matrix $D$ so that $A = U D U^T$, and hence this equation is equivalent to $$ U D U^T V + V U D U^T - t U D U^T V U D U = I \\ \iff U\Big( D (U^T V U) + (U^T V U) D - t D (U^T V U) D \Big) U^T = I. $$

Now redefine $V' := U^T V U$, then this is just

$$ U\Big( D V' + V'D - t D V' D \Big) U^T = I \iff D V' + V'D - t D V' D = I.$$

If we can solve this latter equation for $V'$, then we have also solved our original equation by substituting back into $V$.

This makes the situation easier to analyze.


One important remark which we can see from here: The solution $V' = (2 D - t D^2)^{-1}$ is not always the only one. Consider for example the $2 \times 2$-situation, with $D = \begin{pmatrix} a & 0 \\ 0 & b \end{pmatrix}.$

then this system takes, for $V' = \begin{pmatrix} x & y \\ z & w \end{pmatrix}$ the following shape:

$$\begin{pmatrix} ax & ay \\ bz & bw \end{pmatrix} + \begin{pmatrix} ax & by \\ az & bw \end{pmatrix} - t \begin{pmatrix} a^2x & aby \\ abz & b^2w \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1\end{pmatrix}.$$

Whenever $t = \frac{a + b}{ab},$ the offdiagonal equations in the above become $0 = 0$ trivially. That means the offdiagonal entries of $V'$ can be chosen arbitrarily, and in particular that means there are non-diagonal solutions. In contrast, the "standard" solution $V' = (2 D - t D^2)^{-1}$ is always diagonal.


However, if you assume in the above that you restrict to solutions where $V'$ is diagonal, then the system becomes very easy: writing the system in coordinates then, you just get a bunch of scalar solutions on the diagonal which have the shape $av + va - tava = 1 $, which you can solve with $V' = (2 D - t D^2)^{-1}$ (this closer analysis also makes it possible for you to broaden the region of values $t$ that you're allowed to use, which you should then be able to phrase in terms of the eigenvalues of $A$, I think.)

Lastly, this assumption that $V'$ is diagonal of course seems kind of strong, but if you phrase it back in terms of the original matrix $V$, this just means that you assume that $A$ and $V$ are simultaneously diagonalizable. I don't think there are any weaker requirements on $V$ that you can put on it. I think that might the same as in epperly's answer, where you require that $V$ can be expressed as a formal series in $A$. Apparently this is more general than requiring that $V$ is a power series in $A$, by epperly's comment! Hooray.

Alternatively, you could put an assumption on it in terms of wanting a family of solutions which is smooth in $t$ in some region, or so. That may again leave you with your unique family of solutions. But this post is getting long enough :)