Matrix equation with transpose

3.4k Views Asked by At

How can I solve this matrix equation for $X$: $$ (A^T)X = B(X-Y)C, $$ where $A^T$ is the transpose of $A$.

Here, all matrices are small (e.g., $2\times2$). I am especially interested in the following two cases: (a) $Y$ is the matrix with all entries having the value 2, and (b) $Y=2I$, where $I$ is the identity matrix.

1

There are 1 best solutions below

6
On

If $B$ is invertible, or if both $A$ and $C$ are invertible, then this can be easily converted to a Sylvester equation, and you can use standard direct methods for solving it. If not (or if your matrices are not amenable to direct factorizations and you want to use an iterative method instead), then read on; the problem becomes very interesting.

You can use the relationship between vectorization, $\text{vec}(\cdot)$, and the Kronecker product, $\otimes$, to set up a meta-matrix representing the linear system for the vectorization of $X$. In general, the following useful facts hold for any matrices $U,V,W$ of the appropriate size and invertibility so that the equations make sense, $$\text{vec}(UVW) = (W^T \otimes U) \text{vec}(V),$$ $$(U \otimes V)^{-1} = (U^{-1} \otimes V^{-1}),$$ $$(U \otimes V)^T = (U^T \otimes V^T)$$ $$(U \otimes V)(Z \otimes Q) = (UZ) \otimes (VQ)$$ We can use this in our situation as follows, \begin{align} A^T X &= B(X-Y)C \\ \text{vec}(A^T X) &= \text{vec}(B (X - Y)C), \\ (I \otimes A^T) \text{vec}(X) &= (C^T \otimes B) \text{vec}(X) - (C^T \otimes B)\text{vec}(Y), \\ (\underbrace{I \otimes A^T}_{M_1} - \underbrace{C^T \otimes B}_{M_2})\underbrace{\text{vec}(X)}_{x} &= - \underbrace{(C^T \otimes B)}_{M_2}\underbrace{\text{vec}(Y)}_{y} \end{align} which yields the following system of the general form $$(M_1 - M_2)x=-M_2 y.$$ In the case where these matrices $M_1$ and $M_2$ are square and $M_1+M_2$ is invertible, and the sizes are not too large, you can directly form the matrices and solve for the vector of entries in the desired matrix, $x = \text{vec}(X)$. E.g., see the following Matlab code,

A = randn(5,5); B = randn(5,5); C = randn(3,3); Y = randn(5,3);

M1 = kron(eye(3),A');
M2 = kron(C',B);
b = -reshape(B*Y*C, [],1); % = -M2*y
x = (M1-M2)\b;
X = reshape(x,5,3);
norm(A'*X - B*(X-Y)*C) % returns machine zero

But what if the matrices $M_1$ and $M_2$ are not square? If $A$ is a tall matrix, then $M_1-M_2$ is a wide matrix. Unless we are particularly unlucky, this matrix will have full column rank, so we can still find a solution (one of many) by solving the normal equations, $$(M_1 - M_2)^T(M_1 - M_2)x = -(M_1 - M_2)M_2 y,$$ or if you like, the slightly regularized version, $$\left((M_1 - M_2)^T(M_1 - M_2) + \epsilon I \right)x = -(M_1 - M_2)M_2 y.$$ For example,

A = randn(5,3); B = randn(3,5); C = randn(4,4); Y = randn(5,4);

M1 = kron(eye(4),A');
M2 = kron(C',B);
b = -reshape(B*Y*C, [],1); % = -M2*y
epsilon=1e-7;
x = ((M1-M2)'*(M1-M2) + epsilon*eye(numel(Y)))\((M1-M2)'*b);
X = reshape(x,size(Y));
norm(A'*X - B*(X-Y)*C) % returns error on the order of epsilon

If $A$ is a wide matrix, then there is generally no solution to the equations, unless we get particularly lucky.

In the special case (a) where $Y=2 \mathbb{1}\mathbb{1}^T$ is the matrix of all 2's, the right hand side takes the form, $$y = (C^T \otimes B)\text{vec}\left(2\mathbb{1}\mathbb{1}^T\right) = 2\text{vec}\left(B \mathbb{1}\mathbb{1}^T C\right) = 2\text{vec}\left((B \mathbb{1})(C^T \mathbb{1})^T\right),$$ where $\mathbb{1}$ is the vector of all ones.

In the special case (b) where $Y=2I$, the right hand side takes the form, $$y = (C^T \otimes B)\text{vec}\left(2I\right) = 2\text{vec}(BIC) = 2\text{vec}(BC)$$

If the problem is large enough that forming $M$ is prohibitive, you can still apply the action of $M_1$ and $M_2$ on a vector $u=\text{vec}(U)$ by doing some matrix multiplications, $$M_1u = \text{vec}(A^T U), \quad M_2u = \text{vec}(BUC),$$ which allows you to use iterative methods like GMRES. Furthermore, we can precondition it as follows: if one of the matrices $M_1$ or $M_2$ is dominant over the other, you can use that matrix as a preconditioner.

If both matrices are of roughly equal importance, then you use both together as preconditioners, either additively, or multiplicatively.