AXBYC matrix product find XY

222 Views Asked by At

I have a problem of the form

\begin{equation} \mathbf{A}\mathbf{\Gamma}_1\mathbf{A}^\top\mathbf{A}\mathbf{\Gamma}_2\mathbf{A}^\top=\mathbf{C} \end{equation} where $\mathbf{\Gamma}_1$ and $\mathbf{\Gamma}_2$ are diagonal and $\mathbf{A}$ is nonsquare. I would like to solve for the product $\mathbf{\Gamma}_1\mathbf{\Gamma}_2$ by minimizing an expression \begin{equation} \min_\mathbf{X} ~ \left\| \mathbf{E} \mathbf{x} - \mathbf{c} \right\|^2 \end{equation} where $\mathbf{x}=\text{vec}\left(\mathbf{\Gamma}_1\mathbf{\Gamma}_2\right)$ and $\mathbf{c}=\text{vec}(\mathbf{C}$). I know that for the simpler expression \begin{equation} \mathbf{A}\mathbf{\Gamma}\mathbf{A}^\top=\mathbf{C} \end{equation} it's possible to use the Kronecker product to obtain \begin{equation} (\mathbf{A}\otimes\mathbf{A})\text{vec}(\mathbf{\Gamma})=\text{vec}(\mathbf{C}) \end{equation} which does the trick in that case. Is there anything analogous that would work for my problem?

4

There are 4 best solutions below

3
On BEST ANSWER

Define the matrices $$\eqalign{ B=A^TA,\quad F=A^+C(A^+)^T,\quad X=\Gamma_1,\quad Y=\Gamma_2 \cr }$$ where $A^+$ denotes the pseudoinverse of $A$.

Let's also use a naming convention where uppercase letters are matrices and the corresponding lowercase letter is the vector formed from the main diagonal, e.g. $$\eqalign{ x = {\rm diag}(X),\quad y = {\rm diag}(Y),\quad f = {\rm diag}(F),\quad etc }$$ Solve the given equation for $(XBY)$ in a least-squares sense. $$\eqalign{ AXBYA^T &= C \quad&\implies\; XBY = F \cr }$$ Then extract the diagonals and isolate the term of interest. $$\eqalign{ {\rm diag}(XBY) &= {\rm diag}(F) \cr x\odot b\odot y &= f \cr x\odot y &= f\oslash b \;= {\rm diag}(XY) \cr \Gamma_1\Gamma_2 \;=\; XY &= {\rm diag}^{-1}(f\oslash b) \cr }$$ where $(\odot, \oslash)$ represent elementwise/Hadamard multiplication and division, and the ${\rm diag}^{-1}(v)$ operator creates a diagonal matrix from the input vector.

NB:  The formula to rewrite $\,{\rm diag}\big(XBY\big)\,$ as the Hadamard product of vectors does not apply to general matrices; it is only valid when $X$ and $Y$ are both diagonal matrices.

Because $B=A^TA$, none of the elements on the diagonal (i.e. the vector $b$) will be zero unless $A$ contains an entire column of zeros. Therefore it should be safe to perform the indicated Hadamard division $(f\oslash b)$.

3
On

You're trying to solve for the product $\mathbf \Gamma_1 \mathbf \Gamma_2$. This way you are assuming that the elements of $\mathbf A \mathbf \Gamma_1 \mathbf A^T \mathbf A \mathbf \Gamma_2 \mathbf A^T$ depend only on $\mathbf \Gamma_1 \mathbf \Gamma_2$. Since both matrices are diagonal, say $\mathbf \Gamma_n = {\rm diag}\{\gamma_{n,i}\}$ for $n=1,2$, this means $\mathbf A \mathbf \Gamma_1 \mathbf A^T \mathbf A \mathbf \Gamma_2 \mathbf A^T$ would depend only on $\gamma_{1,i} \gamma_{2,i}$ and not on any of the $\gamma_{1,i} \gamma_{2,j}$ for $i \neq j$. I would claim that this assumption is in general not true.

It is (trivially) true if $\mathbf A^T \mathbf A =\mathbf I$, in which case your problem is trivial. Otherwise, it will not be. For instance, for $\mathbf A = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}$, we obtain $$\mathbf A \mathbf \Gamma_1 \mathbf A^T \mathbf A \mathbf \Gamma_2 \mathbf A^T = \begin{bmatrix} \gamma_{2,2} \cdot (\gamma_{1,1}+2\gamma_{1,2})+\gamma_{2,1}\cdot(\gamma_{1,1}+\gamma_{1,2}) & \gamma_{2,2} \cdot (\gamma_{1,1}+2\gamma_{1,2}) \\ \gamma_{1,2} \cdot (\gamma_{2,1}+2\gamma_{2,2}) & 2 \gamma_{1,2}\gamma_{2,2}\end{bmatrix}.$$

Therefore, it does depend on $\gamma_{1,i} \gamma_{2,j}$ for $i \neq j$ and can hence not be expressed as a function of $\mathbf \Gamma_1 \mathbf \Gamma_2$ alone.

What you could do:

  • Express each element of $\mathbf A \mathbf \Gamma_1 \mathbf A^T \mathbf A \mathbf \Gamma_2 \mathbf A^T$ as a quadratic form $\mathbf \gamma_1^T \mathbf G_{i,j} \mathbf \gamma_2$ where $\mathbf \gamma_n$ are the diagonal elements of $\mathbf \Gamma_n$ for $n=1,2$ (e.g., following ideas like equation (524) in the Matrix cookbook). Then you would need to solve a system of bilinear equations. Not trivial but doable. One way: solve for $\mathbf{\bar{\gamma}} = \mathbf \gamma_1 \otimes \mathbf \gamma_2$, enforce the Kronecker structure via constraints (essentially, a rank-one constraint). Non-convex problem but semidefinite relaxation may work.
  • Express the problem as $\mathbf M(\mathbf \gamma_1) \cdot \mathbf \gamma_2 = \mathbf c$ and $\mathbf M(\mathbf \gamma_2) \cdot \mathbf \gamma_1 = \mathbf c$ (using the technique you already mentioned), apply alternating minimization.

By the way, when you try to solve $\mathbf A \mathbf \Gamma \mathbf A^T = \mathbf C$ for a diagonal matrix $\mathbf \Gamma$, you can use the rule ${\rm vec}\{\mathbf A \mathbf \Gamma \mathbf A^T\} = (\mathbf A \diamond \mathbf A)\cdot {\rm vecd}\{\mathbf \Gamma\}$, where the operator ${\rm vecd}\{\cdot\}$ returns a vector that contains the diagonal elements of its argument and $\diamond$ represents the column-wise Kronecker (Khatri-Rao) product.

0
On

i) Let $f:(V,W)\in\Delta^2\mapsto AVA^TAWA^T-C$, where $\Delta$ is the set of $p\times p$ diagonal matrices.

$f=0$ is a system of $n^2$ equations of degree $2$ in the $2p$ unknowns $(v_i),(w_j)$.

Generically (if $A\in M_{n,p},C\in M_n$ are randomly chosen), then some computations "show" that

if $2p\leq n^2$, then there are no solutions (even when $2p=n^2$).

If $2p>n^2$, then there are an infinity of solutions (at least in $\mathbb{C})$ that depend on $2p-n^2$ parameters.

ii) When $2p\leq n^2$, we consider the least squares method. Note that, generically, we may assume that $v_1=1$ (it remains $2p-1$ unknowns).

We seek $\min_{V,W}g(V,W)$, where $U=AVA^TAWA^T$ and $g=tr((U-C)^T(U-C))$. The candidates $(V,W)$ cancel the partial derivatives of $g$, that is, for every $(H,K)\in\Delta^2$

$tr(A^T(U-C)AWA^TAH)=0,tr(A^TAVA^T(U-C)AK)=0$.

That is equivalent to: for every $i\leq p$

$(A^T(U-C)AWA^TA)_{i,i}=(A^TAVA^T(U-C)A)_{i,i}=0$, $2p$ equations of degree $3$ in $2p-1$ unknowns (note that we assume $V\not= 0$; in fact if $v_1=0$, then, generically $V=W=0$). This system is complicated; for example, when $p=3$, we obtain generically a finite number of complex solutions, more precisely, $39$ candidates; fortunately, few of them are real ($3$ in my random test); of course, if you obtain these $3$ real candidates, then easily you deduce the required minimum.

-note that this gives $3$ real solutions for the product $VW$. In other words, to find $VW$ is equivalent to find $V,W$.-

Anyway, when $p$ is large, it's difficult to obtain all the real solutions of an algebraic system of degree $>2$.

iii) On the other hand, you can directly calculate $\min g$ (note that $degree(g)=4$), using the Maple's software NLPSolve; yet, we often obtain only a local minimum...

0
On

Define

\begin{equation} \mathbf{C} = \mathbf{U}\mathbf{\Lambda}\mathbf{V}^\top \end{equation}

to be the singular value decomposition of $\mathbf{C}$, where $\mathbf{\Lambda}$ is diagonal and $\mathbf{U}$ and $\mathbf{V}$ are orthonormal matrices. Left- and right-multiply both sides of the equation by $\mathbf{U}^\top$ and $\mathbf{V}^\top$ to obtain

\begin{equation} \mathbf{U}^\top\mathbf{A}\mathbf{\Gamma}_1\mathbf{A}^\top\mathbf{A}\mathbf{\Gamma}_2\mathbf{A}^\top\mathbf{V}=\mathbf{\Lambda} \end{equation}

Apply the vecd operation to both sides and use the Khatri-Rao product to obtain

\begin{equation} (\mathbf{U}\mathbf A \diamond \mathbf{V}^\top\mathbf A) \cdot {\rm vecd}( \mathbf{\Gamma}_1\mathbf{A}^\top\mathbf{A}\mathbf{\Gamma}_2) = {\rm vecd}(\mathbf{\Lambda} ) \end{equation}

Define $\mathbf{\gamma_i}={\rm vecd}(\mathbf{\Gamma_i})$ and $\mathbf{b}={\rm vecd}(\mathbf{A}^\top\mathbf{A})$. Then, because the $\mathbf{\Gamma_i}$ are diagonal, we have

$$ \eqalign{ {\rm vecd}( \mathbf{\Gamma}_1\mathbf{A}^\top\mathbf{A}\mathbf{\Gamma}_2) &= \mathbf{\gamma_1}\odot \mathbf{b}\odot \mathbf{\gamma_2} \cr &= {\rm diag}(\mathbf{b}) \cdot (\mathbf{\gamma_1}\odot \mathbf{\gamma_2}) \cr &= {\rm diag}(\mathbf{b}) \cdot {\rm vecd}(\mathbf{\Gamma}_1\mathbf{\Gamma}_2) } $$

where diag generates a square matrix with the elements of b lying along the diagonal. Then, substituting back, we find

$$ \left[(\mathbf{U}\mathbf A \diamond \mathbf{V}^\top\mathbf A) \cdot {\rm diag}(\mathbf{b})\right]{\rm vecd}(\mathbf{\Gamma}_1\mathbf{\Gamma}_2)={\rm vecd}(\mathbf{\Lambda}). $$

This has the desired form $\mathbf{Ex}=\mathbf{c}$.

I wonder if the SVD on $\mathbf{C}$ is too restrictive. One effect is that any uncertainty on $\mathbf{C}$ that should be used in the minimization would have to be described in terms of the SVD basis.