Find the best $X$ to minimize the diagonal entries of $A^TXA - D$, allowing arbitrarily large off-diagonals

129 Views Asked by At

Given an (arbitrary) matrix $A \in R^{m \times n}$ and a diagonal matrix $D \in R^{n \times n}$, I want to find a matrix $X \in R^{m \times m}$ such that the diagonal entries of $A^TXA$ "best approximate" $D$. For now I'm saying that means minimizing the sum of squares of the diagonal entries of $A^TXA - D$, but I'm flexible about altering the definition of "best approximates" if it helps find beautiful solutions. I explicitly do not care about the off-diagonals, and want the smallest possible diagonal values of $A^TXA - D$ at the expense of arbitrarily large off-diagonals.

There's no guarantees about the relationship between $m$ and $n$. In some cases $m < n$, so $X$ is in effect part of a low-rank factorization of $D$. The trouble with the obvious $X = A^{\dagger T}DA^{\dagger}$ is that using psuedo-inverses effectively forces $X$ to try to match the off-diagonal 0's of $D$, which I explicitly do not want, because it reduces the accuracy of the diagonals, which are the only values I care about.

What's the best way to solve for $X$? There's obviously gradient descent as a last resort, but that's not very beautiful. It feels like there's something elegant hiding in this problem description. Is there anything clever and closed form I could do?

1

There are 1 best solutions below

4
On

I've worked on this all morning, and found an analytical solution! I gather from my years of lurking that it's bad form to answer your own question, but I figured I'd post it here for posterity. Let me know if I made any mistakes!

Doing this optimization for just the diagonals in closed form requires a bit of gymnastics, but is possible. We'll use the notation that $a_i$ corresponds to the $i$'th column of $A$, $x_i$ to the $i$'th column of $X$, and $a_{ij}$ is the $j$'th entry of the $a_{i}$ vector. Similarly, $D_{ii}$ corresponds to the $i$'th diagonal entry, and $d$ is the vector of just diagonal entries of $D$.

Let's begin by rewriting the optimization objective directly:

$$\min \sum_i \left( (A^T X A)_{ii} - D_{ii} \right)^2$$

Note that:

$$(A^T X A)_{ii} = a_i^T X a_i = \sum_j \underbrace{a_{ij}}_{\text{scalar}}(a_i^Tx_j) = \sum_j {\underbrace{(a_{ij} a_i)}_{\text{vector}}}^Tx_j$$

So it becomes clear that we could construct a long vector $q \in \mathcal{R}^{m^2}$, which will map to every column of $X$ placed end to end. We can also construct a matrix $W \in \mathcal{R}^{n \times m^2}$ where every column $w_i$ is the vectors $a_{ij}a_i$ placed end to end for each $a_i$. Then we have:

$$a_i^T X a_i = w_i^T q$$

Now if we take the diagonals of $D_{ii}$ as entries of a vector $d \in \mathcal{R}^n$, we can write our optimization problem as a linear equation:

$$\min \sum_i ((A^T X A)_{ii} - D_{ii})^2 = \text{min} \,\, \| W^Tq - d \|_2^2$$

This is a standard least squares problem, and it solved when:

$$q = W^{\dagger T} h$$

Once we have a value of $q$, we can reconstruct the original matrix $X$ by taking each column of $X$ the appropriate segment of $q$.