Calculating block diagonalization / canonical bases with linear optimization?

69 Views Asked by At

Edit Even though I have started answering my own question I am still eager to hear any feedback and new ideas. So feel free to tell me if you come to think of anything.


In Linear Algebra there are many types of similarity transformations $${\bf A} = {\bf T}^{-1}{\bf DT}$$ Where $\bf D$ is (block-)diagonal. Famous examples include Eigenvalue decompositions, Jordan Normal Form and in matrix representation theory of groups a fully reduced representation: $${\bf A}_k = {\bf T}^{-1}{{\bf D}_k{\bf T}}, \hspace{0.5cm} {\bf A}_k \hspace{0.25cm} \text{represents group element} \hspace{0.25cm} g_k$$

Where all the ${\bf D}_k$ have some pairwise same block sizes (but not same block content).

Anyway there exist many algorithms to calculate $\bf T$ and $\bf D$. I had an idea of an approach which I don't know if it has been tried, so I was thinking to ask you if you think it has already been done or some feedback if it would be practical or useful to do or not.

If we rewrite $\bf TA = DT$ or even $\bf TA -DT = 0$ then we will be getting an equation system with an equation for each resulting element of these products. It will be linear in $\bf T$ and in $\bf D$ if we consider them separately. Now we can impose constraints to encourage $\bf D$ to be sparse in a block-like way. If we add a penalty ${\bf W}\|\text{vec}({\bf D})\|_k$, where ${\bf W}_{lm} = g(|i-j|)\delta_{l-m}$ correspond to ${\bf D}_{ij}$ in $\text{vec}({\bf D})_l$, say $g(x)$ monotonically increasing and $k$ is some suitable Schatten norm.

Also note that we can replace the equality with an optimization cost: $${\bf TA-DT=0} \to \min_{{\bf D, T}}\left\{\|{\bf TA-DT}\|\right\}$$ and in the case we have two sequences of matrices: $$\{{\bf A}_1 , \cdots, {\bf A}_n \}\hspace{1cm}\{{\bf D}_1 , \cdots, {\bf D}_n\}$$we could just add them upp : $$\min_{{\bf D_l, T}}\left\{\sum_l\|{\bf TA}_l - {\bf D}_l{\bf T}\|\right\}$$

Now we can try to:

  1. Initialize $\bf T$ and $\bf D$(s) to some initial state. For example some random distribution.

Alternatingly solve:

  1. Optimization with respect to $\bf T$ given current $\bf D$(s).
  2. Optimization with respect to $\bf D$(s) given current $\bf T$.

Stop if solutions fluctuate much ( to restart at better point ) or if they converge ( to accept the solution ).

Maybe we will need to have a tiny regularizer keeping down the total norm of $\bf D$(s) and $\bf T$ when solving the respective equation. We can do this linearly for $k=2$ and for $k\neq 2$ by iterated reweighting. So if it works, we would in fact get away with solving a sequence of linear equation systems.

Could this be a useful approach or are there reasons why it will refuse to work (at all) or why it will be unfeasible to perform the calculations ( considering speed of our convergence and other things )?

1

There are 1 best solutions below

0
On BEST ANSWER

Not having found any reason for why this should not work, I could not resist trying it.

Initial investigations

Letting $\bf A = RR^T, R \in \mathcal{N}(\mathbb{R}^{n\times n})$ will guarantee $\bf A$ to be diagonalizable with $\lambda_k({\bf A})\in \mathbb{R}^+$.

We use the vectorization formula with Kronecker products to turn the matrices into vectors.

Furthermore, let $$\text{unvec}(\bf W) = 1_n{1_n}^T-(1-\epsilon) I_n$$

specify our weight-matrix. In other words off diagonals in $\bf D$ cost $1$ and diagonals $\epsilon$.

Please note that no regularization is done for the T phase. Could explain jumpyness in the convergence.

In the plots below : $n=4$, $\epsilon = 10^{-3}$ The level of the "floor" we are hitting seems to be a function of $\epsilon$.

enter image description here enter image description here


Setting ${\bf A} = {\bf R}, {\bf R} \in \mathcal{N}(\mathbb{R}^{n\times n})$ also works, and we do get results converging to the same eigenvectors and eigenvalues as octave "eig" gives, but only if we initiate D and T with non-zero imaginary part. This is not strange as real numbers are closed under addition and multiplication and a sequence of those operations (and their inverses) are the only ones which the iterative method uses!