How to compress a linear operator and have the lossless composition property.

140 Views Asked by At

Consider a linear operator on $\mathbf{R}^n$ represented by a square matrix of size $n \times n$, call it $A$. The matrix acts on a row vector, call it $x$ and returns a row vector, call it $x'$, so we have $x' = xA$. Now we know that this system mainly works on vectors which can be represented on a lower-dimensional subspace, i.e. we are given a matrix $\Phi$ of size $n \times m$, where $m < n$, whose rows represent a lower dimensional representation of the components of the $x$es. For example, given an $x$, its lower dimensional representation $\phi(x)$ is given by $\phi(x) = x \Phi$.

The task is to compress the matrix $A$, i.e. construct a matrix $\pi(A)$ of size $m \times m$ which only works on the lower dimensional representations. It should have the property that $\phi(xA)$ should approximate $\phi(x) \pi(A)$. It means that $xA\Phi$ should approximate $\phi(x) \pi(A)$. If we consider $x$es from the canonical basis, it means that the matrix $IA\Phi = A\Phi$ should approximate $\Phi \pi(A)$. We can now state this as an optimization problem in the following way.

$\min_{\pi(A)} \| A\Phi - \Phi \pi(A) \|_\Xi$

In the above, the matrix norm $\| X \|_\Xi$ is defined as the sum of the diagonal entries of $X \Xi X^\top$, where $\Xi$ is a known weighing vector.

If this was the end of the problem, it could easily be obtained that the optimum value of $\pi(A)$ is $\pi(A) = (\Phi^\top\Xi\Phi)^{-1}\Phi^\top\Xi A\Phi$, which corresponds to standard weighted least squares projection.

However, there is an additional constraint. Consider many such operators: $A_1$, $A_2$ and so on and we want to compress all of them. We require the property that the compression of the composition of the operators is the composition of the compressed operators, i.e. for any $A_1$ and $A_2$ we require $\pi(A_1A_2) = \pi(A_1)\pi(A_2)$.

The question is how to obtain $\pi(A)$ given this constraint. If an optimal solution is hard to find, I would also be quite happy with a family of compression operators $\pi$ which meet the constraint and for which the optimized value is reasonably small (not necessarily minimal).