Fast algorithm for orthogonal projection in $\mathbb{R}^m$ for large $m$

138 Views Asked by At

Let $m=2^n$, I consider an orthogonal basis of $\mathbb{R^m}$ arranged in a matrix $M \in \mathcal{M}_{m,m}$ containing only 1 and -1 that is built in the following way :

First, each column $M_{*,j}$ of $M$ is associated with a different subset $s_j$ of the set $\{ 1, \cdots, n \}$ (there are $m=2^n$ of such subsets)

Then : $M_{i,j}=\begin{cases} 1 & \text{if there are an even number of 1 in the positions $s_j$ of the binary representation of $i$ } \\ -1 & \text{otherwise} \end{cases}$

It can be shown that $M$ is orthogonal.

For instance, for $m=4$, $M=\begin{pmatrix}1 & 1 & 1 & 1 \\ 1 & 1 &-1&-1 \\ 1 & -1 &1&-1 \\ 1 & -1 &-1&1\end{pmatrix}$

Let partition the columns of $M$ into $M_1 \in \mathcal{M}_{m,p}$ and $M_2 \in \mathcal{M}_{m,m-p}$ Let $E_1$ be the span of $M_1$ and $E_2$ that of $M_2$.

The context is a gradient ascent algorithm. I have a gradient $\nabla \in \mathcal{M}_{m,1}$ and linear constraints on the domain that I want to explore. This domain is $E_1$. So I can project $\nabla$ on $E_1$ which is quite easy as $M_1$ is orthogonal. I just have to project $\nabla$ on each vector of $M_1$ : $\nabla_2=\sum_{j=1}^p<g,u_j>.\frac{u_j}{||u_j||^2}$ (where $u_j$ are the columns in $M_1$)

Anyway, I have seen that gradient ascent is much faster in my case for a simple change of parameters (non orthogonal parameters change can affect greatly gradient ascent).

Let $D^{-1} \in \mathcal{M}_{m,m}$ be a diagonal matrix that converts the old parameters into the new ones. If we transpose the problem into the new system of parameters, we have :

$\nabla'=D\nabla$

$M_1'=D^{-1}M_1$

$M_2'=DM_2$

Let $E_1'$ be the span of $M_1'$ and $E_2'$ that of $M_2'$.

It is clear that $E_1' \perp E_2'$. But $M_1'$ and $M_2'$ are no more orthogonal. So it is much more costly to project the new gradient over $M_1'$. I need to orthogonalize any of those matrices using Gram Schmidt method for instance.

Taking a look on complexity, projection on an orthogonal basis is $O(m.p)$, while Gram-Schmidt is $O(m.p^2)$.

In my case, typically $n=12$, $p=2000$ so orthogonal projection costs around 8 millions operations while Gram-Schmidt 16 billions.

My first interrogation was if there is a way to calculate the projection faster in the general case. In the comments, I was told that the problem (I guess general problem of projection without orthogonality) is analoguous to linear least square which is $O(mp^2)$.

I have seen that the projection matrix given a non orthogonal matrix of vectors $R$ that spans $Im(R)$ is $ P = R(R^tR)^{-1}R^t$. In our case where $M_1'=D^{-1}M_1$, we have :

$P = D^{-1}M(M^tD^{-2}M)^{-1}M^tD^{-1}$

So, we could need to find the inverse of $M^tD^{-2}M$. Which seems not easy.

My last hope is to find specific advantage of the particularity of my basis of constraints $M$.

It seems that there are some rearrangements in $M_1$ that might help.

In the example for $n=2$ suppose, for instance that $ M_1 = \begin{pmatrix}1 & 1 \\ 1 & 1 \\ 1 & -1 \\ 1 & -1 \end{pmatrix}$

Then, by column recombination, I can find an equivalent matrix $N_1 = \begin{pmatrix} 1 & 0 \\ 1 & 0 \\ 0 & 1 \\ 0 & 1 \end{pmatrix}$ such that $DN_1 = \begin{pmatrix} d_1 & 0 \\ d_2 & 0 \\ 0 & d_3 \\ 0 & d_4 \end{pmatrix}$ remains orthogonal.

My idea would be to use such arrangements in order to make the projection faster...

Any help welcome.