As context I will present the wavelet lifting scheme. In matrix terms it is a factorization of convolution that splits into two parts:
- Predict ($P$) step.
- Update ($U$) step.
$$S=\begin{bmatrix}I&0\\U&I\end{bmatrix}\begin{bmatrix}I&-P\\0&I\end{bmatrix}$$
Note the location of $I$ and $0$ block matrices, this means in practice that we have a read-only part of the vector and a write-only part of the vector.
This means the matrix multiplications can be done in complete parallel and in-place. Which is very nice for memory and modern CPU with many cores.
The whole transform (matrix multiplication) is then a concatenation of such "steps":
$$S = S_N\cdots S_2 S_1$$
with possibly unique $U_1,\cdots,U_N$ and $P_1,\cdots,P_N$:
$$S_k=\begin{bmatrix}I&0\\U_k&I\end{bmatrix}\begin{bmatrix}I&-P_k\\0&I\end{bmatrix}$$
Now to question, can we find an algorithm that would be able to find such a factorization not only for wavelet transforms but for general matrices?