I'm working with binary matrices. Let's assume that I have an algorithm that is very efficient in transposing 8×8 or 8×16 matrices, but I would like to transpose matrices with an arbitrary size.
After some thinking and scribbling, it seems that it's possible to break down the transposition of a big matrix to transposing several smaller matrices, but I can't prove that it's always possible with arbitrary sizes.