Suppose one has a processor for QR decomposition of complex matrix of size 4 x 4. So if it is necessary to decompose M x M complex matrix, A, one can represent it as R x R block matrix [Cij] (block size is 4 x 4 to match given processor dimension) and do 4 x 4 decomposition N times to achieve A = QR. The support operationes between steps are skiped for simplicity.
So the question is: how N depends on R?
My assumption was N = R3. This result is right for block matrix multiplication (according to Golub, Van Loan). The reason I think so QR and Multiplication are both level 3 algorithms, so complexity is of O(N3).
Thanks.