I'm trying to solve problem 4.2-6 in the fourth edition of CLRS:
Suppose that you have a $\Theta(n^\alpha)$-time algorithm for squaring $n \times n$ matrices, where $\alpha \geq 2$. Show how to use that algorithm to multiply two different $n \times n$ matrices in $\Theta(n^\alpha)$ time.
I think that the authors might have forgotten the assumption that $AB=BA$, which enables the identity $$A B = \frac{(A+B)^2-(A-B)^2}{2}.$$
Without the assumption, one elementary call (with no multiplications allowed) to SQUARE as a subroutine gives us a matrix of the form
$$(c I+aA+bB)^2=c^2 I+caA+cbB+caA+a^2A+abAB+cbB+abBA+b^2B^2 $$
and no matter what linear combination of these I try to form
$$\sum_i \alpha_i (c_i I+a_iA+b_iB)^2$$
I find that $AB$ and $BA$ have the same coefficient, and are thus indistinguishable.
Is it possible to reduce matrix multiplication to matrix squaring, in the non-commutative case? Or is that assumption necessary?
Thanks!