Proving the associativity of Matrix Multiplication

518 Views Asked by At

So, here's what I'm trying to prove right now.

Let $A \in M(m \times n, F)$, $B \in M(n \times s,F)$ and $C \in M(s \times p,F)$. Then:

$$A(BC) = (AB)C$$


Proof Attempt:

The linear maps corresponding to each of the matrices above are denoted as follows:

$$f:F^n \to F^m$$

$$g:F^s \to F^n$$

$$h:F^p \to F^s$$

Now, consider the following:

$[f \circ (g \circ h)](x) = f((g \circ h)(x) = f(g(h(x))) = (f \circ g)(h(x)) = [(f \circ g) \circ h](x)$

In other words, compositions of functions is associative. Since the matrix product corresponds to the composition of two linear maps, it follows that the above corresponds to the following law:

$$A(BC) = (AB)C$$

That proves the desired result.

I know how to prove the given result directly from the definition of matrix multiplication. However, my book seems to be focusing quite a lot on the relationship between function composition and matrix multiplication. In all honesty, that makes more sense in my mind so I'm trying to do the proofs of these laws with reference to that. Can someone have a look at what I've written above and see if it is correct or not? How could I improve the proof?

1

There are 1 best solutions below

3
On BEST ANSWER

You mentioned that you are aware of the linear bijection between the space of matrices, and the space of linear maps; let me be more explicit. Let $m,n,p$ be positive integers. We can define a function $\alpha: M_{m \times n}(F) \to \text{Hom}(F^n, F^m)$ by the rule: for any $A \in M_{m \times n}(F)$, we define $\alpha(A): F^n \to F^m$ to be the function defined by \begin{align} \left(\alpha(A) \right)(x) &:= A \cdot x \qquad \text{for all $x \in F^n$.} \tag{$*$} \end{align}

Now, you should verify that this function $\alpha$ is really a bijection between those spaces, and that it is linear (if you've gotten stuck, then maybe you should move on temporarily, but definitely be sure to come back to this and prove it rigorously). We can of course define a similar maps $\beta: M_{n \times p}(F) \to \text{Hom}(F^p, F^n)$ and $\xi: M_{m \times p}(F) \to \text{Hom}(F^p, F^m)$.

Now, the statement that "matrix product corresponds to function composition" means that the function $\xi$ has the following property: for all $A \in M_{m \times n}(F)$ and all $B \in M_{n \times p}(F)$, \begin{align} \xi(A \cdot B) &= \alpha(A) \circ \beta(B). \tag{$\ddot{\smile}$} \end{align}

The way to read this equation in words is"the linear map associated with the product $A \cdot B$ equals the composition of the linear map associated with $A$ and the linear map associated with $B$" (verify using the definition that $\xi$ really does have this property).


Up until now, I have intentionally introduced new letters $\alpha, \beta, \xi$ to emphasize that these are all different functions with different domains and target spaces. However, usually what happens is people may write $\alpha(A)$ simply as $L_A$, to mean "left-multiplication by $A$", and similarly write $L_B$ to mean "left-multiplication by $B$" instead of $\beta(B)$. The reason for such language is evident from ($*$), because the value of the function $L_A$ on the vector $x$ is simply obtained by multiplying the vector $x$ on the left by the matrix $A$.

With such notation, the equation $(\ddot{\smile})$ may be written in a more memorable manner: \begin{align} L_{A \cdot B} &= L_A \circ L_B \end{align}

This should make it more apparent that matrix product "really corresponds to" function composition (of the associated linear maps).

I hope you appreciate that this notation of left-multiplication is really much more convenient than the notation $\alpha, \beta, \xi$, which I have introduced so far (but of course, one needs to implicitly keep track of which $L$ refers to which).

One other thing to mention is that bijectivity of $\alpha$ in particular means it is injective so that if $\alpha(A_1) = \alpha(A_2)$ then $A_1 = A_2$. Or, if we use the $L$ notation, this is the same as saying if $L_{A_1} = L_{A_2}$ then $A_1 = A_2$. Similar statements hold for $\beta, \xi$. Now, the proof of matrix multiplication associativity becomes almost trivial, like you said \begin{align} L_{(AB)C} &= L_{AB} \circ L_C\\ & = \left(L_A \circ L_B\right) \circ L_C \\ &= L_A \circ \left(L_B \circ L_C \right) \\ &= L_A \circ L_{BC} \\ &= L_{A(BC)}. \end{align}

Hence, by my remark about injectivity earlier, this implies $(AB)C = A(BC)$.


Final Words:

  • You seemed to grasp the main idea of the proof. All I did was try to make super explicit what the "correspondence" between matrix multiplication and function composition is, so that the logic of the proof is airtight.
  • Note that while the actual proof of associativity above is definitely shorter than the proof using explicit matrix multiplication, this proof requires some "background machinery"; and there were several details which I left for you to verify. So, this isn't necessarily a "quicker" proof. However, it is definitely more conceptually clear, and it is worth learning, because once you understand the correspondence between matrices and linear transformations, proving new/ more complicated theorems about matrices could then become a very quick proof if you rephrase it using linear maps (and sometimes, proving facts about linear maps becomes much more obvious once you consider the associated matrix).