Intuition behind Matrix Multiplication

74.6k Views Asked by At

If I multiply two numbers, say $3$ and $5$, I know it means add $3$ to itself $5$ times or add $5$ to itself $3$ times.

But If I multiply two matrices, what does it mean ? I mean I can't think it in terms of repetitive addition.

What is the intuitive way of thinking about multiplication of matrices?

8

There are 8 best solutions below

5
On BEST ANSWER

Matrix ¨multiplication¨ is the composition of two linear functions. The composition of two linear functions is a linear function.

If a linear function is represented by A and another by B then AB is their composition. BA is the their reverse composition.

Thats one way of thinking of it. It explains why matrix multiplication is the way it is instead of piecewise multiplication.

0
On

The short answer is that a matrix corresponds to a linear transformation. To multiply two matrices is the same thing as composing the corresponding linear transformations (or linear maps).

The following is covered in a text on linear algebra (such as Hoffman-Kunze):

This makes most sense in the context of vector spaces over a field. You can talk about vector spaces and (linear) maps between them without ever mentioning a basis. When you pick a basis, you can write the elements of your vector space as a sum of basis elements with coefficients in your base field (that is, you get explicit coordinates for your vectors in terms of for instance real numbers). If you want to compute something, you typically pick bases for your vector spaces. Then you can represent your linear map as a matrix with respect to the given bases, with entries in your base field (see e.g. the above mentioned book for details as to how). We define matrix multiplication such that matrix multiplication corresponds to composition of the linear maps.

Added (Details on the presentation of a linear map by a matrix). Let $V$ and $W$ be two vector spaces with ordered bases $e_1,\dots,e_n$ and $f_1,\dots,f_m$ respectively, and $L:V\to W$ a linear map.

First note that since the $e_j$ generate $V$ and $L$ is linear, $L$ is completely determined by the images of the $e_j$ in $W$, that is, $L(e_j)$. Explicitly, note that by the definition of a basis any $v\in V$ has a unique expression of the form $a_1e_1+\cdots+a_ne_n$, and $L$ applied to this pans out as $a_1L(e_1)+\cdots+a_nL(e_n)$.

Now, since $L(e_j)$ is in $W$ it has a unique expression of the form $b_1f_1+\dots+b_mf_m$, and it is clear that the value of $e_j$ under $L$ is uniquely determined by $b_1,\dots,b_m$, the coefficients of $L(e_j)$ with respect to the given ordered basis for $W$. In order to keep track of which $L(e_j)$ the $b_i$ are meant to represent, we write (abusing notation for a moment) $m_{ij}=b_i$, yielding the matrix $(m_{ij})$ of $L$ with respect to the given ordered bases.

This might be enough to play around with why matrix multiplication is defined the way it is. Try for instance a single vector space $V$ with basis $e_1,\dots,e_n$, and compute the corresponding matrix of the square $L^2=L\circ L$ of a single linear transformation $L:V\to V$, or say, compute the matrix corresponding to the identity transformation $v\mapsto v$.

1
On

Apart from the interpretation as the composition of linear functions (which is, in my opinion, the most natural one), another viewpoint is on some occasions useful.

You can view them as something akin to a generalization of elementary row/column operations. If you compute $AB$, then the coefficients in $j$-th row of $A$ tell you, which linear combination of rows of B you should compute and put into the $j$-th row of the new matrix.

Similarly, you can view $AB$ as making linear combinations of columns of $A$, with the coefficients prescribed by the matrix $B$.

With this viewpoint in mind you can easily see that, if you denote by $\vec{a}_1,\dots,\vec a_k$ the rows of the matrix $A$, then the equality $$\begin{pmatrix}\vec a_1\\\vdots\\\vec a_k\end{pmatrix}B= \begin{pmatrix}\vec a_1B\\\vdots\\\vec a_kB\end{pmatrix}$$ holds. (Of course, you can obtain this equality directly from definition or by many other methods. My intention was to illustrate a situation, when familiarity this viewpoint could be useful.)

4
On

Asking why matrix multiplication isn't just componentwise multiplication is an excellent question: in fact, componentwise multiplication is in some sense the most "natural" generalization of real multiplication to matrices: it satisfies all of the axioms you would expect (associativity, commutativity, existence of identity and inverses (for matrices with no 0 entries), distributivity over addition).

The usual matrix multiplication in fact "gives up" commutativity; we all know that in general $AB \neq BA$ while for real numbers $ab = ba$. What do we gain? Invariance with respect to change of basis. If $P$ is an invertible matrix,

$$P^{-1}AP + P^{-1}BP = P^{-1}(A+B)P$$ $$(P^{-1}AP) (P^{-1}BP) = P^{-1}(AB)P$$ In other words, it doesn't matter what basis you use to represent the matrices $A$ and $B$, no matter what choice you make their sum and product is the same.

It is easy to see by trying an example that the second property does not hold for multiplication defined component-wise. This is because the inverse of a change of basis $P^{-1}$ no longer corresponds to the multiplicative inverse of $P$.

0
On

You shouldn't try to think in terms of scalars and try to fit matrices into this way of thinking. It's exactly like with real and complex numbers. It's difficult to have an intuition about complex operations if you try to think in terms of real operations.

scalars are a special case of matrices, as real numbers are a special case of complex numbers.

So you need to look at it from the other, more abstract side. If you think about real operations in terms of complex operations, they make complete sense (they are a simple case of the complex operations).

And the same is true for Matrices and scalars. Think in terms of matrix operations and you will see that the scalar operations are a simple (special) case of the corresponding matrix operations.

0
On

One way to try to "understand" it would be to think of two factors in the product of two numbers as two different entities: one is multiplying and the other is being multiplied. For example, in $5\cdot4$, $5$ is the one that is multiplying $4$, and $4$ is being multiplied by $5$. You can think in terms of repetitive addition that you are adding $4$ to itself $5$ times: You are doing something to $4$ to get another number, that something is characterized by the number $5$. We forbid the interpretation that you are adding $5$ to itself $4$ times here, because $5\cdot$ is an action that is related to the number $5$, it is not a number. Now, what happens if you multiply $4$ by $5$, and then multiply the result by $3$? In other words,

$3\cdot(5\cdot4)=?$

or more generally,

$3\cdot(5\cdot x)=?$

when we want to vary the number being multiplied. Think of for example, we need to multiply a lot of different numbers $x$ by $5$ first and then by $3$, and we want to somehow simplify this process, or to find a way to compute this operation fast. Without going into the details, we all know the above is equal to

$3\cdot(5\cdot x)=(3\times 5)\cdot x$,

where $3\times 5$ is just the ordinary product of two numbers, but I used different notation to emphasize that this time we are multiplying "multipliers" together, meaning that the result of this operation is an entity that multiplies other numbers, in contrast to the result of $5\cdot4$, which is an entity that gets multiplied. Note that in some sense, in $3\times5$ the numbers $5$ and $3$ participates on an equal ground, while in $5\cdot4$ the numbers have differing roles. For the operation $\times$ no interpretation is available in terms of repetitive addition, because we have two actions, not an action and a number. In linear algebra, the entities that gets multiplied are vectors, the "multiplier" objects are matrices, the operation $\cdot$ generalizes to the matrix-vector product, and the operation $\times$ extends to the product between matrices.

0
On

To get away from how these two kinds of multiplications are implemented (repeated addition for numbers vs. row/column dot product for matrices) and how they behave symbolically/algebraically (associativity, distribute over 'addition', have annihilators, etc), I'd like to answer in the manner 'what are they good for?'

Multiplication of numbers gives area of a rectangle with sides given by the multiplicands (or number of total points if thinking discretely).

Multiplication of matrices (since it involves quite a few more actual numbers than just simple multiplication) is understandably quite a bit more complex. Though the composition of linear functions (as mentioned elsewhere) is the essence of it, that's not the most intuitive description of it (to someone without the abstract algebraic experience). A more visual intuition is that one matrix, multiplying with another, results in the transformation of a set of points (the columns of the right-hand matrix) into new set of points (the columns of the resulting matrix). That is, take a set of $n$ points in $n$ dimensions, put them as columns in an $n\times n$ matrix; if you multiply this from the left by another $n \times n$ matrix, you'll transform that 'cloud' of points to a new cloud.

This transformation isn't some random thing; it might rotate the cloud, it might expand the cloud, it won't 'translate' the cloud, it might collapse the cloud into a line (a lower number of dimensions might be needed). But it's transforming the whole cloud all at once smoothly (near points get translated to near points).

So that is one way of getting the 'meaning' of matrix multiplication.

I have a hard time getting a good metaphor (any metaphor) between matrix multiplication and simple numerical multiplication so I won't force it - hopefully someone else might be able to come up with a better visualization that shows how they're more alike beyond the similarity of some algebraic properties.

3
On

First, Understand Vector multiplication by a scalar.

Then, think on a Matrix, multiplicated by a vector. The Matrix is a "vector of vectors".

Finally, Matrix X Matrix extends the former concept.