Why, intuitively, is the order reversed when taking the transpose of the product?

21.2k Views Asked by At

It is well known that for invertible matrices $A,B$ of the same size we have $$(AB)^{-1}=B^{-1}A^{-1} $$ and a nice way for me to remember this is the following sentence:

The opposite of putting on socks and shoes is taking the shoes off, followed by taking the socks off.

Now, a similar law holds for the transpose, namely:

$$(AB)^T=B^TA^T $$

for matrices $A,B$ such that the product $AB$ is defined. My question is: is there any intuitive reason as to why the order of the factors is reversed in this case?

[Note that I'm aware of several proofs of this equality, and a proof is not what I'm after]

Thank you!

9

There are 9 best solutions below

12
On BEST ANSWER

One of my best college math professor always said:

Make a drawing first.

enter image description here

Although, he couldn't have made this one on the blackboard.

6
On

By dualizing $AB: V_1\stackrel{B}{\longrightarrow} V_2\stackrel{A}{\longrightarrow}V_3$, we have $(AB)^T: V_3^*\stackrel{A^T}{\longrightarrow}V_2^*\stackrel{B^T}{\longrightarrow}V_1^*$.

Edit: $V^*$ is the dual space $\text{Hom}(V, \mathbb{F})$, the vector space of linear transformations from $V$ to its ground field, and if $A: V_1\to V_2$ is a linear transformation, then $A^T: V_2^*\to V_1^*$ is its dual defined by $A^T(f)=f\circ A$. By abuse of notation, if $A$ is the matrix representation with respect to bases $\mathcal{B}_1$ of $V_1$ and $\mathcal{B}_2$ of $V_2$, then $A^T$ is the matrix representation of the dual map with respect to the dual bases $\mathcal{B}_1^*$ and $\mathcal{B}_2^*$.

1
On

Here's another argument. First note that if $v$ is a column vector then $(Mv)^T = v^T M^T$. This is not hard to see - if you write down an example and do it both ways, you will see you are just doing the same computation with a different notation. Multiplying the column vector $v$ on the right by the rows of $M$ is the same as multiplying the row vector $v^T$ on the left by the columns of $M^T$.

Now let $( \cdot , \cdot )$ be the usual inner product on $\mathbb{R}^n$, that is, the dot product. Then the transpose $N = M^T$ of a matrix $M$ is the unique matrix $N$ with the property

$$(Mu, v) = (u, Nv).$$

This is just a consequence of associativity of matrix multiplication. The dot product of vectors $u,v$ is given by thinking of $u,v$ as column vectors, taking the transpose of one and doing the dot product: $(u,v) = u^T v$.

Then $(Mu,v) = (Mu)^T v = (u^T M^T) v = u^T (M^Tv) = (u, M^Tv)$.

Exercise: Show uniqueness!

With this alternate definition we can give a shoes-and-socks argument. We have

$$( ABu, v) = (Bu, A^Tv) = (u, B^TA^Tv)$$

for all $u,v$, and so $(AB)^T = B^T A^T$. The argument is exactly the same as the one for inverses, except we are "moving across the inner product" instead of "undoing".

0
On

(Short form of Jair Taylor's answer)

In the expresson $v^tA B w$, vectors $v$ and $w$ "see" $A$ and $B$ from different ends, hence in different order.

2
On

the intuitive reason is that the entries of a product matrix are feynman path integrals, and transposing the matrixes corresponds simply to reversing the arrow of time for traveling along the paths.

(so it's practically the same idea as in your shoes-and-socks example: matrix transposition is about time-reversal, just like function inversion is about time-reversal.)

the (i,k)th entry in a product matrix ab is the sum over j of a(i,j).b(j,k). in other words, it's a sum over all "2-step paths" (i,j,k) from i to k, each path visiting one intermediate point j on its way from i to k.

this sum over paths is called a "feynman path integral". if you read feynman's original paper on the subject, focusing on the parts that are easy to understand, you'll see that that was feynman's basic message: that whenever you have a big long string of matrixes to multiply, each entry in the product matrix is a "sum over paths" aka "path integral", with the contribution of each particular path being a long product of "transition quantities", each associated with one transition-step along the path.

this "path" interpretation of matrix multiplication actually gets more intuitive for longer strings of matrixes, because then each path consists of many steps. for example each entry of a matrix product abc...z is a sum over 26-step paths; each path visits 27 points but with just 26 transition-steps from one point to the next.

0
On

A matrix is a collection of entries that may be represented with 2 indices. When we multiply two matrices, each resultant entry is the sum of the products

$$C_{ik} = \sum_j A_{ij} B_{jk} $$

Crucially, the 'middle' index, $j$, must be the same for both matrices (the first must be as wide as the second is tall).

A transpose is just a reversal of indices:

$$A_{ij}^T = A_{ji}$$

It should now go without saying that

$$C_{ik}^T = C_{ki} = (\sum_j A_{ij} B_{jk})^T = \sum_j B_{kj} A_{ji}$$

Memory shortcut: multiplication fails immediately for non-square matrices when you forget to commute for a transpose.

3
On

Each element of the matrix $AB$ is the inner product of a row of $A$ with a column of $B$.

$(AB)^T$ has the same elements that $AB$ does (just in different places), so its elements too must each come from a row of $A$ and a column of $B$.

However if we want to start with $A^T$ and $B^T$, then a row of $A$ is the same thing as a column of $A^T$ (and vice versa for $B$ and $B^T$), so we need something that has columns of $A^T$ and rows of $B^T$. The matrix that we take columns from is always the right factor, to $A^T$ must be the right factor in the multiplication.

Similarly, $B^T$ must be the left factor because we need its rows (which are columns of the original $B$).

0
On

[this is an attempt to combine two previously given answers, mdup's video demo and my "path-sum" story, so it might help to refer to those.]

after watching mdup's video demo i started wondering how it relates to the "path-sum" interpretation of matrix multiplication. the key seems to be that mdup's hand-drawn picture of the matrix product AB wants to be folded up to form the visible faces of an oblong box whose three dimensions correspond precisely to the points i, j, and k in a three-point path (i,j,k). this is illustrated by the pairs of pictures below, each pair showing the oblong box first in its folded-up 3-dimensional form and then in its flattened-out 2-dimensional form. in each case the box is held up to a mirror to portray the effect of transposition of matrixes.

in the first pair of pictures, the i, j, and k axises are marked, and in the folded-up 3-dimensional form you can see how transposition reverses the order of the axises from i,j,k to k,j,i. in the flattened-out 2-dimensional form you can see how it wants to be folded up because the edges marked j are all the same length (and also, because it was folded up like that when i bought the soap).

test
(source: ucr.edu)

test
(source: ucr.edu)

the second pair of pictures indicate how an entry of the product matrix is calculated. in the flattened-out 2-dimensional form, a row of the first matrix is paired with a column of the second matrix, whereas in the folded-up 3-dimensional form, that "row" and that "column" actually lie parallel to each other because of the 3d arrangement.

test
(source: ucr.edu)

test
(source: ucr.edu)

in other words, each 3-point path (i,j,k) corresponds to a location inside the box, and at that location you write down (using a 3-dimensional printer or else just writing on the air) the product of the transition-quantities for the two transition-steps in the path, A_[i,j] for the transition-step from i to j and B_[j,k] for the transition-step from j to k. this results in a 3-dimensional matrix of numbers written on the air inside the box, but since the desired matrix product AB is only a 2-dimensional matrix, the 3-dimensional matrix is squashed down to 2-dimensional by summing over the j dimension. this is the path-sum- in order for two paths to contribute to the same path-sum they're required to be in direct competition with each other, beginning at the same origin i and ending at the same destination k, so the only index that we sum over is the intermediate index j.

the 3-dimensional folded-up form and the 2-dimensional flattened-out form have each their own advantages and disadvantages. the 3-dimensional folded-up form brings out the path-sums and the 3-dimensional nature of matrix multiplication, while the 2-dimensional flattened-out form is better-adapted to writing the calculation down on 2-dimensional paper (which remains easier than writing on 3-dimensional air even still today).

anyway, i'll get off my soapbox for now ...

0
On

Considering the dimensions of the various matrices shows that reversing the order is necessary.

If A is $m \times p$ and B is $p \times n$,

AB is $m \times n$,

(AB)$^T$ is $n \times m$

A$^T$ is $p \times m$ and B$^T$ is $n \times p$

Thus B$^T$A$^T$ has the same dimension as(AB)$^T$