A confusion in the proof of $(AB)^T = (B^T A^T)$

2.5k Views Asked by At

Let $A = (a_{i,j})_{n\times n}$ and $B = (b_{i,j})_{n\times n}$

$(AB) = (c_{i,j})_{n\times n}$, where $c_{i,j} = \sum_{k=1}^n a_{i,k} b_{k,j}$, so

$(AB)^T = (c_{j,i})$, where $c_{j,i} = \sum_{k=1}^n a_{j,k}b_{k,i} $, and $B^T = b_{j,i}$ and $A^T = a_{j,i}$, so

$B^T A^T = d_{j,i}$ where $d_{j,i} = \sum_{k=1}^n b_{j,k} a_{k,i}$, but this mean that $(AB)^T \not = (B^T A^T)$, so where is the problem in this derivation ?

Edit: To be clear, lets be more precise; Let $A = (a_{x,y})_{p\times n}$ and $B = (b_{z,t})_{n\times q}$

So, $A^T_{n\times p} = (a_{y,x})$ and $B^T_{q\times n} = (b_{t,z})$, which implies

$$(B^T A^T)_{i,j}^{q \times p} = \sum_{k=1}^n b_{i,k} a_{k,j},$$ and

$(AB)_{c,d}^{p\times q} = \sum_{k=1}^n a_{c,k} b_{k,d}$, which implies $$((AB)^T)_{d,c}^{q\times p} = \sum_{k=1}^n = a_{d,k} b_{k,c}.$$ Since $i,d \in \{1,...,q\}$ and $j,c \in \{1,...,p\}$, $$((AB)^T)_{d,c}^{q\times p} = \sum_{k=1}^n = a_{d,k} b_{k,c} = = \sum_{k=1}^n = a_{i,k} b_{k,j},$$ which again concludes that $(AB)^T \not = (B^T A^T)$.

4

There are 4 best solutions below

0
On BEST ANSWER

You seem to know that $(i,j)$-entry of $B^T$ is $b_{j,i}$, that is probably why you are writing $B^T = (b_{j,i})$. The issue is, this notation is confusing as it is not telling you which index denotes the row and which denotes the column. I guess this is where you get confused.

To make things clear, let us use some unconventional notation: A matrix $A$ whose $(i,j)$-entry is $a_{i,j}$ is denoted by the following function notation

$$A = [(i,j)\mapsto a_{i,j}].$$

So if you know that $A$ is given by $A = [\text{some function of the pair }(i, j)]$, then you simply evaluate that function at $(i,j)$ to retrieve its $(i,j)$-entry. This seemingly stupid tautology is in fact helping because the transpose of $A$ is written by $A^T = [(i,j) \mapsto a_{j,i}]$, where the role of $i$ and $j$ are now explicit. Then

\begin{align*} \text{$(i,j)$-entry of $B^TA^T$} &= \sum_{k=1}^{n} (\text{$(i,k)$-entry of $B^T$})\cdot(\text{$(k,j)$-entry of $A^T$}) \\ &= \sum_{k=1}^{n} (\text{value of $(x, y)\mapsto b_{y,x}$ at $(x,y) = (i,k)$}) \\ &\hspace{3em} \cdot(\text{value of $(z, t)\mapsto a_{t,z}$ at $(z,t) = (k,j)$}) \\ &= \sum_{k=1}^{n} b_{k,i} a_{j,k} = \sum_{k=1}^{n} a_{j,k}b_{k,i} = \text{$(j,i)$-entry of $AB$}. \end{align*}


If it is still not convincing, it never hurts to consider a concrete example. Consider

$$ A = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \end{pmatrix}, \qquad B = \begin{pmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \\ b_{31} & b_{32} \end{pmatrix} $$

Then $[AB]_{11} = a_{11}b_{11} + a_{12}b_{21} + a_{13}b_{31}$ as expected. Now consider their transpose:

$$ A^{T} = \begin{pmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \\ a_{13} & a_{23} \end{pmatrix}, \qquad B^{T} = \begin{pmatrix} b_{11} & b_{21} & b_{31} \\ b_{12} & b_{22} & b_{32} \end{pmatrix}. $$

Now the $(1,2)$-entry of the product $B^T A^T$ is given by

\begin{align*} [B^T A^T]_{12} &= [B^T]_{11}[A^T]_{12} + [B^T]_{12}[A^T]_{22} + [B^T]_{13}[A^T]_{32} \\ &= b_{11}a_{21} + b_{21}a_{22} + b_{31}a_{23} \\ &= a_{21}b_{11} + a_{22}b_{21} + a_{23}b_{31} \\ &= [AB]_{21} \end{align*}

13
On

OK, your notations confusing. It's very hard to understand what you mean by $(c_{j,i})$, so let's start over.


Let's call $AB = C$, and let's call $B^TA^T=D$. What we want to prove is $C^T=D$.

First of all, let's say that the element in $C$'s $i$-th row and $j$-th column is $c_{ij}$. Then you know that $$c_{ij} = \sum_{i=1}^n a_{ik}b_{kj}.$$

Now, let's say that the element in $D$'s $i$-th row and $j$-th column is $d_{ij}$. Then we know that $$d_{ij} = \sum_{k=1}^n b'_{ik}a'_{kj}$$

Where $a'_{ik}$ is the element of $A^T$ and $b'_{kj}$ of $B^T$.

Next, since that's what transposition is, we know that $a'_{ik} = a_{ki}$ which means that $$d_{ij} = \sum_{k=1}^n b_{ki}a_{jk}$$


Now you have to show that $d_{ij} = c_{ji}$ for all $i,j$. Since you know what $c_{ij}$ is equal to, you know that $c_{ji}$ is equal (if you replace every $i$ with $j$ and every $j$ with $i$) to

$$c_{ji} = \sum_{i=1}^n a_{jk}b_{ki}$$ which is exactly the same as $d_{ij}$ and you are done.


If you don't like the fact that we switch $i$ and $j$, you can also introduce new variables to make it more clear:

We want to prove that $C^T=D$ which means we want to prove that $c_{qp} = d_{pq}$ for all $q,p$ (I am using different indices to avoid confusion). Since we know what $c_{ij}$ is for any value $i,j$, we can now substitute $i=p$ and $j=q$ to get

$$c_{qp} = \sum_{k=1}^n a_{qk} b_{kp}$$

and

$$d_{pq} = \sum_{k=1}^n b_{kp} a_{qk}.$$

Now it's easy to see that $c_{qp}=d_{pq}$.


AFTER EDIT:


After your edit, the mistake comes in the line

$$(B^T A^T)_{i,j}^{q \times p} = \sum_{k=1}^n b_{i,k} a_{k,j}$$

because that is not true. In fact, $$\sum_{k=1}^n b_{i,k} a_{k,j}$$ is equal to $$(BA)_{i,j}.$$

Since you want $(B^TA^T)_{i,j}$, you want

$$(B^T A^T)_{i,j}^{q \times p} = \sum_{k=1}^n b'_{i,k} a'_{k,j}$$

where $b'_{i,k} = (B^T)_{i,k}$ and $a'_{i,k} = (A^T)_{k,j}$.

Then, you use the fact that $$b'_{i,k} = b_{k,i}$$ to get the correct result, which is

$$(B^T A^T)_{i,j}^{q \times p} = \sum_{k=1}^n b_{k,i} a_{j,k}$$

4
On

Why are you making things so difficult?

Let $A$ be a $m \times n$ matrix, $B$ be a $n \times p$ matrix.

$$(B^TA^T)_{ij} = \sum_{k=1}^{n}(B^{T})_{ik}(A^T)_{kj}$$

$$= \sum_{k=1}^{n}B_{ki}A_{jk}$$ $$= \sum_{k=1}^{n}A_{jk}B_{ki}$$ $$=(AB)_{ji}$$ $$=((AB)^{T})_{ij}$$

Therefore we conclude

$$(AB)^T = B^T A^T$$

0
On

In the world of real vector spaces, one can define $A^T$ to be the adjoint of $A$ with respect to the Euclidean inner product $\langle \cdot,\cdot \rangle$ (this adjoint is unique). More precisely, $A^T$ is the unique linear mapping so that $$\langle Ax,y\rangle = \langle x,A^Ty\rangle \qquad \forall x,y$$ Similarly, $B^T$ and $(AB)^T$ are the unique linear mappings so that $$\langle Bx,y\rangle = \langle x,B^Ty\rangle \qquad \forall x,y$$ and $$\langle (AB)x,y\rangle = \langle x,(AB)^Ty\rangle \qquad \forall x,y.$$ Now, note that $$ \langle (AB)x,y\rangle=\langle A(Bx),y\rangle=\langle Bx,A^Ty\rangle=\langle x,B^TA^Ty\rangle=\langle x,(B^TA^T)y\rangle$$ By uniqueness of the adjoint, we directly obtain $(AB)^T=B^TA^T$. Note that the advantage of this proof is that it holds for the adjoint operator of matrices defined with respect to any inner product. Doing the same for complex vector spaces, we obtain that $(AB)^*=B^*A^*$ where $A^*$ is the Hermitian conjugate of $A$.

Note: To obtain uniqueness of the adjoint, simply plug in all the pairs of vectors taken from an orthogonal basis for $x,y$ in the above equations.