If $AB = I$ then $BA = I$

140.7k Views Asked by At

If $A$ and $B$ are square matrices such that $AB = I$, where $I$ is the identity matrix, show that $BA = I$.

I do not understand anything more than the following.

  1. Elementary row operations.
  2. Linear dependence.
  3. Row reduced forms and their relations with the original matrix.

If the entries of the matrix are not from a mathematical structure which supports commutativity, what can we say about this problem?

P.S.: Please avoid using the transpose and/or inverse of a matrix.

17

There are 17 best solutions below

9
On BEST ANSWER

Dilawar says in 2. that he knows linear dependence! So I will give a proof, similar to that of TheMachineCharmer, which uses linear independence.

Suppose each matrix is $n$ by $n$. We consider our matrices to all be acting on some $n$-dimensional vector space with a chosen basis (hence isomorphism between linear transformations and $n$ by $n$ matrices).

Then $AB$ has range equal to the full space, since $AB=I$. Thus the range of $B$ must also have dimension $n$. For if it did not, then a set of $n-1$ vectors would span the range of $B$, so the range of $AB$, which is the image under $A$ of the range of $B$, would also be spanned by a set of $n-1$ vectors, hence would have dimension less than $n$.

Now note that $B=BI=B(AB)=(BA)B$. By the distributive law, $(I-BA)B=0$. Thus, since $B$ has full range, the matrix $I-BA$ gives $0$ on all vectors. But this means that it must be the $0$ matrix, so $I=BA$.

0
On

We have the following general assertion:

Lemma. Let $A$ be a finite-dimensional $K$-algebra, and $a,b \in A$. If $ab=1$, then $ba=1$.

For example, $A$ could be the algebra of $n \times n$ matrices over $K$.

Proof. The sequence of subspaces $\cdots \subseteq b^{k+1} A \subseteq b^k A \subseteq \cdots \subseteq A$ must be stationary, since $A$ is finite-dimensional. Thus there is some $k$ with $b^{k+1} A = b^k A$. So there is some $c \in A$ such that $b^k = b^{k+1} c$. Now multiply with $a^k$ on the left to get $1=bc$. Then $ba=ba1 = babc=b1c=bc=1$. $\square$

The proof also works in every left- or right-Artinian ring $A$. In particular, the statement is true in every finite ring.

Remark that we need in an essential way some finiteness condition. There is no purely algebraic manipulation with $a,b$ that shows $ab = 1 \Rightarrow ba=1$.

In fact, there is a $K$-algebra with two elements $a,b$ such that $ab=1$, but $ba \neq 1$. Consider the left shift $a : K^{\mathbb{N}} \to K^{\mathbb{N}}$, $a(x_0,x_1,\dotsc) := (x_1,x_2,\dotsc)$ and the right shift $b(x_0,x_1,\dotsc) := (0,x_0,x_1,\dotsc)$. Then $a \circ b = \mathrm{id} \neq b \circ a$ holds in the $K$-algebra $\mathrm{End}_K(K^{\mathbb{N}})$.

See SE/298791 for a proof of $AB=1 \Rightarrow BA=1$ for square matrices over a commutative ring.

5
On

Since there seem to be some lingering beliefs that an approach which does not make explicit use of the finite-dimensionality could be valid, here is a familiar counterexample in the infinite dimensional case.

Let $V = \mathbb{R}[t]$ be the vector space of real polynomial functions. Let $B: V \rightarrow V$ be differentiation: $p \mapsto p'$, and let $A: V \rightarrow V$ be anti-differentation with constant term zero: $\sum_{n=0}^{\infty} a_n t^n \mapsto \sum_{n=0}^{\infty} \frac{a_n}{n+1} t^{n+1}$.

These are both $\mathbb{R}$-linear maps and $B \circ A$ is the identity, but $A \circ B$ is not (the constant term is lost).

3
On

You might have a look at the following note "Right inverse implies left inverse and vice versa": http://www.lehigh.edu/~gi02/m242/08m242lrinv.pdf

2
On

I prefer to think in terms of linear operators rather than matrices. A function has a right inverse iff it is surjective and it has a left inverse iff it is injective. For a linear operator, this means that having a right inverse is equivalent to having range equal to the entire space and having a left inverse is equivalent to having trivial kernel. For a linear operator on a finite-dimensional space, the dimension of its kernel + the dimension of its range = the dimension of the entire space, so this does the job.

3
On

It follows by the pigeonhole principle. Here's an excerpt from my Dec 11 2007 sci.math post:

Recall (proof below) $\rm\; AB \:\:=\:\:\: I \:\;\Rightarrow\; BA \:\:=\:\: I\;\;\:$ easily reduces to:

THEOREM $\;$ $\rm\;\;B\;$ injective $\rm\;\Rightarrow\:\: B\;$ surjective, $\:$ for linear $\rm\:B\:$ on a finite dim vector space $\rm\:V$

Proof $\rm\ \ \ B\;$ injective $\rm\;\Rightarrow\ B\;$ preserves injections: $\rm\;R < S \;\Rightarrow\; BR < BS\;$
Hence for $\rm\;\;\; R \;\: < \;\; S < \cdots < \; V\;\;$ a chain of maximum length (= dim $\rm V\:$)
its image $\rm\;BR < BS < \cdots < BV \le V\;\;\:\;$ would have length greater
if $\rm\ BV < V\:,\: $ hence, instead $\rm\:\:\:\ \ BV = V\:,\;\:$ i.e. $\rm\; B \;$ is surjective. $\;$ QED

Notice how this form of proof dramatically emphasizes the essence of the matter, namely that
injective maps cannot decrease heights (here = dimension = length of max subspace chain).

Below is said standard reduction to xxxjective form. See said sci.math post for much more,
including references to folklore generalizations, e.g. work of Armendariz and Vasconcelos in the seventies.

First, notice that $\rm\;\;\ AB = I \;\Rightarrow\: B\:$ injective, since $\rm\;A\;$ times $\rm\;B\:x = B\:y\;$ yields $\rm\;x = y\:,\:$ and

$\rm\ B\ $ surjective $\rm\ \Rightarrow\ BA = I \;\;$ since for all $\rm\;x\;$ exists $\rm\; y : \;\; x = B\:y = B(AB)\:y = BA \: x$

Combining them: $\rm\: AB = I \;\Rightarrow\: B\:$ injective $\rm\;\Rightarrow\; B\:$ surjective $\rm\;\Rightarrow\: BA = I$

7
On

Let $x_1, x_2, \dots, x_n$ be a basis of the space. At first we prove that $Bx_1, Bx_2, \dots, Bx_n$ is also a basis. To do it we need to prove that those vectors are linearly independent. Suppose it's not true. Then there exist numbers $c_1, c_2, \dots, c_n$ not all equal to zero such that $$c_1 Bx_1 + c_2 Bx_2 + \cdots + c_n B x_n = 0.$$ Multiplying it by $A$ from the left, we get $$c_1 ABx_1 + c_2 ABx_2 + \cdots + c_n ABx_n = 0,$$ hence $$c_1 x_1 + c_2 x_2 + \cdots + c_n x_n = 0$$ and so the vectors $x_1, x_2, \dots, x_n$ are also linearly dependent. Here we get contradiction with assumption that the vectors $x_i$ form a basis.

Since $Bx_1, Bx_2, \dots, Bx_n$ is a basis, every vector $y$ can be represented as a linear combination of those vectors. This means that for any vector $y$ there exists some vector $x$ such that $Bx = y$.

Now we want to prove that $BA = I$. It is the same as to prove that for any vector $y$ we have $BAy = y$. Now given any vector $y$ we can find $x$ such that $Bx = y$. Hence $$BAy = BABx = Bx = y$$ by associativity of matrix multiplication.

1
On

For a more elementary treatment ...

Fact. If the rows of $A$ are linearly dependent, then the rows of $A B$ are linearly dependent.

Proof of fact. Consider the 3x3 case, where the linearly-dependent rows of $A$ are $\mathbf{a}_1$, $\mathbf{a}_2$, $\mathbf{a}_3 = h \mathbf{a}_1 + k \mathbf{a}_2$ (for some scalars $h$ and $k$):

$$A = \begin{bmatrix}\mathbf{a}_1 \\ \mathbf{a}_2 \\ h\mathbf{a}_1 + k\mathbf{a}_2\end{bmatrix} = \begin{bmatrix}p & q & r \\ s & t & u \\ hp + ks & hq + kt & hr + ku \end{bmatrix}$$

Writing $\mathbf{b}_1$, $\mathbf{b}_2$, and $\mathbf{b}_3$ for the rows of $B$, we have

$$A B = \begin{bmatrix}p & q & r \\ s & t & u \\ hp + ks & hq + kt & hr + ku \end{bmatrix} \begin{bmatrix}\mathbf{b}_1 \\ \mathbf{b}_2 \\ \mathbf{b}_3\end{bmatrix} = \begin{bmatrix}p \mathbf{b}_1+ q\mathbf{b}_2 + r\mathbf{b}_3 \\ s\mathbf{b}_1 + t\mathbf{b}_2 + u\mathbf{b}_3 \\ (hp + ks)\mathbf{b}_1 + (hq + kt)\mathbf{b}_2 + (hr + ku)\mathbf{b}_3 \end{bmatrix}$$

$$= \begin{bmatrix}p \mathbf{b}_1+ q\mathbf{b}_2 + r\mathbf{b}_3 \\ s\mathbf{b}_1 + t\mathbf{b}_2 + u\mathbf{b}_3 \\ h(p\mathbf{b_1}+q\mathbf{b}_2+r\mathbf{b}_3) + k(s\mathbf{b}_1 + t\mathbf{b}_2 + u\mathbf{b}_3) \end{bmatrix}$$

Generally, the linear dependence of the rows of $A$ carries over to the rows of the product, proving our Fact. (This reasoning actually shows the more-precise Fact that $rank(AB)\le rank(A)$.)

We can restate the Fact this way:

Re-Fact. If the rows of $AB$ are linearly independent, then the rows of $A$ are linearly independent.

To your question: If $A B = I$, then (by the Re-Fact) the rows of $A$ must be linearly independent. This implies that $A$ can be row-reduced to a diagonal matrix with no zero entries on that diagonal: the row-reduced form of $A$ must be the Identity matrix.

Note that row-reduction is actually an application of matrix multiplication. (You can see this in the equations above, where (left-)multiplying $B$ by $A$ combined the rows of $B$ according to the entries in the rows of $A$.) This means that, if $R$ is the result of some row combinations of $A$, then there exists a matrix $C$ that "performed" the combinations:

$$C A = R$$

If (as in the case of your problem) we have determined that $A$ can be row-reduced all the way down to the Identity matrix, then the corresponding $C$ matrix must be a (the) left-inverse of $A$:

$$C A = I$$

It's then straightforward to show that left and right inverses of $A$ must match. This has been shown in other answers, but for completeness ...

$$A B = I \;\; \to \;\; C (A B) = C \;\; \to \;\; (C A) B = C \;\; \to \;\; I B = C \;\; \to \;\; B = C$$

Once you start thinking (ahem) "outside the box (of numbers)" to interpret matrices as linear transformations of vectors and such, you can interpret this result in terms of mapping kernels and injectivity-vs-surjectivity and all the kinds of sophisticated things other answers are suggesting. Nevertheless, it's worth noting that this problem is solvable within the realm of matrix multiplication, plain and simple.

10
On

If $\rm\,B\,$ is a linear map on a finite dimensional vector space $\rm\, V\,$ over field $\rm\,K,\,$ then easily by finite dimensionality (cf. Note below) there is a polynomial $\rm\,0\ne p(x)\in K[x]\;$ with $\rm\,p(B) = 0.\,$ W.l.o.g.$\rm\,p(0) \ne 0\,$ by canceling any factors of $\rm\,B\;$ from $\rm\;p(B)\;$ by left-multiplying by $\rm A,\,$ using $\rm\, AB = 1.$

Notice $\rm\ AB=1 \, \Rightarrow\, (BA-1)\, B^n =\, 0\;$ for $\,\rm\;n>0\;$

so by linearity $\rm\, 0 \,=\, (BA-1)\ p(B)\, =\, (BA-1)\ p(0) \;\Rightarrow\; BA=1 \quad\quad$ QED

This is essentially a special case of computing inverses by the Euclidean algorithm - see my Apr 13 1999 sci.math post on Google or mathforum.

Note $ $ The existence of $\rm\;p(x)\;$ follows simply from the fact that $\rm\,V\;$ finite-dimensional implies the same for the vector space $\rm\,L(V)\,$ of linear maps on $\rm\,V\,$ (indeed if $\rm\,V\;$ has dimension $\rm n$ then a linear map is uniquely determined by its matrix of $\,\rm n^2\,$ coefficients). So $\rm\, 1,\, B,\, B^2,\, B^3,\,\cdots\;$ are $\rm\,K$-linearly dependent in $\rm\, L(V)$ which yields the sought nonzero polynomial annihilating $\rm\,B.$

0
On

Coincidentally, a totally unrelated MathSciNet search turned up this article, which gives a result along the lines of (but slightly stronger than) the one in Martin Brandenburg's answer.

In particular:

Theorem (Hill, 1967): Let $R$ be a ring satisfying the ascending chain condition on right ideals. Then:
a) For any $n \in \mathbb{Z}^+$, the matrix ring $M_n(R)$ also satisfies ACC on right ideals.
b) If $a,b \in R$ are such that $ab = 1$, then also $ba = 1$.

1
On

here is a try. we will use AB = I to show that the columns of $B$ are linearly independent and then use that to show $BA$ is identity on the range of $B$ which is all of the space due to linear independence of the columns of $B$. This implies that $BA$ is identity. linear independence of the columns of follows if we can show Bx = 0 implies x = 0. assume $Bx = 0, x = Ix = ABx = A0 = 0$. now to show that $BA$ is an identity on range of $B$ we have $(BA)Bx = B(AB)x = Bx $ and and we are done.

2
On

Motivation $\ $ If vector space $\rm V\:$ has a $1$-$1$ map $\rm\,B\,$ that's not onto, i.e. $\rm V > BV\:,\:$ then this yields an $\infty$ descending chain of subspaces by $\rm V > \: BV > \;\cdots\: > B^i V$ by repeatedly applying $\rm B\:$.

Theorem $\rm\:\ AB = 1 \;\Rightarrow\; BA=1\:$ for linear maps $\rm\:A,B\:$ on a finite dimensional vector space $\rm\: V $

Proof $\rm\;\; V > BV, \;$ times $\rm\; B^i\:\Rightarrow\: B^i V > B^{i+1} V \;$ (else $\rm\; B^i V = B^{i+1} V, \;$ times $\rm\; A^i \Rightarrow V = BV)$

$\rm\ \ \ \ \Rightarrow\rm\;\;\; V > BV > B^2 V > \cdots \:$ is an $\infty$ descending chain $\rm\; \Rightarrow\; dim\: V = \infty\,\:$ contra hypothesis.

Hence $\rm\ \ \ V = BV \;\Rightarrow\; (BA\!-\!1)V = (BA\!-\!1)BV = B(AB\!-\!1)V = 0 \quad$ QED

Remark $\;\;$ Hence vector space $\rm\:V\;$ has infinite dimension $\rm\;\iff V\:$ is Dedekind infinite, i.e. $\rm\:V\:$ has an isomorphic proper subspace $\rm BV,\:$ viz. the theorem proves $(\Leftarrow)$ and the converse follows by taking $\rm B\:$ to be a basis shift map $\rm\; (v_1,v_2,v_3,\cdots\:)\:\to\: (0,v_1,v_2,v_3,\cdots\:)\:,\;$ i.e. said simply, a vector space is infinite iff it has a subspace chain model of Hilbert's infinite hotel. $\:$ That is the simple essence of the matter.

1
On

Disclaimer: The following proof was written by me, but the main idea was given to me by a friend in some discussion, and he probably proved it too.

The main idea is that this claim is true when $A$ is an elementary row matrix, and we can induct on the number of row operations needed to be done on $A$ in order to reduce it.

  • Some background:

    An elementary row-operation matrix $E$ corresponds to one of 3 operations: row addition, row multiplication and row switching. Every $E$ has a matrix $E'$ which is another row-operation matrix of the same type, such that $EE'=E'E=I$ (you don't need to term "inverse of a matrix" to believe this - this "inverse" $E'$ is described here and you can verify it yourself).

    By performing elementary row operations on a matrix $A$, one gets the following equality: $E_{k} E_{k-1} \cdots E_1 A = C$, where $C$ is in row reduced form and $E_i$ are elementary row matrices.

Claim 1: $A$ can be written as $E_{k} \cdots E_{1}C $, where $E_i$ are some elementary row matrices (not neccesarily the same matrices from before). Proof: Multiply by $E'_1 E'_2 \cdots E'_k$ from the left.

Claim 2: if $AB=I$, then this $C$ is the identity. Proof: By using determinant this is fairly easy. The condition $AB=I$ ensures $\det(C) \neq 0$ by multiplicativity of determinant, and since $C$ is upper triangular with every leading coefficient being 1 (the only nonzero entry in its column), the determinant is non-zero iff $C=I$. This can also be proved without determinants, but it is not the main focus of the proof (but it is an important claim nonetheless).

So we showed that $A=E_{k} E_{k-1} \cdots E_{1}$.

Claim 3: If $k=0$ or $1$ ($k$ is the number of row operations required to reduce $A$), then $AB=I \implies BA=I$. Proof: If $k=0$, $A=I \implies B=I \implies BA=I$. If $k=1$, $A$ is an elementary row matrix: $A=E$, so $AB=EB=I \implies B = E'EB = E' \implies BA=E'E=I$.

Claim 4: $AB = I \implies BA = I$. I will induct on $k$. Let's assume this is true while $k \le n$, $n \ge 1$. I am going to prove this for $k=n+1$. Let $A=E_{n+1}E_{n} \cdots E_{1}$.

$$ AB = E_{n+1} \cdots E_{1} B = I \implies E_{n} \cdots E_{1} B = E'_{n+1} \implies $$ $$ (E_{n} \cdots E_{1}) (B E_{n+1})= I \implies (B E_{n+1})(E_{n} \cdots E_{1}) = I \implies $$ $$ BA= I$$

0
On

Theorem:

If $A$ and $B$ are two square matrices such that $A B=I$, then $B A=I$.

Proof:

By applying the properties of determinants of square matrices, we get that

$\det A\cdot\det B=\det(A B)=\det I=1\ne0$.

So it results that $\;\det A\ne0\;$ and $\;\det B\ne0\;$.

Now we consider the matrix

$C=\frac{1}{\det A}\cdot\text{adj}(A)$

where $\;\text{adj}(A)\;$ is the adjugate matrix of $A$ which is the transpose of the cofactor matrix of $A$.

$\text{adj}(A)= \begin{pmatrix} A_{1,1} & A_{2,1} & \cdots & A_{n,1} \\ A_{1,2} & A_{2,2} & \cdots & A_{n,2} \\ \vdots & \vdots & \ddots & \vdots \\ A_{1,n} & A_{2,n} & \cdots & A_{n,n} \end{pmatrix}$

where $\;A_{i,j}\;$ is the cofactor of the element $a_{i,j}$ of the matrix $A$.

So $\;A_{i,j}=(-1)^{i+j}\det M_{i,j}$

where $\;M_{i,j}\;$ is the submatrix of $A$ formed by deleting the $i^{th}$ row and the $j^{th}$ column.

We are going to use the Laplace expansions which are the following equalities:

$a_{i,1}A_{j,1}+a_{i,2}A_{j,2}+\ldots+a_{i,n}A_{j,n}= \begin{cases} \det A\;,\quad\text{ if } i=j\\ 0\;,\quad\quad\;\,\text{ if } i\ne j \end{cases}$

$A_{1,i}a_{1,j}+A_{2,i}a_{2,j} +\ldots+A_{n,i}a_{n,j}= \begin{cases} \det A\;,\quad\text{ if } i=j\\ 0\;,\quad\quad\;\,\text{ if } i\ne j \end{cases}$

By applying Laplace expansions, we get that

$A C = C A = I$.

Since for hypothesis $\;A B =I\;,\;$ then $\;C(A B)= C I\;,\;$ hence $\;(C A)B=C,\;$ therefore $\;I B = C\;$ and so we get that $\;B=C$.

Consequently,

$B A = C A = I\;,$

so we have proved that

$B A = I$.

0
On

The OP from 10 years ago said he understood

1.Elementary row operations.
2.Linear dependence.
3.Row reduced forms and their relations with the original matrix.

I'll use only a bit more than (1) and (3). Each elementary row operation corresponds to left multiplication by an elementary matrix and every elementary matrix is invertible. I won't use (2) but I will use

(4) If the rref of square matrix has a zero row, then it does not have a left inverse.

Proof. Let square $H$ have rref $R$ whose last row is zero. Setting a parameter equal to 1 and using $R,$ there is a nonzero column vector $X$ such that $HX={\bf 0}.$ Suppose $KH=I.$ Then $$ X=IX=KHX=K(HX)=K{\bf{0}}={\bf 0},\ \text{a contradiction.}$$

Let $AB=I.$ By (4), $I$ is the rref of $B.$ Therefore, $CB=I,$ where $C$ is a product of elementary matrices. By taking, in reverse order, the product of the inverses of these elementary matrices, it follows that $C$ is invertible. $$C^{-1}=C^{-1}(CB)=IB=B$$ so that $BC=I.\tag 1$ $$A=AC^{-1}C=(AB)C=IC=C\tag 2 $$ Combining (1) and (2), we conclude that $BA=I.$

(I just noticed the P.S. of the OP. Sorry. Also, there's probably overlap with earlier proofs. I just wanted to give a short, simple proof that did not use any concepts from vector spaces.)

0
On

I wanted a simpler and clearer proof, which is as follows:

Prop. Let $A$ and $B$ be real square matrices of size $n$. If $AB = I$, then $BA=I$.

Notation. Let $L_A : \mathbb{R}^n \to \mathbb{R}^n$ be a linear transformation such that $L_A(x) = Ax$. We treat $L_A$ and $A$ equally for simplicity. $e_i$ is the $i$-th standard basis element. The span of (column) vectors $b_1,\dots,b_n$ is denoted by $\text{span}(b_i) = \{\sum_{i=1}^n c_i b_i : c_i \in \mathbb{R}^n\}$.

Proof. Denote $B=[b_1 \cdots b_n]$ with columns $b_i$. Then $b_i$ are indepedent since if $$ 0 =\sum_i c_ib_i $$ then $$ 0 = \sum_i c_i A b_i = \sum_i c_i e_i $$ gives $c_i=0$ as $e_i$ are independent and $AB=I$ means $e_i = Ab_i$ for all $i$. Thus, $b_i$ spans $\mathbb{R}^n = \text{span}(b_i)$. Hence, the range of $A$ is equal to $$ A(\mathbb{R}^n) = \{ \sum_{i=1}^n c_iAb_i : c_i \in \mathbb{R} \} = \{ \sum_{i=1}^n c_ie_i : c_i \in \mathbb{R} \} = \mathbb{R}^n. $$ Hence, due to Rank-Nullity Theorem, $\ker A = \{0\}$.

Now, rearranging $$ A = IA = (AB)A = A(BA) $$ gives $0 = A(I - BA)$. Denote $I-BA = [z_1 \cdots z_n]$ with columns $z_i$. Since $\ker A = \{0\}$, we have $Az_i = 0$ for all $i$, proving $I - BA= 0$; i.e., $BA = I$. This completes the proof.

Note. This answer is a clearer version of the selected.

Note. Rank-Nullity Theorem is proved purely by linear dipendancy and/or row operation/reduced echelon form. It states that $$ \dim \ker A + \dim A(\mathbb{R}^n) = n. $$

0
On

Since matrices represent linear maps between finite-dimensional vector spaces, the question can be recast as:

Let $K$ be a field. Prove that if $f:K^n\to K^n$ and $g:K^n\to K^n$ are linear maps, and $f\circ g=\DeclareMathOperator{\id}{id}\id$, then $g\circ f=\id$.

Since $f\circ g=\id$, the function $g$ has a left-inverse, and so it is injective. Now, the rank-nullity theorem asserts that $$ \dim(\operatorname{im} g)+\dim(\operatorname{ker} g)=n \, . $$ As $g$ is injective, its kernel is $\{\mathbf{0}\}$, which has dimension $0$. Hence, $\dim(\operatorname{im} g)=n$. If a linear subspace $U$ of $V$ has the same dimension as $V$, then $U=V$. Since the image of $g$ is a linear subspace of $K^n$, it follows that the image of $g$ equals $K^n$, and so $g$ is surjective. Hence, $g$ is bijective, with $f$ being its unique two-sided inverse.