Motivation for the $3\times 3$ matrix inversion formula

115 Views Asked by At

When $A$ is a square non-singular matrix, $A^{-1} = \frac{1}{\det(A)} \mathrm{Adj}(A)$. What is the motivation for this formula and how can I get there? I have been told about how the 2x2 formula is guessable, but this is no convincing argument for higher dimensions.

I am less interested in a highly rigorous proof that works backwards from the result, or on the other extreme, a derivation only consisting of only words.

5

There are 5 best solutions below

0
On

Actually, for any square matrix size $n$, you have the formula $$A\cdot\operatorname{adj}(A)=\operatorname{adj}(A)\cdot A=\det(A)\, I_n $$ which comes from the formula for the expansion of a determinant along a row or a column.

0
On

Let's consider the $n \times n$ case.

Let the $n\times n$ matrix $A^{-1} = \begin{pmatrix} \vec{v_1} & \vec{v_2} & \vec{v_3} \dots \vec{v_n} \end{pmatrix} $ where $\vec{v_1}, \vec{v_2}, \vec{v_3}, \dots , \vec{v_n}$ are the $n \times 1$ column vectors representing the columns of $A^{-1}$.

$AA^{-1} = I_n$. This means $A\vec{v_j} = \vec{e_j}$ for all $1 \le j \le n$ where $\vec{e_j}$ is the $n \times 1$ column vector whose $j$th row is a $1$ and every other row is a $0$.

According to Cramer's rule the solution to the equation $A\vec{v_j} = \vec{e_j}$ for $\vec{v_j}$ is as follows:

  • The number in the $i$th row of the column vector $\vec{v_j}$ is given by $A^{-1}_{ij} = v_{ij} = \frac{\det(D_{ij})}{\det(A)}$ where $\det(D_{ij})$ is the determinant of the matrix $D_{ij}$ formed by replacing the $i$th column of $A$ with the column vector $\vec{e_j}$.

Now another way to find $\det(D_{ij})$ is to expand this determinant along the $i$th column of $D_{ij}$. This column is just the basis vector $\vec{e_j}$. Only the $j$th position down this column is non-zero.

So $\det(D_{ij}) = (-1)^{i+j}\det(M_{ji}) = C_{ji}$ where $M_{ji}$ is the matrix formed by removing the $j$th row and $i$th column of $D_{ij}$.

But since $D_{ij}$ is just $A$ with column $i$ replaced, $M_{ji}$ is the same as the matrix formed by removing/crossing out the $j$th row and $i$th column of $A$, since the column in $A$ that was replaced by $\vec{e_j}$ is being removed/crossed out anyway.

Thus, $M_{ji}$ is the matrix formed by removing/crossing out the $j$th row and $i$th column of $A$. It is the $j, i$ minor matrix of $A$. We also see then that $C_{ji}$ is the $j, i$ cofactor of $A$.

So we have $A^{-1}_{ij} = v_{ij} = \frac{(-1)^{i+j}\det(M_{ji})}{\det(A)} = \frac{C_{ji}}{\det(A)}$.

This means $A^{-1}_{ij} = \frac{C_{ji}}{\det(A)}$ so $A^{-1}_{ij} = \frac{C_{ji}}{\det(A)}$ (Pay attention to the location of the indices $i$ and $j$).

In words, the $i$th row and $j$th column of $A^{-1}$ is the $j$th row and $i$th column of the cofactor matrix $C$ divided by the determinant of $A$. By swapping rows and columns in the cofactor matrix (transposing) we get the final statement "the $i$th row and $j$th column of $A^{-1}$ is the $i$th row and $j$th column of the transpose of the cofactor matrix $C$ of $A$".

The transpose of the cofactor matrix of $A$ is the adjugate of $A$ which is written as Adj$(A)$.

So in the end we have arrived at the formula $A^{-1} = \frac{1}{\det(A)}$Adj$(A)$.

The crux of this proof was the part where we invoked Cramer's rule so if you can understand an argument for Cramer's rule you can understand the inverse matrix formula. The first two sections in the Wikipedia article on Cramer's Rule could help you with that. See https://en.wikipedia.org/wiki/Cramer%27s_rule .

Also see here https://people.math.carleton.ca/~kcheung/math/notes/MATH1107/wk07/07_cofactor_expansion.html to learn more about expanding a determinant along a row or column.

0
On

Cramer's rule for solving $Ax=b$ with $A$ square and $\det A\ne0$ is equivalent to stating $A_{ij}\det A_j=b_i\det A$, where $A_j$ is the matrix obtained by replacing $A$'s $j$th column with $b$. Let's motivate this equation.

If you want to solve $Ax=b$, the solution ought to give $b_i$ a $\det A$ coefficient on one side, so it doesn't uniquely determine $b$ if $\det A=0$. But if $b_i\det A$ is equal to something, to match the indices that something would have to be something like $\det A_i$ or $A_{ij}\det A_j$ or $A_{ij}A_{jk}\det A_k$ etc. Having one overall copy of the matrix in front of the determinant makes sense: if we double the $n\times n$ matrix $A$, both sides of $A_{ij}\det A_j=b_i\det A$ multiply by $2^n$.

0
On

The adjugate matrix $\operatorname{adj}(A)$ of a (possibly singular) matrix $A\in\mathbb R^{3\times3}$ can be defined as the matrix $B$ that satisfies $$ \det\pmatrix{Au&Av&w}=\det\pmatrix{u&v&Bw}\tag{1} $$ for all $u,v,w\in\mathbb R^3$. This definition raises a few questions:

  1. Is $B$ well defined? That is, given $A$, does $B$ exist and is it unique?
  2. Does this definition of $B$ agrees with the conventional definition of adjugate matrix (defined using minors of $A$)?
  3. In particular, is it true that $BA=\det(A)I$?

We will answer these questions in reverse order. Suppose $B$ exists. Put $w=Ax$ in $(1)$, we get \begin{aligned} \det\pmatrix{u&v&BAx} &=\det\pmatrix{u&v&Bw}\\ &=\det\pmatrix{Au&Av&w}\\ &=\det\pmatrix{Au&Av&Ax}\\ &=\det(A)\det\pmatrix{u&v&x}\\ &=\det\pmatrix{u&v&\det(A)x} \end{aligned} for every $u,v,x\in\mathbb R^3$. Hence $BA$ must be equal to $\det(A)I$.

Note that by permuting the columns of the matrices on both sides, $(1)$ is equivalent to \begin{align} \det\pmatrix{Au&w&Av}&=\det\pmatrix{u&Bw&v},\tag{2}\\ \text{and }\ \det\pmatrix{w&Au&Av}&=\det\pmatrix{Bw&u&v}.\tag{3} \end{align} So, if $B$ exists, we must have \begin{align} b_{11}&=\det\pmatrix{Be_1&e_2&e_3}=\det\pmatrix{e_1&Ae_2&Ae_3}=m_{11},\\ b_{12}&=\det\pmatrix{Be_2&e_2&e_3}=\det\pmatrix{e_2&Ae_2&Ae_3}=-m_{21},\\ b_{13}&=\det\pmatrix{Be_3&e_2&e_3}=\det\pmatrix{e_3&Ae_2&Ae_3}=m_{31},\\ b_{21}&=\det\pmatrix{e_1&Be_1&e_3}=\det\pmatrix{Ae_1&e_1&Ae_3}=-m_{12},\\ b_{22}&=\det\pmatrix{e_1&Be_2&e_3}=\det\pmatrix{Ae_1&e_2&Ae_3}=m_{22},\\ b_{23}&=\det\pmatrix{e_1&Be_3&e_3}=\det\pmatrix{Ae_1&e_3&Ae_3}=-m_{32},\\ b_{31}&=\det\pmatrix{e_1&e_2&Be_1}=\det\pmatrix{Ae_1&Ae_2&e_1}=m_{31},\\ b_{32}&=\det\pmatrix{e_1&e_2&Be_2}=\det\pmatrix{Ae_1&Ae_2&e_2}=-m_{32},\\ b_{33}&=\det\pmatrix{e_1&e_2&Be_3}=\det\pmatrix{Ae_1&Ae_2&e_3}=m_{33},\\ \end{align} where $m_{ij}$ denotes the $(i,j)$-th minor of $A$. It follows that if $B$ exists, it is unique and it agrees with the conventional definition of $\operatorname{adj}(A)$. Finally, if we define $B$ entrywise using the above nine equations, then by the alternation and multi-linearity of the determinant function, $(1)$ is satisfied. Hence $B$ indeed exists.

0
On

Since the matrix is invertible the function $f(x)=Ax$ with $x$ a column vector is bijective and allows us to study $A$ by considering how it acts on the ambient space. In particular, the unit cube defined by the columns of the identity matrix are mapped to the columns of $A$ by this function. Rather than carry around the $f(x)$ notation we'll simply talk about $A$ as though it were the map throughout.

The identity matrix has a signed volume $\det(I)=1$ while $\det(A)$ is an arbitrary non-zero number so we can interpret this as $A$ scaling the volume of the unit hypercube by its determinant. This motivates the scalar portion of the formula. We simply want to shrink (or grow) the paralellepiped back to unit volume.

However to fully calculate $A^{-1}$ it's insufficient to simply scale the volume correctly, we must also move the vectors back to their canonical positions. We'll need a matrix which we'll call $B$ that maps the columns of $A$ back to the canonical basis vectors they are associated with. This will give us a formula of the form $A^{-1}=\frac{1}{\det(A)}B$. As you may have guessed $B$ will be the adjugate transpose.

We now exploit the fact that $A$ maps the hyperfaces of the canonical paralellepiped to distinct hyperfaces and the inverse transformation must account for this. $B$ must encode information about how the relative position of the vectors change to account for the hyperfaces, as well as requiring each hyperface to be scaled appropriately to satisfy the area requirements of the transformation.

Now we're almost done. It's not hard to see that each hyperface can be accounted for by removing one vector from the parallelpiped, which is the same as removing a column from $A$. Since we're mostly interested in directional information we can ignore where the vectors lay in in the hyperplane they span and find a vector normal to it that has the same length as the hyperface does area. There will be two normal vectors with this property and to determine which we should use we note that each hyperface of the canonical unit cube is associated with it's own canonical basis vector normal to it and that $A$ will only map it to one of them.

To see what's happening algebraically we're looking at vectors orthogonal to each of the the columns of $A$ because they will give us zeroes in the off diagonal positions, which is most of the identity matrix. Multiplying our formula by $A$ on the left gives us $AB=\det(A)I$ which captures the key relationship between $A$ and $B$ - orthogonality. The rows of $A$ are orthogonal to the columns of $B$ except for when the row and column coincide. Putting all the pieces together have finally arrived at the motivation for the adjugate transpose, $B$. Metaphorically I think of the $\det(A)$ term as a sort of "least common orthogonal multiple" for this calculation, similar to how one might clear denominators to move calculations from the rationals to the integers.