Why do determinants have their particular form?

3k Views Asked by At

I know that for a matrix $A$, if $\det(A)=0$ then the matrix does not have an inverse, and hence the associated system of equations does not have a unique solution. However, why do the determinant formulas have the form they do? Why all the complicated co-factor expansions and alternating signs ?

To sum it up: I know what determinants do, but its unclear to me why. Is there an intuitive explanation that can be attached to a co-factor expansion??..

8

There are 8 best solutions below

11
On BEST ANSWER

Two exercises that may give you the answer you need (no work, no gain):

  1. Assume you have a square $[0,1]\times [0,1]$ in the $(x,y)$-plane. Assume for some reason you need to change the variables you are using. The new variables you are using are now $w=a x + b y$ and $z=c x + d y$, where $a,b,c$ and $d$ are numbers. What is the area of the original square under the new coordinate system, the $(w,z)$-plane?
  2. A multi-linear mapping in $\mathbb{R}^2$ (bilinear in this case) is a function, $M:\mathbb{R}^2\times \mathbb{R}^2\rightarrow \mathbb{R}$ such that $M( ax+b \hat x, y)= a M(x,y)+b M(\hat x,y)$ and $M(x,a y + b\hat y)=a M(x,y)+bM(x,\hat y)$. The map is alternating if $M(x,y)=-M(y,x)$. These two properties are very useful. Exercise: Show that if $M$ has these properties then $M(x,y)=k\cdot det\pmatrix{x_1 & y_1 \\ x_2 & y_2}$.
8
On

The determinant is actually determined by a few simple rules, as it is (up to multiplying by a constant) the only multilinear antisymmetric functional on the space of matrices.

We don't decide on a very complicated definition by choice, rather, we take two simple properties (multilinear + antisymmetric) and see what we get. And what we get is slightly complicated (but very useful).

2
On

So a quick consultation of wikipedia suggests that the determinant was used for linear systems long before it was understood those systems could written in terms of matrices.

Nowadays, of course, we know we can write such systems in terms of matrices, or more broadly as linear operators. It turns out that a simple extension of a linear operator under the exterior algebra helps replicate the determinant:

The exterior algebra uses a wedge product, denoted with $\wedge$, and wedge products of several vectors allow us to treat planes, volumes, and such as algebraic elements.

The natural extension of a linear operator across the wedge product is to wedge every vector in that product: so given a wedge product $a \wedge b \wedge c \wedge \ldots$, the natural extension of a linear operator $\underline T$ is

$$\underline T(a \wedge b \wedge c \wedge \ldots) \equiv \underline T(a) \wedge \underline T(b) \wedge \underline T(c) \wedge \ldots$$

That is, have the operator act on each vector individually, then compute the wedge.

When you do this with the highest-graded wedge product, a wedge product of $n$ vectors in an $n$-dimensional space, the action of the linear operator reduces to a scalar multiplication. Let the $n$-vector be denoted $i$, and we get

$$\underline T(i) = (\det \underline T)i$$

This expresses how the "volume" of the space changes orientation and is dilated or shrunk under the transformation. The antisymmetry of the wedges captures exactly the same alternating plus and minus signs typically used to compute the determinant.

Edit: Why does the change in the volume matter? Well, you can grasp how this relates to invertibility: if the linear operator maps volume to volume, then you can see how there may be a bijection between vectors, but if the operator maps volume to zero volume, then the image is at most something smaller (in dimensionality) than volume, and as with any projection, that means multiple vectors are mapped to the same output vector---such a map cannot be invertible.

1
On

Let $V$ be an $n$-dimensional vector space. You can consider set $W_n$ (there is a technical symbolism used for this space, which I will not bother you with) of maps $f\colon V^n\to \mathbb R$ with the following conditions:

  • $f$ is multilinear, that is: linear in each argument: $$f(v_1, \ldots, v_{i-1},av_i+bu_i, v_{i+1},\ldots, v_n)=af(v_1, \ldots, v_{i-1},v_i, v_{i+1},\ldots, v_n)+bf(v_1, \ldots, v_{i-1},u_i, v_{i+1},\ldots, v_n)$$
  • $f$ is alternating, i.e. $f(v_1,\ldots v_n)=0$ whenever $v_i=v_j$ for some $i\ne j$.

The set $W_n$ is a vector space (under the obvious addition and scalar multiplication), and we can wonder what its dimension is. Intriguingly, $\dim W_n=1$. As a consequence, each $f\in W_n$ is already determined by its value at one nontrivial point. Especially, there is a unique $f$ with the property that $f(e_1,\ldots, e_n)=1$ (where the $e_i$ are the standard base vectors). Then, for any $n\times n$ matrix $A$ with columns $v_1,\ldots, v_n$, one can show that we simply have $\det A=f(v_1,\ldots, v_n)$. Hence all the idiosyncrasies of $\det$ are not so unusual at all: they come naturally from the simple requirements of being multilinear and alternating (and these are also naturally related to solubility of linear equations).

3
On

Because determinants are just the signed volume of the parallelotope with sides as the column vectors, the determinant formula is equivalent to (and can be derived from) the product of the norms of the vectors produced by the Gram-Schmidt process applied to the column vectors -- or at least it is when the column vectors are linearly independent. Note that this process finds the volume of a orthotope, but it is the same as the volume of the original parallelotope because of Cavalieri's Principle. The sign is then determined by the whether it is a right-handed or left-handed arrangement of those column vectors.

0
On

This is just an addendum to Hagen von Eitzen's excellent answer. Historically, determinants arose as a way of writing down the solution to a system of linear equations. In modern notation, such a system may be written $$A\boldsymbol x=\boldsymbol b,$$ where $A$ is an $m\times n$ real matrix, $\boldsymbol b$ is a known $m$-vector, and $\boldsymbol x$ is the $n$-vector to be determined. A necessary condition for the system to have a unique solution is that $m\geqslant n$. In fact, if $m>n$, then at least $m-n$ of the equations are redundant. So the interesting case is $m=n$. Even then, the system may be inconsistent (and perhaps redundant as well), in which case it has no solution; or it may be be consistent but redundant, when the solution is not determined. The key case is when the system is square ($m=n$), consistent, and irredundant. Then the solution is given by $$\boldsymbol x=A^{-1}\boldsymbol b,$$ where $A^{-1}$ is the usual matrix inverse. Determinants arise as the entries of $A^{-1}$: each entry is a ratio, all entries having the common denominator $\det A$, which must therefore be nonzero. The numerators are the cofactors of $A$, which are also determinants. The properties of multilinearity and alternation of $\det$ correspond to familiar properties of linear equations; for example, replacing one equation by itself plus a linear combination of other equations doesn't change the solution (or $\det A$).

1
On

Suppose you want to tell if a square matrix is invertible or not. You may try to come up with a function f on the space of square matrices which is "as simple as possible" and has the property that f(A)=0 if and only if A is invertible. What can be simpler than a polynomial function of matrix coefficients? So, you try to find a polynomial function. How to measure complexity of a polynomial? Probably by minimizing the degree. Thus, you look for a polynomial of minimal degree which has this property. Then you discover

Theorem. The only minimal degree polynomials f with the required property are scalar multiple of the determinant.

What is left is to pick the right scalar multiple. You decide (a bit arbitrarily) to require that f(I)=1, where I is the identity matrix. Now, you conclude that f is the determinant.

1
On

Of course there are many good answers. However, I think what I post below adds some value. I show why the determinant might be discovered merely as a consequence of systematic row reduction to seek a criterion for invertibility. The conclusion of the derivation is by no means unique, and that is where volume and orientability comes into play. I have not much to say about those as they have been addressed in other answers already. What follows is actually taken from my 2014 Linear Algebra notes

The base case $n=1$ has $A=a \in \mathbb{R}$ as we identify $\mathbb{R}^{ 1 \times 1}$ with $\mathbb{R}$. The equation $ax=b$ has solution $x=b/a$ provided $a \neq 0$. Thus, the simple criteria in the $n=1$ case is merely that $\boxed{a \neq 0}$.

The $n=2$ case has $A = \left[ \begin{array}{cc} a & b \\ c & d \end{array} \right]$. We learned that the formula for the $2 \times 2$ inverse is: $$ A^{-1} = \frac{1}{ad-bc}\left[ \begin{array}{cc} d & -b \\ -c & a \end{array} \right]. $$ The necessary and sufficient condition for invertibility here is just that $ad-bc \neq 0$. That said, it may be helpful to derive this condition from row reduction. For brevity of discussion (you could break into further cases if you want a more complete motivating discussion, our current endeavor is to explain why the determinant formula is natural) we assume $a,c \neq 0$. $$ A = \left[ \begin{array}{cc} a & b \\ c & d \end{array} \right] \ \underrightarrow{ \ cr_1, ar_2 \ } \ \left[ \begin{array}{cc} ac & bc \\ ac & ad \end{array} \right] \ \underrightarrow{ \ r_2- r_1 \ } \ \left[ \begin{array}{cc} ac & bc \\ 0 & ad-bc \end{array} \right] $$ Observe that $\boxed{ad-bc \neq 0}$ is a necessary condition to reduce the matrix $A$ to the identity.

The $n=3$ case has $A = \left[ \begin{array}{ccc} a & d & g \\ b & e & h \\ c & f & i \end{array} \right]$. I assume here for brevity that $a,b,c,d,e,f \neq 0$ \begin{align} \notag A = \left[ \begin{array}{ccc} a & d & g \\ b & e & h \\ c & f & i \end{array} \right] \ \ &\underrightarrow{ \ bcr_1, \ acr_2,\ abr_3 \ } \ \left[ \begin{array}{c|c|c} abc & dbc & gbc \\ acb & ace & ach \\ abc & abf & abi \end{array} \right] \\ \notag &\underrightarrow{ \ r_2-r_1, \ r_3-r_1 \ } \ \left[ \begin{array}{c|c|c} abc & dbc & gbc \\ 0 & c(ae-db) & c(ah-gb) \\ 0 & b(af-dc) & b(ai-gc) \end{array} \right] \\ \notag &\underrightarrow{ \ r_1/(bc), \ r_2/c, r_3/b \ } \ \left[ \begin{array}{c|c|c} a & d & g \\ 0 & ae-db & ah-gb \\ 0 & af-dc & ai-gc \end{array} \right] \\ \notag &\underrightarrow{ \ r_2/(ae-db) \ } \ \left[ \begin{array}{c|c|c} a & d & g \\ 0 & 1 & \frac{ah-gb}{ae-db} \\ 0 & af-dc & ai-gc \end{array} \right] \\ \notag &\underrightarrow{ \ r_3- (af-dc)r_2 \ } \ \left[ \begin{array}{c|c|c} a & d & g \\ 0 & 1 & \frac{ah-gb}{ae-db} \\ 0 & 0 & ai-gc -(af-dc)\frac{ah-gb}{ae-db} \end{array} \right] \\ \notag &\underrightarrow{ \ (ae-db)r_3 \ } \ \left[ \begin{array}{c|c|c} a & d & g \\ 0 & 1 & \frac{ah-gb}{ae-db} \\ 0 & 0 & (ai-gc)(ae-db) -(af-dc)(ah-gb) \end{array} \right] \\ \notag \end{align} Apparently, we need $(ai-gc)(ae-db) -(af-dc)(ah-gb) \neq 0$. Let's see if we can simplify it, \begin{align} \notag (ai-gc)(ae-db) -(af-dc)(ah-gb) &= a^2ie-aidb-gcae+gcdb-a^2fh+afgb+dcah-dcgb\\ \notag &= a[aie-idb-gce-afh+fgb+dch] \end{align} We already assumed $a \neq 0$ so it is most interesting to require: $$ \boxed{aie-idb-gce-afh+fgb+dch \neq 0} $$ The condition above would seem to yield invertibility of $A$. To be careful, the calculation above does not prove anything about matrices for which the above row operations are forbidden. Technically, you'd need to examine those cases separately to prove the boxed criteria suffices for invertiblity of $A$. That said, perhaps this section helps motivate why we define the following determinants: \begin{align} \notag \text{det}[a] &= a, \\ \notag \text{det}\left[ \begin{array}{cc} a & b \\ c & d \end{array} \right] &=ad-bc, \\ \notag \text{det}\left[ \begin{array}{ccc} a & d & g \\ b & e & h \\ c & f & i \end{array} \right] &= aie-idb-gce-afh+fgb+dch \end{align} If $x \neq 0 $ then $-x \neq 0$ thus the invertibility criteria alone does not suffice to uniquely determine the determinant. We'll see in a later section that the choice of sign has geometric significance. If a set of $n-1$ vectors $v_1,\dots v_{n-1}$ forms a hyperplane in $\mathbb{R}^n$ and we consider $\text{det}[v_1|\cdots |v_n |w]$ for some vector $w$ then the determinant is positive if $w$ is one one side of the hyperplane and it is negative if $w$ is one the other side. If $w$ is on the hyperplane then the determinant is zero. These facts serve to determine the definition of the determinant in general.