Detail in Proof that Reduced Row-Echelon Form is Unique

1.5k Views Asked by At

I've been reading a proof that the reduced row-echelon form of a given matrix is unique, but there was one part that made me wonder.

This step of the proof shows that if $B$ and $C$ are row-equivalent and in reduced row-echelon form, where $r$ is the number of non-zero rows in $B$ and $r'$ is the same for $C$, then $r = r'$. Note that $d_k$ is the column of the $k^{th}$ pivot in $B$, and $d'_k$ is the same in $C$. Previously, it was proved that $d_k = d'_k$, which makes sense to me - this proof assumes that.

Without further ado, the proof (paraphrased from the University of Puget Sound's free textbook on Linear Algebra). If my annotations below are too convoluted, refer to page 34 here.

Suppose $r' < r$. For $1 \leq \mathscr{l} \leq r'$, we have $[B]_{rd_\mathscr{l}} = 0$, as $[B]_{kd_\mathscr{l}} = 0$ iff $k = \mathscr{l}$. Because the rows of $B$ (including row $r$) are a linear combination of those of $C$, we have $0 = [B]_{kd_\mathscr{l}} = \sum_{k=1}^{m} \delta_{rk} [C]_{kd_\mathscr{l}}$ where $\delta_{ik}$ is the coefficient row $k$ in $C$ is multiplied by to contribute to row $i$ in $B$.

We can decompose this sum as $\sum_{k=1}^{r'} \delta_{rk} [C]_{kd_\mathscr{l}} + \sum_{k=r' + 1}^{m} \delta_{rk} [C]_{kd_\mathscr{l}}$, and since $[C]_{kd_\mathscr{l}} = 0$ for $k > r' \geq \mathscr{l}$, we can drop the second sum, leaving $\sum_{k=1}^{r'} \delta_{rk} [C]_{kd_\mathscr{l}}$.

Since we know $d_k = d'_k$, this becomes $\sum_{k=1}^{r'} \delta_{rk} [C]_{kd'_\mathscr{l}}$. Pulling out a single term, we have $\delta_{r\mathscr{l}}[C]_{kd_\mathscr{l}}$ + $\sum_{k=1, k \neq \mathscr{l}}^{r'} \delta_{rk} [C]_{kd'_\mathscr{l}}$. Because $[C]_{kd_\mathscr{l}} = 1$ iff $k = \mathscr{l}$, and otherwise equals $0$, the previous expression reduces to $\delta_{r\mathscr{l}}(1)$ + $\sum_{k=1, k \neq \mathscr{l}}^{r'} \delta_{rk}(0) = \delta_{r\mathscr{l}}.$ Thus, $\delta_{r\mathscr{l}} = 0$...

This proof makes sense to me (especially after drawing a diagram), but I wonder why we pull out the sum from $r + 1$ to $m$ in the second paragraph. It's perfectly fine, but wouldn't the proof be shorter like this?

Suppose $r' < r$. For $1 \leq \mathscr{l} \leq r'$, we have $[B]_{rd_\mathscr{l}} = 0$, as $[B]_{kd_\mathscr{l}} = 0$ iff $k = \mathscr{l}$. Because the rows of $B$ (including row $r$) are a linear combination of those of $C$, we have $0 = [B]_{kd_\mathscr{l}} = \sum_{k=1}^{m} \delta_{rk} [C]_{kd_\mathscr{l}}$ where $\delta_{ik}$ is the coefficient row $k$ in $C$ is multiplied by to contribute to row $i$ in $B$.

Since we know $d_k = d'_k$, this becomes $\sum_{k=1}^{m} \delta_{rk} [C]_{kd'_\mathscr{l}}$. Pulling out a single term, we have $\delta_{r\mathscr{l}}[C]_{kd_\mathscr{l}}$ + $\sum_{k=1, k \neq \mathscr{l}}^{m} \delta_{rk} [C]_{kd'_\mathscr{l}}$. Because $[C]_{kd_\mathscr{l}} = 1$ iff $k = \mathscr{l}$, and otherwise equals $0$, the previous expression reduces to $\delta_{r\mathscr{l}}(1)$ + $\sum_{k=1, k \neq \mathscr{l}}^{m} \delta_{rk}(0) = \delta_{r\mathscr{l}}.$ Thus, $\delta_{r\mathscr{l}} = 0$...

Note that the second paragraph is now gone, and in the third paragraph $r'$ in the upper limit of sums has been replaced with $m$. My question is ultimately: is my shorter proof correct? If so, why wouldn't the proof be presented this way in the first place?

1

There are 1 best solutions below

0
On

Heres a proof I made up when teaching the class. Maybe it will interest someone. Basically the matrix A determines its null space, which in turn equals the graph of a linear transformation whose matrix is the negative of the unknown part of the reduced echelon form. QED.

more detail:

Conceptual proof of uniqueness of reduced echelon form:

It is fundamental that a matrix and its reduced echelon form have the same null space. Indeed that is the reason reduced echelon forms are useful for finding the null space of the original matrix. I claim that null space determines entirely the reduced echelon form. For simplicity take the case where the “pivot” columns all appear first, followed by the non pivot columns. (Note that a column is a pivot column if and only if it does not depend linearly on earlier columns. I.e. this is obvious for a reduced echelon matrix and hence also true for the original matrix, since the columns of both matrices satisfy exactly the same relations.)

If the matrix A is n by m with rank r, then the reduced echelon form has its last n-r rows equal to zero and its upper left r by r block equal to an identity matrix. Thus it suffices to show that the null space characterizes the remaining upper right (r) by (m-r) block. Such a block of course determines, and is determined by, a unique linear transformation from m-r space to r space. Further, that linear transformation is determined by its graph, an m-r dimensional linear subspace of m space.

That subspace, i.e. that graph, except for a minus sign, is precisely the null space. I.e. from looking at the reduced echelon form one can see that the negative of the upper right r by (m-r) block is exactly the matrix of the linear map whose graph is the null space. Looked at another way, if A is the given matrix, the equation AX= 0 determines implicitly a linear function from m-r to r space whose matrix is the negative of the upper right r by (m-r) block of the reduced echelon form of A.

Note: if f is my function, I have written the graph entries here in the order (f(t),t) rather than (t, f(t)).

In general, the null space determines a linear map from the coordinate subspace spanned by the non pivot variables to that spanned by the pivot variables, whose matrix columns are (when augmented at the bottom by zeroes) exactly minus the sequence of non pivot columns of the reduced echelon form of A. Since both the location and the content of the pivot columns are known, the reduced form is determined by the null space.

(I hope I said this right, I'm doing this in my head.)

E.g. if the reduced form has two non zero rows (1 0 3 4), (0 1 5 2)

this says the upper right 2 by 2 part with rows (3 4), (5 2), is minus the matrix of the linear map (x,y) = f(z,w), where x = -3z -4w, y = -5z -2w. The graph of this map f is the plane in 4 space spanned by the vectors (-3, -5, 1, 0), (-4, -2, 0, 1), which is the null space of the matrix.

Oops, I see my definition of "graph" is backwards from the usual, in that my entries appear in the opposite order, i.e. my graph of (x,y) = f(z,w) has entries (f(z,w), (z,w)) = ((x,y), (z,w)), instead of ((z,w), f(z,w))