Theoretically, why does this work? How do we know the resulting matrix performs the operation on ALL other matrices?
I guess what I'm asking is whether you can prove without "checking" the resulting matrix that it performs the same operation on all other matrices.
So something like : Let $E_o$ be an elementary operation. Then applying $E_o$ to $I$, we get
$I -> E_m$, where $E_m$ is the matrix associated with the elementary matrix.
Three questions :
1) How do we know $I$ functions as the "identity" when we're applying these operations? All we know is that it works that way when we multiply.
2) How do we know $E_m$ performs the operation we applied?
3) How do we know $E_m$ works for all matrices?
It's really crucial that the proof doesn't explicitly find the resulting matrix and then just check that it does what it's supposed to for all matrices.
$$ \newcommand{\mm} {\mathbf M} \newcommand{\mk} {\mathbf K} \newcommand{\mi} {\mathbf I} \newcommand{\me} {\mathbf E} \newcommand{\ml} {\mathbf L} $$ If you extend your notion of "elementary operation" to include the "do nothing operation" --- call it $N$ for "nothing" --- then the corresponding matrix is $I$. That is to say, applying $N$ to a matrix $\mm$ gives the same result as multiplying the matrix by $\mi$: $$ N(\mm) = \mi \cdot \mm $$ Now let's look at some other elementary operation $S$; we can compute $$ \me = S(\mi), \tag{1} $$ right? That is to say, the operation $S$, applied to $\mi$ must produce some matrix, and I'm going to call that matrix $\me$.
Now I'm going to define a new operation $T$ on matrices: $$ T(\mm) = \me \cdot \mm. $$
And your question, now that I've got the definitions out of the way, is
"How do I know that $$S(\mm) = T(\mm)$$ for every matrix $\mm$?"
The first thing to realize is that what I've written above, without the word 'elementary' in "some other elementary operation $S$", is not enough to guarantee this result. It's possible that $S$ might be defined so that $S(\mi) = \me$, where $\me$ is some non-identity matrix, but $S(\mm) = \mi$ for every $\mm \ne \mi$. In that case, the claim that $S(\mm) = T(\mm)$ would be false.
So we're going to need to use the notion of "elementary" somehow. It turns out that the key property that elementary operations share --- the one that makes this proof go through --- is that $$ S(\mm \cdot \mk) = S(\mm) \cdot \mk \tag{2} $$ where the $\cdot$ here represents matrix multiplication. Supposing, for the moment, that our elementary operation has this property, it's easy to prove that $S(\mm) = T(\mm)$ as follows. We know that \begin{align} \mi \cdot \mm &= \mm & \text{property of identity} \\ S(\mi \cdot \mm) &= S(\mm) & \text{apply $S$ to both sides} \\ S(\mi) \cdot \mm &= S(\mm) & \text{use equation 2 above} \\ \me \cdot \mm &= S(\mm) & \text{use equation 1 above} \\ \end{align} and that's the result we wanted, because the left hand side is exactly the definition of $T(\mm)$.
Now how do we prove that fundamental claim, equation 2? We need to use elementary-ness to do so. It's going to come down to cases, alas. For instance, if $S$ is an operation that swaps rows $p$ and $q$, I need to write out $S$ explicitly: $$ S(\mm)_{ab} = \begin{cases} m_{ab} & a \ne p \text{ and } a \ne q \\ m_{pb} & a = q \\ m_{qb} & a = p \\ \end{cases} $$ That is to say, the $ab$--entry of $S(\mm)$ is the $ab$-entry of $\mm$, unless the row ($a$) is either $p$ or $q$, in which case you have to swap things around.
Now looking at a pair of matrices $\mm$ and $\mk$, we need to know a formula for the $ab$ entry of the product $\mm \cdot \mk$; that's $$ (mk)_{ab} = \sum_i m_{ai} k_{ib}. $$ I'm following the usual convention that the entries of $\mm$ are $m_{ij}$ and those of $k$ are $k_{ij}$, and using the two-letter name "$mk$" (in parens) to indicate entries of the product, so that the $i$th row, $j$th column entry of $\mm \cdot \mk$ is denoted $(mk)_{ij}$.
And now we can look at $S(\mm \cdot \mk)_{ab}$ and $ (S(\mm) \cdot \mk)_{ab}$ and compare to see whether they're the same. Well $$ \begin{align} S(\mm \cdot \mk)_{ab} &= \begin{cases} (mk)_{ab} & a \ne p \text{ and } a \ne q \\ (mk)_{pb} & a = q \\ (mk)_{qb} & a = p \end{cases} & \text{definition of $S$} \\ &= \begin{cases} \sum_i m_{ai} k_{ib} & a \ne p \text{ and } a \ne q \\ \sum_i m_{pi} k_{ib} & a = q \\ \sum_i m_{qi} k_{ib} & a = p \end{cases} & \text{definition of matrix multiply} \\ &=\left( \begin{cases} \sum_i m_{ai} & a \ne p \text{ and } a \ne q \\ \sum_i m_{pi} & a = q \\ \sum_i m_{qi} & a = p \end{cases} \right) k_{ib} & \text{extract $k_{ib}$ factor from each!} \\ &=\sum_i \left( \begin{cases} m_{ai} & a \ne p \text{ and } a \ne q \\ m_{pi} & a = q \\ m_{qi} & a = p \end{cases} \right) k_{ib} & \text{extract $\sum_i$, by distributive law} \\ &= \sum_i S(\mm)_{ai} k_{ib} & \text{replace large parens with $S(\mm)_{ai}$, because they match} \\ &= (S(\mm) \cdot \mk)_{ab}. \end{align} $$ So they are, in fact, equal, and we're done (with this case).
That still leaves the "multiply a row by a constant" case, and the "add $r$ times one row to another" case to work out, but the proofs for those cases are nearly identical to this one, so I leave that to you.