What is an intuitive explanation to why elimination retains solution to the original system of equations?

2.4k Views Asked by At

I've studied linear algebra before, however, I wanted to come back to the foundations and understand it again from the beginning.

I was looking the following inoffensive linear equations:

$$ x - 2y = 1 $$ $$ 3x + 2y = 11 $$

and after elimination one has:

$$ x - 2y = 1 $$ $$ 8y = 8 $$

the claim is that both systems of equations have the same solution. Geometrically:

enter image description here

if you've never seen this before it nearly seems like magic! We managed to combine different equations and still retain that the solutions is preserved. The new equations seems to be completely different and the lines that we care about don't even have same gradients. Basically, at first glance, the problem seems like it dramatically changed!

So even though, the system seems to have changed a lot, in reality, it didn't change that much since they still intersect at the same point.

My question is, what is the intuition to why manipulating system of equations in this way and combining them retains the original solution.

The intuition/justification that I used to think of was that, if we combine equations, in principal, the "total" information that we had at the beginning of the system is preserved as long we are combining different equations and we don't discard one any of them. Basically, combining two equations implicitly retains the information that we had about the old equation. However, we can "forget" about the old form of the new equation because that information is preserved even though the equation changed. i.e. its ok to combine equation 1 and 2 to form 2' and discard 2, since 2' AND 1 contains all the information of the original system.

This is sort of the intuition that I use but I wasn't sure if that was a good way to think about it or if people maybe had better intuitions or justification to why elimination worked.

7

There are 7 best solutions below

3
On

I will likely be proven wrong by another answer, but I don't think that you can get a geometrical explanation. Algebraically, the situation is very clear: you have a system $Ax=b$, and when you perform row reduction, you are multiplying on the left by an invertible (elementary) matrix. So the system becomes $EAx=Eb$, with the same solution. Multiplying by an invertible matrix does not have an easy geometrical interpretation, as far as I can tell.

3
On

There is only one elimination rule: if I perform an operation on two copies of the same number, both copies change in the same way.

If $x-2y=1$ then $3x-6y=3$. I have changed nothing. I have merely noticed that if I multiply $1$ by $3$ I get $3$.

If I now subtract $3x-6y=3$ from $3x+2y=11$ I am just subtracting $3$ from $11$ which gives me $8y=8$. If I now divide $8$ by $8$ I get $y=1$.

In all cases I am just doing the same thing to the same number written in two different ways.

0
On

So even though, the system seems to have changed a lot, in reality, it didn't change that much since they still intersect at the same point.

My question is, what is the intuition to why manipulating system of equations in this way and combining them retains the original solution.

The reason that the solution space stays the same is algebraic in nature:

As shown by your images, each row defines an affine hyperplane (an affine plane of dimension $n-1$): $$ a_i^\top x = \beta_i $$ where the vector $a_i$ is a normal vector to the hyperplane.

The elementary scalings of a row does not change the points $x$ of the hyperplane, only (the length) of its normal vector, which stays normal, and the inhomogenity to compensate: \begin{align} a_i^\top x = \beta_i \iff \\ s a_i^\top x = s \beta \iff \\ {a'}_i^\top x = {\beta'}_i \end{align} Adding a multiple of a hyperplane to another gives $$ \beta_i = \alpha_i^\top x \to \quad (*) \\ \beta_i + s \beta_j = \alpha_i^\top x + s \alpha_j^\top x = (\alpha_i + s \alpha_j)^\top x \iff \\ (\beta')_i = (\alpha')_j^\top x \quad (**) $$ Stand alone this can be a new hyperplane, with a new set of points $x$. However for a $x$ of the common intersection, this means that still $\alpha_j^\top x = \beta_j$ holds, so we can again revert to the equation $(*)$ from $(**)$. So $$ \alpha_i^\top x = \beta_i \wedge \alpha_j^\top x = \beta_j $$ has the same solutions as $$ \alpha_i^\top x + s \alpha_j^\top =\beta_i + s \beta_j \wedge \alpha_j^\top x = \beta_j $$ The geometric consequence is that the hyperplanes $(*)$ and $(**)$ have the points $x$ in common, for which $\alpha_j^\top x = \beta_j$, which are at least a $n-2$ dimensional affine subspace. For $n=2$ this means staying the same or rotating around the intersection point. For $n=3$ this means staying the same or rotating around the intersection line.

OK, so we can do affine rotations without changing the solution space. However we do not do this at random, but have a strategy which allows to read the solutions from the parameters $\alpha$ and $\beta$:

The goal of elimination is to get a row echelon form. Row per row more dependencies of variables are removed.

elimination (Large Version)

In your image the $x$-dependency of your second line has been removed, it just depends on $y$, which because it is the last variable, turns out constant. It is an affine rotation.

As we now have the $y$-dependency in the second row, we could remove it from the first row, keeping only the $x$-dependency there: $$ \left[ \begin{array}{rr|r} 1 & -2 & 1 \\ 0 & 8 & 8 \end{array} \right] \to \left[ \begin{array}{rr|r} 1 & -2 & 1 \\ 0 & 1 & 1 \end{array} \right] \to \left[ \begin{array}{rr|r} 1 & 0 & 3 \\ 0 & 1 & 1 \end{array} \right] $$ The matrix is now in reduced row echelon form.

elimination2 (Large version)

We have aligned each line parallel to one of the coordinate axes. In higher dimensions similiar alignment happens. We can directly read fixed coordinates of the solutions from this form.

Further we can make statements about the dimension of the solution space:

If the row echelon form has $k$ empty rows, it is clear that the intersection of $n-k$ hyperplanes turns out not to be zero dimensional (a point, a unique solution) but is some at best only $k$ dimensional affine subspace, if all hyperplanes intersect not identical. If they do not have a common intersection, there is no solution.

2
On

Suppose someone gives you a pair of distinct lines in the plane, neither horizontal nor vertical, and intersecting at a point $O$. To find the Cartesian coordinates of $O$, you could rotate one line until it was horizontal, then rotate the other line until it was vertical. You could then read off the coordinates by checking where the rotated lines hit the coordinate axes.

Rotating a line to be horizontal means you've eliminated $x$ from its equation; rotating a line to be vertical means you've eliminated $y$. Perhaps the surprising thing is that elementary row operations effectively rotate lines.

To make this algebra-to-geometry dictionary more precise, write the original lines as \begin{align*} a_{1}x + b_{1}y + c_{1} &= 0, \\ a_{2}x + b_{2}y + c_{2} &= 0. \\ \end{align*} View each left-hand side as the value of a function of two variables whose graph is a plane.

Multiplying an equation by a non-zero scalar "rotates" the corresponding plane in space about its zero set, but doesn't change the zero set.

Fixing one equation and subtracting a non-zero scalar multiple of the other amounts to intersecting planes, then projecting the line of intersection to the $(x, y)$-plane; as the scalar multiple changes, (the shadow of) the line of intersection rotates about $O$. If you tilt the planes just right, the projected line of intersection is parallel to a coordinate axis.

0
On

I know this is an old post but I remember my excellent high school teacher giving the following intuitive explanation. $$ I $$ $$ x + y = k$$

$$ II $$ $$ x + q = p $$

$$ Combined $$ $$ x + y - ( x + q) = k -p $$

Simplifies to:

$$ y - q = k -p $$

$$ y = k - p + q $$

Insert y into the firs equation...

$$ x + (k - p + q) = k $$

...gives you the second

$$ x + q = p$$

The reasoning is as follows. If you combine two equations into one and solve for a variable which occurs in both equations and insert this in the first, you get the second. It basically states that the solution for your variable satisfies both equations...

2
On

A visual to supplement the answers of Andrew Hwang and mvw.

One can 'see' the transformation by subtracting the first equation 'slowly' over $0\le t\le 3$, $$\begin{matrix}3x+2y&=11\\\phantom{aaaaaaaaaaaaa} -t(x-2y)&\phantom{aaaaaa}{-t}\end{matrix}$$ $$(3-t)x+(2+2t)y=11-t$$

$\hspace{4cm}$enter image description here

The transformation is not a rotation but a shear along the other line. The point of intersection appears to act as the origin because if $\vec{x}_0$ ($=(3,1)$) is the solution of $A\vec{x}_0=\vec{b}$ then the intermediate lines satisfy $$(3-t)x+(2+2t)y=(-t,1)\begin{pmatrix}1&-2\\3&2\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}=(-t,1)\begin{pmatrix}1\\11\end{pmatrix}=11-t$$ That is, writing $v_t=(-t,1)$ and $\vec{b}=(1,11)$, $$v_t^\top A\vec{x}=v_t^\top\vec{b}\iff v_t^\top A(\vec{x}-\vec{x}_0)=0$$

0
On

I was actually thinking about this question earlier this morning funny enough. I also found myself quite stumped with the intuition part of elimination. I will try and lay out the thought process to how I came to terms with the intuition as it came to me and hopefully we will end up in the same boat.

Examples and Explanation: Let's start a simple example, you are going about your day and come upon a troll that tells you to solve two problems before you can cross a bridge. The first problem is this: one TV and two DVDs altogether weigh 5kgs, one TV and one DVD weigh 3kgs. How much does the DVD weigh? When you write out the equations you get: $x+2y=5$ and $x+y=3$. You make the witty realization that the only difference between the two weighings was the weight of 1 DVD and that difference is 2 kgs. You tell the troll that the DVD must weigh 2kgs and you get the first problem right.

Lets take a pause here, you already know how to solve this mostly trivial problem, but there's a lesson here. Two different equations and two variables 'should' always be solvable, but that is only true when you know how to manipulate the equations correctly. It isn't immediately obvious how to solve a system without first realizing that you need to be willing to combine them. Two equations only give you two static relationships, but the insight is that they also give you a ton of combinations, combinations that when used appropriately like in gauss-jordan elimination can get you to simplify and isolate certain variables without changing anything, but rather building from what was given. Next example.

Troll says he's impressed with how quickly you solved the first problem. So he makes the next harder. He says, one TV's and two DVDs weigh 5 kg's, and two TVs and one DVD weigh 4 kg's. How much does the TV weigh?

When solving you make the equations: $x+2y=5$ and $2x+y=4$. This time you know that you can't immediately isolate a weight. But you know that by removing the second equation from the first two times you get a combination where the TV weight is isolated. You get the equation $-3x=-3$ so x is 1kg. You tell the troll the TV weighs 1 kg (small TV) and you get it right again.

Let's go over what happened here. You combined two equations and got an answer, that is all. It is almost exactly the same as the first example in that regard. In the first example you technically did: $y = (x+2y) - (x+y) = (5) - (3)$. Each equation is kind of like a block and you fit them together to cancle the shared components. When we said the "difference between the two weighings is 2", this is technically what happened. Here though what we did was: $(x+2y) - (2x+y) - (2x+y) = (5) - (4) - (4) = -3 = -3x$ The insights was in order to find the variable we wanted we had to allow the other variable to be cancelled. So long as the proportions in the two equations is different this is always possible and critical to gauss-jordan eli. Its essentially just an abstraction to, any combination of equations is fine so long as the equalities of the original equations are maintained.

Tying It Together: You might wondering how do to reconcile all this notation with the changing of the graphs in the original question. The most intuitive way to think about is actually in the vector/columns space. I'll attach a 3blue1brown video that covers the concept in more detail if you need a refresher before continuing. But in short, a matrix moves vectors that are inputted into it to a new location. And all square matrices that are full rank (no lost information) move each unique vector a unique location. All unique inputs go to unique outputs which can always be undone given the inverse matrix. If there was 1 solution to the system before a full rank square matrix transformation there is still 1 solution after. If there was infinite solutions before there is still infinite after. If there was no solution before there is still no still no solution after. Adding rows is an elementary matrix transformation that is square and full rank. All it does is slightly stretch the space but proportionally stretches the vector you want to find, it can still be undone without changing any truths in the system. In the cartesian plane its a little harder to see other than the point of intersection never changes as the graphs change. But a rough guide is that the closer you are to solving the system, at least through gauss-jordan elimination is that the lines get more and more perpendicular to the x-y axii, this is since you are getting closer to a x= ?, y=?, which when graphed are perpendicular lines to each other. In summary, its not that you did a manipulation to the equations and happened to get the same point of intersection, it's that you only did manipulations that would keep the same point of intersection and then doing this on and on till you solve the system.

Hope this helped.

3b1b - Linear Combinations and Matrices: https://www.youtube.com/watch?v=kYB8IZa5AuE&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab&index=3

EDIT: I changed my answer quite a bit to accomodate a few more insights also, thank you @vitamin d for the latex.