Why can ALL quadratic equations be solved by the quadratic formula?

48.8k Views Asked by At

In algebra, all quadratic problems can be solved by using the quadratic formula. I read a couple of books, and they told me only HOW and WHEN to use this formula, but they don't tell me WHY I can use it. I have tried to figure it out by proving these two equations are equal, but I can't.

Why can I use $x = \dfrac{-b\pm \sqrt{b^{2} - 4 ac}}{2a}$ to solve all quadratic equations?

9

There are 9 best solutions below

0
On

The formula comes from the technique known as completing the square.

4
On

Probably the easiest way to understand where the quadratic formula comes from is by 'completing the square': solving equations of the form '$x^2$=whatever' is easy, so let's see if we can put our quadratic equation ($ax^2+bx+c=0$) in that form.

The first thing to do is divide by $a$; of course this doesn't work if $a=0$, but then if that's the case our formula wasn't quadratic in the first case! This gives us $x^2+{b\over a}x+{c\over a}=0$. Now, that $b\over a$ term keeps us from having a clean square - but if we remember how to square a sum of two numbers - $(m+n)^2=m^2+2mn+n^2$ - then by substituting $x$ for $m$, we can see that our $n$ should be half of the linear term: $(x+{b\over 2a})^2 = x^2+{b\over a}x + {b^2\over 4a^2}$. But now the constant term isn't right; we have to adjust it to make it $c\over a$. A correction of $({c\over a}-{b^2\over 4a^2})$ will do this; we get $(x+{b\over 2a})^2+({c\over a}-{b^2\over 4a^2}) = 0$.

But this is exactly what we wanted; we can move that second term over to the right and get $(x+{b\over 2a})^2 = {b^2\over 4a^2}-{c\over a}$. Getting the right-hand-side cleaned up a little bit makes it ${b^2-4ac\over 4a^2}$ - just multiply the numerator and denominator of $c\over a$ by $4a$ and combine terms. Now, we can go ahead and take the square root of both sides: $x+{b\over 2a} = \pm \sqrt{b^2-4ac\over 4a^2} = \pm {\sqrt{b^2-4ac}\over\sqrt{4a^2}} = {\pm\sqrt{b^2-4ac}\over 2a}$. The last step is to subtract $b\over 2a$ from both sides, finally giving the familiar: $x = {-b\pm\sqrt{b^2-4ac}\over 2a}$

0
On

Why? Because we let $a$, $b$, and $c$ be anything (usually real numbers).
So our result doesn't depend on the coefficients, only on the fact that we had a polynomial of degree two (i.e. Quadratic)

2
On

The other answers tell you where the formula "comes from" (namely, from completing the square). If you are just happy checking that the formula gives the correct solutions whatever $a$, $b$ and $c$, you may verify that the identity $$ aX^2+bX+c=a\left(X-\frac{-b+\sqrt{b^2-4ac}}{2a}\right)\left(X-\frac{-b-\sqrt{b^2-4ac}}{2a}\right) $$ holds for every $a$, $b$ and $c$.

19
On

I would like to prove the Quadratic Formula in a cleaner way. Perhaps if teachers see this approach they will be less reluctant to prove the Quadratic Formula.

Added: I have recently learned from the book Sources in the Development of Mathematics: Series and Products from the Fifteenth to the Twenty-first Century (Ranjan Roy) that the method described below was used by the ninth century mathematician Sridhara. (I highly recommend Roy's book, which is much broader in its coverage than the title would suggest.)

We want to solve the equation $$ax^2+bx+c=0,$$ where $a \ne 0$. The usual argument starts by dividing by $a$. That is a strategic error, division is ugly, and produces formulas that are unpleasant to typeset.

Instead, multiply both sides by $4a$. We obtain the equivalent equation $$4a^2x^2 +4abx+4ac=0.\tag{1}$$ Note that $4a^2x^2+4abx$ is almost the square of $2ax+b$. More precisely, $$4a^2x^2+4abx=(2ax+b)^2-b^2.$$ So our equation can be rewritten as $$(2ax+b)^2 -b^2+4ac=0 \tag{2}$$ or equivalently $$(2ax+b)^2=b^2-4ac. \tag{3}$$ Now it's all over. We find that $$2ax+b=\pm\sqrt{b^2-4ac} \tag{4}$$ and therefore $$x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}. \tag{5}$$
No fractions until the very end!

Added: I have tried to show that initial division by $a$, when followed by a completing the square procedure, is not a simplest strategy. One might remark additionally that if we first divide by $a$, we end up needing a couple of additional "algebra" steps to partly undo the division in order to give the solutions their traditional form.

Division by $a$ is definitely a right beginning if it is followed by an argument that develops the connection between the coefficients and the sum and product of the roots. Ideally, each type of proof should be presented, since each connects to an important family of ideas. And a twice proved theorem is twice as true.

3
On

Here is a slightly less ad-hoc approach to deriving the formula.

You look at the polynomial $ax^2+bx+c$ and you think of it as being composed of two kinds of indeterminates: coefficients $a$,$b$,$c$, and variable $x$. What you wish to do is if $ax^2+bx+c=a(x-r_1)(x-r_2)$ you want find an expression for $r_1$ and $r_2$ in terms of $a,b,c$ involving only the operations $+,-,\times,\div$ and $\sqrt[n]{}$.

But how are $r_1$ and $r_2$ related to $a,b$ and $c$? If you look at the expression $ax^2+bx+c=a(x-r_1)(x-r_2)$, it is easy to compute that $b=-a(r_1+r_2)$ and $c=ar_1r_2$.

Intuitively because you know that $(r_1+r_2)=-\frac ba$, determining $r_1$ and $r_2$ is the same as determining $(r_1-r_2)$. Let $E=(r_1-r_2)$ and note that $2r_1=(r_1+r_2)+(r_1-r_2)=-\frac ba+E=$ and $2r_2=(r_1+r_2)-(r_1-r_2)=-\frac ba-E$, so we already have most of our quadratic formula: $$r_1,r_2=\frac{-b}{2a}\pm\frac{E}2$$

All we need to do then, is express $E=(r_1-r_2)$ using $+,-,\times,\div,\sqrt[n]{}$ in terms of $a,b,c$. In order to do this, we need to take a small detour to see what expressions in $+,-,\times,\div$ and $a,b,c$ could possible be.

Note that the coefficients $b=-a(r_1+r_2)$ and $c=r_1r_2$ are symmetric functions in $r_1$ and $r_2$ in the sense that if you exchange $r_1$ with $r_2$ for each other, the values of $b$ and $c$ do not change. Furthermore, $b$ and $c$ are in fact scalar multiples of the so-called elementary symmetric functions, which have the property that any symmetric function (in $2$ variables) can be expressed uniquely as a polynomial (quotient of polynomials for our purposes) in them.

In particular, we can "symmetrize" the quantity $E=(r_1-r_2)$ to obtain the discriminant $D=(r_1-r_2)^2$ which is in some sense "the smallest" symmetric function of $r_1$ and $r_2$ that becomes 0 if $r_1=r_2$. Technically, though, the above is the discriminant only when $a=1$ because our coefficients $b$ and $c$ are elementary symmetric functions scaled by $a$, so we define the general discriminant to be $D=a^2(r_1-r_2)^2$. Because $D$ is symmetric and $b$ and $c$ are (up to a multiplicative factor) elementary symmetric, we should be able to express $D$ as a polynomial in $b$ and $c$.

We do so in a somewhat ad-hoc matter (though there are algorithms that will do this procedurally): $$D=a^2(r_1-r_2)^2$$ so $$D=a^2(r_1^2-2r_1r_2+r_2^2)$$ hence $$D=a^2(r_1^2+2r_1r_2+r_2^2-4r_1r_2)$$ and finally $$D=a^2(r_1+r_2)^2-a^24r_1r_2$$ giving us $$D=b^2-4ac$$

Evidently, now we have that $\sqrt{D}=a(r_1-r_2)=aE$ and so $E=\frac{\sqrt{D}}a$. This allows us to rewrite our formula so far to get from $$r_1,r_2=\frac{-b}{2a}\pm\frac{E}{2}$$ to $$r_1,r_2=\frac{-b}{2a}\pm\frac{\sqrt{D}}{2a}$$ and finally $$r_1,r_2=\frac{-b\pm\sqrt{b^2-4ac}}{2a}$$


The only strange question is: why did we only have to take one square root in order to get the formula, i.e. why did the quantity $E=(r_1-r_2)$ turn out to be a square root of a nice polynomial in $a,b,c$? That is where modern Galois theory comes in.

What's really happening is this: the first four suggest that you think of the coefficients as living in the field $F$ (a set of expressions such that adding, subtracting, multiplying, or dividing any two of them gives another expression in the set) consisting of $\{\dfrac {p(a,b,c)}{q(a,b,c)}\}$ where $p$ and $q$ are polynomials in three variables (and rational coefficients). Then $r_1$ and $r_2$ will generate an extension field $E$ of $F$, that is, the smallest field $E$ that contains $F$ and also $r_1$ and $r_2$. Galois theory says that this extension field $E$ will be a $2!=2$-dimensional vector space over $F$ and hence a single square root will be sufficient to generate $E$. Thus we need an expression in the coefficients (symmetric expression in the roots) whose square root is an expression in the roots, but not symmetric, and a natural choice then is the most elementary anti-symmetric function known as the Vandermonde determinant which is precisely $(r_1-r_2)$ in this case (anti-symmetric=swapping two variables flips the sign, obviously the square of an anti-symmetric function is a symmetric function).

For general polynomials, the extension field will be of higher dimension, and so you will need to take possibly several roots of different orders. Galois theory allows us to compute what these roots ought to be and in what order (giving us the cubic and quartic formulas in a way that is not ad-hoc at all), and also shows that the general degree $5$ and above polynomial does not have a formula involving only $+,-,\times,\div,\sqrt[n]{}$. (Some people feel frightened by this, because taking roots should invert the raising of powers, but this is not the case because the order of operations matters...) Now, if the coefficients of the higher degree polynomial satisfy some additional relations (i.e. are not completely independent from each other), then Galois theory also gives procedures for computing formulas for those cases and also for determining what such relations ought to be.

6
On

First, let's examine an analogous simpler special case. Why does the difference of squares formula $\rm\: x^2 - a^2\ =\ (x-a)\ (x+a)\:$ always work? Well, let's consider the obvious proof. Expanding the RHS we obtain $\rm\ (x-a)\ (x+a)\ =\ x^2 - a\ x + x\ a - a^2\ $ which indeed equals $\rm x^2 - a^2\ $ as long as $\rm\ a\ x = x\ a\ $ for all $\rm\:x\:,\:$ i.e. as long as $\rm\:a\:$ commutes with all elements of the ring. Because the proof employed only the commutative law in addition to the standard ring axioms (most notably the distributive law) this difference of squares formula works in all commutative rings. However, generally it fails in noncommutative rings, e.g. rings involving difference or differential operators.

In fact, if we consider both $\rm\:x\:$ and $\rm\:a\:$ as indeterminates, then we can specialize the "generic" factorization $\rm\: x^2 - a^2 = (x-a)\ (x + a)\:$ in the ring $\rm\:\mathbb Z[x,a]\:$ to any commutative ring by using an evaluation homomorphism mapping $\rm\:x,a\:$ to specific values in the target ring (such an evaluation map always exists by the universal property of polynomial rings). So this formula is an identity of commutative rings, a formula universally true, i.e. true in every commutative ring. Many other well-known formulas and proofs are of this sort, e.g. the binomial theorem, resultant formulas which determine if polynomials have a common root, the Cayley-Hamilton theorem, etc.

A somewhat similar remark holds true for the well-known derivation of the quadratic formula (see e.g. André's answer here). However, it is not truly a universal formula for commutative rings because, in addition to the commutative ring axioms, we have invoked some special properties in its derivation. Namely, we have assumed that $\rm\,2a\,$ is invertible, and we have assumed that the discriminant has a square root in the ring. So the proof of the quadratic formula goes through in any commutative ring satisfying these two additional hypotheses. More technically we could carry out the proof generically in a ring where such elements exist, say $\rm\:e = 1/(2a),\ d^2 = b^2 - 4ac\:,\:$ so the proofs works naturally in the ring $\rm\:\mathbb Z[a,b,c,d,e]/(2ae-1, d^2-b^2-4ac)\:.\:$ Therefore any invocation of the quadratic formula can be obtained simply by specializing the proof in this generic ring, just as we did above for the difference of squares formula.

Such "generic" or "universal" proofs can yield quite nontrivial results, e.g. one can "generically" algebraically cancel "apparent singularities" in one fell swoop, before evaluation - thus avoiding alternative dense topological arguments. For example, see this slick proof of Sylvester's determinant identity $\rm\ det\ (1+AB)=det\ (1+BA)\ $ that proceeds by universally cancelling $\rm\ det\ A\ $ from the $\rm\ det\ $ of $\rm\ \ (1+A\ B) A = A (1+B\ A)\,$ in $\,\rm\Bbb Z[a_{\,ij},b_{\,ij}]\,$ where the matrix entries are indeterminates $\rm\,a_{\,ij},b_{\,ij}.\,$ Such proofs exploit to the hilt the universal properties of formal polynomials (vs. less general polynomial functions - see here for much more on this distinction).

Remark $ $ It's worth emphasizing that in general rings it is possible for quadratic equations to have more than $2$ roots, e.g. $\rm\:x^2 = 1\:$ has roots $\rm\:\pm1,\:\pm3\, \in\, \mathbb Z/8 =$ ring of integers modulo $8.\,$ Thus plugging one root of the discriminant into the quadratic formula doesn't necessarily yield all the roots of a quadratic. Such anomalies cannot occur in domains, i.e. rings without zero divisors, where $\rm\ xy = 0\ \Rightarrow\ x=0\ \ or\ \ y=0.\,$ Indeed, a polynomial $\rm\ f(x)\in D[x]\ $ has at most $\rm\ deg\ f\ $ roots in the ring $\rm\:D\ $ iff $\rm\ D\:$ is a domain. For a simple proof see this answer, where I illustrate it constructively in $\rm\ \mathbb Z/m\ $ by showing that, given any $\rm\:f(x)\:$ with more roots than its degree, we can quickly compute a nontrivial factor of $\rm\,m\,$ via a $\rm\:gcd.\,$ The quadratic case of this result is at the heart of many integer factorization algorithms, which attempt to factor $\rm\:m\:$ by searching for a nontrivial square root in $\rm\: \mathbb Z/m,\,$ e.g. a square root of $1$ that is not $\:\pm 1$.

0
On

I translate from my 1968 Algebra book by Sebastião e Silva and Silva Paulo, because I really loved it, and still love.

Consider any quadratic equation

$$ax^{2}+bx+c=0,\qquad (a\neq 0).\qquad (1)$$

Multiplying both sides by $1/a$ we get the equivalent equation

$$x^{2}+\frac{b}{a}x+\frac{c}{a}=0.$$

We are going to show that it is possible to find $h$ and $\alpha $ such that

$$x^{2}+\frac{b}{a}x+\frac{c}{a}=(x+h)^{2}-\alpha .$$

Expanding the RHS gives

$$x^{2}+\frac{b}{a}x+\frac{c}{a}=x^{2}+2hx+h^{2}-\alpha .$$

This means, applying the method of undetermined coefficients, that

$$\left\{ \begin{array}{l} \frac{b}{a}=2h \\ \frac{c}{a}=h^{2}-\alpha. \end{array} \right. $$

Hence

$$\left\{ \begin{array}{l}h=\frac{b}{2a} \\ \alpha=h^{2}-\frac{c}{a}=\frac{b^{2}}{4a^{2}}-\frac{c}{a}=\frac{b^{2}-4ac}{4a^{2}}.\end{array}\right. \qquad (2)$$

In this way the given equation is reduced to the binomial equation in $x+h$

$$\left( x+h\right) ^{2}-\alpha =0\qquad \text{equivalent to}\qquad \left( x+h\right) ^{2}=\alpha $$

and thus it is satisfied when

$$x+h=\sqrt{\alpha }\qquad \text{or}\qquad x+h=-\sqrt{\alpha },$$

i.e. when

$$x=-h+\sqrt{\alpha }\qquad \text{or}\qquad x=-h-\sqrt{\alpha }.$$

Denoting the first value of $x$ by $x_{1}$ and the second by $x_{2}$ and replacing $h$ and $\alpha $ by their expressions given by $(2)$, yields

$$x_{1}=\frac{-b+\sqrt{b^{2}-4ac}}{2a}\qquad \text{or}\qquad x_{2}=\frac{-b-\sqrt{b^{2}-4ac}}{2a}.\qquad (3)$$

As it can be seen nowhere the method of completing the square was mentioned. Rather the method of undetermined coefficients was fully explained previously.

1
On

Pre-note: Since the asker needs an insight, I'd present a non-rigorous proof/intuition.

Try working backwards!

Assuming that the quadratic formula holds true,$$x = {-b \pm\sqrt{b^2 - 4ac} \over 2a} $$Isolate $x$.$$\begin{align}2ax &=& -b \pm \sqrt{b^2 - 4ac} \\ 2ax + b & = &\pm\sqrt{b^2 - 4ac} \\ (2ax + b)^2 & = & b^2 - 4ac \\ 4a^2x^2 + 4abx + b^2& = & b^2 - 4ac \\ 4a^2x^2 + 4abx + 4ac & = & 0 \\ ax^2 + bx + c & = & 0 \end{align}$$ Note that we divided both sides by $4a$ in the last step assuming a $\ne$ $0$, which is what we have learnt all along—in the polynomial $a_nx^n + a_{n - 1}x^{n - 1}\cdots a_0x^0$, $a_n \ne 0$.

Also, if you see everything from bottom to top, you'd almost get André Nicolas' proof!