Why is a derivative defined using limits?

8.8k Views Asked by At

From our childhood, we learn mathematics along with a gradual progression where topics are sequentially related to each other.

The discussion of calculus, almost, always starts with the concept of limits. This is where most beginners start to gasp for air. The concept of limit is a disconnected concept which superficially connects algebra and calculus.

Why is a derivative defined using the concept of limit?

8

There are 8 best solutions below

3
On BEST ANSWER

You need limits because limits are where calculus gets interesting.

We don't actually start by learning limits then derivatives. We start by learning slopes, before Calculus even starts. We know that given two points on a line, we can construct the function of a line between them. We also know that we can approximate the area under the curve via a series of trapezoids (the precursor to Riemann sums).

We can even start to get the hint that the slope between two points that are getting closer and closer approaches some particular slope (i.e. its derivative).

However, what makes Calculus impressive is the idea that we can formally define limits. Without such limits, Zeno's paradox remains unsolved, and its not clear whether we can ever reach what we now call the derivative. It is the formal definition of limits that permits one to define derivatives formally. Without it, all we have is basic algebra, with sums and products. The formal definition of limits is the thing which permits all of Calculus to come forth.

0
On

Why is a derivative defined using the concept of limit?

Because the concept of limit is the way mathematicians found to express rigorously the geometric notion of tangent line.

12
On

The idea of a derivative-as-limit was introduced in the 17th Century by Newton and Leibniz (Newton's first description of the derivative pre-dates Leibniz's by 20 or so years, but Newton didn't publish at the time, and the modern consensus is that Leibniz built the theory independently). We remember the names Newton and Leibniz in large part because they had the insight to use the concept of a limit to describe instantaneous rates of change. This was a very difficult idea which (perhaps) required the intellectual force of giants such as Newton and Leibniz.

Even so, neither Newton nor Leibniz really used ideas that we would recognize as limits (in the sense of $\varepsilon$-$\delta$ arguments). Instead, they estimated the quantities of interest with an error term, e.g. $$ \frac{(x-o)^2 - x^2}{o} $$ where $o$ is an "infinitesimal" error term, then performed algebraic manipulations and made the error terms disappear with a wave of the hands. While this approach can be made rigorous (see Robinson's Non-standard analysis, cited below, for a more modern approach to infinitesimals), it isn't quite how we usually think of things.

The modern notion of limit came later. I honestly don't know when it was introduced or by whom (it feels like something that Cauchy or one of his contemporaries might have come up with?). In any event, I would guess that modern $\varepsilon$-$\delta$ arguments date to the early 19th Century (they were certainly well established by the beginning of the 20th Century, but I don't think that mathematicians like Euler or Fourier used an entirely modern approach). In any event, the definition of a limit was another profound intellectual achievement, and is only "obvious" in retrospect.

The point being, it should not be surprising that the jump to calculus via limits is difficult for many students. The notion of a derivative as an instantaneous rate of change was a difficult concept which took a couple of hundred years and the attention of some very smart people to develop.

This comic may be relevant.


That being said, there are certain classes of curves that can be discussed in geometric or algebraic terms. We can build the theory in the following manner (note that this is ahistorical, but makes a pretty good narrative for, say, a group of students in a precalculus class).

The motivating question might be the following:

Given a curve in the plane (or in some higher dimensional space?!) and a point on that curve, what does it mean for a line to be tangent to the curve?

For a circle, we have a really good idea of what we want this to mean: the line touches the curve at exactly one point. From this definition, we are able to do a lot: tangents are perpendicular to radii, we can (after coordinitization) define a bunch of trigonometric functions related to tangent lines, etc. This notion of tangency also generalizes fairly well to other conic sections. However, it does not generalize well to arbitrary curves in the plane (or even arbitrary algebraic curves), which is particularly annoying if you are interested in the graphs of functions.

Another idea is the following: when we look at a line tangent to a circle, the line does not cross the circle—it touches at a point, then "bounces off". This isn't a very rigorous idea, but we can make it a little more rigorous. To do this, let's first consider a parabola.

Using our basic geometric ideas, we can define

Definition: We say that a line $\ell$ is tangent to the graph of $f(x) = ax^2 + bx + c$ if

  • $\ell$ is the graph of a function of the form $\ell(x) = mx + k$ for two real constants $m$ and $k$ (i.e. $\ell$ is not a vertical line; please excuse my abuse of notation, namely using $\ell$ both for the line and the function defining the line); and
  • $\ell$ intersects the graph of $f$ at exactly one point.

This first constraint may seem silly, but we want to eliminate the "obviously wrong" vertical lines which intersect the graph at a single point, but which don't really look like the kinds of tangent lines that we would expect.

This idea can be expressed algebraically: if $\ell$ is tangent to $f$ at the point $(r,f(r))$, then we need $(f-\ell)(r) = 0$ (which means that $f$ and $l$ intersect when $x=r$), and we need $(f-\ell)(x) \ne 0$ for all other $x$ (the line and parabola intersect exactly once). In other words, the function $(f-\ell)(x) = (ax^2 + bx + c) - (mx + k)$ has exactly one real root, namely $x=r$. By the factor theorem, this implies that there is some constant $C$ such that $$ ax^2 + (b-m)x + (c-k) = (f-l)(x) = C(x-r)^2. $$ Expanding out the right-hand side and equating coefficients, we have $$ ax^2 + (b-m)x + (c-k) = Cx^2 - 2Crx + Cr^2 \implies \begin{cases} a = C \\ b-m = -2Cr \\ c-k = Cr^2. \end{cases} $$ Solving for $m$ and $k$, we have $$ m = b+2Cr = b+2ar \qquad\text{and}\qquad k = c - Cr^2 = c-ar^2. $$ Therefore the line tangent to the graph of $$ f(x) = ax^2 + bx + c $$ is the graph of the function $$ \ell(x) = mx + k = (b+2ar)x + (c-ar^2). $$ This Desmos demonstration should be mildly convincing (you can move the point of tangency about by clicking-and-dragging, adjust the coefficients $a$, $b$, and $c$ using the sliders).

The really slick idea here is that tangency has something to do with the way in which a line intersects the parabola. If we look at the difference function $f-\ell$, the point of intersection is a root of order two. After some experimentation, it is reasonable to propose the following, slightly more general definition of tangency:

Definition: Let $p$ be a polynomial of degree $n$. We say that a line $\ell$ is tangent to $p$ at $(r,p(r))$ if the difference function $p-\ell$ has a root of order at least 2 at $r$. That is, $$ (p-\ell)(x) = (x-r)^2 q(x), $$ where $q$ is a polynomial of degree $n-2$.

This notion of tangency actually works rather well, and isn't much more difficult to work out than learning limits (once you know how limits work, have an analytic definition of a tangent line, and have proved useful things like the Power Rule, this algebraic version isn't so great, but learning all that other stuff sounds hard $\ddot\frown$). Generally speaking, you are going to have to multiply out the polynomial $$ (x-r)^2 q(x), $$ which is a relatively tractable problem, then equate coefficients (which reduces the problem to a system of linear equations). If $p$ is of very high degree, this can be tedious, but it requires no knowledge beyond high school algebra (or, perhaps more to the point, it requires no ideas that post-date Newton and Leibniz—Descartes could have (and did) figure it out).

This basic definition generalizes very well to rational functions, and, using the idea that the graph of an inverse function is the reflection of the graph of the original function reflected across the line $y=x$ can be further generalized to deal with functions involving $n$-th roots. If you want to go really deep down the rabbit hole, you might try to prove something like the implicit function theorem and show that this idea can also give you implicit derivatives of any algebraic curve (I don't know how easy or hard this would be to do; I wonder if it might not require some modern ideas out of algebraic geometry? $\ast$shudder$\ast$... sheaves are scary).

Robinson, Abraham, Non-standard analysis, Princeton, NJ: Princeton Univ. Press. xix, 293 p. (1996). ZBL0843.26012.

11
On

Xander Handerson's excellent answer shows how you can define the derivative of a polynomial function without needing to refer to limits; here is another (perhaps simpler?) approach.

Consider any polynomial $p(x)$, and choose a value $a\in\mathbb R$. We all know that you can use long division (AKA the "Euclidean algorithm") to divide $p(x)$ by $x-a$ and obtain a quotient, $q_a(x)$, and a remainder, $r(x)$. The remainder is guaranteed to be lower degree than the divisor, and the divisor $x-a$ has degree $1$, so $r(x)$ must be a constant, which we'll just write as $r$. Then we have $$p(x) = (x-a)q_a(x) + r$$ Moreover, setting $x=a$ in the above equation leads immediately to the result $r=p(a)$. In other words, the remainder you get when you divide $p(x)$ by $x-a$ is just $p(a)$. This is usually called the "Remainder Theorem."

Now let's take the equation $p(x) = (x-a)q_a(x) + p(a)$ and rearrange it just slightly:

$$q_a(x) = \frac{p(x) - p(a)}{x-a}$$

This provides a natural geometric interpretation of the polynomial $q_a(x)$: given any point $b \ne a$, $q_a(b)$ is the slope of the line joining $(a, p(a))$ and $(b, p(b))$.

These observations motivate the following definition:

Definition: For any polynomial $p(x)$ and any $a\in \mathbb R$, the derivative of $p(x)$ at $a$, denoted $p'(a)$, is $$p'(a) = q_a(a),$$ where $q_a(x)$ is the quotient obtained by dividing $p(x)$ by $x-a$.

For example, with $p(x) = x^2 - 3x$, if we choose $a=1$ we find that $p(x) = (x-1)(x-2) - 2$, so $q_1(x)=x-2$, and therefore $p'(1) = q_1(1) = -1$. More generally for any $a$ we have $x^2 - 3x = (x-a)(x-3+a) + (a^2-3a)$, so $q_a(x) = x-3+a$ and $p'(a) = q_a(a) = 2a - 3$, exactly as the "usual" definition gives.

Computationally, this makes finding the derivative of even higher-degree polynomials relatively straightforward: just divide $p(x)$ by $x-a$ using the Euclidean algorithm, throw away the remainder, and evaluate the quotient at $a$. However, I don't see any reasonable way to extend this notion to transcendental functions.

7
On

To find the gradient of a curve at a given point, you need to keep zooming in:

zooming on curves

From the above animation, you can guess the gradients of the different curves (either $0$, $1$ or undefined) but in order to be sure, you'd need an $\infty$ zoom.

This process of "zooming in indefinitely" is why you need the concept of limits when defining the derivative.

PS: I used the same animation as in one of my previous answers.

6
On

I will share a different perspective. In one CalcI class I TA'd for (the first actually) the professor went a completely different route. Instead of starting with the limit he started with defining the derivative purely algebraically.

He defined a new algebraic operation $\frac{d}{dx}$ that had the properties of $\frac{dx}{dx}=1$, $\frac{d(af+bg)}{dx}=a\frac{df}{dx}+b\frac{dg}{dx}$ and $\frac{dfg}{dx}=f\frac{dg}{dx}+g\frac{df}{dx}$. While this looked incredibly awful to me back then, if you want this can actually be made completely rigorous and shown to be equivalent to the derivative (in most cases, there are pathologies ofcourse).

He than taught the chain rule and continued by differentiating $\cos(x)$, $\sin(x) and $e^x$ using their series. Only towards the end of the course did we get to limits. This coupled with copious homework resulted in possibly the best understanding I've seen from most students.

It's worth noting that the definition as given can (I seem to recall) be shown to be nicely unique and equivalent to the usual at least for continuous functions using Stone-Weirstrass.

2
On

You need to understand what differentiability means, and what a derivative does, in order to be able to understand why one cannot avoid studying the convergence properties of functions (which is what I hope you understand by "limits") in order to define the derivative -- at least if you're working with the real numbers (i.e., no infinitesimals).

By the way, you can't avoid limits for long, as it pervades all of calculus, and in consequence all of higher mathematics. But since you have asked in the context of derivatives we shall focus on that.

So, what does a derivative do? What do we need them for? Why are they important? Well, in the study of functions, the derivative is important because it describes the behaviour of the function. This is what it does. For example, you may know nothing else about a function other than it satisfies some relation involving its derivatives (called a differential equation), and just this equation may give you a lot of insight into how the function behaves -- only if you understand and can work with derivatives. How does the derivative do this? Let us focus on a real function of a real variable. Of course, such functions can be represented by curves in the Cartesian plane so that any vertical line intersects them just once. It turns out the simplest functions to try to understand are the linear functions, always of the form $ax+b$ for any constants $a$ and $b$. These functions have the remarkable property that their value $b$ at $x=0$ and the number $a$, which is the difference between two values of points that are $1$ apart, called the slope, are sufficient to understand them. That is, these two numbers $a$ and $b$ completely characterise the linear functions. If $b=0$, then the line intersects the origin, so that the slope $a$ totally determines it -- in particular by telling us its direction relative to the positive $x$-axis. This direction is clearly constant. When we want to understand other functions in this way, we see that the curves describing them don't have constant direction, but bend and turn as $x$ runs through a set of real numbers. If at each point on such curves we can associate a unique linear function that best approximates them near that point, then our problem is solved and we can understand any function that has such an association (these are called differentiable functions, and the lines are called tangent lines). This is the main problem of the differential calculus.

When one then begins to study tangents to curves at points where they are differentiable, one discovers that one can only approximate (get near to) the unique tangent, which must be there since the curve is differentiable (that is, it resembles a line about that point). This is where limits enter. From the ancient mathematicians, most notably Archimedes, the idea of solving a mathematical problem by successive approximations that become better and better has been used, with great success, and it is reasonable too (i.e., it does not lead to inconsistencies). So, when we have such problems that cannot be solved by other means except by approximation, one often wants to find out (especially if one is mathematically inclined, for the practical man stops his approximations when he gets a value within a tolerable error) how much more one can continue to approximate, and whether one can obtain the "exact" answer in this manner. This is the intuition behind the use of limits -- When successive approximations, carried on arbitrarily often, get arbitrarily close to a certain value, but never quite attain it, one can kind of think of that value as the answer one sought, exactly.

You might object that this is not concrete, or that it is impossible to achieve in practice, but: (1) All of continuous mathematics will be closed to you if you go this way, (2) Even in discrete mathematics, there is a close analogue of this, namely the axiom of (mathematical) induction; if you accept this, I don't see why you cannot accept limits, (3) There are actually ways to compute these limits and show that they are indeed the limit sought; so what should stop us using them?

Once you grasp the idea of successively better approximations, and how that extends to the idea of limits, then all of analysis is open to you. Good luck.

0
On

I'd like to step back for a minute and take a look at what the idea of a derivative actually is - it's finding the slope at a certain point.

Let's look at how we might find slope.

Slope is defined as rise over run, which we can write as $$\frac{y_2 - y_1}{x_2-x_1}$$ Suppose $y$ in this case is a function $y(x)$. Let's define an interval, $\Delta x$, which we want to measure the slope over, and let's define $y(x)$ as the beginning of that interval. We then have

$$\frac{y(x+\Delta x) - y(x)}{\Delta x}$$

which is probably starting to look a little familiar if you've taken calculus. (Just to make this very clear, let's change our function from $y(x)$ to $f(x)$.)

$$\frac{f(x+\Delta x) - f(x)}{\Delta x}$$

But before we go further, we've got a bit of a pernicious point. What the heck does it mean to have an instantaneous slope? Slope is rise over run - there's gotta be run for there to be a rise! So how can we define this so we're not creating a paradox?

Well, let's imagine we have a line that only touches our function $f(x)$ at the point we want to find the "instantaneous" slope at and treat the "instantaneous" slope as just the slope of that line? This makes some kind of sense, so let's proceed.

This type of line is called a tangent line. To find the slope of the tangent line, we must used a limit. So we have this slope, and then to make sure the slope is that of a tangent line, we have

$$\lim_\limits{\Delta x\rightarrow 0} \frac{y(x+\Delta x) - y(x)}{\Delta x}$$

or a derivative.

Tl;dr:

We must define a derivative using a limit because to make the idea of "instantaneous slope" make sense, we have to use the idea of a tangent line, whose slope is defined using a limit.