Incorrect reasoning during Taylor series derivation?

887 Views Asked by At

I want to derive the Taylor series approximation of a function $f(x)$ at a point $p$ using the following reasoning, but "my" Taylor series formula misses the inverse factorial scaling of individual terms.

Why? Is my reasoning missing some steps? Are my assumptions incorrect?

  1. The goal is to build a local approximation of a function $f(x)$ at point $p$ using incremental changes of functions.
  2. Our first local approximation of $f(x)$ will be the constant function $f_{0}(x) = f(p)$. This is a very crude approximation, which we seek to incrementally improve by incorporating our knowledge of the first derivative $f'(x)$.
  3. The numerical value of the first derivative $f'(x)$ tells us by how much does the value of function $f$ change if we move one unit from the point $x$ in the direction of the $X$ axis (but this is just a local linear approximation, which is valid only near the point $x$, not necessarily 1 unit away from the point $x$).
  4. We can therefore improve our first guess $f_{0}$ to $f_{1}(x) = f(p) + (x - p)f'(p)$. The term $(x - p)f'(p)$ uses the local measure of change $f'(p)$ to construct a linear function $(x - p)f'(p)$, which represents an incremental offset that improves the approximation provided by $f_{0}$.
  5. $(x - p)f'(p)$ is only a linear function, so to further improve our approximation error $\vert f(x) - f_{1}(x)\vert$, we can use higher-order derivative $f''(x)$ to construct another linear approximation of how does the function $f'(x)$ change: $f_{2}(x) = f(p) + (x - p) \left( f'(p) + (x - p) f''(p) \right) $.

In general, we can keep recursively improving the linear approximations by adding incremental correction terms for lower-order derivatives, e.g. adding the offset $(x - p) f^{(n)}(p)$ that will bring the value of $f^{(n-1)}(p)$ closer to the exact value $f^{(n-1)}(x)$:

$$ \begin{aligned} \Delta x &= x - p\\ f_{n}(x) &= f(p) + \underbrace{\Delta x \left( f'(p) + \underbrace{\Delta x \left(f''(p) + \underbrace{\Delta x \left(f'''(p) + \cdots\right)}_\text{incremental correction for $f''(p)$} \right)}_\text{incremental correction for $f'(p)$} \right)}_\text{incremental correction for $f(p)$}\\ f_{n}(x) &= f(p) + f'(p)\Delta x + f''(p)\Delta x^2 + f'''(p) \Delta x^3 + \cdots \end{aligned} $$

Yet the correct Taylor expansion looks like this:

$$ f_{T}(x) = f(p) + \frac{f'(p)}{1!}\Delta x + \frac{f''(p)}{2!}\Delta x^2 + \frac{f'''(p)}{3!} \Delta x^3 + \cdots $$

Why is my reasoning incorrect? Why is "my" recursive formula not an optimal function approximation when compared to the equivalent Taylor expansion? And how can I change my reasoning to arrive at the "correct" Taylor polynomials that include inverse factorials?

A side-question: seeking a clearer geometrical understanding, what does the term $\frac{1}{n!}$ scale? Does it scale the term $f^{(n)}(p)$ or does it scale $\Delta x^n$?

4

There are 4 best solutions below

12
On BEST ANSWER

Note

After reading Dunham's answer once more, I realized he pointed out a critical mistake you've made directly (which I had done here also indirectly). In the bulk of this answer, I will from scratch, explain how Taylor series ist just nested linear approximations.


You need to increase the number of sample points to involve higher derivatives!

Taylor series is one of my favourite topics in mathematics, and I have gone through this same question you have myself before. The issue is of how you incoporated the second derivative. (Step-5)

We will first talk about approximating functions discrete sampling, and then talk about what happens when the sampling is infinite.

To talk about an n-velocity, we need information of the function at $n$ points. For example, if we talk about acceleration (2nd derivative), we need the function's values at three points.

This could be, for instance understood by seeing the first principles definition of the derivative(*):

$$f''(x) = \lim \frac{f'(x+h) - f'(x)}{h} = \lim \frac{ \left(\frac{f(x+2h) - f(x+h)}{h} \right) - \left(\frac{f(x+h) -f(x)}{h} \right) }{h}$$

Now, the thing is the second derivative contribution can only be captured if we have a sampling of points $ \geq 2$

If we invert for the second $f(x+2h)$, we have (using $f(x+h)=f(x) + f'(x)h$:

$$ f(x+2h) = f(x) + 2hf'(x) + h^2f''(x)$$

Similarly, we can find

$$ f(x+3h) = f(x) + 3 hf'(x)+ 3h^2f''(x) + h^3f'''(x)$$

We'd find that in general, we have :

$$f(x+nh) = f(x) + \binom{n}{1} hf'(x) + \binom{n}{2} h^2 f''(x) +...$$

Now why does this feel a binomial expansion? That's answered in the section of "what is the meaning of all this?", if you can accept the result for now, then you can continue reading continously through the next sections.


Direct calculation of (*):

Write $$ \begin{align}f(x+h) &= f(x+ \frac{h}{2} + \frac{h}{2})\\ &= f(x+\frac{h}{2}) + f'(x+ \frac{h}{2}) \frac{h}{2} \\ &= f(x) + \frac{h}{2} f'(x) +\left[ f'(x) + f''(x) \frac{h}{2} \right] \frac{h}{2} \\ &= f(x) + f''(x)h + f''(x) \frac{h^2}{4} \end{align}$$

A three way split would allow you to calculate $f(x+h)$ with a three split=


Intuitive calculation of $(*)$ [how you should have actually done it]

You could think of $f(x)$ as measuring distance of an accelerating car travelling, $a$ be the first point in time and $b$ being the final point. We introduce a point in the middle $a+ \frac{b-a}{2}$ to involve the second derivative.

The contribution of velocity to the change over this whole interval would be $ f'(a) (b-a)$, what about acceleration? Well suppose we gain velocity $\delta v$ in $\frac{b-a}{2}$ seconds, then for that velocity can only effect the distance in the next $\frac{b-a}{2}$ seconds. Hence, we have:

$$ f(b) = f(a) + f'(a) (b-a) +( f''(a) \frac{b-a}{2} \frac{b-a}{2}) = f(a) + f'(a)(b-a) + f''(a) (\frac{b-a}{2})^2$$

Now, still you'd be confused since there is an additional fact of $1/2$. Obviously, this is not a good approximation since, if slice the time interval more finely then even the most finest slices acceleration would be contributing every other second, so we'd have to take the formula of $f(x+nh)$ formula with $n \to \infty, h \to 0, nh=b-a$


How would get the taylor from the approximation based on the n-point sampling?

If we want to approximate $f(a+b)$ provided $f(a)$, we do:

Setting $nh=t = b-a$, we have:

$$ f(x+t) = f(x) + \frac{ \binom{n}{1}}{n} tf'(x) + \binom{n}{2} t^2\frac{f''(x)}{n^2} ...$$

We find that as $n \to \infty$ the above expression turns to,

$$ f(x + t) = f(x) + tf'(x) + \frac{t^2}{2!}f''(x)..$$

and so on.


Now, what on earth is the meaning of all this?

This is probably the most interesting and conceptually rich section in this post, as these ideas is used even in the most advanced calculus calculations (eg: . So, buckle up!

We have the same premise as before, we want to approximate $f(x)$ at $a+b$ provided $f(x)$ at $a$, we split the interval $\left[a, b \right]$ into $\left[ a , a+h \right] , \left[ a+h , a+2h \right] \cdots \left[ a+(n-1)h , a+nh=b \right] $

So, we will calculate the taylor by approximating in steps. First question: How does the value of the function change from $\left[ a, a+h \right]$? we have by linear approximation:

$$ f(a+h) = f(a) + f'(a) h $$

But let's right this in a different way, we write as:

$$f(a+h) = \left[(1 + h \frac{d}{dx}) f(x)\right]_{x=a}$$

We can see this is the same as the previous result by unpacking everything (let me know if this step wasn't clear).

Now, how does the value of the function change on the interval $\left[ a+h , a+ 2h \right]$?

$$ f(a+2h) = f(a+h) + h f'(a+h)$$

What happens if we use the previously mentioned trick? We have:

$$ f(a+2h) = \left[ (1 + h\frac{d}{dx}) f(x+h)\right]_{x=a}$$

But, hey we can do the trick again on the inner $f(x+h)$, we have:

$$f(a+2h) = \bigg[ (1 + \frac{d}{dx}) \left[ (1+ \frac{d}{dx}) \right] f(x) \bigg]_{x=a}$$

Now, we do another trick, we think of $(1+h \frac{d}{dx})$ as a function in itself, which takes functions and gives out functions. This is known as an operator. And, then, we we consider $(1+ h\frac{d}{dx})^2$ to be this map applied consecutively. We have,

$$f(a+2h) = \left[ (1+ h\frac{d}{dx})^2 f(x)\right]_{x=a}$$

By induction, we can show that:

$$f(a+nh) = \left[ (1+h \frac{d}{dx})^n f(x) \right]_{x=a}$$

Now, suppose we fix $n \cdot h = b-a$( number of partition * size of partition of b-a), and send the number of partitions to infinity, we have:

$$ \begin{align}f(b)= \lim_{ n \to \infty} f(a+nh) &= \lim_{n \to \infty} \left[ ( 1 + h \frac{d}{dx})^n f(x) \right]_{x=a} \\ &=\left[\lim_{n \to \infty} ( 1 + h \frac{d}{dx})^n f(x) \right]_{x=a} \end{align}$$

Now, here is something interesting, which we can show (try see why)

$$\lim_{n \to \infty} ( 1+ h \frac{d}{dx})^n = \lim_{n \to \infty} ( 1+ (\frac{b-a}{n} ) \frac{d}{dx})^n = 1+ (b-a) \frac{d}{dx} + \frac{(b-a)^2}{2!} {d^2}{dx^2} .. =e^{(b-a)\frac{d}{dx} }$$

We identify the operator series as evaluating the series for the exponential at the operator. Finally,

$$f(b) = \left[ e^{(b-a)\frac{d}{dx} } f(x) \right]_{x=a}$$

And that's it! That 's also the basis by the fancy shmancy category theory answer that tp1 wrote. It maybe remarked that the bracket evaluating trick thing I kept doing is also the basic idea for one shadowy version of calculus called Umbral Calculus


Why did this whole procedure "feel" like doing an integral?

See accepted answere here


Bonus : Shifting points of evaluation generally in higher calculus

enter image description here

Roger Penrose road to reality

The idea of linear approximation is quite profound, and more general single variable calculus itself. If we are to think of abstractly, it says to find the value at a little bit away, we add at the inital point plus the change in parameter * the rate of change function with that parameter. We have,

$$E_h f(x) = ( I+ h \nabla_t) f$$

In another way, we can write the value of the function at a later point $x+h$ using the initial way with derivative. This idea works also in higher calculus to change the point of evaluation of a function on many different variables. To capture the idea of point of evaluation being changed, we we think of the curve as being made up a bunch of tiny tangent vectors , and by some math magic, associate the change in the output as we move along these tangent vectors as a derivative operator acting on the function. Once, we have that last line, then we can immediately use the taylor series idea to shift the evaluation point of the function.

9
On

$f^\prime(p)+(x-p)f^{\prime\prime}(p)$ is an approximation to $f^\prime(x)$

As a starting point, you might consider the following question:

Which approximation is better?

  1. $f(x)\approx f(p)+f^\prime(p)(x-p)$ (Taylor polynomial)
  2. $f(x)\approx f(p)+f^\prime(x)(x-p)$
0
On

Taylor series is best explained by the following function: $$ f(x+k) = e^{k\frac{d}{dx}} [f(x)] $$

This is called "translation operator", $(x) \mapsto (x+k)$. which is equivalent of taylor series:

\begin{array}{c} \hspace{2.0cm}\mathbf{(x)} \overset{+k}{\longrightarrow} \mathbf{(x+k)} \\ \hspace{1.4cm}\scriptstyle{\mathbf{f}}\hspace{0.1cm}\mathbf{\downarrow} \hspace{1.7cm} \mathbf{\downarrow} \hspace{0.1cm}\scriptstyle{\mathbf{f}}\\ \hspace{1.8cm}\mathbf{f(x)} \underset{\mathbf{e^{k\frac{d}{dx}}}}{\longrightarrow} \mathbf{f(x+k)} \end{array}

Then we need to remember famous formula (from first pages of Rudin): $$ exp(z) = \sum_{n=0}^{\infty} \frac{z^n}{n!} $$. (this basically answers where the $n!$ is coming from, i.e. it's part of the defintion of $e^{z}$.)

When applied to the translation operator: $$ f(x+k) = \sum_{n=0}^{\infty} \frac{k^n\frac{d^n}{dx^n}}{n!} [f(x)]$$.

This expression is a function $$f(x+k) ::(x,k,f(x),f'(x),f''(x),f'''(x),...,f^{(\infty)}(x)) \rightarrow R$$. (this explains what all data is needed to calculate the taylor series.)

This starts to look very much like the ordinary taylor series.

4
On

I'd say your intuitive idea of improving the approximation is trying to change the wrong things.

$f(x)-f(p)=f'(p)\cdot(x-p)+\psi_1(x-p)$ where $\psi_1(u)/u\to0$ as $u\to0$ by definition of derivative. $f'(p)$ is the (well, represents the) best linear approximation to $f(x)-f(p)$. In light of that, to want to replace $f'(p)(x-p)$ with $(\text{something else})(x-p)$ is, I would say, the wrong philosophy (yes, the Taylor expression can be factored so that it is in that form, but that places the wrong emphasis imo: we don't want "better" linear approximations, we want a perfect linear approximation + a perfect quadratic approximation + ...)

You reasoned: "$f'(p)$ is a measure of the change, but we can improve upon it by using the more sensitive measure $f'(p)+(x-p)f''(p)$". I personally don't buy this argument, but I appreciate the concept of wanting a better measure of the change. It is the combination $x\mapsto f'(p)(x-p)$ which is the measure of the change (in some sense a "perfect" one) and if you want to improve upon it, you should focus on reducing the error. Changing either of the terms $(x-p)$ or $f'(p)$ individually - that has no reason to work. You could only expect improvements by reducing the error, which is guaranteed to be an improvement.

At the moment, we can only say that the error function - $\psi_1$ - is bounded by some constant multiple of $(x-p)$ in some small neighbourhood of $p$. The natural thing to want to do is push this to $(x-p)^2$ (and then to $(x-p)^3$, etc... and if your function is very nice (analytic) you can push this error right down to zero), but if you change $f'(p)(x-p)$ into something else you're being counterproductive: let's instead work on changing $\psi_1$ into something else!

So you can ask the question: focusing on the error term, what is the magic number $a$ (if it exists) so that: $$\psi_1(x-p)=a(x-p)^2+\psi_3(x-p)$$I.e. so that: $$f(x)-f(p)=f'(p)(x-p)+a(x-p)^2+\psi_3(x-p)$$Where $\psi_3(u)/u^{\color{red}{2}}\to0$ as $u\to0$? This magic number exists iff. $f$ is twice differentiable at $p$ and in this case it equals $\frac{1}{2}f''(p)$. The correct thing to do is simply calculate $a$, rather than try to guess at its value. But, this value can be intuitively motivated, and this has been done many times online - I'm not going to do a better job of that.

All I'll say is: if you want to approximate $f$, you want your approximation to start in the correct place (at $f(p)$) and then change in the same way as $f$ does (so the approximation can "keep up" with $f$ as it moves away from $f(p)$). That can be made precise by demanding the (first few) derivatives at $p$ are equal, and if you make that calculation you find $a=\frac{1}{2}f''(p)$ is correct (rather than your $a=f''(p)$ suggestion).