Rigorous treatment of integration by parts in a Calculus 1 course

2.4k Views Asked by At

I will be teaching Calculus 1 soon and I am trying to find some justifications for fishy arguments that are widespread out there.

In a standard Calculus 1 course, the following concepts are presented to students.

Antiderivative: A function $F$ is called an antiderivative of a function $f$ in an interval if $F'=f$ in that interval.

Indefinite integral: the family of all the antiderivatives of a function $f$ is called indefinite integral of $f$ and is denoted by $\int f(x)dx$. Having shown that the difference of any two antiderivatives of the same function is constant, if $F$ is an antiderivative of $f$, then we write $\int f(x)dx=F(x)+C$, where $C$ is a constant.

The problem I see is that some textbooks define the differential in a very vague manner and then foster the use of the equality $dy=y'dx$ without justification.

For example, when presenting the integration by parts all starts fine with the product rule of two differentiable functions $u$ and $v$: $$(uv)'=u'v+uv'\implies uv'=(uv)'-u'v$$ which implies that $$\int u(x)v'(x)dx=u(x)v(x)-\int u'(x)v(x)dx\quad\quad\quad (A)$$

The problem starts with the manipulation of the dummy symbols in the notation of the indefinite integral by the substitutions $dv=v'(x)dx$ and $du=u'(x)dx$ resulting in the popular formula: $$\int udv=uv-\int vdu\quad\quad\quad (B)$$

When I look at the definition of indefinite integral, equality (A) is well-defined but (B) is not.

Into practice: Calculate $\int 2x\cos(x)dx$.

A student using (A) will write: let $u(x)=2x$ and $v'(x)=\cos(x)$. Then $u'(x)=2$ and $v(x)=\int cos(x)dx=\sin(x)$ (here undertanding that we just need 1 (any) antiderivative)

Then by (A) we have: $\int 2x\cos(x)dx=2x\sin(x)-\int 2\sin(x)dx=2x\sin(x)+2\cos(x)+C$.

When using (B) students use $u=2x$ and $dv=\cos(x)dx$. Then compute $du=2dx$ and $v=\sin(x)$, and finally replace the pieces into (B) as if they were TeX processors. I mean, the method relies on the syntax of (B), not in the definition of indefinite integral.

Question: what is the mathematical justification to accept the use of (B)? The justification should be at the level of students taking Calculus 1.

Remark: Note that substitutions of the type $dy=y'dx$ are not necessary for the substitution techniques of integration in a Calculus 1 course.

Indeed, if $F'=f$, then the chain rule shows: $$(F\circ g)'(x)=f(g(x))g'(x)$$ so by the definition of indefinite integral $$\int f(g(x))g'(x)dx=F(g(x))+C,$$ or equivalently, $$\int f(g(x))g'(x)dx=\left.\int f(u)du\right|_{u=g(x)}.$$


Update: Thanks to the answers posted, I realized that my concern was justified: (B) is (apparently) only justified after considering contents that are not part of a calculus 1 course, say, through Stieltjes integrals or differentials. Thank you for the well-presented answers and for the comments and resources presented in the comment sections.

I am well aware that it would not be good to hide (B) from my students since as it was pointed out in the comments, students will face it sooner or later and they should be prepared for it. That is why I posted this question. I think I will present and mostly use (A) during the course. I will mention (B) stating that is true but we do not have the tools to prove it and that for now it can be used as a notation-wise shortcut for (A), so they have a way to justify steps that appear in many calculus textbooks, steps that are layout without a proper justification (and you wonder why people do not understand mathematics).

4

There are 4 best solutions below

0
On

(B) is also well defined whenever the integrals are Stieltjes integrals, ie., limits of sums of the form $$ S_1=\sum_{i=1}^nu(\xi_i)(v(x_i)-v(x_{i-1}))\,,\quad\quad S_2=\sum_{i=1}^nv(\eta_i)(u(x_i)-u(x_{i-1}))\,. $$ The Stieltjes integral does not depend on the choice of $\xi_i,\eta_i\in[x_{i-1},x_i]\,.$ Therefore we can choose $\xi_i=x_i$ and $\eta_i=x_{i-1}\,.$ Then $$ S_1+S_2=\sum_{i=1}^nu(x_i)v(x_i)-v(x_{i-1})u(x_{i-1})=u(x)v(x)-u(0)v(0)\,. $$ Taking the limit we have shown $$ \int_0^xu\,dv+\int_0^xv\,du=u(x)v(x)-u(0)v(0)\,. $$ The interval $[0,x]$ is obviously arbitrary, therefore $\int u\,dv+\int v\,du=uv\,.$

6
On

I want to say that I strongly disagree with the view presented by OP and further I think that we are doing a disservice to students by hiding the approach B from them.

First we have the notation $$\frac{dy}{dx}=y^{\prime}$$ I think you do not disagree with this notation, even though it is not really a fraction. We can then write the expression as, $$dy=y^{\prime}dx$$ and this is just equivalent notation. Or even better to write it as $$dy=\frac{dy}{dx}dx.$$ I would present this to the students as simply notation. In this sense equation B is the same as equation A in an alternative notation.

The main point however is that expression B is much easier, especially for students to remember and use in calculation. Further, it presents significant simplifications in calculation. I find working with students that once you can get them to accept this $dy$ notation, and this may require a little practice, they make rapid progress in applications of integration. Indeed many are confused by the standard A approach that is given them and it hampers their progress. For example, this is how I write the integration of $\int x^2 \cos x dx$, $$\int x^2\cos x dx=\int x^2d(\sin x)$$ $$=x^2\sin x-\int \sin x d(x^2)$$ $$=x^2\sin x-\int 2x\sin x dx$$ $$=x^2\sin x+\int 2x d(\cos x)$$ $$=x^2\sin x+2x\cos x-\int 2\cos x dx$$ $$=x^2\sin x+2x\cos x- 2\sin x$$

This is very streamlined and it makes difficult problems easier to solve.

There is also another issue, how will you treat substitution? Will you not not write, $x=f(u)$
and so $$dx=f^{\prime}(u)du$$ Thus you will have to use the $dy$ notation on any case, so why not harmonize the two methods of integration? There is also a fact that you will have to reconcile with, students will hopefully continue in mathematics, and they will encounter the other notion, thus they should be prepared for it.

As to the question of rigor there are two options: the Stieltjes integral as noted in the previous answer (which is a good idea to indicate), or the idea of differential forms, too advanced but it is rigorous.

For me I would use the notation of the B form almost exclusively. And with the following proof. $$(uv)^{\prime}=uv^{\prime}+vu^{\prime}$$ $$uv=\int uv^{\prime}dx+\int vu^{\prime}dy$$ $$uv=\int udv+\int vdu$$ And then I would provide many exercises to promote facility with manipulation of this notation. But, of course, people have been arguing about how to teach calculus for decades.

12
On

Here's my honest suggestion: just avoid (B) all together if you want to be super rigorous but don't want to go too deep down the rabbit hole; otherwise, just present both ways of writing down things and emphasize that both are talking about undoing the product rule. Because at this stage people don't even define the symbol $d(\text{anything})$ carefully, so there's no point trying to make rigorous sense out of it, or any theorems which follow from it. Often people treat the symbol $dx$ and $\int(\cdots)\,dx$ as merely symbol pairs "go together", like a "." at the end of a sentence.

Anyway, if you want to make things careful, we must first start with a careful definition of $d$ and its effect on functions.

Definition $1$.

Let $I\subset\Bbb{R}$ be an interval (say non-empty and open for simplicity), and let $F:I\to\Bbb{R}$ be a differentiable function. $dF$ shall mean the mapping $I\to \text{Hom}(\Bbb{R},\Bbb{R})$ such that for each $a\in I$, $dF_a:\Bbb{R}\to\Bbb{R}$ is the linear transformation defined as $dF_a(h):= F'(a)\cdot h$.

The symbol $x$ will now denote the inclusion function $x:I\to \Bbb{R}$, defined by setting for each $a\in I$, $x(a):= a$. This is clearly also a differentiable function on $I$, so according to the above definition, we can consider the object $dx$. One can then prove by unwinding definitions that $dF=F'\,dx$, meaning that for all $a\in I$ and all $h\in \Bbb{R}$, $dF_a(h)=F'(a)(dx)_a(h)=F'(a)\cdot h$.

This definition of $d$ shouldn't be introduced with the sole intention of making "indefinite integration rigorous". Rather, the idea of $d$ should be introduced when teaching differential calculus, because it really drives home the idea of local linear approximations (which is really the essence of differential calculus); maybe you'd like to read this answer for a few extra comments.

Definition $2$.

A differential $1$-form on an interval $I\subset\Bbb{R}$ is a mapping $\omega:I\to\text{Hom}(\Bbb{R},\Bbb{R})$. So, for each $a\in I$, we have a linear transformation $\omega_a:\Bbb{R}\to\Bbb{R}$.

Now, to each $\omega$, we can define a corresponding function $f:I\to\Bbb{R}$ as $f(a):=\omega_a(1)$. Thus, based on the definition of $dx$, you can write $\omega=f\,dx$. So, every $1$-form $\omega$ can be written as $\omega=f\,dx$ for some unique $f$, and conversely given any function $f:I\to\Bbb{R}$, we get a corresponding $1$-form $f\,dx$.

So, we now have the vocabulary of a differential 1-form, and of the differential/exterior derivative of a differentiable function $F:I\to\Bbb{R}$. The object $dF$ is thus an example of a differential 1-form.

You can prove that $d$ obeys some nice rules:

  • $d(F+G)=dF+dG$
  • $d(FG)=(dF)\cdot G + F\cdot (dG)$.

The problem of indefinite integration can thus be described in this language as follows:

Antiderivative/Primitive Let $\omega$ be a differential $1$-form on an interval $I$. We say $\omega$ has an anti-derivative/ primitive on $I$ if there is a differentiable function $F:I\to\Bbb{R}$ such that $dF=\omega$.

The set of all primitives of $\omega$ shall be denoted $\int \omega$, i.e $\int\omega=\{F\,| \text{$F:I\to\Bbb{R}$ is differentiable and $dF=\omega$}\}$. Now, we define the sum of such sets to be the set of all possible sums of individual functions. Define scalar multiplication similarly. Then, you can prove things like

  • for any differentiable function $F:I\to\Bbb{R}$, $\int dF=\{F+c\,:\, c\in\Bbb{R}\}$ (this requires the mean-value theorem and the fact that $I$ is connected).
  • If $\omega,\eta$ are differential $1$-forms on $I$ which have primitives, then so does $\omega+\eta$ and in this case, $\int (\omega+\eta)=\int\omega+\int \eta$.
  • If $\omega$ is a differential $1$-form on $I$ which has a primitive then for any $c\in \Bbb{R}$, so does $c\omega$. In this case, $\int c\omega=c\int\omega$.

So, now you can interpret integration by parts for $1$-forms as the reverse of the product rule: if $u,v:I\to\Bbb{R}$ are differentiable functions then $\int u\,dv= \int[d(uv)-v\,du]=\int d(uv)-\int v\,du= \{uv+C\,:\, C\in\Bbb{R}\}-\int v\,du$, or by slight abuse of notation, just $uv-\int v\,du$.


Remarks.

The above is a relatively self-consistent presentation of $d$, and itnegration by parts and so on.

Is the justification at the level of Calc 1? Well, one could definitely introduce the definitions in a calc 1 course (assuming the students have seen the concept of a function as a "rule" between two sets, not just between subsets of reals). So in this regard I would say yes, it is at the level of a calc 1 student.

The more important question however is whether one should introduce these definitions at the level of calc 1? Here I would very strongly say NO. One should not introduce these definitions, unless you have a group of very theoretically-minded and curious students. Why do I say this? Well, the concept of a differential form is undoubtedly very important in higher math, but when the domain is an interval $I\subset\Bbb{R}$, this makes things needlessly complicated, and mainly because linear algebra in one-dimension is very trivial: if $V$ is a vector space over a field $\Bbb{F}$, then we have a canonical isomorphism $\text{Hom}(\Bbb{F},V)\cong V$, the isomorphism being "evaluation at $1$". Above, we have $\Bbb{F}=V=\Bbb{R}$. Because of this fact, it suffices to deal only with functions between subsets of $\Bbb{R}$ (i.e just with functions $F$ and their derivatives $F'$) without dealing with more complicated target spaces (and hence with differential forms $\omega$ and $dF$).

Just to be clear, it is not that integration is trivial; rather differential calculus becomes much simpler because we're only dealing with an open subset $I\subset\Bbb{R}$.

Also, every textbook aimed at this level only introduces functions and their derivatives, so it's best to stick with that, and just emphasize that integration by parts is the product rule with a slightly rearrangement: $uv'=(uv)'-vu'$.

2
On

I offer that the main purpose of beginning courses in calculus is to empower science and engineering students (not math majors) with needed tools to do their work. And that work is very far removed from mathematical rigor.

Universities have a series of courses called "Advanced Calculus" in which mathematical rigor is emphasized. And then a series called "Real Analysis" that goes even deeper.