Binomial theorem - e.g $(1+x)^{-2}$, why must the first number in the brackets be 1?

89 Views Asked by At

So, as a A level student, when I came across the Binomial theorem I was told the first number must be $1$. If it was $2$ or $3$ we needed to bring it out, so as to make the first integer $1$.

But why is this?

An example of what I mean: $(4+3x)^{-2}$ becomes $4^{-2}\left(1+\frac34x\right)^{-2}$

2

There are 2 best solutions below

2
On

There isn't any rule that you have to do this, just guessing but the reason that you were taught this is probably because it makes the calculations simpler. For example $$ (5+3x)^3=5^3\left(1+3\times\frac35x+3\times\left (\frac35x\right)^2+\left(\frac35x\right)^3\right) $$ and you can keep the term as $5^3$ and not worry about multiplying all the powers out until the end. $$ (5+3x)^3=(5^3+3^2 5^2x+3^3 5 x^2+3^3x^3) $$ Incidentally the binomial theorem applies to things like $(x+y)^2$ or something where the first term can't easily be reduced to $1$.

1
On

TL;DR: What they call "binomial theorem" is actually the Taylor series of the function $t\mapsto t^{-2}$ around the point $t=1$, with the substitution $t=\frac{3}{4}x$ (in your example). Known theorems about radius of convergence imply that this series converges for $|t|<1$ i.e. $\left|\frac{3}{4}x\right|<1$. The first term is taken to be $1$ to simplify calculations, which otherwise get messier.

Longer answer: on A-levels, you get to accept that the "classic" binomial formula:

$$(a+b)^\alpha=\sum_{i=0}^\alpha{\alpha\choose i}a^i b^{\alpha-i}$$

which is known to work for a positive integer $\alpha$, somehow also "works" when $\alpha=-2$ (and you would also have examples with non-integer $\alpha$, e.g. $\alpha=\frac{2}{3}$ or $\alpha=-\frac{1}{4}$) ... but you need to set $a=1$. (?!)

What is going on there?

Let us start from your example $\left(1+\frac{3}{4}x\right)^{-2}$ (I presume you have already taken out the factor $4^{-2}$.) To calculate that, you calculate $(1+t)^{-2}$ and then substitute $t=\frac{3}{4}x$. And, in general, you are supposed to follow the same procedure if the exponent is any real number $\alpha$, positive or negative, integer or not (unlike the "classic" binomial formula, where the exponent must be a positive integer).

But why? What is the motivation for this calculation? And why does it work?

As it happens, there is such a thing called Taylor series. Its purpose is this: you know the value of a function at one point (say at the point $1$) - it can help you find the value of the same function at a nearby point (say at the point $1+t$), where $t$ is "small enough" (and for rigour I must say the function is "smooth" enough, whatever that means!). The formula uses the derivatives of the function (which you may have covered on A-levels) and, for a given point $t_0$ and a "small" distance $t$, it looks like this:

$$f(t_0+t)=f(t_0)+f'(t_0)t+\frac{f''(t_0)}{2!}t^2+\frac{f'''(t_0)}{3!}t^3+\cdots$$

... and yes, this is an infinite sum, and the equality above doesn't always hold, in fact the sum on the right side may not even exist in all circumstances.

However, for "small enough" $t$, the sum often converges, which loosely means that the finite sums (those made out of just the first few terms) get closer and closer to the actual full infinite sum as the number of terms increases. To estimate this process, with Taylor's series there goes hand-in-hand a couple of theorems which estimate the error you would make if you, instead of the full infinite sum, just added only the first $n$ terms, and then there are additional theorems that let you find out for which $t$ the sum converges in the first place to the original function.

So now take $f:t\mapsto t^\alpha$, where $\alpha$ is your exponent (e.g. $\alpha=-2$) and write the formula for $t_0=1$. Oh, wait, and what is $f'(t)$, $f''(t)$, $f'''(t)$ etc.? You may have learned on A-levels how to differentiate this function:

$$f(t)=t^\alpha$$ $$f'(t)=\alpha t^{\alpha-1}$$ $$f''(t)=\alpha(\alpha-1)t^{\alpha-2}$$ $$f'''(t)=\alpha(\alpha-1)(\alpha-2)t^{\alpha-3}$$

etc. Now, put all of this into the formula, and you get:

$$(1+t)^\alpha=1^\alpha+\alpha 1^{\alpha-1}t+\frac{\alpha(\alpha-1)}{2!} 1^{\alpha-2}t^2+\frac{\alpha(\alpha-1)(\alpha-2)}{3!} 1^{\alpha-3}t^3+\cdots$$

which, if you take into account that $1$ to any power is $1$, and that you can use "generalised binomial coefficients" ${\alpha\choose n}:=\frac{\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)}{n!}$, makes this formula look really familiar:

$$(1+t)^\alpha=1+{\alpha\choose 1}t+{\alpha\choose 2}t^2+{\alpha\choose 3}t^3+\cdots$$

So, this is roughly what you are doing. You are not using the binomial formula. You are using the Taylor series, of the function $f(t)=t^\alpha$, around the point $t_0=1$, and then you approximate it with a finite sum, with the number of terms pretty much always given in your problem. (At A-levels, AFAIK, you don't have any tools to figure out yourself how many terms you will need to get to the desired precision in your sum, so the number of terms has to be given to you.)

There are two more observations to make:

  • If $\alpha$ is a positive integer, then the generalised binomial coefficients are just ordinary binomial coefficients. For those, note also that ${\alpha\choose n}$ is zero whenever $n>\alpha$, i.e. the infinite sum above becomes finite, as all the subsequent terms for $n>\alpha$ are zero. Thus, you can fully calculate this sum, and it coincides with the "classic" binomial formula. (This gives some justification why at A-levels you still call this special case of Taylor series a "binomial theorem".)
  • It can be shown (way out of the scope of this answer) that the Taylor series for $(1+t)^\alpha$ always converges in the range $|t|<1$. It may or may not converge for other $t$, but at least for $t\in(-1,1)$ it will converge. In other words, calculating more and more terms and adding them will get you closer and closer to the real value of $(1+t)^\alpha$. This fact is not proven in your A-level course, but is given at face value to A-level students, so they can use it in some problems.

From the above, it should be clear that making the first term $1$ is done to simplify all the above calculations and give you the clear formula for the interval in which the series converges ($|t|<1$). You can try to do the whole derivation with the first term not being $1$ (e.g. looking at $t_0=4$ in your example) - you should get the same result, but the calculation will be messier, and you will need to run through the theorems I mentioned above (which you did not learn!) to calculate the interval of convergence.

Hope this helps.