How to reasonably (numerically) estimate $n\int_0^\infty\left(1-\left(1-e^{-t}\left(1+t+\ldots+{{t^{m-1}}\over{(m-1)!}}\right)\right)^n\right)dt$?

376 Views Asked by At

Recently in doing some expected value calculations, I've derived the following two integrals:

$$6\int_0^\infty\left(1-\left(1-e^{-t}\left(1+t\right)\right)^6\right)dt$$

$$6\int_0^\infty\left(1-\left(1-e^{-t}\left(1+t + {{t^2}\over2}\right)\right)^6\right)dt$$

Plugging these into Wolfram Alpha can get us numerical answers. For instance, the first integral comes out to about $24.1$, and the second integral comes out to about $32.7$.

However, I'm wondering if anyone can give a reasonable numerical estimate for these $2$ integrals from first principles (pencil and paper) without using a calculator, Wolfram Alpha, or a computer.

I've tried but made little to no progress, and I consulted some nearby PhD students and they didn't know either, so asking here.

4

There are 4 best solutions below

0
On BEST ANSWER

I think the sigmoid approximation is a little more workable by hand than leonbloy suggests. Following his answer, we will approximate the integral as $6t_0$ where $g(t_0) = \frac{1}{2}$. For the first integral this means estimating the solution to

$$\frac{1 + t}{e^t} = 1 - 2^{-\frac{1}{6}} \approx \frac{\log 2}{6}$$

or

$$\frac{e^t}{1 + t} \approx \frac{6}{\log 2} \approx 9$$

(using $\log 2 \approx 0.69 \dots $). The LHS can be shown to be an increasing function of $t$ so we can try to estimate the root using the intermediate value theorem. Substituting $t = 3$ gives $\frac{e^3}{4} < \frac{3^3}{4} < 7$ while substituting $t = 4$ gives $\frac{e^4}{5}$. We can estimate this by just computing $e^4$ to two significant digits, which gives

$$e^2 \approx 2.7^2 \approx 7.3, e^4 \approx 7.3^2 \approx 53$$

so $\frac{e^4}{5} \approx 10.6$. So $t_0$ in this case is between $3$ and $4$ and likely closer to $4$. If we just estimate $t_0 \approx 3.5$ as the midpoint we get an estimate of $6 \times 3.5 = \boxed{ 21 }$ for the first integral, sadly with no error bound.

For the second integral similarly we want to estimate the solution to

$$\frac{e^t}{1 + t + \frac{t^2}{2}} \approx 9.$$

Substituting $t = 4$ gives $\frac{e^4}{13} < \frac{81}{13} \approx 6$ while substituting $t = 5$ gives $\frac{e^6}{18.5}$. We again compute to two significant digits that $e^6 \approx 140$, so $\frac{e^6}{18.5} \approx \frac{140}{18.5} = 10 - \frac{45}{18.5} \approx 7.5$. So $t_0$ in this case is a bit larger than $5$ (we could substitute $t = 6$ to confirm this but I won't). If we just estimate $t_0 \approx 5$ we get an estimate of $6 \times 5 = \boxed{ 30 }$ for the second integral, again sadly with no error bound.

For the general integral we now want to estimate the solution to

$$e^{-t} \left( \sum_{i=0}^{m-1} \frac{t^i}{i!} \right) = 1 - 2^{-\frac{1}{n}} \approx \frac{\log 2}{n}.$$

We can think of the LHS as $\mathbb{P}(X \le m-1)$ where $X \sim \text{Pois}(t)$. Since we want this probability to be small we want $t \ge m-1$, so using the Chernoff bound adapted to this case we get

$$\mathbb{P}(\text{Pois}(t) \le a) \le \exp \left( a - t + a \log \frac{t}{a} \right)$$

where $a = m-1$. Setting $t = a + x$, which simplifies things a bit, we get

$$\mathbb{P}(\text{Pois}(a + x) \le a) \le \exp \left( - x + a \log \left( 1 + \frac{x}{a} \right) \right).$$

Taking the Chernoff bound as our approximation and setting it equal to $\frac{\log 2}{n}$ gives

$$x \approx \log n - \log \log 2 + (m-1) \log \left( 1 + \frac{x}{m-1} \right)$$

which has dominant growth $x \approx \log n$; substituting this into the logarithm term gives

$$x_0 \approx \log n - \log \log 2 + (m-1) \log \left( 1 + \frac{\log n}{m-1} \right)$$

which gives $t_0 \approx m-1 + x_0$, so we get a final approximation for the quantity in the title of your question:

$$\boxed{ n \left( \log n + (m-1) \left( 1 + \log \left( 1 + \frac{\log n}{m-1} \right) \right) - \log \log 2 \right) }.$$

The dominant growth here looks like $n \log n + (m-1)n \log \log n$ which seems roughly reasonable as an answer to the coupon collector problem you describe in the comment. For $m = 2, n = 6$ we get an approximation of

$$6 \left( \log 6 + \left( 1 + \log \left( 1 + \log 6 \right) \right) - \log \log 2\right) \approx \boxed{ 25.1 }$$

whereas for $m = 3, n = 6$ we get an approximation of

$$6 \left( \log 6 + 2 \left( 1 + \log \left( 1 + \frac{\log 6}{2} \right) \right) - \log \log 2\right) \approx \boxed{ 32.7 }.$$

The Chernoff bound itself can be used to estimate the integrand (we should split into two integrals, one from $0$ to something like $m$ and another from something like $m$ to $\infty$) which can maybe do better; the integrand itself can be written $1 - \mathbb{P}(\text{Pois}(t) \ge m)^n$, which means for fixed $t$ it gives the probability that the minimum of $n$ iid Poisson random variables with intensity $t$ is $\le m-1$. This seems closely related to the coupon collector problem you describe but I have to admit I don't actually see the relationship at the moment.

Edit: The asymptotics for the generalized coupon collector problem are actually described on Wikipedia and given by $n \log n + (m-1) n \log \log n + O(n)$, due to Newman and Shepp. So the above estimate is not too bad!

0
On

Let $f(t)=1-(1-e^{-t}(1+t))^6$.

  1. Binomial Theorem: $f(t)=1-\sum_k\binom{6}{k}e^{-kt}(1+t)^{6-k}$.
  2. Binomial Theorem: $f(t)=1-\sum_{j+k\le 6}\binom{6}{j,k,6-j-k}e^{-kt}t^j$.
  3. Cheat by looking up the moments of the exponential distribution or integrate yourself (using induction). \begin{aligned} \int_0^\infty e^{-kt}t^j\mathrm{d}t &=-\int_0^\infty\frac{j}{k}e^{-kt}t^{j-1}-e^{-kt}t^j\mathrm{d}t+\int_0^\infty\frac{j}{k}e^{-kt}t^{j-1}\mathrm{d}t\\ &=-\left[\frac{1}{k}e^{-kt}t^j\right]_{t=0}^\infty+\frac{j}{k}\cdot\frac{(j-1)!}{k^j} =\frac{j!}{k^{j+1}}. \end{aligned} Evaluating the multinomial coefficients and aggregating is straightforward (don't forget to multiply with $6$ in the end). Solving the second integral is completely analogous.
0
On

Firstly, $$I(m,n)=n\int\limits_0^\infty\left(1-\left(1-e^{-t}\left(1+t+\ldots+{t^m\over m!}\right)\right)^n\right)\,\text dt=n\sum\limits_{k=1}^n (-1)^{k-1} \dbinom nk J(m,k),\tag1$$ where $$J(m,k)=\int\limits_0^\infty e^{-kt}\left(1+t+\ldots+{t^m\over m!}\right)^k\,\text dt.\tag2$$

Then $$J(1,k)=\int\limits_0^\infty e^{-kt}\left(1+t\right)^k\,\text dt =e^k\text E_{-k}(k),\tag3$$ $$J(2,k)=\int\limits_0^\infty e^{-kt}\left(1+t+\dfrac12t^2\right)^k\,\text dt=\dfrac1{2^k}\int\limits_0^\infty e^{-kt} \left(1+(1+t)^2\right)^k\,\text dt=\frac1{2^k}\sum_{j=0}^k\dbinom kj R(k,j),$$ $$R(k,j)=\int\limits_0^\infty e^{-kt}(1+t)^{2j} \,\text dt =e^kE_{-2j}(k),$$ $$J(2,k)=\left(\frac e2\right)^k\sum_{j=0}^k\dbinom kj E_{-2j}(k),\tag4$$ where $$\text E_n(x)=\int\limits_1^\infty \dfrac{e^{-xt}\,\text dt}{t^n}$$ is E_n-function.

Some calculated values of $J(m,k)$ are collected in the table $(5).$

$$\begin{vmatrix} k & J(1,k) & J(2,k) & (-1)^{k-1}\dbinom6k\\ 1 & 2 & 3 & 6\\ 2 & \dfrac54 &\dfrac{33}{16} & -15\\ 3 & \dfrac{26}{27} & \dfrac{409}{243} & 20\\ 4 & \dfrac{103}{128} & \dfrac{48039}{32768} & -15\\ 5 & \dfrac{2194}{3125} & \dfrac{2580591}{1953125} & 6\\ 6 & \dfrac{1223}{1944} & \dfrac{4084571}{3359232} & -1\\ \end{vmatrix}\tag5$$

Taking in account $(1),(5),$ easily to get $$\color{green}{\mathbf{I(1,6)= \dfrac{390968681}{16200000} \approx 24.13386919753}},$$ $$\color{green}{\mathbf{I(2,6)= \dfrac{2286878604508883}{69984000000000} \approx 32.67716341605}.}$$

Edit of 23.10.22

Looks possible to get exact expressions also for $\;J(3,k)\;$ and $\;I(3,n).\;$

Really, $$J(3,k)=\int\limits_0^\infty e^{-kt}\left(1+t+\dfrac{t^2}2+\dfrac{t^3}6\right)^k\,\text dt =\dfrac1{6^k}\int\limits_1^\infty e^{-k(x-1)}(x^3+3x+2)^k\,\text dx$$ $$=\left(\dfrac e6\right)^k\int\limits_1^\infty e^{-kx}\sum\limits_{j=0}^k\dbinom kj\left((x^3+2)^j (3x)^{k-j}\right)\,\text dx,$$ with the clear structure of the result.

For instance, $J(3,6)=\dfrac{247150321423}{132239526912}\approx 1.8689595100.$

On the other hand, using of Taylor series near $\;x=0,\;x=2,\;x=\infty\;$ allows to write: \begin{align} &J_1(3,k)=\int\limits_0^1 \left(e^{-x}\left(1+x+\dfrac12x^2+\dfrac16 x^3\right)\right)^k\,\text dx\\[4pt] &\approx \int\limits_0^1 \left(1-\dfrac16 x^4\left(\dfrac14-\dfrac x5 + \dfrac {x^2}{12}-\dfrac {x^3}{42}\right)\right)^k\,\text dx,\\[4pt] &J_2(3,k)=\int\limits_1^{3,5} \left(e^{-x}\left(1+x+\dfrac12x^2+\dfrac16 x^3\right)\right)^k\,\text dx\\[4pt] &\approx e^{-2k}\int\limits_1^{3.5} \left(\dfrac{331}{45}+\dfrac29x^2-\dfrac49 x^3 +\dfrac18x^4-\dfrac1{90} x^5\right)^k\,\text dx,\\[4pt] &J_3(3,k)=\int\limits_{3,5}^\infty \left(e^{-t}\left(1+t+\dfrac12t^2+\dfrac16 t^3\right)\right)^k\,\text dt =\left(\dfrac e6\right)^k\int\limits_{4.5}^\infty e^{-kx}\left(x^3+3x+2\right)^k\,\text dx\\[4pt] &\approx \left(\dfrac e6\right)^k\int_{4.5}^\infty e^{-kx} \left(x^6+k\bigg(3x^4+2x^3+\dfrac{k-1}2\big(9x^2+12x+9k-14\big)\bigg) x^{3k-6}\right)\,\text dx. \end{align}

This allows to calculate $$J(3,6) = J_1(3,6)+J_2(3,6)+J_3(3,6)\approx 0.97507+0.884825+ 0.00901 = 1.86890.$$

Using obtained formulas, also can be calculated $$I(3,6)\approx 40.788.$$

0
On

I doubt that you can get a reasonable estimate with pencil and paper.

For a quick and rough approximation (still requires calculator):

Let $E_1 = 6 I_1 = 6 \int_{0}^\infty g_1(t) dt$ with $g_1(t)=1-h_1(t)^6$ and $h_1(t)=1-e^{-t}(1+t)$.

Same for $E_2$, with $h_2(t)=1-e^{-t}(1+t+\frac12 t^2)$

In both cases, we can see that $h(t)$ increase monotonously from $0$ to $1$ (can be checked computing the derivative). Hence we can expect that, after the composition with $1-(\cdot)^6$, $g(t)$ has a sigmoid shape, starting at $g(0) =1$ with some quite steep transition to $g(+\infty)=0$. If we assume this, we can try to approximate this sigmoid by a piecewise linear activation function. And in this case, the integral is $I=t_{0}$, where $g(t_0)=1/2$.

Under this assumption, we are left with computing $t_0$ as the solution of

$$ 1-(1-e^{-t}(1+t))^6=1/2 \tag 1$$

or $$ t = \log(1+t) - \log \left(1- \sqrt[6]{1/2}\right) \tag 2$$

Again, this cannot be computed with paper and pencil, but the numerical solution can be found iteratively, eg with a spreadsheet.

(The attempt to use the Taylor series for the logarithm leads nowhere).

We get the values $I_1=3.78$ ($E_1=22.68$) for the first one and $I_2=5.196$ ($E_2=31.17$) for the second one.

The relative error wrt the exact values (from Yuri Negometyanov's answer) is around $5\%$.

See also Qiaochu Yuan's comment for a probabilistic bound.

For general $m,n$, the method used in the answer of this similar (but not quite) question might be of interest.