Use De Moivre–Laplace to approximate $1 - \sum_{k=0}^{n} {n \choose k} p^{k}(1-p)^{n-k} \log\left(1+\left(\frac{p}{1-p}\right)^{n-2k}\right)$

570 Views Asked by At

I am trying to use De Moivre–Laplace theorem to approximate $$1 - \sum_{k=0}^{n} {n \choose k} p^{k}(1-p)^{n-k} \log\left(1+\left(\frac{p}{1-p}\right)^{n-2k}\right)$$

The idea of an approximation is that we don't have the sum term which is difficult to calculate if $n$ is high.

Using the De Moivre–Laplace theorem gets us that: $${n \choose k} p^{k}(1-p)^{n-k} \approx \frac{1}{\sqrt{2 \pi np(1-p)}}e^{-\frac{(k-np)^2}{2np(1-p)}}$$ Now we see that \begin{align} F &= 1 - \sum_{k=0}^{n} {n \choose k} p^{k}(1-p)^{n-k} \log\left(1+\left(\frac{p}{1-p}\right)^{n-2k}\right) \\&\approx 1 - \int_{-\infty}^{\infty} \frac{1}{\sqrt{2 \pi np(1-p)}}e^{-\frac{(x-np)^2}{2np(1-p)}}\log_2\left(1+\left(\frac{p}{1-p}\right)^{n-2x}\right) dx \end{align}

my calculation is inspired by Entropy of a binomial distribution

If one has an other suggestion to approximate $F$ or get a closed for i would like to hear those. So far i've tried approximating $F$ with a least squares method using a tanh function as the fit function.

3

There are 3 best solutions below

3
On

The expression looks very much like the Bernstein approximation of a function ($1-f(x)$) on $[0,1]$. But the argument (in fact the degree $n-2k$) of $log$ function ruins everything.

Here is a quick idea. Denote $$ y(p)=\sum_{k=0}^{n} {n \choose k} p^{k}(1-p)^{n-k} \log\left(1+\left(\frac{p}{1-p}\right)^{n-2k}\right). $$

Let us assume that we can represent $y(p)$ in the form $y(p)=\sum_{m=0}^\infty y_m p^m$, where $y_m$ are constants not depending on $p$.

Note that $y(p)=y(1-p)$. Let us consider the equation $$ y(p)+y(1-p)=f(p). \tag{eq1}\label{eq1} $$ Although we can write out the expression for $f(p)$, let us think that we don't know how $f(p)$ looks like. But for sure, $f(p)$ must satisfy $f(p)=f(1-p)$. It is know (see for example http://eqworld.ipmnet.ru/en/solutions/fe/fe1116.pdf) that equations like \eqref{eq1} have a solution, for example, $$ \tag{eq2}\label{eq2} y(p)=f(p) \sin^2({\pi p \over 2}). $$ By expanding $\sin^2({\pi p \over 2})$ into the Maclaurin series we get $$ y(p)=f(p) \sum_{m=1}^\infty {(-1)^{m+1} 2^{2m-1} \over (2m)!} {p^{2m} \pi^{2m} \over 2^{2m}}. $$

Let us assume that $f(p)$ is an analytic function i.e. $f(p)=\sum_{m=0}^\infty {f^{(m)}(0)\over m!} p^m$. By writing \eqref{eq2} in the series form we have: $$ \sum_{m=0}^\infty y_m p^m = \left ( \sum_{m=0}^\infty {f^{(m)}(0)\over m!} p^m \right ) \left ( \sum_{m=1}^\infty {(-1)^{m+1} \over (2m)!} {p^{2m} \pi^{2m} \over 2} \right ). $$

From this relation it may be possible to find the expressions for $f^{(m)}(0)$ through $y_m$ by equating the coefficients at $p^m$. If this works out, we go back to the right part of \eqref{eq2} and try to find how many terms in the product $$ \left ( f^{(0)}(0) + f^{(1)}(0) p + f^{(2)}(0) {p \over 2} + \dots \right ) \left ( \sum_{m=1}^\infty {(-1)^{m+1} \over (2m)!} {p^{2m} \pi^{2m} \over 2} \right ). $$

yield the approximate value.

12
On

$$\color{brown}{\textbf{Transformations}}$$

Let WLOG the inequality $$q=\dfrac p{1-p}\in(0,1)\tag1$$ is valid. Otherwise, the corresponding opposite events can be reversed.

This allows to present the issue expression in the form of \begin{align} &S(n,p)=1 - (1-p)^n\sum_{k=0}^{n} {n \choose k} q^k\log\left(1+q^{n-2k}\right),\tag2\\[4pt] \end{align} or \begin{align} &=1 - (1-p)^n\sum_{k=0}^{n} {n \choose k}q^kq^{n-2k} - (1-p)^n\sum_{k=0}^{n} {n \choose k}q^k\left(\log\left(1+q^{n-2k}\right)-q^{n-2k}\right)\\[4pt] &=1 - (1-p)^n(1+q)^n - (1-p)^n\sum_{k=0}^{n} {n \choose k}q^k\left(\log\left(1+q^{n-2k}\right)-q^{n-2k}\right)\\[4pt] &S(n,p)= - (1-p)^n\sum_{k=0}^{n} {n \choose k}q^k\left(\log\left(1+q^{n-2k}\right)-q^{n-2k}\right).\tag3\\[4pt] \end{align} Formula $(3)$ can simplify the calculations, because it does not contain the difference of the closed values.

$$\color{brown}{\textbf{How to calculate this.}}$$

Note that the sum of $(3)$ contains both the positive and the negative degrees of $q.$ This means that in the case $n\to \infty$ the sum contains the terms of the different scale.

The calculations in the formula $(3)$ can be divided on the two parts.

$\color{green}{\textbf{The Maclaurin series.}}$

The Maclaurin series for the logarithmic part converges when the term $\mathbf{\color{blue}{q^{n-2k} < 1}}.$ This corresponds with the values $k<\frac n2$ in the case $\mathbf{q<1}$ and with the values $k>\frac n2$ in the case $\mathbf{q>1}.$ Then the Maclaurin series in the form of $$\log(1+q^{n-2k}) = \sum_{i=1}^\infty\frac{(-1)^{i+1}}{i}q^{(2n-k)i}\tag4$$ can be used.

If $\mathbf{\color{blue}{q^{n-2k} > 1}},$ then $$\log(1+q^{n-2k}) = \log(q^{2n-k}(1+q^{k-2n})) = (2n-k)\log q + \log(1+q^{k-2n}).\tag5$$

If $\mathbf{\color{blue}{q^{n-2k} = 1}},$ then $LHS(4) = \log2.$

If $\mathbf{\color{blue}{q^{n-2k} \lesssim 1}},$ then $$\log(1+q^{2n-k}) = \log\frac{1+r}{1-r} = 2r\sum_{i=0}^\infty\frac{(-1)^i}{2i+1}r^{2i},\quad \text{ where } r=\frac{q^{2n-k}}{2+q^{2n-k}}\approx\frac{q^{2n-k}}3,\tag6$$ and can be used some terms of the series.

$\color{green}{\textbf{The double summations.}}$

After the substitution of the $(4)$ or $(5)$ to $(3)$ the sums can be rearranged. For example, $$\sum_{k=0}^{L}{n \choose k}q^k\sum_{i=1}^\infty\frac{(-1)^{i+1}}{i}q^{(2n-k)i}= \sum_{i=1}^\infty\frac{(-1)^{i+1}}{i}\sum_{k=0}^{L}{n \choose k}q^kq^{(2n-k)i}$$ $$= q^{n+1}\sum_{i=1}^\infty\frac{(-1)^{i+1}}{i}\sum_{k=0}^{L}{n \choose k}\left(q^{i+1}\right)^{n-k},$$ wherein the order of the summation can be chosen, taking in account the given data.

5
On

We can apply a method similar to this. Since the summand has a sharp peak around $k = n/2$, we can take an expansion valid for large $n$ and for $k$ close to $n/2$ and then, also due to the tails being small, extend the summation range indefinitely:

$$a_k = \binom n k p^k q^{n - k} \ln \left( 1 + \left( \frac p q \right)^{n - 2 k} \right), \quad q = 1 - p, \\ a_{n/2 + i} \sim \sqrt {\frac 2 {\pi n}} \left( 2 \sqrt {p q} \right)^n \left( \frac p q \right)^i \ln \left( 1 + \left( \frac q p \right) ^{2 i} \right), \\ \sum_{k = 0}^n a_k \sim \sqrt {\frac 2 {\pi n}} \left( 2 \sqrt {p q} \right)^n \sum_{i = -\infty}^\infty \left( \frac p q \right)^i \ln \left( 1 + \left( \frac q p \right) ^{2 i} \right), \\ n \to \infty, p \text{ fixed}, 0 < p < 1, p \neq \frac 1 2.$$