A closed-form formula involving Legendre polynomials for the "information potential" of the binomial distribution

115 Views Asked by At

This is a follow-up of this recent question to which I have given an answer.

It is mainly the "Edit" part of this answer that has an interest here with the following formula I have found (proof below):

$$\sum_{k=0}^n \left(\binom{k}{n}p^k(1-p)^{n-k}\right)^2=(1-2p)^n P_n\left(1+\tfrac{2p^2}{1-2p}\right) \ \tag{1}$$

where $P_n$ is the $n$-th Legendre polynomial.

My question : is (1) a known formula ?

Some similar results can be found in the recent article here page 8, in which you will see that the quantity on the Left Hand Side of (1) is called the "information potential" in the framework of information theory in the case of the binomial distribution $Bin(n,p)$ (we assume $p<1/2$).

Remark: The interest of (1) is that it allows to use the numerous properties of polynomials $P_n$.

Proof of (1):

We start from the following formula for Legendre polynomials (see formula (33) here):

$$P_n(x)=\dfrac{1}{2^n}\sum_{k=0}^n \binom{n}{k}^2(x-1)^{n-k}(x+1)^k\tag{2}$$

Now let us look for $a$ such that

$$\begin{cases}x-1&=&ap^2\\x+1&=&a(1-p)^2\end{cases}\tag{3}$$

Elimination of $x$ gives

$$a=\dfrac{2}{1-2p}.\tag{4}$$

Plugging (3) and (4) in (2), we get:

$$P_n(1+ap^2)=\dfrac{1}{2^n}a^n \sum_{k=0}^n \binom{n}{k}^2 (p^{2k}(1-p)^{2(n-k)})$$

which can be further simplified into:

$$P_n(1+ap^2)=\dfrac{1}{(1-2p)^n}\sum_{k=0}^n \left(\binom{n}{k} p^{k}(1-p)^{(n-k)}\right)^2$$

which is equivalent to (1).

Remark 1: An example of the interest of (1) is that it gives a second order approximate expression to $P_n(x)$ in the vicinity of $1$ in the following way :

$$P_n(1+\varepsilon) \approx \underbrace{P_n(1)}_1+\varepsilon \underbrace{P'_n(1)}_{n(n+1)/2}+\frac12 \varepsilon^2 \underbrace{P_n''(1)}_{n(n+2)(n^2-1)/8} \ \text{with} \ \varepsilon=\frac{2p^2}{1-2p}\tag{2}$$

This approximation is good for small values of $p$, say $<0.1$.

About expressions for $P'_n(1)$ and $P_n''(1)$ : they can be verified by a recurrence reasoning using the differential equation verified by polynomial $P_n$.

Remark 2: A purely probabilistic interpretation of the LHS of (1) is $P(X=Y)$ where $X,Y$ are independent Binomial $Bin(n,p)$ random variables.

Remark 3: Here is a graphical representation of functions $f_n$ :

$$f_n(x)=\sum_{k=0}^n \left(\binom{k}{n}x^k(1-x)^{n-k}\right)^2$$

for $n=2,3,4,5$ with $f_n > f_{n+1}$.

It is interesting to compute the minimum value of these functions which is

$$f_n(\tfrac12)=\frac{1}{2^{2n}} \binom{2n}{n} \approx \frac{1}{\sqrt{\pi n}}$$

according to a classical approximation of the central binomial coefficient

enter image description here