Where is my error in finding the edgeworth expansion of the binomial distribution?

290 Views Asked by At

Let $B_n$ be a standardized binomial distribution. To illustrate the Edgeworth Expansion I made a plot showing $f(x)=P(B_n \le x)-\Phi(x)$ and $g(x)=\frac{p-q}6 (x^2-1) \phi(x) \frac{1}{\sqrt{npq}}$ for $n=1000$ and $p=\tfrac 14$ with only those values of $x$ where $P(B_n=x)$ is nonzero:

enter image description here

Because of the Edgeworth expansion $$P(B_n \le x) = \Phi(x) + \frac{p-q}6 (x^2-1) \phi(x) \frac{1}{\sqrt{npq}} + O\left(\tfrac 1n\right)$$ I assumed that $f(x)$ and $g(x)$ would be nearly the same which is not the case. Thus I guess I have made a failure but I cannot find it. It would be great if you can point me to the error I have made!

Derivation of Edgeworth Expansion

In [1, p. 46] Hall cites a formula from [2]:

$$P(\tilde B_n\le x) = \Phi(x) + \frac 1{6\sqrt n} k_3 (1-x^2) \phi(x) + \frac{R\left(n(np+x\sqrt{pq})\right)}{n\sqrt{pq}}\phi(x) + o\left(\frac 1{\sqrt n}\right)$$

with $R(x) = \lfloor x\rfloor -x+\frac 12$. For a Bernoulli distributed $X$ we find $E[X^4]=p<\infty$ so that $o\left(\frac 1{\sqrt n}\right)$ can be replaced with $O\left(\frac 1{n}\right)$ [1, p. 46]. $k_3$ is the 3rd cumulant of the standardized Bernoulli distribution [1, pp. 41-42] and thus its skewness [1, p. 45]. We have $k_3 = \frac{q-p}{\sqrt{pq}}$ [3]. Thus we find

$$P(\tilde B_n\le x) = \Phi(x) + \frac {p-q}{6\sqrt{npq}} (x^2-1) \phi(x) + O\left(\frac 1{n}\right)$$

Therefore I would aspect that $P(\tilde B_n\le x) - \Phi(x) \approx \frac {p-q}{6\sqrt{npq}} (x^2-1) \phi(x)$ which is not the case in the above diagram...

References:

  1. Peter Hall "The Bootstrap and Edgeworth Expansion", 1992, Springer series in statistics
  2. Gnedenko and Kolmogorov (1954, Section 43): Limit Distributions for Independent Random Variables. Addison-Wesley

Source code for the plot

I used scipy and matplotlib to create the plot. The source code is:

import matplotlib.pyplot as plt
import numpy as np
import os

from scipy.stats import norm, binom

p = 0.25
q = 1-p
n = 1000

ks = np.arange(0., n+1, 1.)
xs = ( ks - n * p ) / np.sqrt(n*p*q)

sel = (xs > -5) & (xs < 5)

plt.plot(xs[sel], binom.cdf(ks[sel], n, p) - norm.cdf(xs[sel]))
plt.plot(xs[sel], (p-q) / 6 * (xs[sel]**2-1) * norm.pdf(xs[sel]) / np.sqrt(n*p*q))
plt.legend(("f(x)", "g(x)"))
plt.savefig( "plot.png" )

Question: Can you my error in the derivation of the Edgeworth series or in the code of the plot? Thanks in advance for your answers!

1

There are 1 best solutions below

0
On BEST ANSWER

I have found the error ;-) I misread a variable $X$ in [1, p. 46] as being the standardized binomial distribution, but it must be the standardized Bernoulli distribution. Thus we have

$$P(\tilde B_n\le x) = \Phi(x) + \frac 1{6\sqrt n} k_3 (1-x^2) \phi(x) + \frac{R\left(np+x\sqrt{npq}\right)}{\sqrt{npq}}\phi(x) + O\left(\frac 1n\right)$$

Because $\frac{R\left(np+x\sqrt{npq}\right)}{\sqrt{npq}}\phi(x)$ is of the same order as $\frac 1{6\sqrt n} k_3 (1-x^2) \phi(x)$ we cannot omit this term. This shows also the following diagram with the additional $h(x)=\frac{R\left(np+x\sqrt{npq}\right)}{\sqrt{npq}}$:

enter image description here

After adding $h(x)=\frac{R\left(np+x\sqrt{npq}\right)}{\sqrt{npq}}$ to $g(x)$ so that I have $g(x)=\frac 1{6\sqrt n} k_3 (1-x^2) \phi(x) + \frac{R\left(np+x\sqrt{npq}\right)}{\sqrt{npq}}$ I get the expected plot

enter image description here

Remark: $R\left(np+x\sqrt{npq}\right)=\tfrac 12$ for $x_k = \frac{k-np}{\sqrt{npq}}$ for $k\in\mathbb Z$ -> only for $x_k$ are the points evaluated.

Note: In the above diagrams are only those values for $x$ considered where $R(x)=\frac 12$. In reality the functions oscillates much...