Approximation theorems in statistics

36 Views Asked by At

Poisson and Binomial approximations can be used to estimate the distribution of sum of a sequence of independent $0-1$ indicators. If the number of such indicators are fixed (say $n$), how big should be the value of n?

In general, can these approximations be used for small or medium sized $n$?

Thank you in advance for your help.

1

There are 1 best solutions below

1
On BEST ANSWER

Neither of the binomial nor the Poisson distributions would be good approximations if they mismatch the mean and variance, which should be $\sum p_i$ and $\sum p_i(1-p_i)$ respectively. There can be a fix for the binomial calculation by changing the $n$ parameter, but it could be seen as unnatural step

Suppose for example you have the $99$ $p_i$s $\{0.01,0.02,0.03,\ldots,0.99\}$. Then the resulting distribution of the sum of your indicators has mean $49.5$ and variance $16.665$.

  • A Poisson distribution with parameter $\lambda=49.5$ would have the same mean but variance $49.5$ which would be substantially overdispersed

  • A binomial distribution with parameters $n=99$ and $p=0.5$ would have the same mean but variance $24.75$ which would still be overdispersed

  • A binomial distribution with parameters $n=75$ and $p=0.66$ comes much closer: it too has the same mean and has variance $16.83$, which is quite close to what we are aiming for. But it is difficult to motivate using this value of $n$ except on empirical grounds

  • A Gaussian distribution can have whatever mean and variance we wish, so $\mu=49.5$ and $\sigma^2=16.665$ seem like natural choices. If we want to keep the integer values of the sum of the indicators, then we could in effect round the values and use a continuity correction (though this will make a small difference to the variance)

Let's compare the probabilities for values near $49.5$ from these different approximations. It should be fairly clear that the Gaussian approximation is really very good:

 x      Poisson     Bin n=99    Bin n=75    Gaussian     actual 
35      0.007281    0.001123    0.000258    0.000183    0.000170
36      0.009910    0.001996    0.000557    0.000423    0.000401
37      0.013125    0.003399    0.001139    0.000918    0.000884
38      0.016924    0.005546    0.002212    0.001880    0.001831
39      0.021263    0.008675    0.004073    0.003627    0.003564
40      0.026048    0.013013    0.007116    0.006589    0.006517
41      0.031130    0.018726    0.011792    0.011277    0.011207
42      0.036318    0.025859    0.018531    0.018181    0.018128
43      0.041386    0.034278    0.027606    0.027615    0.027589
44      0.046089    0.043627    0.038974    0.039512    0.039522
45      0.050186    0.053322    0.052118    0.053257    0.053304
46      0.053459    0.062595    0.065980    0.067624    0.067698
47      0.055734    0.070586    0.079027    0.080889    0.080979
48      0.056895    0.076468    0.089487    0.091149    0.091245
49      0.056895    0.079589    0.095718    0.096757    0.096854
50      0.055757    0.079589    0.096619    0.096757    0.096854
51      0.053570    0.076468    0.091938    0.091149    0.091245
52      0.050480    0.070586    0.082370    0.080889    0.080979
53      0.046670    0.062595    0.069388    0.067624    0.067698
54      0.042349    0.053322    0.054876    0.053257    0.053304
55      0.037729    0.043627    0.040673    0.039512    0.039522
56      0.033013    0.034278    0.028197    0.027615    0.027589
57      0.028379    0.025859    0.018245    0.018181    0.018128
58      0.023976    0.018726    0.010992    0.011277    0.011207
59      0.019912    0.013013    0.006148    0.006589    0.006517
60      0.016261    0.008675    0.003182    0.003627    0.003564
61      0.013063    0.005546    0.001519    0.001880    0.001831
62      0.010324    0.003399    0.000666    0.000918    0.000884
63      0.008029    0.001996    0.000267    0.000423    0.000401
64      0.006148    0.001123    0.000097    0.000183    0.000170

You also asked whether this works for small $n$. Now let's suppose there are $4$ $p_i$s of $\{0.2,0.4,0.6,0.8\}$, so the distribution of the sum of the indicators has mean $2$ and variance $0.8$. We might consider approximation distributions with the same mean: a Poisson distribution with $\lambda=2$ (variance $2$); a binomial distribution with $n=4$ and $p=0.5$ (variance $1$); a binomial distribution with $n=3$ and $p=\frac23$ (variance $\frac23$); and a Gaussian with $\mu=2$ and $\sigma^2=0.8$ and then continuity corrected. We would then get suggested approximate probabilities which look like

x      Poisson   Bin n=4  Bin n=3 Gaussian  actual
0       0.1353   0.0625   0.0370   0.0442   0.0384
1       0.2707   0.2500   0.2222   0.2413   0.2464
2       0.2707   0.3750   0.4444   0.4238   0.4304
3       0.1804   0.2500   0.2963   0.2413   0.2464
4       0.0902   0.0625   0.0000   0.0442   0.0384 

and I think the Gaussian approximation is still pretty good here. It might be possible to come up with cases which might not fit so well, but if you must have an approximation, I believe a Gaussian approximation is the way to go