Implementing Bates distribution

382 Views Asked by At

I've been trying to plot the Bates distribution curve, The Bates distribution is the distribution of the mean of n independent standard uniform variates (from 0 to 1).

(I worked on the interval [-1;1], I made a simple change of variable).

The curve destabilizes after such number of n, which prevents me from moving forward. In order to consider that the variable x is continuous, I sampled interval in 10**6 samples. Here are some examples for different n: distributions of some differents n

But for n greater than 29, the curve diverges, and the greater n, the closer the deformation caused by the divergence is to the (mean) center of the curve: divergence of the distribution

The Bates distribution of probability is defined as follows: equqtion of Bate distribution

My code:

samples=10**6

def combinaison(n,k):   # combination of K out of N
  cnk=fac(n)/(fac(k)*fac(abs(n-k))) # fac is factoriel 
  return cnk


def dens_probas(a,b,n):
  x=np.linspace(a, b, num=samples)
  y=(x-a)/(b-a)
  F=list()
  for i in range(0,len(y)):
    g=0
    for k in range(0,int(n*y[i]+1)):
      g=g+pow(-1,k)*combinaison(n,k)*pow(y[i]-k/n,n-1)
    d=(n**n/fac(n-1))*g
    F.append(d)         
  return F

Any idea to correct the divergence for larger n

Thank you

1

There are 1 best solutions below

2
On

Here is a simulation (in R) of the distribution of the average a of 10 independent standard uniform random variables (based on a million averages of ten simulated standard uniforms).

set.seed(2020)
a = replicate(10^6, mean(runif(10)))
summary(a); sd(a)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1142  0.4377  0.5002  0.5001  0.5626  0.8917 
[1] 0.09139322

hist(a, prob=T, br=30, ylim=c(0,5), col="skyblue2")
  curve(dnorm(x, mean(a), sd(a)), add=T, lwd=2, col="red")

enter image description here

The red curve is for the normal density with mean matching the mean $(\approx 0.5)$ of a million averages of ten and SD matching the SD $(\approx 0.091)$ of the million averages.

The first 1000 averages pass a Shapiro-Wilk normality test.

shapiro.test(a[1:1000])

        Shapiro-Wilk normality test

data:  a[1:1000]
W = 0.99761, p-value = 0.1541

However, S-W tests for larger numbers of values from a may fail any normality test because the match to normal is not quite perfect for averages of ten standard uniforms. Of course, the support of the distribution of $\bar X_{10}$'s will be $(0,1)$ [about $\pm 5\sigma],$ not $(-\infty,\infty).$

Note: Ever since digital computers became available, there has been a demand for variates that are "indistinguishable in practice" from normal samples. In the early days of computation (1950's and 60's), when computing beyond basic arithmetic was expensive, it was common to use $Z = \sum_{i=1}^{12} U_i - 6,$ where $U_i \stackrel{iid}{\sim} \mathsf{Unif}(0,1),$ to simulate $Z \stackrel {aprx}{\sim} \mathsf{Norm}(0,1).$ [Support $(-6, 6).]$ Next was to use the Box-Muller transformation (two uniforms to get two normals via log and trig functions, see Wikipedia). [Because of overflow/underflow with transindental functions, essentially limited to $(-7,7).]$ Nowadays, many statistical software programs (including R) use Michael Wichura's rational approximation to the normal quantile function, which is accurate within the limits of double precision computation.