Why Sampling distribution not skewed when np < 10 and nq < 10?

514 Views Asked by At

My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.

After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.

I tried to simulate a population of bernoulli random variable Y, with a given probability for Y=1. I tried to sample from it, each time sample size being 'n', for 'N' experiments. The sampling distribution turned out to be normal as expected.

I tried to then show, the problem would not be a good fit for normal approximation, when np < 10 or nq < 10 because the distribution would be skewed. But when I simulate in python, I do not observe much skewness. The "extreme" curves were as good as some of intermediate curves, so I do not get it.

I tried increasing sample size, no of experiments, but no improvement. I tried using bar instead of hist, but in vain. I tried varying width of them but no use (in fact changing width affects size for another purpose - to maintain total height as 1 due to density=True)

Kindly check and let me know if there is any issue in the math I am doing.

Code:

import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi

# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1)  # varying probability

N = 2000  # no of experiments
n = 50    # sample size


fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()


def animate(i): 

    ax.clear()

    p = round(p_list[i],4)

    # create  population to be sampled from - Note pop has to be re created every time due to changing p
    pops = create_bernoulli_population(T, p)    

    _, Y_mean_list = sample_for_SDSP(pops, N,n)

    # plot discrete density
    _, bins,_ = ax.hist(Y_mean_list, density=True)

    # normal approx
    mu, var, sigma = get_metrics(Y_mean_list)

    X = np.linspace(min(bins),max(bins),10*len(bins))
    if sigma>0:
        Cp = 1/(sigma*sqrt(2*pi))
        Ep = -1/2*((X-mu)/sigma)**2
        G = Cp*np.exp(Ep)    
        ax.plot(X, G, color='red')         
    metrics_text = '$\mu_x:{}$ \n$\sigma_x:{}$'.format(mu, sigma)
    ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red')    


    ax.set_ylim([0,15])
    ax.set_xlim([0,1])

    np_factor = round(n*p,2)
    nq_factor = round(n*(1-p),2)
    metrics_text_2 = '$p:{}$ \n$np:{}$ \n$nq:{}$'.format(p, np_factor, nq_factor)
    ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red')    

plt.show()

ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)

from IPython.display import HTML
def display_animation(anim):
    plt.close(anim._fig)
    return HTML(anim.to_html5_video())

display_animation(ani1)

Helper file: SDSPSM

Output:
enter image description here