Find the height of a normal curve so it fits a sampling distribution?

24 Views Asked by At

I am trying to fit a normal curve to a plot of the results of 1000 simulations regarding a difference in proportions. I created a simulation to randomly assign successes and failures to two groups, in order to see the distribution of differences of proportions based on random assignment. I then want to add a normal curve to the plot of the distribution, to show that it can be approximated by a normal curve. The mean and standard deviation of the normal curve can be calculated easily from common formulas, but the height (scaling factor) of the curve is unknown to me. Below is the code I created in R:

#Assign variables to the data
s1 = 210   #Number of successes in group 1
f1 = 1152  #Number of failures in group 1
s2 = 230   #Number of successes in group 2
f2 = 1106  #Number of failures in group 2

n = s1 + f1 + s2 + f2 #Calculate sample size
n1 = s1 + f1          #Calculate size of group1
n2 = s2 + f2          #Calculate size of group2
dp = s1/n1 - s2/n2    #Calculate difference in proportions

#Put results on cards
cards = c(rep("success",s1+s2),rep("failure",f1+f2))

simulations = 1000  #Number of simulations to run
results = numeric(simulations)  #List to store results of simulations (difference in proportions)

for(i in 1:simulations){  #Do the following for each simulation:

  #Shuffle cards
  cards = sample(cards,n,replace = FALSE)

  #Divide cards into two groups
  group1 = cards[1:n1]
  group2 = cards[(n1+1):n]

  #Determine proportion of success in group1
  p1 = sum(group1 == "success")/n1

  #Determine proportion of success in group2
  p2 = sum(group2 == "success")/n2

  results[i] = p1 - p2

} #End of for loop

#Summary of results:
plot(table(round(results,log(n,10)+1)))
table(round(results,log(n,10)+1))
mean(results)
sd(results)

#Add normal curve to plot
curve(dnorm(x,0,sd(results), add = T, col = "blue")

The plot produced is shown below: Plot of simulation results and normal curve

It seems to me that the height of the curve does not match the plot. I'm wondering if I'm missing a scaling factor, and how that can be determined.