Finding P-Value and confidence interval

143 Views Asked by At

I am given that the percentage of births that are girls is 48.5%, a study found out of $329$ children, $157$ of them were girls, I need to use binomial distribution to find a P-value for the test. I found that the probability of exactly or fewer than $157$ out of $329$ is $.4101$ which is 41.01% and exactly or more than $157$ out of $329$ is $.6321$ or 63.21%, I just wanted to verify what I thought was correct. I also need to find a 95% confidence interval for the probability that a child is a girl, I am unsure how to find the standard deviation with this problem though? Thank you in advance!

1

There are 1 best solutions below

3
On BEST ANSWER

There are two approaches to this problem:

The exact binomial test and associated confidence interval. In R, this procedure is implemented as binom.test. The 2-sided version of this test with $H_0: p = 0.485$ against $H_a: p\ne 0.585$ is shown below. [The 2-sided test is the default; an extra argument is required to request a 1-sided test, in either direction.]

binom.test(157, 329, p = 0.485)

        Exact binomial test

data:  157 and 329
number of successes = 157, number of trials = 329, 
 p-value = 0.783
alternative hypothesis: 
 true probability of success is not equal to 0.485
95 percent confidence interval:
 0.4221301 0.5326912
sample estimates:
probability of success 
             0.4772036

Because the P-value $0.485 > 0.05 = 5\%$ we do not reject $H_0.$ In other words, we do not find the estimated probability $\hat p = 157/329$ of girl births to be statistically significantly different than the hypothetical value $p = 0.085$ at the 5% level.

Under the assumption that $p =0.985, n = 329$ we find $P(X \le 157) = 0.41.$ That is the probability observing an $x \le \mu = 159.565$ is about $0.41.$ Some programs just double this one-sided P-value to get the P-value of a 2-sided test. However, the probability that $x \ge \mu$ by the same amount is a little smaller, so the two-sided P-value shown in R output above is $0.78.$

pbinom(157, 329, 0.485)
[1] 0.4101271
mu = 329*0.485; mu
[1] 159.565

In the plot below, the bars show the PDF (or PMF) of $\mathsf{Binom}(n = 329, p = 0.485),$ the vertical blue line is at $\mu = np,$ the lower vertical red line is the observed value of $X,$ and the P-value is the sum of the heights of the bars in the two tails outside of the two vertical red lines. [The PDF is almost, but not exactly symmetrical about $\mu.]$

enter image description here

The confidence interval $( 0.422, 0.533)$ is made using exact binomial CDFs, not using normal approximations. It is essentially the Clopper-Pearson "exact" 95% CI for $p.$

Approximate Normal test and 95% CIs. By contrast, a traditional approach to testing $H_0: p = 0.485$ against $H_a: p \ne 0.485$ uses a normal approximation to $X \sim \mathsf{Binom}(n,p).$ Then $Z = \frac{X - np}{\sqrt{npq}} = -0.3939 = \stackrel{aprx}{\sim} \mathsf{Norm}(0,1).$ Thus we reject $H_0$ for a 2-sided test at level 5% if $|Z| \ge 1.96.$ The P-value of the approximate normal test is $$P(X \le -0.3939)+P(X\ge 0.3939) = 0.697.$$

mu = 328*.485;  mu
[1] 159.08
sg = sqrt(mu*(1-.485)); sg
[1] 9.051309
z = (156 - mu)/sg;  z
[1] -0.3938657
2 * pnorm(-0.39)
[1] 0.6965365

A 95% Wald CI is of the form $\hat p \pm 1.96\sqrt{\hat p(1-\hat p)/n}.$ For our example $(0.423,0.531).$

p.hat = 157/329
CI.wald = p.hat + qnorm(c(.025,.975))*sqrt(p.hat*(1-p.hat)/329)
Ci.wald
[1] 0.4232317 0.5311756

A somewhat more accurate 95% CI for $n$ in the hundreds is due to Agresti and Coull: $(0.424, 0.531).$

p.est = (157+2)/(329+4)
CI.ac= p.est + qnorm(c(.025,.975))*sqrt(p.est*(1-p.est)/333)
CI.ac
[1] 0.4238293 0.5311256

In this particular problem, there is little difference between the exact binomial test and its normal approximation. Also, for practical purposes, the confidence intervals are essentially the same.