Deriving p of a Bernoulli trial from number of successes observed

97 Views Asked by At

I have conducted $20$ Bernoulli trials. My observed outcome is $8$ successes.

It was proposed that $p_\text{suc}$ was $0.2.$ I have calculated the standard deviation as $\sqrt{npq} = \sqrt{20\times0.2\times0.8} = \sqrt{3.2} = 1.79$

The proposed mean was $4,$ so my observed value is therefore $+ 2.22$ standard deviations, and so my p value is $0.0264.$ So this is significant by my measure ($<0.05$). However is it valid to say that there is a $97.4\%$ chance that $p_\text{suc} > 0.2$?

Also, I want to define a range of values for $p_\text{suc}$ given my observed number of successes, which I can be $95\%$ confident in. So my estimate is $8/20 = 0.4 \pm x$.

Is there a way to calculate $x$, or can I use brute force to simulate $20$ trials say $10,000+$ times for different $p_\text{suc}$ values, and use an algorithm to find the upper and lower cut-off values for $p_\text{suc}$ whereby my observed success count of 8 appears in $< 5\%$ of simulations?

1

There are 1 best solutions below

0
On BEST ANSWER

In the frequentist perspective to hypothesis testing, $p_{\text{suc}}$ is not a random variable; it is a parameter whose value is fixed but unknown. Thus it makes no sense to say $\Pr[p_{\text{suc}} > 0.2] = 0.974$. All that the frequentist can say amounts to inferences about the true value of the fixed parameter based on the evidence shown in the random data. So when the frequentist calculates a $p$-value, it is a probability of observing a result as extreme as the data obtained, if we suppose the null hypothesis is true; hence, the $p$-value is a measure of the implausibility of observing such data by random chance, not a measure of the chance of the parameter being in a certain interval.

For a binomial proportion, construction of a confidence interval for the parameter is not unique: there are several methods, each with advantages and disadvantages. The simplest is based on a normal approximation and is the Wald interval; the most conservative is the Clopper-Pearson exact interval, which guarantees the nominal coverage probability, but can be unnecessarily large under certain conditions. Another one is the Wilson score interval.