Confidence when inferring p in a binomial distribution

49 Views Asked by At

Say you send n=100 cold sales emails and get 3 responses. One might say that the chance of a person responding is p=3%.

But it could also be p=2%, it could also be p=4%. 3% is just the MLE. How do you get a PDF of these values of p, or find a range of values for p that I can tell the sales director with 90% confidence?

I've been reading about likelihood functions - they seem like a step in the right direction but I'm not sure because their integral is not required to be 1. If you normalize a likelihood function so that it's integral is 1, can you use it as a PDF? (so you can answer questions like "what's the probability that $0.02 <= p <= 0.04$").

2

There are 2 best solutions below

3
On BEST ANSWER

The only way I know of to get an answer $[a,b] $ whose interpretation is that you believe there is a $90\%$ chance that $a \le p \le b$ is to use a Bayesian posterior distribution. Bayesian statistics allows you to talk about a probability of $p$ being in an interval. Frequentist results like confidence intervals do not have this interpretation, though it's a common mistake to believe they do. (Which is not to say that you shouldn't use them... you just shouldn't use them and interpret as probability that $p$ lies in the interval.)

In order to get a Bayesian posterior distribution for the parameter $p,$ you need to compute $$ f(p\mid X_1=x_1,\ldots X_{100}=x_{100}) = \frac{f(x_1,\ldots x_{100}\mid p) f(p)}{\int_0^1f(x_1,\ldots x_{100}\mid p) f(p) dp}$$ which is just Bayes' rule. Here $f$ denotes all the relevant density functions. $f(p)$ is called the prior distribution for $p.$ It represents your initial belief about the value of $p$ before seeing any data. $f(x_1,\ldots, x_n\mid p)$ is the probability density of the observed data when the parameter is $p$. $f(p\mid X_1=x_1,\ldots X_{100}=x_{100})$ is the posterior distribution, which is the distribution you should believe after you've seen the data.

So in order to compute the posterior distribution you need to have an assumed prior $f(p).$ There is much to say here about how one should go about choosing and interpreting this distribution, but for the sake of time, let's assume your prior is a uniform distribution $f(p) = 1,$ which would mean you initially believe it is equally likely for $p$ to assume any value in $[0,1].$ (I will say one thing: there's a good argument that this isn't actually the right prior to choose if you're "in a state of total ignorance" but let's ignore that for simplicity now.) Then we can compute $$ f(x_1,\ldots, x_n\mid p) = {100 \choose 3} p^3(1-p)^{97}$$ and then setting $f(p)=1,$ $$\frac{f(x_1,\ldots x_{100}\mid \theta) f(p)}{\int_0^1f(x_1,\ldots x_{100}\mid p) f(p) dp} = \frac{{100 \choose 3} p^3(1-p)^{97}}{\int_0^1 {100 \choose 3} p^3(1-p)^{97} dp} = \frac{ p^3(1-p)^{97}}{\int_0^1 p^3(1-p)^{97} dp}.$$

So now, you just need to find a nice interval $[a,b]$ for which $$ \frac{ \int_a^b p^3(1-p)^{97}}{\int_0^1 p^3(1-p)^{97} dp} = 0.9.$$ I won't go into exactly how to calculate this or what the best choice would be between the many possible intervals with $90\%$ probability. One simple choice that gives me a quick answer here is to make $0$ your lower endpoint (this wouldn't be a wise choice if the MLE wasn't so close to zero). Then, assuming I've calculated right, I get a probability of $90\%$ for $p$ to be in the interval $[0,0.0649].$

1
On

One way is the Wald interval: if there are $n$ trials and $x$ responses, the statistic $\hat p = x/n$ is approximately normal with mean $\mu = \hat p$ and variance $\sigma^2 = \hat p(1-\hat p)/n$, since $x \sim \operatorname{Binomial}(n,p)$ with mean $np$ and variance $np(1-p)$. Then a $100(1-\alpha)\%$ confidence interval for $p$ is $$\hat p \pm z_{\alpha/2} \sqrt{\hat p (1 - \hat p)/n},$$ where $z_{\alpha/2}$ is the upper $\alpha/2$ quantile of the standard normal distribution; i.e., $$\Pr[Z > z_{\alpha/2}] = \alpha/2.$$

The Wald interval has poor coverage probability when $p$ is either close to $0$ or $1$, or if $n$ is too small. Alternatively, we can use the Clopper-Pearson interval, which is an exact interval that arises from computing the quantiles of the binomial distribution itself. This ensures that the coverage probability is at least $100(1-\alpha)\%$, but in some cases this leads to an overly conservative interval due to the fact that the binomial distribution is discrete.

A third option is the Wilson score interval, whose derivation I will not provide here, but is given by $$\frac{\hat p + \frac{z_{\alpha/2}^2}{2n}}{1 + \frac{z_{\alpha/2}^2}{n}} \pm \frac{z_{\alpha/2}}{1+\frac{z_{\alpha/2}^2}{n}} \sqrt{\frac{\hat p(1-\hat p)}{n} + \frac{z_{\alpha/2}^2}{4n^2}}.$$

The Wilson score interval has good coverage properties, although it is not exact. There exist other confidence intervals for a binomial proportion. See the Wikipedia article for more information.