How to calculate the lower and upper bound of the error in estimating a ratio with low sample size?

743 Views Asked by At

I have the following problem,

I am trying to estimate the conversion rate of product sales online, my data is simple, I have the number of clicks and the number of sales for each product.

Sales are sparse so there are few sales per product. how can I estimate the upper and lower bound of the error on each product?

For example, if I have 21 clicks and 1 sale, the conversion rate is 1/21=0.07, I want to be able to add a lower and upper bound on the error of this estimation.

Thank you

1

There are 1 best solutions below

0
On

If you are trying to get a confidence interval for the 'population' conversion rate based on $1$ sale among $21$ clicks, you can view this as a confidence interval for a binomial proportion. The relevant Wikipedia page, of @tomi's Comment, discusses several styles of confidence intervals.

In R, the procedure prop.test gives an 'exact' 95% CI $(0.0066, 0.1828)$ based on binomial CDFs (with no normal approximation) as shown below. [The suffix $conf.int shows just the CI.]

prop.test(1, 27, cor=F)$conf.int
[1] 0.006568146 0.182834659
attr(,"conf.level")
[1] 0.95

A Bayesian interval estimate, based on the Jeffries noninformative prior $\mathsf{Beta}(.5,.5),$ is often used as a frequentist CI.

For your data, this 95% CI $(0.0052,0.2018)$ uses quantiles $.025$ and $.975$ of the Bayesian posterior distribution $\mathsf{Beta}(.5+x, .5+n-x) = \mathsf{Beta}(1.5, 20.5).$

qbeta(c(.025, .975), 1.5, 20.5)
[1] 0.005187043 0.201755968

The Wald CI of the form $\hat p \pm 1.96\sqrt{\frac{\hat p(1-\hat p)}{n}},$ where $\hat p = x/n.$ This is an asymptotic interval meant for use with large $n$ and so one does not expect reliable results with $n = 21.$ For your data the Wald CI is taken as $(0, 0.1327),$ suppressing the impossible negative lower limit.

n = 21;  x = 1;  p.h = x/n
CI.wald = p.h + qnorm(c(.025,.975))*sqrt(p.h*(1-p.h)/n)
CI.wald
[1] -0.04346329  0.13870138

Agresti and Coull (1998) proposed a modification of Wald's CI, using point estimate $\tilde p = (x+2)/(n+4)$ to obtain the interval $\tilde p \pm 1.96\sqrt{\frac{\tilde p(1-\tilde p)}{n+4}},$ which has a more reliable 95% coverage probability for small $n$ than the Wald CI.

The Agresti interval is currently used in many elementary and intermediate level statistics texts because it gives reasonably good good results and can be computed without specialized software (e.g., using just a mobile phone calculator). For your data, this 95% CI amounts to $(0, 0.2474).$

n = 21;  x = 1;  p.e = (x+2)/(n+4)
CI.ac = p.e + qnorm(c(.025,.975))*sqrt(p.e*(1-p.e)/(n+4))
CI.ac
[1] -0.007382581  0.247382581

Note: I have shown endpoints of 95% CIs to four decimal places to make clear the differences among types of intervals. In practice, with small $n,$ it might be appropriate to show two-place accuracy. To one place: 'Exact', Jeffries, and Agresti intervals are all $(0.0, 0.2).$