We are measuring the number of times a certain event happens. We do this with the help of sampling, so that we only report events with a probability p. For example p=0.01 would result in about 1/100 events coming through. Now, if we have measured N events, we will assume that there have for real been N/p events.
We would be interested to know what the error margin is based on p and N. We would like to be abled to say that with 95% probability our result is within some margin of error. How to calculate this? We are looking for a function which takes N and p as parameters.
Thanks for your help in advance!
I think this question fits more into Crossvalidated or Statistics part of SE, but here is how I would deal with this issue:
Number of events, or count data, is commonly modeled with Poisson distribution, which takes just one parameter - $\lambda$ (lambda). Poisson distribution's mean and variance equals to this $\lambda$.
So, for example, if you measured this N for several periods, and have, e.g. a vector of Ns [1000, 1020, 1040, 970, 900]; In R, to get $\lambda$ given such a vector, you would do:
Then, if your sample is large, you can get 95% confidence interval's lower and higher boundaries by multiplying standard deviation with 1.96 and subtracting and adding it from mean respectively.
Edited: forgot to mention, now that you have 95% CI, you can answer your question by checking if the value you are interested in is within this 95% CI or not.
As for the sampling rate, I'm not sure if it really matters, if you know for sure that you sample every 100th value - just multiply the values in Ns by 100.
Edited: Maybe this helps also: https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
Or given your N and p, what would be the uncertainty about the p.