How do I know if I have enough data to calculate probability?

752 Views Asked by At

It's been a while since I took a stats class so I'm pretty rusty on this.

I have a hobby where I've been collecting data on an online gambling game called "Crash" (you can see an explanation of the game here if you're not familiar).

I now have the crash points for a bunch of game rounds, and I've added up the counts for each round where the crash point was higher than a given number. In other words, if the crash point was 3, then 3, 2, and 1 would've been acceptable bets and they have their counter incremented. You can see what I mean in a sample of my data here.

Using my data, I'm pretty sure I can say that the odds of getting a crash of at least 1 is about 96% because I have about 150k instances out of 156k rounds where the crash was greater or equal to 1. However, I don't think I could accurately estimate the chances of getting a crash of at least 420 since 335/156k instances are probably too few. But I don't know how to calculate how many samples I need to be reasonably certain of the results.

My question is, how many instances of a crash point do I need to have before I could somewhat accurately estimate the odds of hitting that crash point or higher?

1

There are 1 best solutions below

0
On BEST ANSWER

If you have a sequence of 'binomial trials', independent and all with success probability $p.$ then a you can get a confidence interval for $p.$

Traditional binomial confidence interval. For very large $n,$ you can use $\hat p = X/n,$ as an estimate of $p,$ where $X$ is the number of Successes in $n$ trials. Then a traditional 95% confidence interval for $p$ is of the form $$\hat p \pm 1.96\sqrt{\frac{\hat p(1-\hat p)}{n}}.$$

This interval assumes that $n$ and $p$ are such that $Z = \frac{X - np}{\sqrt{np(1-p)}}$ is nearly standard normal and that $\frac{\hat p(1 - \hat p)}{n}$ is a reasonably good estimate of $\frac{p(1-p)}{n}.$

Determining n for a given margin of error. If you want to know $n$ that estimates $p$ within a particular margin of error $M,$ you can get that from the confidence interval as follows:

If you have a rough idea of $p$ in advance then you can choose $n$ so that the margin of error $M = 1.96\sqrt{\frac{\hat p(1-\hat p)}{n}}$ is of a desired size.

If you have no idea of the size of $p,$ then you can use $p = .5$ as a worst-case because for a given $M$ that value of $p$ gives the largest $n.$ [Among public opinion pollsters there is a rough rule that the margin of sampling error of a poll is $M \approx \sqrt{1/n},$ which is based on this worst case.]

Notes: (1) For $n < 500$ (approximately) or $p$ very near 0 or 1, it is better to use $n^\prime = n + 4$ and $X^\prime = X + 2$ (instead of $n$ and $X)$ to compute $\hat p$ and the confidence interval. [This makes an 'Agresti-Coull style of confidence interval' for which you can google explanations and a link to their paper in The American Statistician (1998) pp 119-126.] (2) A Bayesian 95% probability interval is based on a uniform prior uses quantiles .025 and .975 of $\mathsf{Beta}(1 + X, 1 + n - X).$