Estimating a random variable from repeated trials

78 Views Asked by At

I have an $n$ sided die and suspect that it is biased. I'm interested in the probability of rolling a $1$, so I roll the die $m$ times and count up the number of times I roll $1$, then divide the count by $m$ to calculate an estimate.

What is the expected error on this estimate?

For those of you hesitant to do people's homework, I'll use the answer to this to estimate how many crystal growth simulations one needs to run to get a good estimate of how likely one crystal layer is to follow another. Hopefully I can prove that estimating the probability this way is far slower than working it out using a Markov Chain, and I'll have something interesting to show my supervisor.

Cheers! Allen Hart

2

There are 2 best solutions below

1
On BEST ANSWER

There is no real sense in formulating an expected error because you should say an expected error relative to some distribution, but you don't know what that distribution is. Nonetheless, the "standard" statistical answer here is as follows. The number of $1$s is distributed as Binomial($m,p$) where $p=1/n$ would be the unbiased case. This has mean $mp$ and variance $mp(1-p)$. From here you can construct a confidence interval for $p$ using the normal approximation to the binomial distribution.

For instance if you observed $\hat{p}$ then the half-width of the confidence interval for $p$ at the level of significance $1-\alpha$ is $z_{\alpha/2} \left ( \frac{\hat{p}(1-\hat{p})}{m} \right )^{1/2}$, where $z_\beta$ satisfies $P(Z \geq z_\beta)=\beta$ when $Z$ has the standard normal distribution. A common choice of $\alpha$ is $0.05$; in this case you have $z_{0.025} \approx 1.96$. The significance of this is that approximately $95\%$ of intervals generated in this fashion will contain the true value $p$. Another way of interpreting the significance here is that you would accept the null hypothesis $p=p_0$ at the 95% significance level if and only if $p_0$ is in your confidence interval.

The deficit of this is that you have no way of actually knowing for sure whether the particular interval that you generate contains the true value $p$. All you can do is speculate about what would happen if you generated many such intervals (which we rarely do in the real world).

2
On

You can't measure the "expected error".

In probability, it is thought that if you made an experement N times(N should be a big number) and the interested eveniment happend "k" times, then the fraction ( k / N ) tends to an number supposed to be "the real probability".

So, by repeating the experimen many times, all you know is that "the bigger is N, the better your estimation is", and not "how close you are".

I don't know if you are able to do a few milions test, but that number should be enought. Sorry if this isn't the expected answer