How much data needed to test my hypothesis

92 Views Asked by At

Suppose we have a data stream which outputs integer values from $0$ to $2^{32}-1$. We know that either (A) all values occur equally often or (B) exactly half of the values occur $2^{10}$ times more often than the other half (but it is unknown which values are more likely).

I'd like to know how I can approach whether (A) or (B) is true. Also, I've been asked to state how many values I would need to observe to be confident of my result with a probability of $0.999$. Moreover, assume we can only observe up to $2^{18}$ values.

My rough attempt at a solution. We gather a certain amount of data, for example $2^{10} = 1024$ integers. We then record how many of these fall in the first half of possible values, that is $0$ to $2^{31}-1$. If (A) is true, all values are equally likely so one would expect to see around $512$ fall in this half. If (B) is true, the expected value could be anything so we would almost definitely get a very different amount.

This method makes sense to me but I don't know how many values I would need to observe to be confident enough to say the result is true with probability $0.999$. I've been told it is something to do with using the 3-sigma rule for a normal distribution but I'm not sure how I should apply this here.

Any tips would be greatly appreciated.