We are playing a game with my friend and trying to determine who is the better player.
The game has an luck element to it.
The current score is 9-7.
The following questions have risen:
1) How many games we should play to get statistically significant result that one is better than the other? We can assume p=9/16 or p=0,5 and something standard for z.
2) What can we infer from the current game score? We tried binomial proportion confidence interval (we used Wilson score, because N<30). Here are the results. Not sure how to interpret them.. does it mean that with 95%, the probability, at which the game score is 9-7, is within 34%-76%?
I'm not sure I understand your notation, but if we assume that one player wins 9/16 of the time, and we want a confidence of 99.5% (p=0.5%), then we can write the probability that the ratio of 9/16 would arise by chance as a function of the number of games, then solve for the number of games that gives a probability of 0.5%.
Edit: I wrote a function to calculate this for 9/16 and p=5%. The result is that it takes 265 trials (149 wins by one player) to exceed confidence of 95%.
For part (2), you can infer that you can't yet say who is better with 95% confidence. All you can say with 95% confidence is that the "true" outcome is somewhere between 34% and 76%. As long as your confidence interval includes 50%, you can't say who's better.
Here's my sample code in Python (you need the scipy.stats package):