I'm trying to figure out how to combine historical lottery data with a traditional lottery number probability calculation technique to result in a more "accurate" probability of a number being drawn, but I don't know how to bring everything together in an accurate way.
The lottery pulls 6 numbers from a range of 1 - 75. In order for users to win, they need to match 1 to 6 numbers.
I'm using the following formula to calculate the probability of a winning number being drawn where n = total number of available numbers, k = total number of numbers in the winning combination for the jackpot, and x = total number of user numbers that match the winning set. $$ \ =\frac{C(k,x)*C(n-k,k-x)}{C(n,k)}\ $$
For example, to calculate the probability of 1 winning number, I'm using the following (simplified) formula: $$ C(6,1) * C(69,5)/C(75,6) = 0.3348789 $$
I also have a frequency table of every number telling me how much it's been pulled in a winning set of numbers. For example:
# | Number of times drawn
---------
1 | 652
2 | 601
3 | 634
....
73| 587
74| 599
75| 661
How can I utilize the frequency of each number being drawn in a winning set of lotto numbers into the probability calculation?
Edit: Upon investigation, it looks like an ANOVA test might work, but I don't know how to construct it properly. Is that the correct route?
Assuming what you want is to compute empirical frequencies as proxies for probabilities, the formula is straightforward. $$ P(\text{number $i$ is selected at a draw})\approx f_i={\text{number of draws with $i$ selected}\over {\text{total number of draws}}}. $$ With your data $$ f_1={652\over 652+...+661} $$ etc. These frequencies should verify $$ \sum_i f_i = 6 $$ as 6 numbers are drawn.