Calculating Prize Line Expectation Part 2

111 Views Asked by At

Thanks in advance for any help.

Yesterday a very helpful member called @joriki answered my original question on this and that conversation came to a conclusion as a result. I have a second part that extends the problem a little.

This was the original post: Calculating Expected Increase In Prize Line Count

To summarise, the base problem of calculating the expected increase in prize lines from one turn using only number pools containing numbers was I think solved but I was having difficulty when introducing wildcard symbols. What joriki was finally able to get through to me was that using my proposed solution I was overcounting the probability boost of the wildcard symbols because I was applying them to multiple incomplete lines originating from a particular cell and that in reality that wildcard would only be able to apply to one or the other number in the target column and therefore only one incomplete prize line containing the originating cell number. By making a pseudo choice of target prize line in my line calculations, the calculated probability matched my brute forced probability. So far so good again.

I went on in my investigation to increase the number of wildcards in the same column. My reasoning was that I would not be able to apply those wildcards to more incomplete lines but that the probability of achieving those lines to which it is applied would be increased because of the increased ratio of wildcards to numbers. Here are my workings:

[1] [x] [7] 
[2] [x] [8] 
[3] [x] [9] 

Number pools: 
 1   4   7 
 2   5   8 
 3   6   9 
         *
         *

This time the brute forced combinations and prize line increases are as follows. There are now 45 (3x3x5) possible draw combinations.

147(1),148(0),149(1),14*(1),14*(1),157(1),158(0),159(1),15*(1),15*(1),167(1),168(0),169(1),16*(1),16*(1)
247(0),248(1),249(0),24*(1),24*(1),257(0),258(1),259(0),25*(1),25*(1),267(0),268(1),269(0),26*(1),26*(1)
347(1),348(0),349(1),34*(1),34*(1),357(1),358(0),359(1),35*(1),35*(1),367(1),368(0),369(1),36*(1),36*(1)

33 additional prize lines / 45 draw combinations = 0.73333 expected increase in complete prize lines. This time we apply our solution in the same way as when there was only 1 wildcard on the draw but increase the probability boost for the boosted lines to reflect the ratio of wildcards to numbers in the pool.

Line 1->4->7 = ⅓ * ⅗ = 0.20000
Line 2->5->8 = ⅓ * ⅗ = 0.20000
Line 3->6->9 = ⅓ * ⅗ = 0.20000
Line 1->2->3 = 0 (not possible to complete this line in a single draw) 
Line 7->8->9 = 0 (not possible to complete this line in a single draw) 
Line 1->5->9 = ⅓ * ⅕ = 0.06666
Line 3->5->7 = ⅓ * ⅕ = 0.06666

The sum of these probabilities is 0.73332 which ~= 0.73333. Again if we boost the precision we’re going to be very close indeed.

Again, so far so good. So then I decided to move a step further and add wildcard symbols to multiple number pools. Below you can see I've added a wildcard to column 1's number pool and left one in column 3's number pool. This time I've changed the board configuration so only 2 numbers are checked off in advance. Here are my workings:

[1] [4] [7] 
[2] [x] [8] 
[3] [x] [9] 

Number pools: 
 1   4   7 
 2   5   8 
 3   6   9 
 *       * 

This time we have 48 (4x3x4) possible number pool combinations to brute force. (Using W instead of * as it isn't translated well by the editor)

147(2),148(1),149(2),14W(2),157(0),158(0),159(1),15W(1),167(0),168(0),169(1),16W(1) 247(1),248(2),249(1),24W(2),257(0),258(1),259(0),25W(1),267(0),268(1),269(0),26W(1) 347(2),348(1),349(2),34W(2),357(1),358(0),359(1),35W(1),367(1),368(0),369(1),36W(1) W47(2),W48(2),W49(2),W4W(2),W57(1),W58(1),W59(1),W5W(1),W67(1),W68(1),W69(1),W6W(1)

50 additional prize lines completed / 48 number pool combinations = 1.04167 expected additional lines completed by a single draw. Now we apply our proposed solution:

Line 1->4->7 = ½ * ⅓ * ½ = 0.5 * 0.33333 * 0.5 = 0.08333 //both wildcards used to boost 1 + 7
Line 2->5->8 = ½ * ½ = 0.5 * 0.5 = 0.25 //both wildcards used to boost 2 + 8
Line 3->6->9 = ½ * ½ = 0.5 * 0.5 = 0.25 //both wildcards used to boost 3 + 9
Line 1->2->3 = 0
Line 4->5->6 = ⅓ = 0.33333
Line 7->8->9 = 0
Line 1->5->9 = ¼ * ¼ = 0.25 * 0.25 = 0.0625
Line 3->5->7 = ¼ * ¼ = 0.25 * 0.25 = 0.0625

The sum of these probabilities = 1.04166 ~= 1.04167. Again more precision will get these numbers very close together indeed.

So once again, things look to be so far so good. However, if I experiment moving the wildcard probability boost around to different incomplete lines, I get diverging values once again. See my workings here (assuming the same board state and number pools as in the above example):

Line 1->4->7 = ¼ * ⅓ * ¼ = 0.25*0.33333*0.25 = 0.02083
Line 2->5->8 = ½ * ½ = 0.5 * 0.5 = 0.25 //both 2 + 8 boosted
Line 3->6->9 = ¼ * ¼ = 0.25 * 0.25 = 0.0625
Line 1->2->3 = 0
Line 4->5->6 = ⅓ = 0.33333
Line 7->8->9 = 0
Line 1->5->9 = ½ * ½ = 0.5 * 0.5 = 0.25 //both 1 + 9 boosted
Line 3->5->7 = ½ * ½ = 0.5 * 0.5 = 0.25 //both 3 + 7 boosted

The above probabilities summed up give me 1.11041, which is way above my brute force calculation. I am hoping that this is because there is a rule for how wildcard probabilities must be applied that I am overlooking in my ignorance.

To be clear on how I'm adding the probability boost that a wildcard adds: If a number in pool 3 has a 1/4 chance of being drawn (1 instance of the number in the pool, 4 numbers and or wildcards in total), when I apply the wildcard to that incomplete prize line, I am boosting the probability of matching that number to 1/2 (1 instance of the number, 1 instance of the wildcard that I wish to use, 4 numbers and or wildcards in total). I am doing this in the same way for both columns when the boosts are applied.

I've also tried balancing out the probability boost across all numbers and also splitting the wildcards amongst incomplete lines according to the rule of not having boosted the probability for that number in any other incomplete prize line calculation but these methods also give me values that are varying degrees of divergence from the brute forced value.

I think I'm making an incorrect assumption somewhere or not applying the correct rule to placement of these wildcards somewhere that's causing my problem but I'm not sure what else to try.

Many thanks in advance for your time and any help!

1

There are 1 best solutions below

13
On

You’re again making multiple use of combinations. The draw J5J is rightly only counted once in J5J(1), but the probabilities for the lines $258$, $159$ and $357$ all assume that you’d use a draw of J5J in those lines. Perhaps it would help to note down in the first listing how you’re using the wildcards, e.g. J[2]5J[8](1) where you have J5J(1); that might make it easier to only count them once in the other approach. You have to think about where you can use them in order to deduce the (1), anyway, so you might as well write it down, since it somehow seems to be less confusing for you in that context than in the other one.