# of Raffle Wins (Probability Distribution)

976 Views Asked by At

MOTIVATION

I am considering investing a significant amount of money into a raffle. In order to decide the number of entries I purchase, I would like to find probability distributions for the number of prizes I will win with respect to the number of entries I purchase.

HOW THE RAFFLE WORKS

Total entries: 1000

Winning entries (# of prizes): 20

How it actually works is in 20 rounds of 50 entries.

  • Entries 1-50 have a 1/50 chance to win prize 1
  • Entries 51-100 have a 1/50 chance to win prize 2

...

  • Entries 951-1000 have a 1/50 chance to win prize 20

The entry numbers are purchased in order, so technically if I can get entries 1-50 then I have a 100% chance to win prize 1. However, I don't expect I will be able to do this since many people will be trying to buy entries at the same time. For simplicity, perhaps we can just assume that my entries will be evenly distributed across all 20 rounds (see BONUS below for my thoughts on how this change impacts the solution and please correct me if I am wrong).

INITIAL THOUGHTS

From some quick research I think the estimate for my odds of winning ONE prize is approximately like this:

1 - [ (1000-n) / 1000 ]^20

where n = number of entries I purchase

WHAT I WANT TO KNOW

What I actually want is how to calculate the probability distribution of the number of prizes I win. So not just whether I win 1 prize or not.

Given n where n is the number of entries I purchase, I want to know the average (mean) number of prizes I should expect to win and the surrounding distribution. This way I can decide my risk tolerance and choose how many entries (n) it is worth it for me to buy.

BONUS

I mentioned we can simplify the problem to assume my entries will be even distributed across all 20 rounds, but I am curious what the optimal strategy would be if I could choose my entry numbers.

For example, if n = 100 entries, is it best to buy entries 1-100 and have a 100% chance to win 2 prizes? Or would having a more even distribution be better. For example, having 5 entries in each of the 20 rounds ?

In other words, I could have:

  • 100% chance to win in 2 rounds (win 2 prizes) and 0% chance to win in the other 18 rounds
  • 10% chance to win in all 20 rounds

My understanding is that in both cases my expected number of wins is 2. The difference is that in the first case it is guaranteed whereas in the second place I could get lucky and win more or unlucky and win less. Correct?

Extrapolating from that, it seems like the more evenly distributed the entry numbers are across rounds, the more uncertainty in the number of prizes I will actually win. However, the expected number (mean) of the distribution should always be the same. Is this true?

2

There are 2 best solutions below

1
On

Generally, you are correct in that the expected number of the distribution would more or less be the same. Obviously, going for a split in each is a high risk, high return probability.

The thing is, as you stated earlier, there is no way you will get a sure 100% for both raffles 1 and 2. Therefore, I estimate the highest probability you will get for 1 individual raffle is about 50%, although this could widely vary.

5 Tickets in 20 Raffles

Now, for some math. Let's calculate the probability you get less than 2 wins when investing 5 in each raffle.

For 1 win, it's $\binom{20}{1} \cdot (\frac{1}{10})^1 \cdot (\frac{9}{10})^{19} =$ 27.017%.

And for 0, it's 12.158%.

Adding them up, we get the total probability as 39.175%.

The probability of you getting 2 when investing 5 in each is 28.518%., through a similar concept.

Now, to calculate the probability of getting more than 2, we just add the probabilities from 0 to 2 and subtract that sum from 1.

The probability is 1 - 0.67333 = 32.667%.

Summing everything up,

The probability of getting less than 2 wins is 39.175%.

The probability of getting exactly 2 wins is 28.518%.

The probability of getting more than 2 wins is 32.667%.

As you can see, it's actually a larger chance of getting under 2 wins than above.

10 Tickets in 10 Raffles

Now, we calculate the probabilities for when you enter 10 raffles with 10 tickets each.

Similar reasoning, but just change up the numbers a bit.

For 1 win, it's $\binom{10}{1} \cdot (\frac{1}{5})^1 \cdot (\frac{4}{5})^9 =$ 26.844%.

And for 0, it's 10.737%.

Therefore, getting under 2 wins is 37.581%.

Getting exactly 2 wins is 30.199%.

And getting more than 2 wins is 1 - 0.67780 = 32.22%.

Summing everything up,

The probability of getting less than 2 wins is 37.581%.

The probability of getting exactly 2 wins is 30.199%.

The probability of getting more than 2 wins is 32.22%.

As you can see, investing 5 tickets in 20 raffles gives you a higher chance of getting less than 2 wins, but also gives you a higher chance of getting more than 2 wins. However, the difference between the less than 2 wins percentage is much larger than the difference between the more than 2 wins percentage.

Using this data, make your own decision! Hope you win more than 2, at least :D

-FruDe

P.S. This was my first ever math answer on StackExchange, tell me what you think!

0
On
  • If you buy a total of $\ n\ $ tickets, the expected number of prizes you win is $\ \frac{n}{50}\ $, regardless of which rounds the tickets are in. You're therefore correct that your expected number of wins is always the same for the same number of tickets purchased.
  • If you buy $\ t_i\ $ tickets in round $\ i\ $ for $\ i=1,2,\dots,20\ $, and the winning ticket for each round is drawn randomly, and independently of the draws of all the other rounds, then the variance of the total number of prizes you win is the sum of the variances of the numbers of prizes you win in all rounds. You can only win no prize or $1$ prize in any single round, $\ i\ $, say, which you will do with probabilities $\ 1-\frac{t_i}{50}\ $ and $\ \frac{t_i}{50}\ $, respectively. The expected number of prizes you win in that round is therefore $\ \frac{t_i}{50}\ $, and the variance of that number is $$ \left(0-\frac{t_i}{50}\right)^2\left(1-\frac{t_i}{50}\right) +\left(1-\frac{t_i}{50}\right)^2\left(\frac{t_i}{50}\right)= \left(\frac{t_i}{50}\right) \left(1-\frac{t_i}{50}\right)\ . $$ Therefore the total variance in the number of prizes you will win is $$ \sum_{i=1}^{20} \left(\frac{t_i}{50}\right) \left(1-\frac{t_i}{50}\right)\ . $$ You're also also correct that this is minimised by concentrating all your tickets as much as possible in the same rounds. If you have $\ s_1\le s_2\le \dots \le s_j<50\ $ tickets in rounds $\ r_1, r_2, \dots, r_j\ $, respectively, for instance, then those tickets contribute a total of $$ \sum_{i=1}^j \left(\frac{s_i}{50}\right) \left(1-\frac{s_i}{50}\right) $$ to the variance. If you were to transfer $\ x\ $ of the tickets you have in round $\ r_1\ $ to round $\ r_j\ $, however $\large($with $\ 0<$$x\le$$\min\left(s_1,50-s_j\right)\ \large)$, the variance would then decrease by $$ \left(\frac{s_1}{50}\right)\left(1-\frac{s_1}{50}\right)+ \left(\frac{s_j}{50}\right)\left(1-\frac{s_j}{50}\right)-\left(\frac{s_1-x}{50}\right)\left(1-\frac{s_1-x}{50}\right) -\left(\frac{s_j+x}{50}\right)\left(1-\frac{s_j+x}{50}\right)=\frac{x\left(s_j+x-s_1\right)}{25}>0\ . $$ It follows from this that you will minimise the variance by concentrating all your tickets as much as possible in the same rounds—that is, by having $\ t_i<50\ $ for at most one value of $\ i\ $. If you do that, then you're certain to win at least $\ \left\lfloor\frac{n}{50}\right\rfloor\ $ prizes, and at most $\ \left\lfloor\frac{n}{50}\right\rfloor+1\ $. You will win the former number with probability $\ 1+\left\lfloor\frac{n}{50}\right\rfloor-\frac{n}{50}\ $, and the latter number with probability $\ \frac{n}{50}-\left\lfloor\frac{n}{50}\right\rfloor\ $.
  • What your "optimal" strategy is depends on your own personal preferences. Typically, the "best" strategies are considered to be the ones which maximise your expected gain. If that's what you want to do, you should buy all the tickets in every round for which the value of the prize exceeds $50$ times the cost of a ticket. This would seem to me to be reasonable if the prizes are all cash, but might be problematic if they're not, because the nominal value of a prize might be much more than you would ever be willing to pay for it.

    If one of the prizes, for instance, were $\ \$250\ $ worth of cricket lessons from Sachin Tendulkar, for which you'd have to come up with your own travel expenses to India to take advantage of, and the cost of each raffle ticket were $\$3$, you'd have to ask yourself whether you'd be willing to buy such a set of cricket lessons for only $\$150$ and then travel to India to receive them. If not, then my advice would be to refrain from buying any tickets in the round for which that was the prize.

  • Just knowing your expected gain and its variance should be sufficient for you to determine what your optimum strategy is, and I don't think you'll gain much more by knowing the complete distribution of the number of prizes you will win. It is nevertheless possible to calculate that distribution for the two scenarios you mention, which I therefore do below.

  • If you have $\ t_i\ $ tickets in round $\ i\ $ for then$\ i=1,2,\dots,20\ $, and $\ W\ $ is the number of prizes you win, then $$ P(W=w)=\sum_{S\subseteq\{1,2,\dots,20\}\\ \hspace{1em} |S|=w}\prod_{i\in S}\frac{t_i}{50} \prod_{j\not\in S}\left(1-\frac{t_j}{50}\right)\ . $$ I doubt if this expression can be simplified much for general $\ t_i\ $, and the sum in it has $\ 2^{20}\ $ terms. The sum would thus be infeasible to calculate by hand, although it would be no problem for a modern computer. If you have the same number $\ t\ $ of tickets in every round, however, the distribution simplifies to the binomial: $$ P(W=w)={20\choose w}\left(\frac{t}{50}\right)^w \left(1-\frac{t}{50}\right)^{20-w} $$

  • If you have a total of $\ n\ $ tickets, randomly distributed over all rounds, then $\ t_1,t_2,\dots,t_{20}\ $ will be random variables with the following distribution: \begin{align} P\left(t_1=\tau_1,t_2=\tau_2,\dots,t_{20}=\tau_{20}\right)&= \frac{\prod_\limits{k=1}^{20}{50\choose\tau_k}}{1000\choose n}, \end{align} for $\ 0\le\tau_1,\tau_2,\dots,\tau_{20}\le50\ $ and $\ \sum_\limits{i=1}^{20}\tau_i=n\ $. These random variables are not independent, however, so your approximation, $\ 1-\left(\frac{1000-n}{1000}\right)^{20}\ $ for the probability of winning at least one prize is certainly not exact. If $\ n=1\ $, for instance, it gives the probability as approximately $\ 0.0198\ $, whereas the true probability is $\ \frac{1}{50}=0.02\ $.

    Given that the value of $\ t_i\ $ is equal to $\ \tau_i\ $ for all $\ i\ $, the probability of your not winning the prize for round $\ i\ $ is $\ 1-\frac{\tau_i}{50}\ $, and the probability of winning a least one prize is therefore $$ 1-\prod_{i=1}^{20}\left(1-\frac{\tau_i}{50}\right)\ , $$ i.e. one minus the probability that you don't win the prize for any round. Your exact probability of winning at least one prize is obtained by multiplying this by the probability that $\ t_i=\tau_i\ $ for all $\ i\ $ and summing over all possible values of the quantities $\ \tau_i\ $: \begin{align} 1-\frac{1}{1000\choose n}&\sum_{\tau:\sum_{k=1}^{20}\tau_k=n\\ 0\le\tau_k\le50}\prod_\limits{k=1}^{20}{50\choose\tau_k}\prod_{j=1}^{20}\left(1-\frac{\tau_j}{50}\right)\\ &=1-\frac{1}{1000\choose n}\sum_{\tau:\sum_{k=1}^{20}\tau_k=n\\ 0\le\tau_k\le49} \prod_{j=1}^{20} {49\choose\tau_j}\ . \end{align} While the sum in this expression might look daunting for $\ n\ $ far away from the extremes of the range $0$-$1000$, there is nevertheless a recursive procedure for evaluating it quite efficiently over the whole of that range.

    The following table gives the approximate probabilities of winning at least one prize for various values of $\ n\ $ using both the exact formula and the approximation $\ 1-\left(\frac{1000-n}{1000}\,\right)^{20}\ $. The vulgar fractions in the first three columns of the first row are exact probabilities. \begin{array}{c|c|c|} n& 1&2&3\\ \hline \text{exact}&\frac{1}{50}=0.02&\frac{1,979}{49,950}\approx0.0396&\frac{489,077}{8,308,350}\approx0.0589&0.0777&0.0963\\ \hline \text{approximate}&0.0198&0.0392&0.0583&0.0770&0.0954\\ \hline \end{array} \begin{array}{c|c|c|}n&4&5&6&7&8\\ \hline\text{exact} &0.0777&0.0963&0.1144&0.1322&0.1500\\ \hline\text{approximate}& 0.0770&0.0954&0.1134&0.1311&0.1484\\ \hline \end{array} \begin{array}{c|c|c|} \hspace{-0.5em} n&9&10&50&100&500\\ \hline \hspace{-0.5em}\text{exact}&0.1669&0.1837&0.6451& 0.8810&0.99999921\\ \hline \hspace{-0.5em}\text{approximate}&0.1654&0.1821&0.6415&0.8784&0.99999905\\ \hline \end{array}

    Thus, the approximate formula gives a reasonably good estimate over this range. For $\ n=500\ $, the approximate probabilities are $\ 1-7.86\times10^{-7}\ $ and $\ 1-9.54\times 10^{-7}\ $. Although the approximate probability of not winning a prize, $\ 9.54\times 10^{-7}\ $, is thus in error here by more than $20\%$, that error is of little consequence because the true probability itself is so small.

    More generally, the distribution of the number of prizes you win in this case is given by \begin{align} P(W&=w)=\\ &\frac{1}{1000\choose n}\sum_{\tau:\sum_{k=1}^{20}\tau_k=n\\ 0\le\tau_k\le50}\prod_\limits{k=1}^{20}{50\choose\tau_k}\sum_{S\subseteq\{1,2,\dots,20\}\\ \hspace{1em} |S|=w}\prod_{i\in S}\frac{\tau_i}{50} \prod_{j\not\in S}\left(1-\frac{\tau_j}{50}\right)\\ =&\frac{1}{1000\choose n}\sum_{S\subseteq\{1,2,\dots,20\}\\ \hspace{1em} |S|=w} \sum_{\tau:\sum_{k=1}^{20}\tau_k=n\\ 0\le\tau_k\le50}\prod_\limits{k=1}^{20}{50\choose\tau_k}\prod_{i\in S}\frac{\tau_i}{50} \prod_{j\not\in S}\left(1-\frac{\tau_j}{50}\right)\\ =&\frac{{20\choose w}}{1000\choose n}\sum_{\tau:\sum_{i=k}^{20}\tau_k=n\\ 0\le\tau_k\le50}\prod_\limits{k=1}^{20}{50\choose\tau_k}\prod_{i=1}^w\frac{\tau_i}{50} \prod_{j=w+1}^{20}\left(1-\frac{\tau_j}{50}\right)\\ =& \frac{{20\choose w}}{1000\choose n}\sum_{\sigma:\sum_{i=1}^{20}\sigma_i=n-w\\ 0\le\sigma_i\le49}\prod_{i=1}^{20}{49\choose\sigma_i}\ \end{align} where the last step comes from the identities $\ \displaystyle \prod_\limits{i=1}^w{50\choose\tau_i}\frac{\tau_i}{50}=$$\displaystyle\prod_\limits{i=1}^w{49\choose\tau_i-1}\ $ and $\ \displaystyle\prod_{j=w+1}^{20} {50\choose\tau_j}\left(1-\frac{\tau_j}{50}\right)=$$\displaystyle \prod_\limits{j=w+1}^{20}{49\choose\tau_j}\\ $, and setting $\ \sigma_i=\tau_i-1\ $ for $\ 1\le i\le w\ $ and $\ \sigma_i=\tau_i\ $ for $\ w+1\le i\le 20\ $.

    Note that the probability of your winning $\ w\ $ prizes when you buy $\ n\ $ tickets is just $\ 20\choose w\ $ times the probability of your winning no prizes when you buy $\ n-w\ $ tickets.