Using R, solve the birthday paradox

297 Views Asked by At

The probability that two students in a class have the same birthday is at least 75%. What is the minimum size of the class?

I tried,

Prob_Birthday <-1
Prob_Min = 0.75
Max_Class_Size = 40
Stop = FALSE
for (k in 2:Max_Class_Size){ 
 if(Stop == FALSE){
    Prob_Birthday[k]<-1-prod(365:(365-(k-1)))/365^k 
    if(Prob_Birthday[k]>=Prob_Min){
      print("Found")
      print(Prob_Birthday[k])
      print(k)
      Stop = TRUE
   }
  }
} 
if(Stop == FALSE){
  print("Probability not found, try increasing class size")
} 

But this code proves every class size until 40; so I can get when the size when the probability = 75%, (size = 32, when the probability = 75%). Even though, that's inefficient.

How could I find the only size of the class when the probability is 75%?

2

There are 2 best solutions below

1
On

The probability that $k$ people chosen at random do not share birthday is: $$ \dfrac{364}{365} \cdot \dfrac{363}{365} \cdot \ldots \cdot \dfrac{365-k+1}{365}. $$

If you want to do it in R, you should use vectorised operations or R will heavily penalise you in performance.

treshold <- 0.75
aux <- 364:1 / 365
probs <- cumprod(aux)
idx <- which(probs < treshold) #This gives you all the indices where probs < treshold
idx[[1]] + 1 #The number of people needed since we started aux at 364
1
On

In my opinion, the best way to use the computer to solve such a problem is independent of the programming language. Simply use logarithms. If there are $n$ people, and the chance that all $n$ have different birthdays is $p$, then you have that

$$\frac{(365) \times (364) \times \cdots \times (366-n)}{(365)^n} = p.$$

Taking logarithms you have that

$$\left[~\sum_{k = 1}^n \log(366-k) ~\right] - \left[~n \times \log(365) ~\right] = \log(p). \tag1$$

In this case, you (presumably) want the smallest value of $n$ such that the LHS of equation (1) above will be a smaller number (i.e. a negative number of greater magnitude) than $\log(1/4).$

Personally, I am not acquainted with the R programming language, but I would be extremely surprised if it did not include canned routines for logarithms, either base $(10)$ or base $(e)$ (i.e. natural logarithms) or both.