Expected maximum number of simultaneous celebrators

127 Views Asked by At

Suppose there's a set of 500 individuals. What is the expected number of coinciding birthday celebrators? That's easy - it's $500/365$, under the assumption of uniform distribution.

But a birthday can't be fractioned between different dates so of practical reasons, some days will have more and others less celebrators. I wonder what's the expected number of simultaneous birth dates on the most common day (i.e. the one with most people having been born).

My impression is that there's something I'm missing and that it's not as easy as the formula in the first paragraph. The reasoning is that if there's only 5 individuals, the expected number of celebrators on the most common day is not $5/365$ but only $1/365$ because it's most probable that the birthdays are spread across the year.

So, the suggested solution would be removing the first $365$ people, regarding them as likely having one day each. Then, keeping the remaining $135$, hence arriving at $1 + 135/365$.

But I'm so confused and uncertain that I need outside input.

edit based on comments

There's certainly at least two simultaneous celebrators on the most common day. So the expected number is $2 + something$. That something consists of the likelihood for each of the remaining $500 - 2$ individuals.

If there's $2$ people in the group, the expected value is $1 + 1/365$. If there are $3$ people in the group, the expected value is $1 + 2 * 1 / 365$. Is that correct?

Then, if we'd have $365$ people, the formula suggested above would render $1 + 365 * 1 / 365$, which is $2$. It seems intuitively correct to me. Am I missing something?

Finally, the original group size was $500$. Hence, the answer would be $2 + 135 * 1 / 365$ or, if you want, $1 + 500 * 1 / 365$. Can anybody shoot it down somehow?

1

There are 1 best solutions below

0
On BEST ANSWER

Your comment asked for an answer:

The following R code allows a simulation

maxhits <- function(x){ max(table(x)) }
meanmax <- function(people, days=365, cases=10000){
  eg <- matrix(sample(days, people*cases, replace = TRUE), nrow=cases)
  maxfreq <- table(apply(eg, 1, maxhits) )
  sum(as.numeric(names(maxfreq)) * maxfreq / cases)
  }

such as

> set.seed(1)
> meanmax(365)
[1] 4.9799
> meanmax(400)
[1] 5.2131
> meanmax(500)
[1] 5.8776
> meanmax(1000)
[1] 8.7057

though with 10,000 simulated cases I would not trust these results past the first decimal place

Added: This chart show simulated means of the maximum for up to $1200$ people

enter image description here

and rather strangely I managed empirically to get something which looks like a reasonable approximation to the mean of the maximum rounded to integers on this range with something like qpois(0.9984, lambda=(1:1200)/365) as a quantile of the Poisson distribution, illustrated in red. I would expect there to be a continuous function which did even better, though the first $40$ or so values do not appear to be on the same smooth curve as the rest of them