Multinomial example from online game

73 Views Asked by At

Suppose that a multinomial distribution has 25 outcomes, the first 24 have chance $\frac{1}{465}$ and the final has $\frac{441}{465}$ chance.

Find $n$ the number of trials for this such that the chance of having the first 24 outcomes happening at least once is approximately $0.75$. E.g. the vector $(1,1,1,...,1, n-24)$ with $n$ trials, etc.

Further, find the $n$ such that the chance is $0.95$ etc or characterise this result.

Thank you

1

There are 1 best solutions below

1
On BEST ANSWER

I commented

As an indication of scale, the expected number is $\sum\limits_{n=1}^{24} \frac{465}{n} \approx 1755.8$ so the answers are likely to be in the thousands and so easily calculated computationally by suitably combining $25$ geometric distributions.

The expectation comes in a similar way to that of the coupon collectors problem: you expect to need $\frac{465}{24}$ attempts to get any of the first $24$, then a further $\frac{465}{23}$ attempts to get any of the remaining $23$, and so on to a further $\frac{465}{1}$ attempts to get the final $1$, and you can add these up (so combining $24$ rather than $25$ geometric distributions).

If you believe that the answers are likely to be in the thousands, then you can work the probabilities out computationally up to $10000$ attempts, for example using R and the following slightly clunky code:

maxx <- 10000
probs <- c(1,rep(0,maxx)) # R starts counting at 1 not 0
for (n in 24:1){
  geomprobs <- c(0, dgeom(0:(maxx-1), n/465))
  newprobs <- probs[1] * geomprobs
  for (i in 2:(maxx-1)){
    newprobs <- newprobs + c(rep(0,i-1), probs[i]*geomprobs[1:(maxx-i+2)])
    }
  probs <- newprobs
  }
probs <- probs[-1]  # remove initial 0
sum(probs)          # check 1 (up to rounding)
# 1 
1-sum(probs)         # probability higher than maxx
# 1.072724e-08
sum(probs * (1:maxx)) # expectation
# 1755.82 

Here is the resulting Cumulative Distribution Function. Your question is asking when it first crosses $0.75$ and $0.95$, which is at $2059$ and $2857$.

plot(1:maxx, cumsum(probs), type="l") # cdf 
abline(h=0.75, col="red")
abline(h=0.95, col="blue")

min(which(cumsum(probs) >= 0.75))
# 2059
min(which(cumsum(probs) >= 0.95))
# 2857

enter image description here

You suggested that Monte Carlo simulation gave an approximation of about $1400$. This looks wrong to me, both on the basis of the calculations above and in terms of simulation, as the probability of all $24$ appearing in the first $1400$ attempts seems to be closer to just under $0.3$. Indeed, allowing for simulation noise, the answers above appear to be broadly confirmed by my simulations:

cumsum(probs)[1400]
# 0.2973073
cumsum(probs)[2059]
# 0.7503181
cumsum(probs)[2857]
# 0.9500517

simsample <- function(attempts){
  s <- sample(1:25, attempts, replace=TRUE, prob=c(rep(1,24), 441) / 465)
  length(unique(s[s != 25]))
  }
set.seed(2024)
sims <- replicate(10^5, simsample(1400)) 
mean(sims == 24)
# 0.29694
sims <- replicate(10^5, simsample(2059)) 
mean(sims == 24)
# 0.74945
sims <- replicate(10^5, simsample(2857)) 
mean(sims == 24)
# 0.95026