Probability question about meeting people

188 Views Asked by At

Here is the problem:

You go to a school that last 3 years, and you start in the first year. You meet one new random person everyday and there are 300 days in the school year.

In every year, there are 200 people (Do not include yourself as one of them), and so every year the people in the highest year leave and 200 newcomers leave.

The question is "What is probability that you will run out people to meet in school before you leave it?"

For example - In the first year you meet the 200 people in your year and 100 in the year above. Then, next year, you meet all 200 people in the new year below you and another 100 from the year above.

In your final year, there are only 200 new people joining the school and you talk to them but you still have 100 days in the year left and so you lost.

However, if you had started with the highest year and talked to them and 100 from the year above, then next year you would talk to the 100 remaining from the year above and the 200 in your year, and in the final year you have 400 people to talk to within 300 days, so you would win with 100 to spare.

3

There are 3 best solutions below

8
On BEST ANSWER

In your last year, there are $599$ students other than yourself, and there must be at least $300$ of them that you haven't met, so fewer than $300$ that you have already met. In your first two years, you met $600$ students, and if fewer than $300$ are still in school, more than $300$ must have graduated. Thus, you succeed if and only if, in the first two years, you met more than $300$ of those ahead of you.

Since there are only $200$ students one year ahead of you, you must have met at least $101$ of the graduating seniors. Let $s$ be the number of seniors you met and $j$ be the number of juniors you met, where $j\leq300-s$. Then you met $300-s-j$ members of your own class.

In your second year, there are $j$ students in the class above you and $300-s-j$ student in your own class that you've already met, so $299+s$ strangers. Let $k$ be the number of students from the class ahead of you that you meet this year. We must have $k+j+s\geq301$, so $k\geq301-s-j$ and obviously $k$ is at most $200-j$. There remaining $300-k$ people you meet are chosen from the $299+s-(200-j)=99+s+j$ strangers who aren't one year ahead of you.

The probability of success is

$$\sum_{s=101}^{200}\sum_{j=0}^{300-s}\sum_{k=301-s-j}^{200-j}\frac{\binom{200}{s}\binom{200}{j}\binom{199}{300-s-j}}{\binom{599}{300}}\frac{\binom{200-j}{k}\binom{99+s+j}{300-k}}{\binom{299+s}{300}}$$

I wrote a python script to evaluate this.

from math import factorial

def choose(n,m):
    if m > n:
        return 0
    return factorial(n)//(factorial(m)*factorial(n-m))

choices = choose(599, 300)

prob = sum(choose(200,s)*choose(200,j)*choose(199,300-s-j)/choices*\
           choose(200-j,k)*choose(99+s+j,300-k)/choose(299+s,300)\
           for s in range(101,201) for j in range(301-s) for k in range(301-s-j,201-j))
print(prob)

The script output

4.296291459801449e-06

so if I haven't made any mistakes, the probability is a little less than $4.3\cdot10^{-6}$.

1
On

An exact solution (not closed form), plus an intuitive argument why you will very likely run out of people to meet.

For ease of explanation, I assume your class has $200$ other people (i.e. $201$ people including you). Tracking you as one of the $200$ just adds tediousness without any new insights.

Exact solution: Whether you run out of people depends on what happens during the first two years only. Let $A$ denote the class two years above you, $B$ denote the class one year above you, $C$ denote your class, and $D$ denote the class one year below you. For any $X \in \{A,B,C,D\}$ let $X_1, X_2$ denote the number of people you meet in class $X$ during your $1$st and $2$nd year respectively. (Note that $A_2$ is undefined, since class $A$ already left by your $2$nd year. Or you can consider $A_2 \equiv 0$. Same with $D_1$ since class $D$ hasn't arrived yet during your $1$st year.) Since you meet $300$ people every year, we have:

$$A_1 + B_1 + C_1 = B_2 + C_2 + D_2 = 300$$

Meanwhile, at the beginning of your $3$rd year, there is a pool of $200 \times 3 = 600$ people in classes $C, D, $ and the one after $D$. The number of new-to-you people is

$$Y = 600 - C_1 - C_2 - D_2 = A_1 + B_1 + B_2$$

The answer you seek is

$$p := P(\text{run out of people}) = P(Y \le 299) = P(A_1 + B_1 + B_2 \le 299)$$

Well, $A_1, B_1, B_2$ are dependent variables, but we can find the joint distribution easily. For any $(a_1, b_1, b_2)$ in range, define $c_1 = 300 - a_1 - b_1$, and we have:

$$P(A_1 = a_1, B_1 = b_1, B_2 = b_2) = {{200 \choose a_1} {200 \choose b_1} {200 \choose c_1} \over {600 \choose 300}} \times {{200-b_1\choose b_2} {400-c_1 \choose 300 - b_2} \over {600 - b_1 - c_1 \choose 300}}$$

The first term is $P(A_1 = a_1, B_1 = b_1)$ and the second term is $P(B_2 = b_2 \mid A_1 = a_1, B_1 = b_1)$. Now we just need to specify the set $S$ of $(a_1, b_1, b_2)$ triplets being summed over and we will have:

$$p = \sum_{(a_1, b_1, b_2) \in S} P(A_1 = a_1, B_1 = b_1, B_2 = b_2)$$

$S$ is basically the requirement on $Y$ plus for each binomial coefficient ${n \choose k}$ above we require $0 \le k \le n$. I.e. $S$ is defined by satisfying all of these:

  • $a_1 + b_1 + b_2 \le 299$

  • $0 \le a_1 \le 200$

  • $0 \le b_1 \le 200$

  • $0 \le c_1 = 300 - a_1 - b_1 \le 200$

  • $0 \le b_2 \le 200 - b_1$

  • $0 \le 300 - b_2 \le 400- c_1 = 400- (300 - a_1 - b_1)$

  • Note: the second denominator leads to: $300 \le 600 - b_1 - c_1 = 300 + a_1$ which is a repeat of $a_1 \ge 0$.

Sorry I haven't given too much thought into simplifying this. Instead, I threw together some approximations and came up with...

Intuitive argument why $p$ is high: The randomization happens year by year, sequentially, and during year $1$ we have $E[A_1] = E[B_1] = E[C_1] = 100$. So at the beginning of an "average" year $2$, you have only $100$ people left in classes $B$ and $C$ whom you haven't met, plus $200$ people in the new class $D$. Out these these $400$ you have to meet $300$, so $E[B_2] \approx \frac34 \times 100 = 75$. Thus

$$E[Y] = E[A_1 + B_1 + B_2] \approx 100 + 100 + 75 = 275 < 299$$

and this intuitively explains why you're likely to run out of people.

(While $275$ does not seem so far away from $299$, a secondary effect might also be hurting you: the variables $A_1, B_1$ are negatively correlated, and so are $B_1, B_2$, so intuitively the sum has smaller variance than it would if the three variables were mutually independent.)

0
On

The number is too long for a comment. For what it's worth, following saulspatz's answer with symbolic calculation using sympy yields

$$\frac{ 25585645982315492067657035143398653202691256916040715862276856268098480786289230303320231861285620700607883570859663670172278055071403580682911962391917951584664685470603522699406454625724625028610757477494945584536006865312829861448815083644996689944023154224929291879521966819855500125885198335483 }{ 5955286372376153632422599235793338630796552887670448794005330442636929487935438012394654039405297733558338499773863496276051902568528360918988070182251577225786820296667920754831940756698755817470229054601232971118016993631964253947354269321884124022884419619027034949838735322255862869435768006237486940 },$$

which evaluates to $4.2962914598021⋅10^{−6}$, so in accordance with the calculated value in his answer.

The code (runs in under four minutes on my machine):

import sympy
s = sympy.Symbol('s', integer=True)
j = sympy.Symbol('j', integer=True)
k = sympy.Symbol('k', integer=True)

tsum = 0
for s in range(101, 201):
    c = sympy.binomial(200, s)/sympy.binomial(299 + s, 300)
    for j in range(0, 301-s):
        d = c * sympy.binomial(200, j) * sympy.binomial(199, 300 - s - j)
        for k in range(301-s-j, 201-j):
            tsum += d * sympy.binomial(200 - j, k) * sympy.binomial(99+s+j, 300 -k)

prob = tsum/sympy.binomial(599, 300)
prob

And to evaluate:

1.0*prob