Birthday problem - men and women

1k Views Asked by At

Birthday problem is well known. Here is the graph illustrating counterintuitive fact that only 23 people in a group are enough for probability of two having the same birthday to be higher that 1/2:

enter image description here

Here is slightly different problem:

In a set of n randomly chosen people, what is the probability of a man and a woman having the same birthday?

Is it 1/3 of the probability from original birthday problem? (given that there are 3 possibilities for same birthday: man-man, woman-woman, woman-man)

I don't know how to approach it. :(

4

There are 4 best solutions below

2
On BEST ANSWER

Suppose a person chosen at random can have any of $N$ possible birthdays with uniform probability and can be male or female with equal probability.

Consider a group of $k$ persons with $1$ to $k$ unique birthdays, such that any two or more persons with the same birthday are all of the same sex. When we add one person to this group, the possible outcomes can be organized into three events:

Event $A_{k+1}$: person $k+1$ has a different birthday than any person already present. In this case there is no "match," and the number of unique birthdays present increases by one.

Event $B_{k+1}$: person $k+1$ has the same birthday and same sex as one or more people already present. In this case there is no "match" and no change in the number of unique birthdays present.

Event $C_{k+1}$: person $k+1$ has the same birthday as one or more people already present, but opposite sex. In this case there is a "match."

If there were $U_k$ unique birthdays among the first $k$ persons, then $$P(B_{k+1}) = P(C_{k+1}) = U_k/(2N)$$ and $$P(A_{k+1}) = (2N - 2U_k)/(2N) = (N - U_k)/N.$$

Let $AB_k = A_k \cup B_k$, which is the event that the first $k$ persons did not include a SBOS (same-birthday opposite-sex pair. The probability that we first observe such a pair when we add person $k + 1$ is \begin{eqnarray} P(C_{k+1}|AB_k) &=& \sum_{u=1}^k P(C_{k+1}|U_k=u) \, P(U_k=u|AB_k) \\ &=& \sum_{u=1}^k \frac{u}{2N} P(U_k=u|AB_k) \\ &=& E_k \end{eqnarray} where $E_k = E(U_k|AB_k)$ is the conditional expectation of the number of unique birthdays among the first $k$ persons given that no male and female have the same birthday. The value of $E_k$ can be computed by the recursion \begin{eqnarray} E_{k+1} &=& \frac{(E_k + 1) \, P(A_k) + E_k \, P(B_k)}{P(A_k) + P(B_k)} \\ &=& E_k + 1 - \frac{E_k}{2N - E_k}. \end{eqnarray} where $E_1 = 1$.

In order not to find a SBOS pair in the first $k$ persons, we must not find such a pair when we add a second person to the group; then we must not find match when we add a third person, given that there was match before then; and similarly for the fourth person and every other up to and including the $k$th person. The probability of finding a SBOS pair in the first $k$ persons is $$P_k = 1 - \prod_{i=1}^{k-1} \left( 1 - \frac{E_i}{2N} \right).$$

Discussion:

The formula for $P_k$ above is related to the usual question of matching birthdays. When we ask only to match birthdays without regard to gender, however, the numerator $E_k$ becomes $k+1$ (because all birthdays in a non-matching set of people must be unique), and of course the denominator is just $N$ rather than $2N$.

If you assume that a person chosen at random is exactly 50% likely to be female and 50% likely to be male, and that the genders of any two people are independent in probability, then the chance that two people chosen at random are of opposite genders is $1/2$, not $1/3$. That's because to get two females, you need two independent events each with probability $1/2$, so the probability of the two events both occurring is $(1/2) \cdot (1/2) = 1/4$. Likewise, the probability of two males is also $1/4$. The probability of one of each is $1 - ((1/4) + (1/4)) = 1/2$; no matter which gender the first person has, the second person has a $1/2$ probability of matching it.

The naïve conclusion from this is that the probability of finding a man and a woman with the same birthday among a randomly selected set of $N$ people (where $N$ might be 23, for example) is just $1/2$ times the probability that there will be any pair of individuals in that set (regardless of gender) who have the same birthday. This is not accurate. For example, with 23 people, there is approximately a 1.32% chance that three people share the same birthday, and in that case the probability that you have at least one man and one woman is $3/4$. There is also a non-zero probability that you have two matching pairs (two different birthdays, each shared by two people), in which case the probability that at least one of those pairs is a male-female pair is $3/4$. There are also cases with more than two matching pairs, larger numbers of people with the same birthday, and so forth.

The total probability of a male and a female with the same birthday is therefore more than $1/2$ times the probability that there will be any two individuals with the same birthday.

How much more is a difficult question. The usual method for the probability of a matching pair does not work, because at any stage, as you add a new person to a group that already contains $k$ people, the probability that this person will form part of a matching pair is not $1 - k/(2 \cdot 365)$ as you might expect. If you were trying to get two people of the same gender with the same birthday, that would be the correct probability, but as you are trying to find two people of opposite gender, among the $k$ people you have already collected there may be two or more individuals that have the same birthday and the same gender. When that happens, there are fewer than $k$ possible gender-birthday combinations that will make an opposite-gender matching pair with anyone already in the room.

The Wikipedia page http://en.wikipedia.org/wiki/Birthday_problem gives a method for computing the probability of an opposite-gender pair with the same birthday in a room containing $m$ men and $n$ women, but that is not the question you asked. To use the formula from Wikipedia to find the probability for 23 randomly-chosen people, you could consider 24 separate cases: one for each possible split between men and women ranging from 23 women and no men, 22 women and one man, 21 women and 2 men, all the way to 23 men and no women. For each of these cases you could compute its probability (using a binomial distribution), multiply that by the probability of a suitable pair (using the formula from Wikipedia), and then add all those products together to get your total probability. I'll leave that exercise to someone else, at least for now.

Another tempting approach is to take the probability of matching birthdays (regardless of gender), subtract the probability of matching birthdays and gender, and trust that the result is the probability of matching birthdays and opposite genders. Unfortunately, as soon as three people are gathered, there is a non-zero probability that they will include a same-sex pair with the same birthday and an opposite-sex pair with the same birthday. The sum of the probabilities of those events is therefore not equal to the probability of any matching birthday.

3
On

Assuming you mean that we choose among an infinite set of people having birthdays uniformly distributed among $\{1,...,365\}$ and genders uniformly distributed among $\{m,f\}$ a randomly chosen person will be represented by coordinates $(x,y)$ where $x$ is the birthday and $y$ is the gender. There are $365\cdot 2$ such pairs. Since birthday and gender are independent these are uniformly distributed as well.

Now let $\overline p(n)$ denote the probability that among $n$ people none of them have the same coordinates. For $n<365\cdot 2$ this can be calculated as $$ \begin{align} \overline p(n)&=1\cdot\left(1-\frac{1}{365\cdot 2}\right)\cdots\left(1-\frac{n-1}{365\cdot 2}\right)\\ &=\frac{(365\cdot 2)!}{(365\cdot 2)^n\cdot(365\cdot 2-n)!} \end{align} $$ And the probability $P_n$ that at least two out of $n$ people have the same gender and birthday will thus be $$ P_n=1-\overline p(n) $$ which I do not care to write out in further detail :)

0
On

PLEASE NOTE. This is really a comment rather than an answer, but it is too long to be posted as a comment. I request that people do not upvote or downvote it. However comments are welcome!


The point I would like to make is that we cannot, as some have suggested, treat the problem of cross-gender birthday coincidences (CGBCs) in the same way as the "classical" birthday problem but with twice as many options: in effect, taking a $730$-day year instead of the normal $365$ days.

First reason: in the $730$-day classical problem, the problem of no coincidences for $n=731$ is clearly zero. However the probability of no CGBCs for $n=731$, though presumably very small, is not zero, as all $731$ people chosen could be of the same gender. The probability is therefore at least $1/2^{730}$, assuming equal distribution of men and women in an essentially infinite population.

Second reason: we compare the classical problem and the CGBC problem for $n=3$. In the classical problem the probability of no coincidence for $n=3$ is $$P(\hbox{no coincidence for $n=2$})\times P(\hbox{no coincidence for $n=3$}\mid\hbox{no coincidence for $n=2$})\ .$$ The first factor is $\frac{729}{730}$. The second is $\frac{728}{730}$, because two of the $730$ dates are already "taken".

In the current problem we have again that the problem of no CGBC for $n=3$ is $$P(\hbox{no CGBC for $n=2$})\times P(\hbox{no CGBC for $n=3$}\mid\hbox{no CGBC for $n=2$})\ ,$$ and again the first factor is $\frac{729}{730}$. But the second is now much harder to calculate since it depends on whether the reason for no CGBC with $n=2$ was same gender, or different birthday. In fact it is probably easier to compute the desired probability from scratch. With $n=3$ there are two mutually exclusive options.

  • We choose three people of the same gender: probability $\frac{1}{4}$.
  • We choose one person of one gender, and two of the other gender, with birthdays not matching that of the first person: probability $\frac{3}{4}(\frac{364}{365})^2$.

The required probability is $$\frac{1}{4}+\frac{3}{4}\Bigl(\frac{364}{365}\Bigr)^2 =\frac{365^2+3\times364^2}{730^2} =\frac{728^2+728+1}{730^2} =\frac{729\times728+1}{730^2}$$ and not $\frac{729\times728}{730^2}$ as before.

0
On

I reconsidered the problem since I realized how solving the same-gender problem is NOT that closely related to solving the cross-gender problem, just like many others have noted.

But instead of following the approaches suggested by others, I have reformulated the setup and derived recurrence relations for my reformulation. Here is what I did:

Let $P_n(m,f)$ denote the probability of ending up with $n$ people with NO pair of man and woman sharing birthday and where the male birthdays occupy $m$ different dates and the female birthdays occupy another $f$ distinct dates.

With the above we have for $n\geq 0$: $$ P_{n+1}(m,f)= \begin{array}{ll} +&P_n(m-1,f)\cdot\tfrac{1}{2}\cdot\tfrac{365-((m-1)+f)}{365} &\text{if }m>0\\ +&P_n(m,f-1)\cdot\tfrac{1}{2}\cdot\tfrac{365-(m+(f-1))}{365} &\text{if }f>0\\ +&P_n(m,f)\cdot\tfrac{1}{2}\cdot\left(\tfrac{m}{365}+\tfrac{f}{365}\right) &\text{if }m+f\leq n \end{array} $$ and this $P_{n+1}(m,f)$ only makes sense for $m+f\leq\min(n+1,365)$ since otherwise we would have more birthdates than either people or days of year. One can easily replace $n+1$ by $n$ and $n$ by $n-1$ in the above, but to make those lines more reader friendly I chose to let the most repeated index be $n$ to shorten notation.

Explanations:

  1. $\tfrac{1}{2}\cdot\tfrac{365-((m-1)+f)}{365}$ is the probability of the $(n+1)$'th person being male and not having one of the $(m-1)+f$ already occupied birthdates thus adding $1$ to number of male dates.

  2. $\tfrac{1}{2}\cdot\tfrac{365-(m+(f-1))}{365}$ has the exact same explanation as the above figure with the word male replaced by female.

  3. $\tfrac{1}{2}\cdot\left(\tfrac{m}{365}+\tfrac{f}{365}\right)$ denotes for one part the probability of the $(n+1)$'th person being a male having one of the $m$ birthdates already occupied by men thus not changing the number. This accounts for the $\tfrac{1}{2}\cdot\tfrac{m}{365}$ part of the figure. A similar argument for adding a female on one of the occupied female birthdates accounts for the other part.

With this we can now define $$P_n=\sum_{m,f} P_n(m,f)$$ as the total probability of $n$ people not having a pair of opposite gender sharing a birthday.

I implemented these definitions in an ipython notebook via the following code:

from __future__ import division
P={(0,0,0):1.0,0:1.0}
maxn=40
for n in range(1,maxn+1):
    P[n]=0
    for m in range(0,min(n,365)+1):
        for f in range(0,min(n-m,365-m)+1):
            P[(n,m,f)]=0
            if m>0:
                P[(n,m,f)]+=P[(n-1,m-1,f)]*0.5*(365-(m-1+f))/365
            if f>0:
                P[(n,m,f)]+=P[(n-1,m,f-1)]*0.5*(365-(m+f-1))/365
            if m+f<n:
                P[(n,m,f)]+=P[(n-1,m,f)]*0.5*(m/365+f/365)
            P[n]+=P[(n,m,f)]
for n in range(1,maxn+1):
    print n, ':', P[n]

producing the following output:

1 : 1.0
2 : 0.998630136986
3 : 0.995896040533
4 : 0.99180893637
5 : 0.986385569532
6 : 0.979648089944
7 : 0.97162390155
8 : 0.962345476415
9 : 0.951850135573
10 : 0.940179798689
11 : 0.927380704889
12 : 0.913503107365
13 : 0.89860094457
14 : 0.882731491026
15 : 0.865954990904
16 : 0.84833427764
17 : 0.829934382971
18 : 0.81082213875
19 : 0.791065774958
20 : 0.770734517272
21 : 0.749898187457
22 : 0.728626809789
23 : 0.706990226541
24 : 0.685057725405
25 : 0.662897681551
26 : 0.640577216773
27 : 0.61816187797
28 : 0.595715336916
29 : 0.573299113048
30 : 0.550972320697
31 : 0.528791441903
32 : 0.5068101257
33 : 0.485079014433
34 : 0.463645597425
35 : 0.442554092023
36 : 0.421845351807
37 : 0.401556801477
38 : 0.381722397735
39 : 0.362372615245
40 : 0.343534456566

where you can see that for 33 people the probability of not having a cross-gender pair passes 50%. So for 33 people it is more likely to have a cross-gender pair sharing a birthday than not to have it.

If we plot $1-P_n$ which is the probability of male and female sharing birthdays using my program for $n=1..70$ it looks as follows:

enter image description here

Perhaps someone could cleverly transform the recurrence relation into something nicer, but that is all I have got for now.