Formula for the probability of two people with the same name in a group.

110 Views Asked by At

I'm looking to figure out a formula that can help me calculate the probability that two people in a group share the same full name. Specifically, my question is: In a group of $X$ people (ranging from $25$ to $250,000$), if I select one person at random, what is the probably that someone else has the same name. This will of course depend on the uniqueness of the randomly selected person's name, as well as the size of $X$.

To solve this, I have census data that tells me the popularity / number of people with any particular given (first) name and any particular surname. Unfortunately, I can't get this combined (number of people with any particular full name).

I imagine I first need to combine the uniqueness of the two names to get an estimate of uniqueness for the full name, and then need to normalize this to the group size. I'm not sure if this problem is some modification of the birthday paradox, or, because I'm looking for a specific match and an infinite number of potential names, this is a bad place to start. Either way, I'm not really sure where to take it from here, but hopefully brighter minds can give guidance.