Estimate duplet probability of patients

25 Views Asked by At

I would like to estimate the probability that two patients in a system have the same first and last name and birthdate.

This is not easy as names are not distributed evenly and I don't know the number of names around. So I thought to estimate a worst case scenario like follows and would like to know if this makes sense:

The last name Müller (German) is most common name with 1.6%. The firstname Marie (also German) is currently the most common with 2.6%. As Birthdates are not distributed evenly across a Year I estimate 1/300 probability for the average birthdate

Now I just multiplied those three to get an estimate of the case that the most common scenario has a duplet. Does this make sense?

1

There are 1 best solutions below

0
On BEST ANSWER

Let $f_i,l_j,b_k$ be the probabilities to have $i$-th first name, $j$-th last name, and $k$-th birthday. Then the probability that the data base contains two people with three given characteristics is: $$ (f_il_jb_k)^2, $$ and the probability that there are at least two people with identical three characteristics is: $$\sum_{i,j,k}(f_il_jb_k)^2. $$

It was assumed that all three characteristics are independent, though in reality the first name and the birthday can heavily correlate.