Combining statistical distributions

98 Views Asked by At

I have a situation where a distribution is dependent on 2 variables, one of which follows the poisson distribution, and the other the normal distribution, and I want to establish the method of calculating the meand and spread for the dependent variable.

Specifically

I have an algortihm whcih matches names against a bad guy list. The bad guy list is static. The matching algorith can identify 0, 1, 2, or more matches. ie a Poisson distribution. So if I only apply one name I can calcualte the lamda for the distribution.

But, I don't just apply one name I apply many names, once per day, so one day one I might have 100, day 2, 150, etc. The number of names applied each day follows a normal distribution. For this I can calculate the mean and s.d.

What I want to be able to find is the number of matches I can expect each day, and the potential spread. ie combining the 2 distributions

The reason to do this is so that I can determine how many people I need to review the matches, given that it take a set amount of time to review each one. Getting the calculation wrong can be costly, or increase the risks that we may not identify correctly a bad guy becasue we dont have enough staff.

1

There are 1 best solutions below

0
On

Some starting points: The ten largest airports in the world average 70,0124,224 passengers per year (2013; ATL, PEK, LHR, HND, ORD, LAX, DXB, CDG, DFW, CGK)$^1$ or 192,121 passengers per day. Assume that 1 per 1,000 passengers are bad guys (adjust as appropriate). It would be reasonable to assume that $A$ = Number of passenger $A$rrivals at the airport's checkpoint is distributed Poisson with mean $\lambda$. Then the the number of $B$ad guys matched $given$ the number of $A$rrivals at the checkpoint will be Binomial and the probability of a bad guy will be:

$P_{Binomial}(B = k | A = j) = \dfrac{j!}{k!(j-k)!} p^k(1−p)^{j−k}$

with probability of success $p$ = 0.001. The unconditional probability distribution of $B$ad guys matched at the checkpoint will be:

$P(B=k) = \sum_{i=k}^{\infty} P_{Binomial}(B = k | A = i)P_{Poisson}(A = i)$

or

$P(B=k) = \sum_{i=k}^{\infty} \dfrac{i!}{k!(i-k)!} p^k(1−p)^{i−k} \dfrac{\lambda^i e^{-\lambda}}{i!}$

Let $m=i−k$. Then

$P(B=k) = \sum_{m=0}^{\infty} p^k(1−p)^m \dfrac{\lambda^{k+m} e^{-\lambda}}{k!m!} = \dfrac{(\lambda p)^ke^{−\lambda}}{k!} \sum_{m=0}^{\infty} \dfrac{(\lambda(1−p))^m}{m!} = \dfrac{(\lambda p)^ke^{−\lambda}}{k!} e^{\lambda (1-p)}$

so that

$P_{Poisson}(B=k) = \dfrac{(\lambda p)^ke^{−\lambda p}}{k!} $

Hence, $B$ is distributed Poisson with mean $\lambda p$ and standard deviation $\sqrt{\lambda p}$. A reasonable range then for the number of $B$ad guys matched at the checkpoint would be $\lambda p \pm 3\sqrt{\lambda p}$ for sufficiently large $\lambda$. For example, if 192,121 passengers are expected to arrive $(\lambda)$ and the probability of any arrival being a bad guy is 0.001 ($p)$, then 192 bad guys are expected plus or minus 42 bad guys.

$^1$wiki http://en.wikipedia.org/wiki/World%27s_busiest_airports_by_passenger_traffic#2013_statistics