What is the chance that 2 scanned licence plates from 2 different cameras are from the same vehicle?

40 Views Asked by At

Suppose I have to ANPR (automatic number plate recognition) cameras, which scan licence plates of vehicles. Most of the scanned vehicles have licence plates containing 6 characters, but some have a different number. Also, the scans are not done optimal, so the result of the ANPR software is a list of characters of the licence plate and for each character a probability that this is the correct character.

For example, if a camera gives me the result ['A', 'B', 'C', '1', '2', '3'] and [0.89, 0.78, 0.94, 0.56, 0.78] then the chance that the first character is an A is 89% and that it isn't an A is 11%, the chance that the second character is a B is 78%, etc.

Suppose I get 2 results in a format like this of licences with the same length (same amount of characters) from 2 different cameras, how do I calculate the chance that it is the licence plate from the same vehicle?

And second, how do I calculate the chance that it is the licence plate from the same vehicle if the lengths of the results from the 2 different cameras is not the same? This can happen for example when one character is missed by a camera.

1

There are 1 best solutions below

1
On BEST ANSWER

The second question is a combinatorics hell, and you should make several assumptions as @TonyK mentioned in the comments to even start to tackle the problem. I will try to answer the first, which also requires several assumptions to be made but is much easier to answer.

Let's say a license plate is composed of $n$ characters. You observe license plates in the wild, which are samples from the random vector $\boldsymbol{C} = (C_1, \ldots, C_n)$, corrupted by your devices/detectors (the software of the cameras). That is, the sample $\boldsymbol{c} = (c_1,\ldots,c_n)$ may be corrupted by camera A and what you get is $\boldsymbol{c}^A = (c_1^A,\ldots,c_n^A)$, and if it is corrupted by camera B you get $\boldsymbol{c}^B = (c_1^B,\ldots,c_n^B)$. You want to know if, given two corrupted samples $\boldsymbol{x}^A$ and $\boldsymbol{y}^B$, they come from the same original, true sample. That is, you want to compute: $$P(\boldsymbol{x} = \boldsymbol{y} | \boldsymbol{x}^A, \boldsymbol{y}^B)$$

We can express this differently by iterating through the values the random vector $\boldsymbol{C}$ can take: \begin{align} P(\boldsymbol{x} = \boldsymbol{y} | \boldsymbol{x}^A, \boldsymbol{y}^B) &= \sum_\boldsymbol{s} P(\boldsymbol{x} = \boldsymbol{s}, \boldsymbol{y} = \boldsymbol{s} | \boldsymbol{x}^A, \boldsymbol{y}^B) \\ &= \sum_\boldsymbol{s} P(\boldsymbol{x} = \boldsymbol{s} | \boldsymbol{x}^A) P(\boldsymbol{y} = \boldsymbol{s} | \boldsymbol{y}^B) \\ &= \sum_{s_1,\ldots,s_n} P((x_1,\ldots,x_n) = (s_1,\ldots,s_n) | \boldsymbol{x}^A) P((y_1,\ldots,y_n) = (s_1,\ldots,s_n) | \boldsymbol{y}^B) \\ &= \sum_{s_1,\ldots,s_n} \prod_i P(x_i = s_i | x_i^A)P(y_i = s_i | y_i^B) \end{align} We have made two assumptions here. First (second equality), that the observations $\boldsymbol{x}$ and $\boldsymbol{y}$ are independent (which may not be the case, since observing a set of plates with one camera constrains what the other will likely see, for example if they are near each other cars that appear in one might appear in the other). Second (last equality), that the individual characters in a plate are independent from each other (that might also be false, since plate numbers are generally assigned sequentially).

Now, we need to know $P(x_i = s_i | x_i^A)$, that is, the probability that the true $i$-th character is $s_i$ given that the observed $i$-th character is $x_i^A$. The best way to do that would be to calibrate these values running the system on an annotated dataset. However, we already know $P(x_i = s_i | x_i^A)$ when $s_i = x_i^A$ because that's the score your algorithm returns for character $i$. For the other cases we can make the (yet another!) assumption that the probability is equally distributed among them. That is: $$P(x_i = s_i | x_i^A) = \begin{cases} \text{score}_A(i)\quad &\text{if } s_i = x_i^A\\ \frac{1-\text{score}_A(i)}{M-1}\quad &\text{otherwise} \end{cases}$$

Here $M$ is the size of the character set. The same reasoning applies to $P(y_i = s_i | y_i^B)$. Naturally, this is not amenable to be calculated by hand on a piece of paper, but you can easily embed the computation in your software.