Estimate number of guesses for credentials

83 Views Asked by At

Each account credential consists of User Number and Date of Birth.

User Number is a unique, sequential number that must be in the range of 1,000,000 to 22,000,000. So we know there are 21,000,000 existing User Numbers. A User Number will only ever exist in one credential.

Date of Birth can be any date between 1/1/1920 and 1/1/2002. So that's 29,952 possible birth dates. There are many valid account credentials for each Date of Birth. For instance, there may be 100 valid accounts for users born on 5/5/1955.

I am able to guess credentials and receive a correct/incorrect response. If I were to systematically take each Date of Birth and then guess all the possible User Numbers to figure out which are correct and which are incorrect, how many guesses would it take in the worst-case scenario?

Here's a simple illustration of the algorithm:

Guess 1: 1/1/1920 - 1000000 - Incorrect
Guess 2: 1/1/1920 - 1000001 - Correct
Guess 3: 1/1/1920 - 1000002 - Correct
Guess 4: 1/1/1920 - 1000003 - Incorrect
...
Guess N1: 1/2/1920 - 1000000 - Incorrect
Guess N2: 1/2/1920 - 1000003 - Incorrect
Guess N3: 1/2/1920 - 1000004 - Incorrect

Some quick multiplication tells us that this will result in 628,992,000,000 guesses. Although, that figure doesn't account for a heuristic that should significantly reduce the number of required guesses. Knowing that each User Number can only be used in one credential pair, we needn't guess a User Number again once we've found its matching Date of Birth. I've attempted to show this in the illustration above via Guess 2 and Guess 3.

Let's assume that the User Numbers are distributed equally among the birth dates. So every day should have 701 valid numbers (21,000,000/29,952).

If we wanted to find all of the valid credentials, how would we figure out a worst-case or expected-case guessing scenario? Obviously, the guessing algorithm will become more efficient as additional correct credential pairs are discovered.

2

There are 2 best solutions below

5
On BEST ANSWER

Suppose there's only one user number. You'll need to guess $29951$ dates at most, because after guessing $29951$ dates wrong you'll know that the $29952$nd date is the correct date. If you get any guess right, you can stop after that. Assuming the record date is uniformly random from among the possible dates, the expected number of guesses you'll need is

$$\frac{1+2+3+4+5+6+\ldots+29949+29950+29951+2995\color{red}{1}}{29952}.$$

If there are $21000000$ user numbers, you'll just have to repeat the above process $21000000$ times, once for each user number, so you'll need $21000000$ times the guesses both worst-case and expected-case.

0
On

Worst-case: Every single one happens to be on the last date you check. You have to do $29951 \cdot 21000000 = 628971000000$ checks before you have one date left and you know that everyone must be from that date.

Expected-case: @Magma is correct. Here is another justification for only looking at one credential - suppose the random variable $X_i$ denotes the number of dates you tested with credential $i$ before finding the correct value. Then, the expected total number of tests you need to do is $E(X_1 + X_2 + \dots + X_{21000000})$. By linearity of expectation that is $E(X_1) + E(X_2) + \dots + E(X_{21000000})$ and if we assume no special knowledge about any credential, all of those are equal and we get an expected number of $21000000\cdot E(X_1)$. Now, the expected number of tests for a single credential is $\frac{29951}{2}$, so the expected number of tests total is half the worst-case scenario, $\frac{29951 \cdot 21000000}{2}$.