What's the probability of guessing a secret code if the attempts are limited and you stop at the first success?

1k Views Asked by At

Some time ago I tried to answer a question on Security Stack Exchange, but I realised I'm not sure I got the maths right, and I'm here to ask for help.

The scenario is as follows: somebody is trying to break into another user's account. That account is protected by a password and by a so-called "second factor" (like a code sent vis SMS, or generated by an app like Google Authenticator), and the hypothesis is that the attacker knows the password, so the only protection is given by this code. If it is a 3-digit code, there are 1000 possible codes, and with a single attempt the probability to guess it is $p = \frac{1}{1000}$. This is easy. The point where I'm stuck at is calculating the probability of guessing the right code if he has more than one attempt. For example, he can try at most 3 times, and after that the account is locked for protection, and it's game over. What's the probability of success, in that case?

Here's what I tried.

First of all, we have to establish whether the correct code is the same for all three attempts, or whether it changes every time. I've explored both cases.

Case A: the code is always the same.

I think this can be modelled with a hypergeometric variable. As Wikipedia says, it's "the probability of k successes in n draws, without replacement, from a finite population of size N that contains exactly K successes, wherein each draw is either a success or a failure". In our case, with N = 1000, K = 1, n = 3, k = 1, it's

$$ \require{cancel} P_{Code\,is\,guessed} = \frac{\binom{1}{1} \binom{1000-1}{3-1}}{\binom{1000}{3}} = \frac{\binom{999}{2}}{\binom{1000}{3}} = \frac{\frac{999!}{2!\cdot\cancel{997!}}}{\frac{1000!}{3!\cdot\cancel{997!}}} = \frac{\frac{\cancel{999!}}{\cancel{2}}}{\frac{1000\cdot\cancel{999!}}{3\cdot\cancel{2}\cdot1}} = \frac{3}{1000} $$

To confirm this, I've drawn a tree of the possible outcomes: at every node there's a guess. The green branch corresponds to guessing the right code, the red branch to not guessing it and proceeding to the next attempt (unless it's the last one): Tree of possible outcomes when the code is the same

The probability of ending up at a certain leaf is the product of all the probabilities of the intermediate nodes, and the probability of reaching any of the "green" leaves is simply the sum, as they are independent events. Therefore,

$$ P_{Code\,is\,guessed} = \frac{1}{1000} + (\frac{\cancel{999}}{1000} \cdot \frac{1}{\cancel{999}}) + (\frac{\cancel{999}}{1000} \cdot \frac{\cancel{998}}{\cancel{999}} \cdot \frac{1}{\cancel{998}}) = \frac{1}{1000} + \frac{1}{1000} + \frac{1}{1000} = \frac{3}{1000} $$

This looks good, because it's the same result. I'm fairly confident it's correct. Is it right?

Case B: the code changes every time

I tried to apply the same reasoning to the case where the code changes at every failed attempt. Intuitively, the fact that the code changes every time means that for his second attempt the attacker can't restrict the range to 999 possible codes, but he has to consider, again, 1000. I think this corresponds to a binomial distribution where he tries to have k = 1 successes in n = 3 attempts, but I already have my doubts: if we guess the right code, we immediately stop, but I think the binomial assumes that we make 3 attempts in any case, that is, even if we guess the right code at the first attempt, the binomial assumes we try (and fail) 2 more times.

Anyway, with $p=\frac{1}{1000}$, n=3, k=1, the probability is

$$ P_{Code\,is\,guessed} = \binom{n}{k} \cdot p^k \cdot (1-p)^{(n-k)} = \binom{3}{1} \cdot \frac{1}{1000} \cdot (\frac{999}{1000})^{(3-1)} = 3 \cdot \frac{1}{1000} \cdot (\frac{999}{1000})^2 = 0.002994003 $$

Again, to confirm the result I tried building the tree: Tree of possible outcomes when the code changes every time

And in this case,

$$ P_{Code\,is\,guessed} = \frac{1}{1000} + \frac{999}{1000} \cdot \frac{1}{1000} + (\frac{999}{1000})^2 \cdot \frac{1}{1000} = 0.002997001 $$

which is different from the previous result, the one that I got using the binomial! This means at least one of these calculations is wrong (and possibly both), but I'm not sure where the error is.

Anyway the tree convinces me more, and as I said I suspect that the problem is that the binomial isn't the right choice here, because it assumes that the attacker always makes 3 attempts, and this is not realistic in this scenario.

So, these are my questions:

  1. Are my calculations correct for case A (the code is always the same)?
  2. For case B, I'm sure there's an error. Where is it?

I even tried simplifying the problem to rolling a die trying to get a certain face, and if the result is not the desired one, rolling one more time. I think the trick might be that if the first roll is successful we don't have to add the probability of guessing at the second roll, because there simply won't be a second roll. But, again, I'm confused.

2

There are 2 best solutions below

4
On

For part A you are correct but there's a more simple solution. Your guesses are three different numbers from $1$ to $1000.$ The right answer is a random one of the $1000$ numbers. You cover three of them with your answer, so there's a $3/1000$ chance you cover the right answer and a $997/1000$ chance you don't.

For B it is correct that you should model this as independent bernoulli trials with $p=1/1000,$ to which the binomial distribution is applicable. And you're right that we don't want the probability of one correct guess. Rather, we want the probability of one or more correct guesses, because while you only need to get it right once to have succeeded in your goal of correctly guessing the password, you must also count the (very rare) situations where you succeed twice or three times as successes. The easiest way to calculate this is to realize it's the same thing as the probability that you won't fail three times in a row. The probability that you fail three times in a row is $(999/1000)^3$ so the probability that you succeed one or more times is $$ 1-(999/1000)^3$$ which is the same as your second answer that you arrived at by the tree.

0
On

Here is a different way to work out part B:

Consider the proposition: \begin{align} A_i\equiv\textrm{$i$-th guess is correct, }i\in\{1,2,3\} \end{align}

In what follows sum of propositions indicates logical OR between them and product of propositions indicates logical AND between them. $\overline{()}$ indicates logical negation of the proposition. Since you stop at the first correct guess, the proposition that the code is guessed right is $A_1+\bar{A}_1A_2+\bar{A}_1\bar{A_2}A_3$. The probability that the code is guessed right in one of three attempts, with stopping at first success, is: \begin{align} P(A_1+\bar{A}_1A_2+\bar{A}_1\bar{A_2}A_3)&=P(A_1)+P(\bar{A}_1A_2)+P(\bar{A}_1\bar{A_2}A_3)\quad\textrm{(mutually exclusive)}\\ &=P(A_1)+P(\bar{A}_1)P(A_2)+P(\bar{A}_1)P(\bar{A_2})P(A_3)\quad\textrm{(independence)}\\ &=\frac{1}{1000}+\frac{999}{1000}\times\frac{1}{1000}+\left(\frac{999}{1000}\right)^2\times\frac{1}{1000}\\ &=\frac{2997001}{1000000000}\approx 0.003 \end{align}