Estimate password entropy by "trying out" passwords

105 Views Asked by At

The entropy of a password of a fixed length $n$ and $c$ possible characters is calculated by $n*log_2(c)= log_2(c^n)$, see also here: https://ritcyberselfdefense.wordpress.com/2011/09/24/how-to-calculate-password-entropy/

Assuming that I have a kind of "blackbox" that has unknown requirements on a password, especially like: Min occurences of a certain group of characters, different possible characters for different character positions, etc. I can give this blackbox a password that I generated and ask if it is an accepted one or not.

Is it possible to estimate the entropy of the "room" of available passwords by generating an amount of random passwords (lets say 1 million) and see how many have been accepted? What I do know of the room of available passwords is which characters may be used, so I only have to try passwords with these characters, not with the whole UTF-8 table. From this ratio $r$ of tried and accepted password, I can calculate the entropy with a formular like this:

$e_{estimated} = log_2(r*c^n)$

From my tests this formular works quote well for an estimation of the entropy. I am looking for another formular that can tell me how many passwords I have to try to make an assumtion like this:

With a probability of $p$, $|e_{actual} - e_{estimated}| < epsilon$

Is something like this even possible?

1

There are 1 best solutions below

3
On BEST ANSWER

Yes, this is entirely possible. As long as your test passwords are randomly distributed in the space, you will get the measure you are seeking. The hard part comes if the requirements on passwords allow most or very few of them to pass. You will detect that, but will not get a good measure.

As an example, let us assume a character set of $26$ upper case letters, $26$ lower case letters, $10$ numbers and $10$ symbols, for a total of $62$ characters, and passwords $12$ characters long. If the requirement is just one symbol or number, there are $62^{12}\approx 3.23 \cdot 10^{21}$ character strings, of which $62^{12}-52^{12}\approx 2.84 \cdot 10^{21}$ are valid passwords. That is about $90\%$ valid, and your sampling will work well. If we allowed all passwords, your sampling would show that but you might worry that since you only sampled a tiny fraction of the space there are some invalid ones out there.

To estimate the error, you can use the binomial distribution. If you do $N$ samples and the fraction of valid passwords is $p$, the standard deviation of the number of valid ones found is $\sqrt{Np(1-p)}$ This is valid when $N$ is large and $p, 1-p$ are not too close to zero. Since $\sqrt {p(1-p}$ has a maximum of $1/2$, you can say the standard deviation of the fraction of valid passwords is less than $2/\sqrt N$