How To Calculate Binomial Distribution Of Really Small %?

1.8k Views Asked by At

I asked this question on the Bitcoin Forum, but I think it's more appropriate for a mathematics forum.

I'm making an informative video and I need a binomial distribution calculation. I want to find out how many trials are needed to get 1%, 50% and 90% likelihood 1 or more successes. The problem is that the likelihood of success is 1 out of 2^160 (number of distinct bitcoin/ethereum addresses).

Normally for something like this, I would use a binomial distribution calculation in Excel using this formula:

=1-BINOM.DIST(0,????,2^-160,TRUE)

I would then tinker with the ???? until the entire cell result returned 1%, 50% and 90%. However, Excel can't handle numbers anywhere near this large. Does anyone know of a way I can calculate the number of trials required for these 3 percentages given the infinitesimally small chance of success? It would be great if there was an online tool I could use to support my results.

Just to illustrate what I'm looking for. If this analysis was for something much simpler, such as a probability of success being 1%, then I could calculate the results to be:

  • 229 trials needed for 90%, | 89.99%=1-BINOM.DIST(0,229,0.01,TRUE)
  • 69 trials needed for 50%, | 50.01%=1-BINOM.DIST(0,69,0.01,TRUE)
  • 1 trial needed for 1%, | 1.00%=1-BINOM.DIST(0,1,0.01,TRUE)
5

There are 5 best solutions below

4
On BEST ANSWER

Using manual tinkering with R I get the following values

pbinom(0,3.365231884e48,2^(-160), FALSE)   = 0.9000000000339017
pbinom(0,1.0130357393e48,2^(-160), FALSE)  = 0.5000000000001161
pbinom(0,1.46885823057e46,2^(-160), FALSE) = 0.01000000000005571

As a check

pbinom(0,229,0.01, FALSE) = 0.8998941257385102
pbinom(0,69,0.01, FALSE)  = 0.5001629701008011
pbinom(0,1,0.01, FALSE)   = 0.01
3
On

Letting $p$ be the probability of success on one trial, and letting $T$ be the desired probability that you want to exceed (for at least one success in $n$ trials), we need $$1-(1-p)^n>T$$ This can be rewritten as $(1-p)^n<1-T$.

So we want $n\ln(1-p)<\ln(1-T)$

Which is equivalent to $n>\frac{\ln(1-T)}{\ln(1-p)}$

If $p$ is very small you get a very good approximation to $\ln(1-p)$ with a degree one Taylor approximation: $\ln(1-p)\approx -p$. (The next term of the Taylor approximation will be $\frac{-p^2}{2}$ which can probably be considered negligible; and the overall error will also be around this value.)

So you would want $n$ to be around $\frac{\ln(1-T)}{-p}$

In the case of $p=2^{-160}$, this gives $n > -2^{160}\cdot \ln(1-T)$

3
On

I observe that $229/0.01^{-1} \approx 3.36\times 10^{48}/\left(2^{-160}\right)^{-1}$, so I believe @gammatester's numbers. What I don't believe is that we have to search $229\%$ of the address space to be only $90\%$ confident to find the one address for which we search.

(More precisely, $\log_2 3.365231884 \times 10^{48} = 161.203\dots$, which is $2.302\dots$-times the size of your address space.)

The process you describe with this binomial distribution is randonly picking an address, testing it, then placing it back into the population of addresses that may be picked in the future. A substantially more efficient method is to sequentially check addresses -- meaning that we are $100\%$ certain to have found the address after checking $100\%$ of the address space. In fact, we are $1\%$ certain after checking $1\%$ of the space, $50\%$ after $50\%$, and so on.

0
On

For a $\text{bin}(n,p)$, $P(0\text{ successes}) = (1-p)^n$

To have a $q= 0.01, 0.50$ and $0.99$ chance of 1 or more successes you would correspondingly have $1-q$ -- i.e. $0.99, 0.50$ and $0.01$ chances of 0 successes (complementary probabilities of the complementary event).

For $P(0\text{ successes}) = 1-q$ we set $1-q = (1-p)^n$, or $n=\log(1-q)/\log(1-p)$ (for any base of logs you like). I'll work with natural logs.

Now $\log(1-p)\approx -p = -2^{-160}$ (to a high degree of accuracy, so we have $n\approx 2^{160}\log(\frac{1}{1-q})$.

Now for $q=0.01, 0.5$ and $0.99$, $\log(\frac{1}{1-q}) \approx 0.01005, 0.69315$ and $4.60517$

More generally for very small $p$, you need $n\approx \frac{1}{p}\log(\frac{1}{1-q})$. You might try it for say $2^{-16}$, for which it's still reasonably accurate; it's much more accurate for very small $p$.

So (calculating in R):

n <- (2^16)*log(1/(1-c(.01,.5,.99)))
> n
[1]    658.6588  45426.0936 301804.4333

(1-(2^-16))^n
[1] 0.989999924 0.499997356 0.009999649

or since we can't have $n$ being fractional, if we truncate:

(1-(2^-16))^trunc(n)
[1] 0.990009876 0.499998070 0.009999715

These give essentially the $1-q$ values we required.

0
On

The 'bc' unix utility can be used to find the number of minimum trials using paw88789's formula :

$n > \ln(1−T) / \ln(1−p).$

#!/bin/sh
bc -l <<EOF |
scale=1000

p=1/(2^(160))

t= 1; print  t, "/100: minimum trials needed "; l(1-t/100)/l(1-p); print "|\n";
t=50; print  t, "/100: minimum trials needed "; l(1-t/100)/l(1-p); print "|\n";
t=90; print  t, "/100: minimum trials needed "; l(1-t/100)/l(1-p); print "|\n";

EOF
tr -d '\\\n'|tr '|' '\n'|sed 's@\..*@@'

$ sh ./2845598

1/100: minimum trials needed 14688582305617833934466621173201888921196164071
50/100: minimum trials needed 1013035739299659071135698605846798551586536899366
90/100: minimum trials needed 3365231883504527125129111981421568120141379515366