Finding minimum value of bloom filter function

183 Views Asked by At

Bloom filter definition from Wikipedia:

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with a "counting" filter); the more elements that are added to the set, the larger the probability of false positives.

If m is the number of bits in the array, k is the number of hash functions and we have n entries, then the false positive probability upper bound can be approximated as:

$$ (1 - e^\frac{-kn}{m})^k $$

It further states in the wiki article that:

The number of hash functions, k, must be a positive integer. Putting this constraint aside, for a given m and n, the value of k that minimizes the false positive probability is:

$$ k={\frac {m}{n}}\ln 2 $$

How can I find the k value that minimizes the false positive function ?

1

There are 1 best solutions below

0
On

I don't think that that is correct. Perhaps one of the $k$'s is wrong.

Anyway, here's my take.

Let $f(k) =(1 - e^\frac{-kn}{m})^k =(1 - e^{ak})^k $ where $a = -n/m$.

Then

$\begin{array}\\ f'(k) &=((1 - e^{ak})^k)'\\ &=(e^{k\ln(1 - e^{ak})})'\\ &=(k\ln(1 - e^{ak}))'(e^{k\ln(1 - e^{ak})}) \qquad\text{since } (e^{g(k)})' = g'(k)e^{g(k)}\\ &=(\ln(1 - e^{ak})+k(\ln(1 - e^{ak}))')(e^{k\ln(1 - e^{ak})})\\ &=(\ln(1 - e^{ak})+k\dfrac{(1 - e^{ak})'}{1 - e^{ak}})(1 - e^{ak})^k \qquad\text{since } (\ln(g))' = \dfrac{g'}{g}\\ &=(\ln(1 - e^{ak})+k\dfrac{- ae^{ak}}{1 - e^{ak}})(1 - e^{ak})^k\\ &=\dfrac{\ln(1 - e^{ak})(1 - e^{ak})- kae^{ak}}{1 - e^{ak}}(1 - e^{ak})^k\\ &=0 \qquad\text{when }(1 - e^{ak})\ln(1 - e^{ak})=kae^{ak}\\ \end{array} $

Let $1-e^{ak} = y$, so $k = \ln(1-y)/a $.

Then $(1 - e^{ak})\ln(1 - e^{ak})=kae^{ak} $ becomes $y\ln(y)=(\ln(1-y)/a)(1-y) $ or $y^y =(1-y)^{(1-y)/a} $.

I don't see how to solve this.