Bucket loads with hashtable

52 Views Asked by At

I have a math problem that derives from the frequency with which buckets are occupied in a hash table. Suppose we have a hash table that has as many buckets as there are nodes, i.e. the nominal load factor is 1.0.
I wrote a small program that simulates and outputs the distribution of the depths of the individual buckets.

#include <iostream>
#include <random>
#include <vector>
#include <map>
#include <iomanip>

using namespace std;

int main()
{
    constexpr size_t N_BUCKETS = 0x1000000;
    vector<size_t> buckets( N_BUCKETS, 0 );
    mt19937_64 mt;
    uniform_int_distribution<size_t> uidHash( 0, -1 );
    for( size_t i = N_BUCKETS; i--; )
        ++buckets[uidHash( mt ) % N_BUCKETS];
    using map_t = map<size_t, ptrdiff_t>;
    map_t mappedDepths;
    for( size_t load : buckets )
        ++mappedDepths[load];
    map_t::const_iterator
        mapBegin = mappedDepths.cbegin(),
        mapIt = mapBegin;
    for( ; mapIt != mappedDepths.cend(); ++mapIt )
    {
        double d = (double)mapIt->second;
        cout << mapIt->first << "\t" << setw( 7 ) << trunc( d / N_BUCKETS * 1'000'000.0 + 0.5 ) / 10'000.0 << "%";
        if( mapIt != mapBegin )
            cout << "\t" << trunc( mapBegin->second / d + 0.5 );
        cout << endl;
    }
}

The program outputs the following:

0       36.7897%
1        36.779%        1
2       18.4064%        2
3        6.1273%        6
4        1.5309%        24
5        0.3074%        120
6        0.0508%        724
7        0.0076%        4856
8        0.0008%        43467
9        0.0001%        342905
10            0%        2.05743e+06
11            0%        6.17228e+06

The first column shows the bucket depth, the second how many nodes there are that have this depth. The third column indicates how large the relation of the bucket depth zero is compared to the current column. This reveals an interesting phenomenon: this specification is the factorial of the bucket depth!
I also wonder why there are as many buckets with no entry as buckets with one entry and why that's about 36%.
Maybe someone here will see the mathematical logic behind it.

1

There are 1 best solutions below

0
On

The expected number of items in slot $i$ is $1$, regardless of $i$. For a large number of slots, the distribution of the number of items in a slot is close to a Poisson distribution with $\lambda = 1$. So, for example, the probability that a slot is empty is $$P(X=0) = e^{-\lambda} = e^{-1} = 0.367879$$

The reason you see the factorials in the third column is that $$\frac{P(X=0)}{P(X=n)} = \frac{e^{-\lambda}}{(1/n!) \lambda^n e^{-\lambda}} = n!$$