Where does the bias come from in LogLog and HyperLoglog?

209 Views Asked by Bumbble Comm At 25 Mar 2026 - 11:10

In the original LogLog paper from P. Flajolet, the cardinality estimation function is presented as

$$E := \alpha_m m2^{1 \over m} {^{\sum_{j} M^{(j)}}}$$

Where $\alpha$ is needed to correct the systematic bias in the asymptotic limit which cause overestimates.

However I am having trouble identifying where does this bias come from.

My intuition tells me this is related to the fact we retain the $max$ count of leading zeros, which is probably skewed to higher values.

Wikipedia states this is due to the risk of hash collisions, but that seems wrong as hash collision would underestimate not overestimates (this is actually an improvement in the HyperLogLog paper)

Has anyone a more explicit definition (ideally in layman terms) or reason explaining this bias?

Thank you!

Original Q&A

Where does the bias come from in LogLog and HyperLoglog?

Related Questions in PROBABILITY-THEORY

Related Questions in PROBABILITY-LIMIT-THEOREMS

Trending Questions

Popular # Hahtags

Popular Questions