Assume a fixed set of message $D$ and an associated distribution for selecting each message $d_i$ such that the total probability $\sum_{i \in D} d_i = 1$. We create a cache with $M$ bits and $k$ hashes, where each message from $D$ gets mapped to $k$ unique entries between $[1,M]$ (i.e., a Bloom Filter). More formally, a Bloom filter is an array of $M$ bits initially set to 0. It employs $k$ hashes, each of which maps or hashes some set element to one of the $m$ array positions with a random distribution. These functions are denoted by $h_1,h_2, \cdots, h_k$. To add an element $x$ to the Bloom Filter, we compute $k$ hash functions to determine the $k$ positions in the bit array and set the bits to these positions to 1. In other words, $h_i(x) = 1$ for the $i$ selected.
We introduce a random variable $Y_j$ to represent the count of occupied slots in the Bloom Filter at the $j$-th iteration. Furthermore, we assume that once $Y_j$ reaches or exceeds a threshold $\sigma$, the Bloom Filter is reset such that the $M$ bits are set to 0 and the process concludes. I am interested in proving the following lemma:
Let $X_{j} = \mathbb{1}_{Y_j \geq \sigma\; | \; Y_{j-1} < \sigma}$ (i.e. the $j+1$-st draw to result in a reset given that it did not in the $j$-th draw), then $P(X_j = 1) \leq P(X_{j+1} = 1)$.
I am finding it challenging to prove this lemma. I have verified its validity via simulations and it seems to hold. I can also brute force the computation for very simple cases and it also works.
Please note that this situation differs from demonstrating that for $X'_{j} = \mathbb{1}_{Y_j \geq \sigma}$, the property $P(X'_j = 1) \leq P(X'_{j+1} = 1)$ is relatively simple to establish using a sample path argument. I mention this because I originally thought that the proof would be trivial using this argument. The main difficulty arises because, for any specified number of set bits, it's possible to construct sequences of messages of any length that have resulted in that filter being filled.
I would appreciate any help. A counter-example would also do the trick. Thanks!
Thanks for the question, Burdy. Perhaps I have misunderstood the definition of Bloom filters, but here is a simple counter-example. The first two random variables $Y_0, Y_1$ are in fact deterministic, namely $P(Y_0 = 0) = 1$ and $P(Y_1 = k) = 1$, where $k$ is the number of hashing functions. For any $1 \leq \sigma \leq k$, we therefore have $$P(X_1 = 1) = P(Y_0 < \sigma, Y_1 \geq \sigma) = 1$$ while $$P(X_2 = 1) = P(Y_1 < \sigma, Y_2 \geq \sigma) = 0$$ contradicting your conjecture. Something further needs to be assumed, perhaps on the value of $\sigma$. But even if, say, $\sigma > k$, we can construct the following counter-example for all $k < \sigma \leq 2k$.
Let $Z_j$ be the random variable corresponding to the message sampled at iteration $j$, and assume that all $n$ messages have equal probability of being sampled, namely, $P(Z_j = d_i) = 1/n$ for all $i,j$. Assume moreover that the hashing functions have disjoint support on distinct messages, namely, $h_k(d_i) \neq h_l(d_j)$ for all $i, j, k, l$. (I assume this defeats the purpose of a Bloom filter, but just for the sake of an extreme counter-example). Then, again, we have $P(Y_1 = k) = 1$ and $P(Y_2 = k) = P(Z_2 = Z_1) = 1/n$ and $P(Y_2 = 2k) = P(Z_2 \neq Z_1) = 1-1/n$. Then we obtain, for any $k < \sigma \leq 2k$, $$P(X_2 = 1) = P(Y_1 < \sigma, Y_2 \geq \sigma) = P(Y_2 = 2k) = 1-1/n$$ while $$P(X_3 = 1) = P(Y_2 < \sigma, Y_3 \geq \sigma) \leq P(Y_2 < \sigma) = P(Y_2 = k) = 1/n \,,$$ which contradicts your conjecture for all $n \geq 3$ since $1/n < 1-1/n$. I imagine this sort of construction could be extended to produce a counter-example for arbitrary $\sigma$. Perhaps you must therefore add an assumption on the support of different hashing functions?