Problem:
Given P persons (say 1000) distribued among N rooms (say 50) using a discrete random distribution function.
What would be the probability of a room having at least K persons (say 30 or more persons in the same room) ? document
Context:
I'm trying to store large data sets (1010 entries) in smaller capacity databases (maximum 108 documents).
The idea is to group multiple entries is the same DB document using a hash function in such a way that every DB documents would store 100 data entries on average.
BUT, I'm affrayed of having DB documents with too many entries say 1000+ in the same DB (some kind of the birthdays paradox with multiple collisions)
![Chernoff's inequality vs Markov's inequality for K from 20 to 100]](https://i.stack.imgur.com/uSBvM.png)
Let's try an easier question: what is the expected number of rooms with $K$ or more occupants?
Considering a single room, we can use a binomial distribution:
So in your first example with $P=1000, N=50, K=30$ you get the probability of at least $K$ occupants of that room being about $0.0207$, while with $P=10^{10}, N=10^8, K=10^3$ you get the probability of at least $K$ occupants of that room being about $10^{-611}$
There are $N$ rooms, so the expected number of rooms with at least $K$ occupants is $N$ times the previously calculated probabilities, which in your first example is about $1.035$ and in your second is about $10^{-603}$. You can find the power of $10$ in the second case using R with
You can be sure that the probability of any room having at least $K$ occupants is no more than this, which is not particularly helpful for the first example but is for the second since the expectation is much smaller than $1$ and so the probability is still about $10^{-603}$.
If the expected number $E$ of rooms having at least $K$ occupants is much smaller than $N$ and $N$ is large, then perhaps you could try as a Poisson approximation for the probability of any room having at least $K$ occupants the value $1-e^{-E}$, which for the first example is about $0.645$, perhaps a little low since a simulation suggested a figure closer to about $0.676$.