Suppose you have a hashmap where you store values. The hashmap is of size m and supports chaining. The hash function is simple uniform hashing assumption. I understand that the average length of a chain is the load factor. But if you were storing values from a set into the hashmap, what is the probability that when you are storing value x, there is another value x already in the hashmap?
I know, that if x already exists, it would map to the same linked list in the hashmap. But how do you go further?
If a size of hash table is $m$, a size of your set is $n$, and you are picking the numbers from this set by uniform distribution, this is a probability that one random element $z$, which is in $N$, is in $M$, where $M \subseteq M$. This is sure $m \over n$. If you want to know a conditional probability, where you know already $x$ is in a certain chain, you need to determine the probability that the size of a full class of $x$ is equal to some $|class|$, when you know already, $|class| \ge |chain|$. $P(|class| = n ~\backslash~ |class| \ge |chain|) = {P(|class| = n) \over P(|class| \ge |chain|)}$, if $|class| \ge |chain|$, and $0$ if $|class| < |chain|$. Note that |class| and |chain| are here constants. The full probability that $x$ is in chain already is $$\sum_{C = |chain|}^{n}{P(x~in~chain~\backslash~|class|=C)P(|class|=C ~\backslash~ |class| \ge |chain|)} = \\ =\sum_{C = |chain|}^{n}({|chain|\over C}\cdot{P(|class| = C) \over P(|class| \ge |chain|)}) ~=~ \sum_{C = |chain|}^{n}({|chain|\over C}\cdot{{C_{n}^{C}p^{C}(1-p)^{n-C}} \over P(|class| \ge |chain|)}) ~=\\ =~ ({|chain| \over P(|class| \ge |chain|)}) \cdot {\sum_{C = |chain|}^{n} {{C_{n}^{C}p^{C}(1-p)^{n-C}} \over {C}}}$$ Here $|class|$ is already a binomial random variable, $|chain|$ is a constant you know, and $1 \over p$ is a number of classes. This function can be sure approximated.