I am interested in statistics of pseudo random functions. In particular, in the following. Given:
- a pseudo random function $f : S \to S$ (with $s = |S|$),
- and a set $D \subseteq S$ (with $n = |D|$) of arbitrary elements,
what is the expected size of the preimage of $D$ through $f$? In other words, $\mathbb{E}[|\{x \in S \mid f(x) \in D\}|]$.
The reverse problem is easier (to me): given the same $f$ and a set $C \subseteq S$ (with $m = |C|$) of arbitrary elements what is the expected size of the image of $C$ through $f$?
We have $\mathbb{E}[f(C)] = s(1-(1-1/s)^m) \approx s(1-\exp(-m/s))$. See e.g. this question for argument on a very similar development.
Curious if we can use a similar reasoning for the preimage scenario.
My attempt so far:
I start by computing the probability that the preimage size is a given size: \begin{align*} \Pr(|C| = m \mid |D| = n) &= \Pr(n \text{ points all map to a different image}) \times \Pr(\text{the other } m-n \text{ points map to an existing image}) \times {m \choose n}\\ &= \left(\prod_{i=0}^{n-1} 1-\frac i s\right) (n/s)^{m-n} {m \choose n}\\ \Pr(|C| < n \mid |D| = n) &= 0 \end{align*}
Then compute the weighted average for the expected value: \begin{align*} \mathbb{E}(|C| \mid |D| = n) &= \sum_{m=0}^s m \Pr(|C| = m \mid |D| = n)\\ &= \sum_{m=n}^s m \left(\prod_{i=0}^{n-1} 1-\frac i s\right) (n/s)^{m-n} {m \choose n}\\ &= \left(\prod_{i=0}^{n-1} 1-\frac i s\right) \sum_{m=n}^s m (n/s)^{m-n} {m \choose n} \end{align*}
However the result is unweildy (in particular difficult to compute if $s$ is large), and I am not sure of its correctness.