Let us define a hash function $$\begin{align*} H \colon A &\to B\\ a &\mapsto b \end{align*}$$
$$|A| \geq |B|$$ Assuming it is perfectly random (Hyp 1), we estimate the probability that it is not surjective. We construct a mapping $$ H(a_{i}) = \left\{\def\arraystretch{1.2}% \begin{array}{@{}c@{\quad}l@{}} b_{i} & \text{for } i \in \{1,...,|B|-1\} \\ \text{any value except }\beta & \text{for } i \in \{|B|, ..., |A|\} \\ \end{array}\right. $$
where all but one element in B have a pre-image in A (the second part of the mapping is not explicit).
The probability that we don't hit beta after one try is $$1 - 2^{-|B|}$$
We assume that events are independent (Hyp 2). The number of possible tries left is given by: $$|A|-|B|+1$$
We then derive the following expression:
$$P(\text{"there exists }\beta \in B \text{ with no pre-image"}) = (1-2^{-|B|})^{|A|-|B| + 1}$$
In the case of SHA256, we have
$$ |A| = 2^{2^{64}} \approx 2^{18446744073709551615}; |B| = 2^{256} $$
This gives us: $$(1-2^{-256})^{2^{18446744073709551615}-2^{256} + 1} \leq 10^{-5000000} \approx 0$$ where the upper-bound was computed here
- Is my formula correct?
- How could I estimate this assuming a given bias in the hash function? (i.e. some values are more likely than other)
I'm not following your argument (why "construct" a mapping, when you want random mappings?)
I'm going to start from scratch.
If $b=|B|$ and $a=|A|$ there are at most $b(b-1)^a$ maps that miss an element - we pick an element $\beta \in B$ in $b$ ways, then pick a map from $A$ to $B\setminus \{\beta\}$ in $(b-1)^a$ ways. This over-counts maps by counting multiple times any maps that miss multiple elements, but that will tend to be a relatively tiny amount, and we have at least an upper bound for the number of such maps.
So an upper bound for our probability is: $$p<\frac{b(b-1)^a}{b^a} = b\left(1-\frac{1}{b}\right)^a$$
(This isn't a great upper bound when $a,b$ are close, since $(1-1/b)^b\approx 1/e$. But when $a$ is much larger than $b$, this is actually a useful upper bound. The value $b(1-1/b)^a$ is actually the expected number of missed elements of $B$.)
Taking the log, we get $$\log p<\log b + a\log(1-1/b)<\log b - \frac{a}{b},$$
because $\log(1-x)<-x$ for $x\in[0,1)$.
So this means the probability is less than $be^{-a/b}$, which is indeed very tiny when $a=2^{2^{64}}, b=2^{256}$ - it is bounded above by $e^{-2^{63}}$.
If the probabilities are uneven, then let $p(\beta)$ be the probability of getting $\beta\in B$, and let $p_{min}=\min_{\beta\in B} p(b)$.
Then the above argument, refactored, will show that:
$$p<b(1-p_{min})^a \leq be^{-ap_{min}}$$
Depending on how badly varied to probabilities are, this might not be a great estimate, but the $p_{min}$ value would have to be very tiny to give a significant value to $a=2^{2^{64}}$ and $b=2^{256}$.