There are $5$ billion unique email addresses in the World. If I created a database containing their SHA-256 hashes, how any unique hashes would we expect that database to contain?
By my crude methods, a SHA-256 hash is $256$ bits long so there are $k=2^{256}$ possible hashes. Therefore the probability of any two email addresses chosen at random having the same hash is roughly $p=\frac{1}{k}=2^{-256}$.
Let $n=5\times10^9$ be the number of distinct email addresses.
Therefore the expected number of duplicates of any one chosen hashed email address is given by $E=np=4.3\times10^{-68}$
Now suppose we need to check all hashed email addresses in turn for duplicates we will expect to find $n^2p=2.16\times10^{-58}$ email addresses having a duplicate hash. We would then roughly halve this since any duplicates would have been found twice (and by the smallness of the probability, the likelihood of there existing triplets does not materially affect the result).
So in intuitive terms, we would need to inhabit $10^{58}$ planet Earths to the same density as we do this one, before the expected number of duplicate hashed email addresses reached $1$.
Therefore we can be extremely certain that every hash will be unique. Is this correct, and can this estimate be refined or improved? A key (reasonable?) assumption I've made is that the common structure which email addresses share does not impact the likelihood of hashes matching.
Your reasoning is basically right.
You can compute two related but different things.
First, the expected number of coincidences (or collisions). This is given by
$$C= \frac{n (n-1)}{2} \frac{1}{k} \approx \frac{n^2}{2k} $$
Then the probability that there is at least a collission. This is
$$P = 1- \frac{k!}{n!}\frac{1}{k^n} \approx 1 - \exp \left( -\frac{n^2}{2k}\right)$$
Both numbers almost coincide if the probability of collissions is small. For details see the birthday problem.