Constructive proof of secure hash collision

111 Views Asked by At

The SHA3-512 hash algorithm can be considered as a map h from the set F of "files" (finite sequences of octets, each octet being an integer in the range 0 to 255) to the set H of "hashes" (sequences of 64 octets). I'm curious about proving the non-injectivity of h:

Proposition P: there exist a, b, such that ab and h(a)=h(b)

P seems obviously true to me, because F is infinite and H is finite. However, nobody knows of any particular a,b pair. Given that, is there a constructive proof of P?

2

There are 2 best solutions below

5
On BEST ANSWER

Some people summarize constructive proofs as "proofs that directly provide a specific example". This is not exactly correct: a constructive proof is one that allows you to produce a specific example after putting in some well-defined work: a constructive proof does not need to give an object with the desired property, just an unambiguous recipe/algorithm for finding an object with the desired property (aside: in terms of proof theory, this additional work amounts to performing cut-elimination). With this in mind, we can refine your question into two questions.

1. Is there a constructive proof that directly provides a specific example $(a,b)$ with $h(a) = h(b)$ and $a \neq b$?

As of 23 April 2020, we don't know such a proof.

2. Is there a constructive proof that provides an algorithm for finding such $(a,b)$?

Sure, there is (although the best one that we know as of 23 April 2020 is quite slow)! Here's such an algorithm. Construct the first $2^{512} + 1$ files, and compute their hashes. Look and see if there are two with the same hash. If so, you can return those two as your result $(a,b)$ and you are done. If not, then you have just constructed $2^{512} + 1$ different elements of the $2^{512}$-element set $H$, so $2^{512} + 1 \leq 2^{512}$. This is a contradiction: everything follows from a contradiction, including the claim that any two files have the same hash, so you can just return your two favorite files $(x,y)$ as your result (this just illustrates the logical structure of the argument: you won't ever find yourself in this second, contradictory case, as you will necessarily find two files with the same hash).



Does this argument rely on the law of excluded middle? After all, we do a case analysis, our result depends on whether there are two elements with the same hash in our list, i.e. $(\exists a < b < 2^{512} + 1. h(a) = h(b)) \vee (\forall a < b < 2^{512} + 1. h(a) \neq h(b))$, where $h(n)$ denotes the hash of the $n$th file in our list.

Fortunately, $(\exists a < b < 2^{512} + 1. h(a) = h(b)) \vee (\forall a < b < 2^{512} + 1. h(a) \neq h(b))$ is itself a constructively provable theorem. This is intuitively obvious: there's an algorithm that decides which of the two cases holds. Nonetheless, anticipating further objections, I now give an excruciatingly detailed proof of this fact.

We can prove something stronger: $\forall k. (\exists a < b < k. h(a) = h(b)) \vee (\forall a < b < k. h(a) \neq h(b))$ by induction on $k$. Here's how:

  • If $k=0$, then the second disjunct holds vacuously and we are done.
  • Otherwise $k=\ell + 1$ for some $\ell \in \mathbb{N}$, and by inductive hypothesis we have $(\exists a < b < \ell. h(a) = h(b)) \vee (\forall a < b < \ell. h(a) \neq h(b))$. We do a proof by cases.
    • In the first case, we have $\exists a < b < \ell. h(a) = h(b)$, so $\exists a, b < \ell + 1. h(a) = h(b)$ holds a fortiori and we are done.
    • In the second case, we have $forall a < b < \ell. h(a) \neq h(b)$. We do a case analysis on whether $\exists a < \ell. h(a) = h(\ell)$ or $\forall a < \ell. h(a) \neq h(\ell)$.
      • In the first case, $\exists a, b < \ell + 1. h(a) = h(b)$ holds with $b = \ell$.
      • Otherwise $\forall a < b < \ell + 1. h(a) \neq h(b)$ holds since either $b = \ell$, in which case $\forall a < \ell. h(a) \neq h(\ell)$ applies, or $b < \ell$, in which case the inductive hypothesis $\forall a < b < \ell. h(a) \neq h(b)$ applies.

At this point we can ask the question: does this previous argument rely on the law of excluded middle? After all, we do a case analysis on $(\exists a < \ell. h(a) = h(\ell)) \vee (\forall a < \ell. h(a) \neq h(\ell))$. But this again has a constructive proof: there's an algorithm that, given $\ell$, decides which of the disjuncts holds. Nonetheless, I now give an excruciatingly detailed proof.

I shall prove $\forall k. (\exists a < k. h(a) = x) \vee (\forall a < k. h(a) \neq x)$ by induction on $k$ for any fixed hash $x$. Here's how:

  • If $k=0$ then the second disjunct holds vacuously and we are done.
  • Otherwise $k=\ell+1$ and we have $(\exists a < \ell. h(a) = x) \vee (\forall a < \ell. h(a) \neq x)$ by inductive hypothesis.
    • If the first disjunct holds, we have $\exists a < \ell. h(a) = x)$, so $\exists a < \ell + 1. h(a) = x$ holds a fortiori.
    • Otherwise, we have $\forall a < \ell. h(a) \neq x$. We check whether $h(\ell) = x$ or $h(\ell) \neq x$.
      • In the first case, we have $\exists a < \ell + 1. h(a) = x$, with $a = \ell$.
      • In the second case, $\forall a < \ell + 1. h(a) \neq x$ since either $a < \ell$ and the inductive hypothesis applies, or $a = \ell$ and $h(\ell) \neq x$ applies.

At this point we can ask the question: does this previous argument rely on the law of excluded middle? After all, we do case analysis on $h(\ell) = x$. But this works because hashes are just whole numbers, and you can decide if two whole numbers are equal by just comparing them. More formally, $\forall x,y \in \mathbb{N}. x = y \vee x \neq y$ is a theorem of constructive arithmetic, but one that I now leave as an exercise for the reader.

1
On

The constructive proof involves an algorithm that constructs $2^{512}+1$ distinct files, say $F_0, \ldots F_{2^{512}+1}$, calculates their SHA3-512 hash values and then searches for $i \neq j$ such that $h(F_i) = h(F_j)$. This algorithm is guaranteed to terminate by the pigeon-hole principle, but the running time is infeasible in the real world.

A constructive proof giving an algorithm that had a feasible running time would expose a flaw in the design of the SHA3-513 algorithm. We hope that no such algorithm exists, but, as happens so often in cryptographical questions of this kind, we have no proof of that.