An "additive" cryptographic hash-function that allows calculating the hash of some data from the hashes of the parts of that data.

1.1k Views Asked by At

I want to use a cryptographic hash-function that allows calculating the hash of some data from the hashes of the parts of that data.

This question was already asked on SO - "Additive" hash function, but without the requirement that the hash-function function is cryptographically strong.

Mathematically speaking I want to find a hash-function $h$ and a hash-hash function $hh$ such that:

For every two sequences $d_1$ (length $n_1$) and $d_2$ (length $n_2$), the hash code of the combined sequence $d$ (length $n_1+n_2$) $d$ can be calculated both directly (hash function $h$) and using only hashes of $d_1$ and $d_2$ (hash-hash function $hh$): $h(d) = hh(h(d1), h(d2))$

Another take: For any sequence $d$ of length $n$, its hash $h(d)$ can be calculated by breaking it in any position $n_1<n$ into two sequences $d_1$ (length $n_1$) and $d_2$ (length $n_2 = n - n_1$), taking the hashes of the sub-sequences - $h_1=h(d_1)$, $h_2=h(d_2)$ and calculating the hash of full sequence using only the sub-sequence hashes: $h(d) = hh(h_1, h_2)$

Trivial solutions like $h(d)=d$ or $XOR$ satisfy the above requirements. Many checksums like CRC32 satisfy it too. But I'm looking for an irreversible cryptographic hash function.

I seek:

  1. A simplified "mathematical" property that the functions should hold to satisfy my criteria. I guess, some function must be associative, but it's a bit hard for me to think about associativity when the inputs have variable length.
  2. A cryptographic hash function that satisfies my criteria (it can be build on top of some common cryptographic hash-function) OR a proof that it cannot exist.

P.S. I intended to use such hash-function for a version-control system to allow addressing parts of files.

3

There are 3 best solutions below

2
On

That doesn't work, as it would allow to replace parts of the message rather easily. Cryptographic hash functions are designed to not allow this.

4
On

Just a few observations and formalizations. So we have some (finite) set $\Omega$, and to find a function $$ h \,:\, \Omega^\mathbb{N}_\textrm{fin} \to [0,N] \text{,} $$ where $\Omega^\mathbb{N}_\textrm{fin}$ means the set set of all finite sequences of elements from $\Omega$. We additional want a has-combining operator $$ \oplus \,:\, [0,N] \times [0,N] \to [0,N] $$ such that $$ h(\omega_1,\ldots,\omega_k,\ldots,\omega_n) = h(\omega_1,\ldots,\omega_k) \oplus h(\omega_{k+1},\ldots,\omega_n) $$ for all sequences $\omega_1,\ldots,\omega_n$ and all $1 \leq k < n$.

In particular, that implies that $$\begin{eqnarray} h(\omega_1,\omega_2,\omega_3) &=& h(\omega_1) &\oplus& h(\omega_2,\omega_3) &=& h(\omega_1) &\oplus& \bigg(h(\omega_2) &\oplus& h(\omega_3)\bigg) \\ h(\omega_1,\omega_2,\omega_3) &=& h(\omega_1,\omega_2) &\oplus& h(\omega_3) &=& \bigg(h(\omega_1) &\oplus& h(\omega_2)\bigg) &\oplus& h(\omega_3) \end{eqnarray}$$ meaning that $\oplus$ is associative (i.e., we don't have to worry about parenthesis when chaining multiple $\oplus$ invocations). It follows that for every sequence $\omega_1,\ldots,\omega_n$, $$ h(\omega_1,\ldots,\omega_n) = h(\omega_1) \oplus \ldots \oplus h(\omega_n) \text{.} $$

That's a very strong property, and certainly seems to make an attacker's life unnecessary easy.

0
On

There is a talk from FP-Syd titles "FP-Syd - Hashing with SL2: Hash Functions as Monoid Homomorphisms by Sam Reis (Nov 2015)" on YouTube which discusses hashes with this property if I remember correctly, and they're using it for exactly the use case you mention, "...version-control[just file storage] system to allow addressing parts of file". There is an implementation in Haskell in the hwsl2 package.