I want to use a cryptographic hash-function that allows calculating the hash of some data from the hashes of the parts of that data.
This question was already asked on SO - "Additive" hash function, but without the requirement that the hash-function function is cryptographically strong.
Mathematically speaking I want to find a hash-function $h$ and a hash-hash function $hh$ such that:
For every two sequences $d_1$ (length $n_1$) and $d_2$ (length $n_2$), the hash code of the combined sequence $d$ (length $n_1+n_2$) $d$ can be calculated both directly (hash function $h$) and using only hashes of $d_1$ and $d_2$ (hash-hash function $hh$): $h(d) = hh(h(d1), h(d2))$
Another take: For any sequence $d$ of length $n$, its hash $h(d)$ can be calculated by breaking it in any position $n_1<n$ into two sequences $d_1$ (length $n_1$) and $d_2$ (length $n_2 = n - n_1$), taking the hashes of the sub-sequences - $h_1=h(d_1)$, $h_2=h(d_2)$ and calculating the hash of full sequence using only the sub-sequence hashes: $h(d) = hh(h_1, h_2)$
Trivial solutions like $h(d)=d$ or $XOR$ satisfy the above requirements. Many checksums like CRC32 satisfy it too. But I'm looking for an irreversible cryptographic hash function.
I seek:
- A simplified "mathematical" property that the functions should hold to satisfy my criteria. I guess, some function must be associative, but it's a bit hard for me to think about associativity when the inputs have variable length.
- A cryptographic hash function that satisfies my criteria (it can be build on top of some common cryptographic hash-function) OR a proof that it cannot exist.
P.S. I intended to use such hash-function for a version-control system to allow addressing parts of files.
That doesn't work, as it would allow to replace parts of the message rather easily. Cryptographic hash functions are designed to not allow this.