What is the purpose of the "diffusion" property of a hash function?

169 Views Asked by At

https://www.cs.cornell.edu/courses/cs312/2008sp/lectures/lec21.html says

For a hash table to work well, we want the hash function to have two properties:

Injection: for two keys k1 ≠ k2, the hash function should give different results h(k1) ≠ h(k2), with probability m-1/m.

Diffusion (stronger than injection): if k1 ≠ k2, knowing h(k1) gives no information about h(k2). For example, if k2 is exactly the same as k1, except for one bit, then every bit in h(k2) should change with 1/2 probability compared to h(k1). Knowing the bits of h(k1) does not give any information about the bits of h(k2).

What is the purpose of the "diffusion" property?

It doesn't reduce collision, correct?

Is "diffusion" needed to increase performance in open addressing hash table? But not in chaining hash table?

Is it needed in cryptographic hash function?

Thanks.

1

There are 1 best solutions below

0
On

Is it needed in cryptographic hash function?

Yes, it is needed for the cryptographic hash functions. It is more knowns as Strict avalanche criterion

The strict avalanche criterion (SAC) is a formalization of the avalanche effect. It is satisfied if, whenever a single input bit is complemented, each of the output bits changes with a 50% probability.

Consider that a cryptographic hash function candidate fails this. So, $k_1$ will give information about $k_2$. How one can use this?

Consider it is a 4-bit hash function where the first output bit doesn't change %50 with the first input bit. Let's say the relation is true with %80. Then this will

  • reduce the timing of the preimage attacks. For random input, we can immediately say that the input can validate or not with at least %80 probability. Or we simply generate only those on searching/attacking. If one can find more biased bits, it causes less time to attack.

  • And, similarly reducing the timing of the collision attack - first output is 1 then the first input is 1 with %80 probability. Concentrate on the rest...

This is a small demonstration. If there is a bias in the first component, probably more in the others even with some combinations. too. So, hash function designers either test their outputs or use well-established components to achieve this. Merkle-Damgard hash function can use block ciphers that have no related keys ( AES has it, so it is not a hermetic cipher and we don't see AES-based hash function). One can also, rely on the fact that even tiny round functions like TEA block cipher, can be secure with a high number of rounds like the SHA-1,2 series.

In cryptography, a tiny bias can be exploitable, and we don't want it. A very good example is in ECDSA two-bit bias doom!