Why are cryptographic hash functions apparently easier to create than lossless compression schemes?

186 Views Asked by At

Both cryptographic hash functions and lossless compression schemes map certain long sequences of bits to shorter sequences of bits. Theoretically, lossless compressions schemes are injective, while hash functions are not.

However, by design, cryptographic hash functions are considered injective if their domain is restricted to only sequences of bits useful to humans. For example, this answer on stack overflow creatively describes the chance of an SHA-256 collision. Practically, if all of the files ever generated by humans and ever to be generated by humans were hashed with SHA-256, there would almost certainly be no collisions.

State of the art general lossless compression algorithms can be expected to compress a general file (i.e. a sequence of bits useful to a human) to about 0.8 times the size on average.

SHA-256 compresses every file to just 256 bits. No existing lossless compression algorithm can compress every file to just 256 bits. The only problem is that it's not easily reversible, also by design.

So why is it apparently much harder to make a reversible, practically injective function than an irreversible, practically injective function from long sequences of bits to short sequences of bits?