As an initial disclaimer, I know virtually nothing about coding theory. I apologize in advice for incorrect or inappropriate terminology.
The problem space I'm exploring is ensuring the integrity of bulk data being transported "trans-continentally" across multiple R&E networks. The messages can be up to many gigabytes in size. I suspect that there is a very low incidence (< every 10^12 bytes transmitted) of byte swapping that can occur in transit that the 16bit 1's compliment checksum used by our data format, TCP, & IPv4 headers is insensitive to. The CRC schemes used by the various layer 2 segments in the path should be sensitive to byte swapping but I A) lack the knowledge to estimate the probability of corruption happening that could bypass both a CRC scheme and a 16 bit 1's compliment, B) can't rule out that the issue isn't occurring "in between" layer 2 segments, and C) still need an end-to-end mechanism to detect this type of event so I feel that trying to answer #A isn't necessarily helpful.
The current solution has been to independently transmit a md5 digest (on the same network path). However, it's occurred to me that the separate md5 digest is also vulnerable the byte swapping issue while in transit. In practice, it's generally possible to re-transmit either the message or the digest but I would like to consider the case when this isn't possible. This has lead me to the question: What is the most robust method of distinguishing between corruption in the digest and the message?
Is there a hash algorithm that produces a digest with inherent ECC properties?
In "practice", are there examples of ECCs being calculated to protect digests?
Is it reasonable to use two independent hash functions to produce digests of the same message?
- If so, how does one go about selecting multiple hash functions that are likely to have different [in]sensitivities?
Should I be considering using a form of ECC over the entire message instead of independent digests?
- If so, is there an ECC algorithm that both decodes faster than polynominal time and is independent of the original message such that it can be used without requiring an ECC decode on every read?
I realize this is a bit of a novel. Many thanks in advance to anyone willing to answer any of these questions.