Deciphering XOR-encrypted text with frequency analysis.

149 Views Asked by At

I know nothing about cryptography, so I appreciate simple explanations.
Assume each letter of an English Alphabet is represented by a 5-bit string. I read that XOR-encrypted text can be deciphered using frequency analysis. I suppose the accuracy of the frequency analysis depends on the amount of encrypted text. My guess is that more text yields better results.

So, let's assume we have $n$ unique random English words with a length of at most $m$ characters. We convert these words to binary and then use a single secret binary string $s$ that is XORed to each individual binary word.

My questions are:

  1. Is there a runtime complexity analysis that depends on $n$ and $m$ for an algorithm that recovers the secret string $s$?
  2. Is there a simple error analysis (some bounds on the accuracy or some uncertainty measure) of the recovered secret key $s$ given $n$ unique words of length $m$?
1

There are 1 best solutions below

0
On BEST ANSWER

One simple approach is to ignore the word structure and assume the letters are drawn randomly with a probability distribution of the letters in English. There are only $32$ choices for $s$. Try each one and do a chi-squared test on the resulting set of letters. It doesn't take too many letters to "guarantee" that there will be one clearly better than all the rest. The reason that guarantee is in scare quotes is that whoever made the text may be nefarious. The substitution cyphers in the newspaper often had no e's, for example. If you are given that each batch of characters is a word you can look them up in a dictionary. Much encryption respaces the text to five letter blocks so you don't initially know where the word breaks are. My simple approach is not fooled by that, but it loses information if the respacing is not done.