Shannon Entropy

1.1k Views Asked by At

Shannon Entropy is a measure of randomness of a discrete random variable $X$ which can be defined as $$H(X) = −\sum_{i=0}^n P(x_i) \log(P(x_i))$$

where each $x_i$ is a possible value of $X$.

For the purposes of cryptographic investigation we consider $X$ to be the output of a random number generator, with $x_i$ denoting all the possible values it could take. By using logarithms in base $2$, we can think of entropy in terms of bits. Some key properties of entropy include:

• Data can have less entropy than the total number of bits, but not more.

• Different length blocks of data can contain the same amount of entropy.

• Identical length blocks of data can contain different amounts of entropy.

This is a very important concept in cryptography, particularly when considering cryptographic keys and passwords. For instance, an $8$ byte key, where the value of each byte is independent and uniformly distributed, can have up to $64$ bits of entropy. We can see this by applying the above formula to each byte to obtain

$$-\sum_{i=0}^{255} P(i) · \log(P(i))=-256·\left(\frac{1}{256}\cdot \log_2{2^{-8}}\right)=8,$$

and then summing this over all the bytes.

However, if those bytes can only take the values 1 or 0, then we see that the entropy in the key cannot exceed 8 bits, since $−\left(\left(\frac{1}{2}\log_2{2^{-1}}+\frac{1}{2}\log_2{2^{-1}}\right)+0+0+\cdots\right) = 1$

a. (i) If each byte in this $64$ bit key has exactly $1$ non-zero bit, and this bit can take any position in the byte with equal probability, how much entropy is there in the entire key?

(ii) The entropy in a stream is preserved under a 1-to-1 map. State a 1-to-1 map from the 64 bit space in (i) such that the resultant stream will have maximal entropy.

If we take 2 independent streams $A$ and $B$ of $64$ bits and concatenate them, then $A||B$ has the sum of the entropy in $A$ and $B$.

b. (i) If $A$ and $B$ are dependent, in that $B = F(A)$, where $F$ is an 1-to-1 map, how much entropy does $A||B$ have?

(ii) If $F$ is not invertible (say a cryptographic hash) how would that affect your answer? Justify your reasoning briefly.

In a cryptographic system, in order for a key to have full entropy, random data is taken from a random source and pooled into a much larger store than the size of the key to be generated. When a key is required, the large quantity of data is processed and compacted to produce the key using a non invertible function. This ensures that if there is relatively little entropy in the collected data, the resultant key will have higher entropy.

Consider the following example system. We have a timer which starts at zero, and has accuracy to $1$ microsecond ($10^{−6}$ seconds). We have a perfect random source which will reset the timer to $0$ at most $1$ second after the previous reset. The value of the timer at reset is read off as a $64$ bit value and added to our entropy pool. When a $64$ bit key is needed, all the $64$ bit timer values in the entropy pool are bitwise XORed.

Bitwise XORing 2 data streams will result in a stream with greater than or equal entropy to both the input streams.

c. i) How much entropy is in each $64$ bit timer value (use the fact that $2^{10} ≈ 1000$ to help with your estimate).

ii) How many of these values (assuming independence) would need to be concatenated to produce a value with $≥ 64$ bits of entropy.

iii) What is wrong with the suggested XOR combining function? Suggest a better function to produce a high entropy $64$ bit output.

1

There are 1 best solutions below

11
On BEST ANSWER

Partial answer:

Note that if there are $N$ equally likely possibilities then $H(X) = - \sum_{k=1}^N {1 \over N} \log_2 {1 \over N} = \log_2 N$.

a. (i): Each of the 8 bytes has exactly one bit set, so $N=8^8$, so $H(X) = \log_2 8^8 = 24$.

a. (ii): Since the entropy is preserved by a one to one map, then for any one to one map, the entropy will be the same. Hence we can choose the identity map, or if that is unsatisfactory, one could take the map that reverses the order of the bits in each byte.

b. (i), (ii): If $B=F(A)$, then each message has the form $x||F(x)$ and we see that $H(A||F(A)) = \sum_k p(x_k||F(x_k)) \log_2 p(x_k||F(x_k)) = \sum_k p(x_k) \log_2 p(x_k) = H(A)$. This holds for an arbitrary $F$.

c. (i): Since the timer has resolution $1\mu S$ and is reset at most every second, we can assume that every value in the range $0,...,10^6-1$ is equally likely, so from the first note, we have $H(A) = \log_2 10^6 \approx \log_2 2^{20} = 20$.

c. (ii): If the $A_k$ are independent, then $H(A_1 || \cdots || A_n) = H(A_1) + \cdots + H(A_n)$, so we look for $n$ that satisfies $20 n \ge 64$, which has a minimum value of $4$.

c. (iii): I don't have an answer for this. The only criticism I can think of is that if you request many numbers quickly, they will be the same in many cases, but I don't see how that would be resolved by using a different combining function. (Perhaps this is a pseudo-political question, see http://www.theregister.co.uk/2013/09/10/torvalds_on_rrrand_nsa_gchq/ :-).) Perhaps the questioner was looking for hashing as a response? (See https://blog.cloudflare.com/ensuring-randomness-with-linuxs-random-number-generator/ for more of same).