Understand cross entropy's calculation

30 Views Asked by At

Screenshot from this video.

Why he calculate predicted probability by using $\frac{1}{\text{bit length}^2}$, what does those calculated probability means? I added up all the predicted distribution and it is $93.x\%$ not $100\%$ but true distribution is $100\%$ why not match? what is the difference means? Is this difference KL-divergence?

enter image description here

1

There are 1 best solutions below

1
On BEST ANSWER

The probability values being used are coming from Claude Shannon's paper A Mathematical Theory of Communication mentioned at the beginning of the video, and are putting the data in terms of binary bits n, i.e.:

$\frac{1}{2^n}$

where they're base 2 because the data is being sent as binary data. He mentions in the video that the code doesn't use messages starting with 1111 so if you add up the values they don't sum to 100%.

If the cross entropy is greater than the entropy, this is what's called the KL or Kullback–Leibler divergence.