Understand cross entropy's calculation

30 Views Asked by Bumbble Comm At 05 Apr 2026 - 10:24

Why he calculate predicted probability by using $\frac{1}{\text{bit length}^2}$, what does those calculated probability means? I added up all the predicted distribution and it is $93.x\%$ not $100\%$ but true distribution is $100\%$ why not match? what is the difference means? Is this difference KL-divergence?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 15 Aug 2023 - 3:12 BEST ANSWER

The probability values being used are coming from Claude Shannon's paper A Mathematical Theory of Communication mentioned at the beginning of the video, and are putting the data in terms of binary bits n, i.e.:

$\frac{1}{2^n}$

where they're base 2 because the data is being sent as binary data. He mentions in the video that the code doesn't use messages starting with 1111 so if you add up the values they don't sum to 100%.

If the cross entropy is greater than the entropy, this is what's called the KL or Kullback–Leibler divergence.

Understand cross entropy's calculation

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-THEORY

Related Questions in STATISTICS

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in SELF-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions