Screenshot from this video.
Why he calculate predicted probability by using $\frac{1}{\text{bit length}^2}$, what does those calculated probability means? I added up all the predicted distribution and it is $93.x\%$ not $100\%$ but true distribution is $100\%$ why not match? what is the difference means? Is this difference KL-divergence?

The probability values being used are coming from Claude Shannon's paper A Mathematical Theory of Communication mentioned at the beginning of the video, and are putting the data in terms of binary bits n, i.e.:
$\frac{1}{2^n}$
where they're base 2 because the data is being sent as binary data. He mentions in the video that the code doesn't use messages starting with
1111so if you add up the values they don't sum to 100%.If the cross entropy is greater than the entropy, this is what's called the KL or Kullback–Leibler divergence.