i have this structure, representing the percentwise distribution of the usage of letters in the english language:
letterFrequency = {
'E' : 12.0,
'T' : 9.10,
'A' : 8.12,
'O' : 7.68,
'I' : 7.31,
'N' : 6.95,
'S' : 6.28,
'R' : 6.02,
'H' : 5.92,
'D' : 4.32,
'L' : 3.98,
'U' : 2.88,
'C' : 2.71,
'M' : 2.61,
'F' : 2.30,
'Y' : 2.11,
'W' : 2.09,
'G' : 2.03,
'P' : 1.82,
'B' : 1.49,
'V' : 1.11,
'K' : 0.69,
'X' : 0.17,
'Q' : 0.11,
'J' : 0.10,
'Z' : 0.07 }
Fairly simple. The thing that I am having difficulty understanding, is that in this book (https://eclass.uniwa.gr/modules/document/file.php/CSCYB105/Reading%20Material/[Jonathan_Katz%2C_Yehuda_Lindell]_Introduction_to_Mo(2nd).pdf?fbclid=IwAR1hf1OTKAhf4ZHvswERpcZ3ZVDQMxHuP2FWRg2tvlo3-tUMSdFIPLWZR_8) [introduction to modern cryptography page 11] the following claim about this distribution is made.
Let p_i,with 0 ≤ p_i ≤ 1, denote thefrequency of theith letter in normal English text Calculation
using Figure 1.3 gives:
$ \sum_{i=0}^{25}p_i^2\approx0.065 $
This makes absolutely no sense at all to me, when i do the calculation, and sum over every frequency lifted to a power of two I get 646.6717. What am I doing wrong?
The table gives percents, so as probabilities each must be divided by $100.$ So if you don't do that first, the sum of squares that way being $646.6717,$ then you still need to divide that by $10000$ to get $.06466717$ which rounds to their number $.065.$
[since your raw numbers are $100$ times the right ones, their squares are $10000$ times the right frequency squares.]