The hypothesized letter-frequency values below are taken from Pavel Micka's website, which cites Robert Lewand's Cryptological Mathematics.
The Actual Appearances were manually obtained upon reading "A Study in Scarlet" by Arthur Conan Doyle.
Use the Chi-squared statistic to test whether the hypothesized frequencies are correct.
This will be a one-tailed test by design. Use a $5\%$ chance of a Type I error.
Hypothesized Actual
Letter Frequency Appearances
a 0.08167 19890
b 0.01492 1701
c 0.02782 5556
d 0.04253 10578
e 0.12703 29479
f 0.02228 4252
g 0.02015 5601
h 0.06094 8663
i 0.06966 9267
j 0.00153 276
k 0.00772 2244
l 0.04025 9458
m 0.02406 7184
n 0.06749 13765
o 0.07507 16986
p 0.01929 5887
q 0.00095 153
r 0.05987 7984
s 0.06327 11181
t 0.09056 27087
u 0.02758 5277
v 0.00978 3031
w 0.02360 7670
x 0.00150 200
y 0.01974 3396
z 0.00074 159
Attempted Solution:
I added each of the actual appearances to get $216{,}925$. Then I multiplied all the hypothesized frequencies by that number. I then used the formula Chi-squared Statistic $= \sum$$(O-E)^2\over{E}$ to get $14598.17$. The critical value I found from the table was $37.652$, thus rejecting the null hypothesis.
I was wondering if I did this correctly. I suspect I did not because of how much larger my Chi-square statistic was than my critical value.
Any help would be much appreciated.
EDIT:
I think I need to square root my chi-square statistic to get $120.8$. That is still a lot bigger than $37.652$, my critical value.
If you used 'Actual appearances' for $O$ in the formula for the chi-squared statistic $Q$ and $n$ times 'Hypothesized frequency' for $E$, then your method of computing $Q$ is correct.
You have $k = 26$ 'categories' (letters of the alphabet) so $Q \stackrel{aprx}{\sim}\mathsf{Chisq}(\nu = k - 1).$ So for a test at the 5% level, the critical value is 37.65248, as you say.
With a large amount of data, it not unusual to get a very large value of $Q$, indicating a very bad fit of the observed to expected frequencies.
However, I tried putting your data into Minitab 17 software. When I cut/pasted from your data table, data in the rows for letters 'q' and 'u' were missing. (Maybe there are hidden, non-printing, characters in your data table that prevented transfer.) I entered these two rows into the Minitab worksheet by hand. Then as a check I added the hypothesized frequencies and got a total of 1 and the actual appearances to get 216,925, which agrees with your computation. Also, I got the same $Q$ you did. [You do not need to take the square root.]
So my computations agree with yours, and the data do not fit the hypothetical frequencies.
Particularly large 'contributions' to $Q$ come from the letters, h, i, t, and w. (Too many h's and i's; not enough t's and w's for a good match). There may be some quirk in the subject matter of the Doyle story or in his writing style (over or under usage of common words such as the, in, it, at, to, with, that, which and so on) that accounts for the discrepancy.