In a corpus of text the expected letter frequency might be:
- e = 30%
- t = 30%
- a = 20%
- o = 20%
Actual recorded frequency:
- e = 90%
- t = 10%
- a = 0%
- o = 0%
I want to know the OVERALL percentage difference (not the average difference) between the expected frequencies and the actual frequencies, how would this be done? e.g: "Overall there is a 35% difference between what we expected our frequency count to be and our actual frequencies".
90% of the letters are e, but we expected only 30% of them to be e. So there is a 60% difference.
But this doesn't seem right to me, if it was expected to be 30% but we've got 90%, then that is 200% difference isn't it?!
It really depends on your definition of "percent difference." Some people use this to refer to the point difference between two percentages. (That is, $90\% - 30\% = 60\% \text{ percent difference}$).
However, other people refer to percent difference as a specific fraction: $$\Delta\% = \frac{x - x_0}{x_0}\times100\%$$ (where $x$ is a given reading, and $x_0$ is the reference reading).
It depends on your specific application which definition you should use.