When using the $\chi^2$ statistic, if the errors (difference between observed and expected) are too low, the resulting statistic will be low. If we repeat the experiment several times with similar errors, the statistic values will remain low.
However, the actual PDF function of the Chi-squared distribution has a mean of $k$ and a mode of $max(0, k - 2)$.
I'm trying to find a way to set the errors so that the distribution of all the $\chi^2$ statistic values I generate follow that distribution.
For example, let's say I have 8 frequencies, and they must sum up to 100. Let's say the expected frequencies are all equal, that is 12.5.
The way I'm trying to generate a frequency list is by starting with 8 elements of 12.5, and to each, add an error generated using the normal distribution using mean $0$ and standard deviation $\sqrt{12.5}$. This actually works, and the statistic values are distributed the same way as the function (which I plotted using Scipy on Python).
However, I'd like an analytical explanation as to why this works. I got to this value with a bit of trial and error.
My guess (not sure if it's correct):
If we take a look at every term that's being summed up in the $\chi^2$ statistic, we get (note that all expected values $E_i$ are equal in this case):
$$ \frac{(O_i - E)^2}{E}\\ = \frac{(O_i - E)^2}{\sqrt{E}^2}\\ = (\frac{O_i - E}{\sqrt{E}})^2\\ $$
Since $O_i$ was generated using $E + \lambda$ using $\lambda \sim N(0, \sqrt{E})$, then we can write:
$$ = (\frac{(E + \lambda) - E}{\sqrt{E}})^2\\ = (\frac{\lambda}{\sqrt{E}})^2\\ $$
Dividing by the standard deviation is the process of standardizing a normal variable (we don't need to subtract the mean since it's 0). So this means the expression $\frac{\lambda}{\sqrt{E}}$ is distributed under $Z$ (standard normal distribution).
This fits the Chi-squared definition, which says $\sum_{i=1}^k Z_i^2 \sim \chi^2(k)$. This means the generated statistic values will match the PDF plot when the errors are generated that way.
