Derivation 9.97 in Jaynes' Probability Theory

420 Views Asked by At

In page 298 of Jaynes' Probability Theory: the Logic of Science, equation (9.97), Jaynes says:

We expect that, if hypothesis $H$ is true, then $n_k$ will be close to $np_k$, in the sense that the difference $|n_k-np_k|$ will grow with $n$ only as $\sqrt n$. Call this 'condition A'. Then using the expansion $\log(x) = (x-1)-(x-1)^2/2+...$, we find that

$$\sum_{k=1}^mn_k\log\left[\frac{n_k}{np_k}\right] = \frac 1 2\sum_k\frac{(n_k-np_k)^2}{np_k} + O\left(\frac 1 {\sqrt n}\right)$$

In the above, $\sum_kp_k = 1$ and $\sum_kn_k=n$, where $n$ is the total number of trials in a series of Bernoulli trials and $n_k$ is the number of trials that had outcome $k$.

I'd like to know how he got from that expansion to the quoted equation, especially given that the squared term in the expansion has negative sign.

--EDIT to add info--

Jaynes' nomenclature confused me a bit. He calls hypotheses of the type "there are $m$ possible outcomes of an experiment, each being observed with probability $p_k$ independent of previous or future repetitions of that experiment" the "Bernoulli class." $n_k$ is the number of performed experiments that had outcome $k$, $n$ is the total number of experiments performed. But that is all that's defined by Jaynes, and all that's known about the hypotheses.

(A thought occurs: what if Jaynes meant that condition A is that $|n_k -np_k| \approx O(1/\sqrt n)$? I haven't explored this possibility to know whether it makes sense.)

2

There are 2 best solutions below

5
On BEST ANSWER

$\sum_k (n_k-np_k)^2/np_k$ is Pearson's chi-squared statistic. The left side of the equation is the log likelihood ratio of the mle for the multinomial distribution vs the probabilities under the null hypothesis, the $p_k$'s. A common result is that 2*log likelihood ratio (likelihood alternative/likelihood null) has a chi-squared distribution under the null hypothesis with df = k-1. It is also well known that the Pearson Chi-squared statistic for the goodness of fit test also has k-1 df. The left and right sides of the equation have the same asymptotic distribution under the null hypothesis. The proof of either of these results should help.

As a side note, the meaning of $n_k-np_k$ growing with rate $\sqrt{n}$ could mean that since $n_k \sim binomial(n, p_k)$,

$\frac{1}{\sqrt{n}}(n_k-np_k) \overset{d}{\to} N(0, p_k(1-p_k)) \text{ as } n\to \infty $, the standard normal approximation to the binomial.

Or it could mean that $\vert n_k-np_k \vert = O_p(\sqrt{n}) \text{ or } O(\sqrt{n})$

1
On

The best I can get is $$ \begin{aligned} \sum_{k=1}^m n_k\log\left[\frac{n_k}{n p_k}\right] & = \sum_{k=1}^m n_k\left[\left(\frac{n_k}{n p_k}-1\right)-\frac12\left(\frac{n_k}{n p_k}-1\right)^2+O\left(\left(\frac{n_k}{n p_k}-1\right)^3\right)\right] \\ & = \sum_{k=1}^m n_k\left[-\frac12\left(\frac{n_k}{n p_k}\right)^2+\frac{2n_k}{n p_k}-\frac32+O\left(\left(\frac{n_k-n p_k}{n p_k}\right)^3\right)\right] \\ & = \sum_{k=1}^m -\frac{n_k^3}{2n^2 p_k^2}+\frac{2n_k^2}{n p_k}-\frac{3n_k}2+n_k O\left(\left(\frac{\sqrt n}{n p_k}\right)^3\right) \qquad\text{condition A} \\ & = \sum_{k=1}^m -\frac{n_k^3}{2n^2 p_k^2}+\frac{2n_k^2}{n p_k}-\frac{3n_k}2+n_k O\left(n^{-3/2}\right) \\ & = \left[\sum_{k=1}^m -\frac{n_k^3}{2n^2 p_k^2}+\frac{2n_k^2}{n p_k}\right]-\frac{3n}2+O\left(n^{-1/2}\right) \end{aligned} $$

But something isn't right. Your desired expression is $O(1)$, while the expansion appears to be $O(\sqrt n)$: condition A says $n_k=np_k+O(\sqrt n)$, so that the left hand side is $$\sum_k(np_k+O(\sqrt n))\log(1+O(n^{-1/2})=\sum_k(np_k+O(\sqrt n))O(n^{-1/2}).$$ I think we need to use something about the definition of $p_k$, $n_k$, and $n$.