I'm trying to understand the second part of a proof which was presented here before. Basically, we want to establish:
$$ \lim_{n\to \infty} P_{\theta_0} [L(\theta _0, X) > L(\theta, X)]=1, \,\,\forall \theta \neq\theta _0$$
The following steps are already established:
$$ \lim_{n\to \infty} P_{\theta_0} [L(\theta _0, X) > L(\theta, X)] = \lim_{n\to \infty} P_{\theta_0} \Big[\frac1n \sum_{i=1}^n \log \Big[\frac{f(X_i;\theta)}{f(X_i;\theta_0)}\Big] < 0\Big] $$
Using Jensen's inequality and the weak law of large numbers one establishes that $ \frac1n \sum_{i=1}^n \log \Big[\frac{f(X_i;\theta)}{f(X_i;\theta_0)}\Big] \to_{P_{\theta_0}} E_{\theta_0} \log \Big[\frac{f(X_1;\theta)}{f(X_1;\theta_0)}\Big] < 0$. From here one concludes the wanted limit equality.
My questions
How do I know that the expected value of $\log \Big[\frac{f(X_1;\theta)}{f(X_1;\theta_0)}\Big]$ is finite? In the book by Kullback, Information Theory and Statistics I read that the Kullback-Leibler divergence always exists but it may be infinite.
How can I conclude from the last fact that the original hypothesis is true? Is it because I can permute the limit and the symbol $P$?
Towards the answer
- For the first part, it seems that the Kullback-Leibler divergence does the trick. But why is the Kullback-Leibler divergence always defined in my case?
- I write $ \frac1n \sum_{i=1}^n \log \Big[\frac{f(X_i;\theta)}{f(X_i;\theta_0)}\Big] \to_{P_{\theta_0}} E_{\theta_0} \log \Big[\frac{f(X_1;\theta)}{f(X_1;\theta_0)}\Big]$ by definition of convergence in probability as
$$\forall \epsilon > 0. \lim_{n\to \infty} P_{\theta_0}\Big[\Big|\frac1n \sum_{i=1}^n \log \Big[\frac{f(X_i;\theta)}{f(X_i;\theta_0)}\Big]- E_{\theta_0} \log \Big[\frac{f(X_1;\theta)}{f(X_1;\theta_0)}\Big]\Big| < \epsilon\Big] = 1$$
and this says that the distance to a negative point is less than any $\epsilon$. I can take $\epsilon$ sufficiently small so that with probability 1 the average of the sum of logs is negative. It would be nice if someone dared to formalize this bit rigorously.