Does almost sure convergence of $\frac{1}{m}L_m$ imply almost sure convergence of $\frac{1}{m}\max\limits_{k\leq m}L_k$?

69 Views Asked by At

Assume we have a distribution function which is controlled by a parameter $f(x;\theta)$.

If we sample $m$ i.i.d samples from this distribution We can define the sum log-likelihood ratio of the $m$ samples with respect to two distributions parameterized by $\theta$ and $\lambda$ respectively, as:

$ L_m=\sum_{j=1}^m \log \frac{f\left(X(j) ; \theta\right)}{f\left(X(j) ; \lambda\right)} $

By the strong law of large numbers we know that $\frac{1}{m}L_m \xrightarrow{a.s}D(\theta || \lambda)$.

Where D(||) denotes the KL-divergence.

Why does it imply that $\frac{1}{m}\max\limits_{k\leq m}L_k \xrightarrow{a.s}D(\theta || \lambda)$ ?

I've read this claim in an article but it feels weird knowing that the terms in the sum might be negative and thus the sum might get its maximal value at the first index - leading to a series of maximums that never changes.

1

There are 1 best solutions below

2
On BEST ANSWER

This is true for general sequences of random variables. Let $(X_i)_{i\in\mathbb N}$ be a sequence of $\mathbb R$-valued random variables such that $$\frac{1}{m}X_m \rightarrow Z \hspace{1cm} \mathbb P-\text{a.s.}$$ for a non-negative random variable $Z$.

Then for every $\omega\in \Omega\setminus N$, for a null-set $N$, and for any $\epsilon>0$ there is a $K\in\mathbb N$ such that $|\frac{1}{m}X_m(\omega)-Z(\omega)|\leq \epsilon$ for all $m\geq K$. Now we can rewrite the maximum as $$\max_{i=1}^m X_i(\omega) = \max\bigg(\max_{i=1}^KX_i(\omega),\max_{i=K+1}^mX_i(\omega)\bigg).$$ Now note thas as $m\to\infty$ we have $\frac{1}{m}\max_{i=1}^K X_i(\omega) \rightarrow 0$. For the second term we get by $Z(\omega)-\epsilon \leq \frac{1}{i}X_i(\omega)\leq Z(\omega)+\epsilon$: $$Z(\omega)-\epsilon \leq \frac{1}{m}\max_{i=K+1}^m X_i \leq Z(\omega)+\epsilon .$$ Letting $\epsilon\to 0$ and $m\to\infty$ gives the desired result.

Note that the Kullback-Leibler-divergence is a non-negative value, hence the statement given above is applicable.