Intuition behind expectation of likelihood ratio function being 1

137 Views Asked by At

Suppose we have two probability density functions $f(x)$ and $g(x)$. Let $\mathbb{P}_g$ be the probability measure induced by $g$. Consider the likelihood ratio function $\Lambda(x) = \frac{f(x)}{g(x)}$. Now consider its expected value $E_g[\Lambda]$ w.r.t. $\mathbb{P}_g$. Assuming that $\Lambda(x)$ is integrable, we have

$$ E_g[\Lambda] = \int_{\mathbb R} \frac{f(x)}{g(x)} \ g(x) \ dx = \int_{\mathbb R} f(x) \ dx = 1. $$

While the math is trivial, I don't understand why this is true intuitively. Intuitively, $\Lambda(x)$ measures how more "likely" it is to "draw" $x$ from $f$ vs $g$. Now, if the probability space is given by $\mathbb{P}_g$, then it means that $x$ is actually "drawn" from $g$. Hence, intuitively I would think that "on average" $\Lambda(x)$ is lower than 1, and so $E_g[\Lambda] < 1$.

Where is the flaw in the heuristic reasoning?

2

There are 2 best solutions below

0
On

Since we are talking about heuristic reasoning I feel comfortable with sharing my thoughts in a non-formal way (so I'll use loads of " ... "). The examples might seem child-trivial but this is the way I like to think.

You say that

"..the function $\Lambda(x)$ measures how more "likely" it is to "draw" $x$ from $f$ vs $g$."

The flaw is that you probably misinterpret how you use the function $\Lambda$ to measure whether something is more or less "likely". In particular, you definitely cannot think of it in terms of $\Lambda$ being smaller (or bigger).

Think of it in very easy terms: for example let $f$, $g$ be the "density" of two Bernoulli variables with parameters $p_f$ and $p_g$. Then you'd have $$ \Lambda (0) = \frac{1-p_f}{1-p_g} , \, \, \, \Lambda (1) = \frac{p_f}{p_g} $$ Here, as you can see, if $p_f > p_g$ then $\Lambda(0) < 1$ and $\Lambda(1) > 1$, and the two things "level out" proportionally to $p_g$ if the underlying distribution is $g$. One way to see this, observe that the ratio between observations (zeros and ones) under $g$ is proportional the ratio between $\Lambda(1)$ and $\Lambda(0)$. You can imagine/visualize it as follows: imagine you're choosing many times between two things, a bigger and a smaller one, exactly (on average) as many times you need to level them out. (the ratio between number of picks of the two is inversely proportional to the ratio of their sizes).

On the other side, if the underlying distribution actually is $f$, then it's less probable to "observe" zeros rather than ones, so in this case the expected value $E_f[\Lambda]$ will be bigger than one.

This is just a small thought in a very easy (and very discrete) setting, but I think that everything could be translated into continuum without too much effort.

0
On

When $g(x) > f(x)$ you have $0 \le \frac{f(x)}{g(x)} \lt 1$

When $0 < g(x) < f(x)$ you have $1 \lt \frac{f(x)}{g(x)} \lt \infty$

In a handwaving sense, your are correct that the first is more common than the second when measured using $g$, but the second can be very much larger when it does occur; indeed in some cases it can be unbounded above, while the first is restricted to a small interval

I think it is not intuitively obvious which effect dominates. As you say, it turns out that these two effects balance each other so long as $g(x)$ is never $0$ on the support of $f(x)$, and this should not be intuitively implausible