Why use exponentials in the Markov inequality?

694 Views Asked by At

The Markov inequality gives an upper bound to the probability of a non-negative rv:

$$P(X\geq a) \leq \frac{E(X)}{a}$$

in terms of the mean of the random variable. But the next, rather unmotivated step in this topic (distribution bounds) is to say... Well, since the exponential is a monotonically increasing function, we can just instead do this...

$$P(X\geq a)=P(e^{tX}\geq e^{ta}) \leq \frac{E(e^{tX})}{e^{ta}}$$

the numerator on the RHS is the moment generating function of the variable. And the last inequality is the Chernoff bound.

So I guess we can say that as long as there is an MGF for $X$, there is a bound. That sounds remotely useful. But perhaps the intention is to get some expression where the probability decays exponentially to conclude that asymptotic values are rare, motivating the introduction of the exponents in the MGF.

But I don't know the reason... What is the motivation to all of a sudden do the switch from $X$ to the MGF?

3

There are 3 best solutions below

1
On

The motivation is greed.

For any positive increasing function $\phi$ one has the inequality $P(x\ge a)\le E[\phi(X)]/\phi(a)$. The faster-growing the function $\phi$ is, the tighter the upper bound looks, as a function of $a$.

2
On

enter image description here

enter image description here

Look at the "geometry" behind the proof of Markov's and Chebyshev's inequality. The essence is to bound from above some "step" functions so that the "remaining area" is as small as possible! Then it makes sense to use a exponential function, which is a smooth convex function that achieves (intuitively) our desire.

5
On

Given an increasing function $f$, you have $P(X \geq a)=P(f(X) \geq f(a)) \leq \frac{E[f(X)]}{f(a)}$. The numerator doesn't depend on $a$. This means that the asymptotic decay rate of your bound improves as you make $f$ grow faster at infinity, until you cross over into $f$ such that $E[f(X)]$ is infinite.

This is useful for studying tails but it may or may not be useful for the particular value of $a$ that you care about in a specific scenario, because for a fixed $a$, making $f$ bigger causes $E[f(X)]$ and $f(a)$ to both get bigger, and so it may or may not reduce the ratio.

Now why exponentials rather than some other rapidly increasing function? That's harder to put your finger on, without having a concrete goal for the inequality in mind.