Why does Hoeffding's Lemma do taylor expansion in the exponent?

406 Views Asked by At

While reading through the proof, I realized that one part of the proof I took for granted was that Hoeffding's lemma expanded on the exponent portion of the inequality after the reparametrization of the bounds:

Note, I will use different variable names. $a$ is the $\theta$, and $x$ is the $u$: https://en.wikipedia.org/wiki/Hoeffding%27s_lemma

They took the function $e^{-ax}(1-a+ae^x)$, and redefined it in terms of $e^{f(x)}$ where $f(x) = -ax + log(1-a+ae^x)$, and did a taylor expansion on the top, around $x = 0$, to get $e^{1/2(a(1-a))x^2}$, which translated to the final result after some trivial observations.

However, I don't understand what prompted the expansion on $f(x)$, rather than the entire function itself. Can someone elaborate on this?

1

There are 1 best solutions below

2
On BEST ANSWER

Part of the motivation for this approach is the more general Chernoff approach to large deviation bounds, where, by Markov's inequality, one has (for $t>0$):

$$\mathbb{P}(X>x) = \mathbb{P}(e^{tX}>e^{tx}) = \le \frac{\mathbb{E}[e^{tX}]}{e^{tx}} = \exp(-[tx-C(t)])$$

with $C(t)=\log \mathbb{E}[e^{tX}]$, the cumulant-generating function of $X$.

To proceed, one then calculates the convex conjugate of $C$:

$$r(x) = \sup_{t>0}[tx-C(t)]$$

to deduce that $\mathbb{P}(X>x) \le \exp(-r(x))$ (by taking the $t$ which attains the supremum above).

The idea of this standard proof of Hoeffding's inequality is to:

  • upper bound $\mathbb{P}(X>x)$, by ...
  • lower bounding $r(x)$, by ...
  • upper bounding $C(t)$ - which is what the proof does.

The choice to do this by a Taylor expansion is partially motivated by the fact that $C(t)$ is convex (this is true for all cumulant-generating functions), and as such, we have information about its derivatives and curvature.