Motivation behind steps in proof of Hoeffding Inequality

339 Views Asked by At

The lemma that is proved for proving Hoeffding's inequality is:

If $a\leq X\leq b$ and $E[X]=0$, $E[e^{tX}] \leq e^{\frac{t^2(b-a)^2}{8}}$ Here's a link to the proof: http://www.stat.cmu.edu/~larry/=stat705/Lecture2.pdf

There's a particular step in the proof the motivation of which I don't understand. Equation (3) in the pdf is $$Ee^{tX} \leq \frac{-a}{b-a}e^{tb}+\frac{b}{b-a}e^{ta}=e^{g(u)}$$ where $u=t(b-a)$, $g(u)=-\gamma u+\log (1-\gamma+\gamma e^u)$ and $\gamma=\frac{-a}{b-a}$.

I can understand that somebody would try to get the inequality in the form $E[e^{tX}] \leq e^{g(u)}$, but I don't see why one would choose $u=t(b-a)$. Put another way, I don't think I would have tried the above step. Is there a motivation behind why someone would think of defining $u$ the above way and get the complicated expression of $g(u)$, and hope that this will lead to something useful?

1

There are 1 best solutions below

1
On

The definition $u:=t(b-a)$ naturally arises from the fact that $a\le X\le b$. $t$ is a reduced variable, independent of the interval length.

The unexpected form of $g(u)$ is a rewrite of $\log(\frac{-a}{b-a}e^{tb}+\frac{b}{b-a}e^{ta})=\log(e^{ta}(\frac{-a}{b-a}e^{t(b-a)}+\frac{b}{b-a}))$, with the intent to bound it from its Taylor's expansion.

The main "trick" in this derivation is the use of the convexity property. The rest is commonplace function approximation.