I am reading a paper which states
$$ \underbrace{\left(\left( 1 -a + \frac{a^2}{2} (B+2) \right) e^ {a (1-\epsilon)}\right)^k}_{(i)} \leq \underbrace{\left(e^{-a + \frac{a^2(B+2)}{2} - \frac{1}{2}\left(a - \frac{a^2(B+2)}{2}\right)^2} e^ {a (1-\epsilon)}\right)^k}_{(ii)} \\ \leq \underbrace{e^{-\frac{(\epsilon^2-\epsilon^3)k}{2(B+1)}}}_{(iii)}, \tag{1}$$ with $B \geq 1$. It is stated that $(i) \to (ii)$ can be done by using $e^{-x + x^2/2} \geq 0$ for $x \geq 0$. Supposing that this is correct I am trying to prove $(ii) \to (iii)$.
In the description says that the upper bound $(iii)$ can be obtained by using $a = \epsilon/ (B+1)$ in $(ii)$ by noting $B \geq 1$.
The first think I have tried so far is to rewrite $(1)$
$$\left(e^{\frac{a^2(B+2)}{2} - \frac{1}{2}\left(a - \frac{a^2(B+2)}{2}\right)^2} e^ {-a\epsilon}\right)^k \leq e^{-\frac{(\epsilon^2-\epsilon^3)k}{2(B+1)} }\tag{2}$$ hopping to use the result here. The next thing I thought was that the value $a = \epsilon/ (B+1)$ mentioned in the paper is a some root that give a tighter bound. So I have decided to define
$$x(a) = \frac{a^2(B+2)}{2} - \frac{1}{2}\left(a - \frac{a^2(B+2)}{2}\right)^2\tag{3}$$ and $$f(a) = e^{x(a)}\tag{4}$$ with goal to find the optimal $a$ that minimize $f(a)$ in order to get a tighter bound on $(ii)$. In this direction we need to compute $a$ that satisfies $\partial_a f(a) = 0$, i.e.,
$$\partial_a f(a) = f(a) \partial_a x(a) = 0\tag{5}$$ which using $f(a) > 0$ equals
$$\partial_a x(a) = 0\tag{6}$$ or more specifically
$$\partial_a x(a) = a\left(-\frac{1}{2} (B+2)^2 a^2 + \frac{3}{2} (B+2) a + (B+1)\right) = 0.\tag{7}$$
It seems that the determinant of $(7)$ does not lead in something useful related to $a = \epsilon / (B+1)$. Could you please someone cast some light? Any help is highly appreciated.
The inequality is from the end of the proof of Theorem 1 in the paper "An algorithmic theory of learning: Robust concepts and random projection".