Help me understand this part in proving a theorem relating to Rademacher complexity. For simplicity, I am going to avoid all extraneous variables (Eq (4)). Random variable $\psi(S)$ is written as $\psi$ for simplicity. Here $S$ is a sample set, $t$ and $\delta$ are small real positive values. $D$ is the data distribution.
Given, $$Pr[\psi \leq E[\psi] + t] \leq \delta $$
Further, complementing the probability we get
$$Pr[\psi \leq E[\psi] + t] = 1- Pr[\psi -E[\psi] \geq t] \leq \delta$$
So, we can write it as
$$ Pr[\psi -E[\psi] \geq t] \geq 1-\delta$$
Finally, we can say that $ \psi\geq t + E[\psi] $ with atleast probability $1-\delta$.
Now, here is the confusing part, the authors are replacing the inequality in another inequality given as
$$ E_D[f] \leq E_S[f] + \psi $$
implies
$$E_D[f] \leq E_S[f] + E[\psi] + t $$ with probability atleast $1-\delta$.
As $\psi$ is larger than $E[\psi] + t$ this substitution doesn't work correct? Where am I going wrong?
TIA
In the paper that you have linked they assume that $$ \mathbb{P}\Big( \varphi(S) \ge \mathbb{E}[\varphi(S)] + t\Big) \le \delta $$ and they prove that such inequality holds iif $t \ge \sqrt{\frac{\log(\delta^{-1})}{m}}$
That implies that, if $t \ge \sqrt{\frac{\log(\delta^{-1})}{m}}$ $$ \mathbb{P}\Big(\varphi(S) \le \mathbb{E}\left[\varphi(S)\right]+t\Big)=1-\mathbb{P}\Big( \varphi(S) \ge \mathbb{E}\left[\varphi(S)\right] + t\Big) \ge 1-\delta $$ And so you get, with probability at least $1-\delta$ $$\mathbb{E}_D\left[f(z)\right] \le \mathbb{E}_S[f(z)]+\varphi(S) \le \mathbb{E}_S\left[f(z)\right] +\mathbb{E}\left[\varphi(S)\right]+t$$