Rademacher Complexity Result

349 Views Asked by At

I was looking at one of the Rademacher Generalisation bound proofs, which says:

If $G$ is a family of functions mapping from $Z$ to $[0, 1]$ and $\mathcal{R_m}(G)$ denotes the Rademacher Complexity of $G$, then for any $\delta > 0$, $$ \Pr \left[ E[g(z)] - \frac{1}{m} \Sigma_{i = 1}^{m} g(z_i) \leq 2 \mathcal{R}_m(G) + \sqrt\frac{\log (\frac{1}{\delta})}{2m} \right] \geq 1 - \delta $$

The proof proceeds by looking at the supremum of $\Phi$ where $\Phi(z_1, z_2, \ldots, z_m)$ is defined as $(E[g(z)] - \frac{1}{m} \Sigma_{i = 1}^{m} g(z_i))$, and applying the McDiarmid's Inequality to $\Phi$.

I do not understand why can we not take $f(z_1, z_2, \ldots, z_m) = \frac{1}{m} \Sigma_{i = 1}^{m} g(z_i)$, and use that $\left| f(z_1, z_2, \ldots, z_i, \ldots, z_m) - f(z_1, z_2, \ldots, z_i', \ldots , z_m) \right | < \frac{1}{m}$. Then apply McDiarmid's inequality to $f$ to get $$ \Pr \left[ E_{S \sim Z^m} [f(S)] - f(S) \leq \epsilon \right] \geq 1 - \exp{(-2 \epsilon^2 m)} \text{......(1)}$$

In the previous, I have written $f(S)$ to denote $f(z_1, z_2, \ldots, z_m)$. Now, $$E_{S \sim Z^m} [f(S)] = E_{S \sim Z^m} [\frac{1}{m} \Sigma_{i = 1}^{m} g(z_i)] = \frac{1}{m} \Sigma_{i = 1}^{m} E_{S \sim Z^m} g(z_i)$$ Since each $z_i$ in $S$ is sampled independently from $Z$, $$ \frac{1}{m} \Sigma_{i = 1}^{m} E_{S \sim Z^m} g(z_i) = \frac{1}{m} \Sigma_{i = 1}^{m} m E[g(z)] = E[g(z)]$$

Now plugging into (1), and taking $\delta = \exp{(-2 \epsilon^2 m)}$, we get

$$ \Pr \left[ E[g(z)] - \frac{1}{m} \Sigma_{i = 1}^{m} g(z_i) \leq \sqrt\frac{\log (\frac{1}{\delta})}{2m} \right] \geq 1 - \delta $$

This result is wrong because there is no additional rademacher complexity term in the generalisation bound, but I cannot figure out the error. Please help.

1

There are 1 best solutions below

0
On BEST ANSWER

The problem is in this line of the question:

$$\frac{1}{m} \Sigma_{i = 1}^{m} E_{S \sim Z^m} g(z_i) = \frac{1}{m} \Sigma_{i = 1}^{m} m E[g(z)] = E[g(z)]$$

This equality precludes the flexibility that $g$ may depend on $S$.

For example, we may want $g$ to be the empirical risk minimizer on $S$, i.e. $g=\mathrm{ERM}_G(S)$, and we may write $g(z)$ as $g(z;S)$ to show its dependence on $S$ explicitly. In this case, clearly $E_S[g(z_i; S)]= E_z[g(z; S)]$ does not hold in general.