The step of swaping terms in the Rademacher bound

59 Views Asked by At

On page 28 here: https://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine-learning-and-statistics-spring-2012/lecture-notes/MIT15_097S12_lec14.pdf the step marked with "why?" is not entirely clear to me. Can someone please explain?

I see that the effect of introducing the Rademacher variables is to randomly swap the terms of the differences inside the sum, and it doesn't change the distribution of the sum. But it can change the g that achieves the supremum, so why is it that in expectation the supremum remains the same? (Would this trick work with any function in place of 'sup'?)

1

There are 1 best solutions below

0
On BEST ANSWER

Correct, the supremum is not important here. It is just a complicated case of "if $X$ and $Y$ have the same distribution then $E_X[f(X)]=E_Y[f(Y)]$." In your example, $X=(X_g)_{g \in \mathcal{G}}$ where $X_g = \frac{1}{m} \sum_i (g(Z'_i) - g(Z_i))$ and $Y=(Y_g)_{g \in \mathcal{G}}$ where $Y_g =\frac{1}{m} \sum_i \sigma_i (g(Z'_i) - g(Z_i))$.