Rademacher complexity of a mixture of distributions

49 Views Asked by At

I would like to prove that the expected Rademacher complexity when the dataset is drawn from a mixture of $K$ distributions $D_1, D_2, \dots, D_K$ is lower bounded by the expected Rademacher distribution whena fixed number of samples---equal to its expected value under the mixture distribution---is drawn from each distribution, i.e. $$ \begin{aligned} & m \mathop{\mathbb{E}}_{S \sim \left( \sum_{k=1}^K \frac{m_k}{m} D_k\right)^m} \mathcal{R}(F,S)\\ & = \mathop{\mathbb{E}}_{\substack{\left(M_1, M_2, \dots, M_K\right) \sim \operatorname{Multi}\left(m ; \frac{m_1}{m}, \frac{m_2}{m}, \cdots, \frac{m_K}{m}\right)\\ \left\{S_k^{\prime} \sim D_k^{M_k}\right\}_{k=1}^{K}\\ \left\{\sigma_k \sim \operatorname{Unif}\left(\{-1,1\}\right)^{m_k}\right\}_{k=1}^K}} \quad \sup_{f \in F} \sum_{k=1}^K \sum_{i=1}^{M_k} \sigma_{k, i} f\left(z_{k i,}^{\prime}\right)\\ &\ge \mathop{\mathbb{E}}_{\substack{\left\{S_k \sim D_k^{m_k}\right\}_{k=1}^{K}\\ \left\{\sigma_k \sim \text{Unif}\left(\{-1,1\}\right)^{m_k}\right\}_{k=1}^K}} \sup_{f \in F} \sum_{k=1}^K \sum_{i=1}^{m_k} \sigma_{k, i} f\left(z_{k, i}\right) \end{aligned} $$ where $m= \sum_{k=1}^K m_k$, $S_k=\{z_{k,i}\}_{i=1}^{m_k}$, $S_k^\prime=\{z^\prime_{k,i}\}_{i=1}^{M_k}$, $\text{Unif}\left(\{-1,1\}\right)$ denotes the uniform distribution over $\{-1, 1\}$, and $\operatorname{Multi}\left(m ; p_1, p_2, \dots, p_K\right)$ denotes the multinomial with $m$ trials and probabilities $p_1, p_2, \dots, p_K$.

I believe the expectation over the Rademacher variables $\sigma_k$ is not needed for the result to hold.

In my opinion this intermediate result is needed to prove Theorem 5 in Mansour et al "Three approaches for personalization with applications to federated learning", https://arxiv.org/abs/2002.10619.