There is a well-developed theory on generalization bounds for deep neural networks, using VC dimensions and Rademacher Complexities. They work for any underlying "true" distribution $\mathcal{D}$ of the data.
However, according to the lecturer I listened to recently, this is often too restrictive and thus does not fully explain why they work so well in practice.
My question is now quite simple: Are there statistical conditions that we can impose on $\mathcal{D}$ that guarantee "faster" generalization than the one would usually get? Are there papers that adress this?
(For example, in linear regression we can quantify the expected prediction error quite well based on the assumption that the response $Y$ is Gaussian with mean $X\beta$ for some $\beta \in \mathbb{R}^{n}$)
Here's an example of such a work: https://arxiv.org/abs/1901.08584. The authors define a "data-dependent complexity measure" (a lower bound on the eigenvalues of a special Gram matrix) and that allows you to obtain a tighter generalization bound for two-layer neural networks trained via gradient descent, assuming a sufficiently large sample size which depends on this measure.