Generalization in Neural Networks: Can one Impose Conditions on the Data?

102 Views Asked by At

There is a well-developed theory on generalization bounds for deep neural networks, using VC dimensions and Rademacher Complexities. They work for any underlying "true" distribution $\mathcal{D}$ of the data.

However, according to the lecturer I listened to recently, this is often too restrictive and thus does not fully explain why they work so well in practice.

My question is now quite simple: Are there statistical conditions that we can impose on $\mathcal{D}$ that guarantee "faster" generalization than the one would usually get? Are there papers that adress this?

(For example, in linear regression we can quantify the expected prediction error quite well based on the assumption that the response $Y$ is Gaussian with mean $X\beta$ for some $\beta \in \mathbb{R}^{n}$)

1

There are 1 best solutions below

0
On

Here's an example of such a work: https://arxiv.org/abs/1901.08584. The authors define a "data-dependent complexity measure" (a lower bound on the eigenvalues of a special Gram matrix) and that allows you to obtain a tighter generalization bound for two-layer neural networks trained via gradient descent, assuming a sufficiently large sample size which depends on this measure.