Rewrite an one-layer neural network into infinite-layer equivalent network

35 Views Asked by At

Problem

In the paper I am reading now, the author wrote generalized linear predictor with ReLU activation $$\{\mathbf{x}\mapsto \sigma(\mathbf{w}^T\mathbf{x}): \Vert\mathbf{w}\Vert_2\leq M\}$$ into a (possibly) infinitely-deep "super-thin" network with ReLU activation $$ \left\{\mathbf{x} \mapsto \sigma\left(w_{d} \cdot \sigma\left(\ldots w_{2} \cdot \sigma\left(\mathbf{w}_{1}^{\top} \mathbf{x}\right)\right)\right) :\left\|\mathbf{w}_{1}\right\| \cdot \prod_{j=2}^{d}\left|w_{j}\right| \leq M\right\} $$ and claim their equivalence and this motivates their following work to show that sample complexity of deep neural network could be independent of depth $d$.

One naive way to see this is to set $w_i=1,i=2,3,\cdots,d$, but I am not sure there is more "interesting" way to interpret this equivalence.