Universal Approximation with Fixed Layer

74 Views Asked by user355356 At 11 May 2026 - 5:37

Fix an activation function $\sigma$, and denote the class of all Neural-networks from $\mathbb{R}\rightarrow\mathbb{R}$ defined by this activation function by $NN^{\sigma}$.

The classical universal approximation theorem for neural networks on $C([0,1];\mathbb{R})$, states that for a fixed nice activation function $\sigma$, any $f\in C([0,1];\mathbb{R})$ and any $\epsilon >0$, there exists some $f_{\epsilon} \in NN^{\sigma}$ satsifying \begin{equation} \sup_{x \in [0,1]} |f_{\epsilon}(x)-f(x)|\leq \epsilon. \label{eq_1} \end{equation} Moreover, only $1$ layer is needed.

My question is, if we instead consider all the neural networks of arbitrary depth but fixed layer, can we still obtain universal approximation, but only by this subset?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 22 Feb 2019 - 2:02

Let's assume $NN^{\sigma}$ is a set of arbitrarily deep feedforward networks with ReLU activation function and all its layers have constant width $H$.

1) If $H>1$ (more than 1 neuron in each layer), then indeed the universal approximation property holds: $\forall f\in C([0,1];\mathbb{R})~~ \forall \varepsilon > 0 ~~ \exists f_{\varepsilon} \in NN^{\sigma}~~ \sup_{x \in [0,1]} |f_{\epsilon}(x)-f(x)|\leq \epsilon$.

2) If $H=1$ (only 1 neuron in each layer), then there is a function in $C([0, 1]; \mathbb{R})$ which can't be approximated in this way. So the universal approximation property does not hold.

This is a special case of the result obtained by Hanin and Sellke in 2017. In general it states that the approximation property holds if and only if layer width is strictly larger than input dimension $\nu$.

As of 2019, a number of connected open problems still stand (source: lecture notes from deep learning theory course by D. Yarotsky):

1) Even though in the case of $H=\nu$ approximation property is not universal anymore (it doesn't hold for all functions in $C([0,1];\mathbb{R})$), what are the necessary and sufficient conditions for a function to be approximable with such a network?

2) What are minimal widths of networks for other activation functions?

Universal Approximation with Fixed Layer

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions