Do we have an universal approximation theory on ReLU activated Neural Networks already?

104 Views Asked by At

In 1990s several researchers(Hornik, Cybenko, etc.) have proved that feedforward neural networks with bounded and non-constant activation function are able to approximate any $L_p$-Integrable function. However many popular activation functions today, ReLU for example, doesn't satisfy the bounded prerequisite. Is there any theory on approximation capability of neural networks with ReLU as activation function yet?

1

There are 1 best solutions below

1
On

You can simulate a node activation such as $$n(x)=\cases{0& if $x<0$\\x& if $0\leq x\leq 1$\\1& otherwise}$$ with two ReLU nodes $n_1,n_2$ like so: $$ n(x)=n_1(x)-n_2(x-1) $$ So anything that can be done with the activation $n$ can be exactly replicated with ReLU, and at most twice as many nodes.