Universal Approximation Theorem -- Neural Networks

1.1k Views Asked by At

Universal approximation theorem states that "the standard multilayer feed-forward network with a single hidden layer, which contains finite number of hidden neurons, is a universal approximator among continuous functions on compact subsets of $R^n$, under mild assumptions on the activation function."

I understand what this means, but the relevant papers are too far over my level of math understanding to grasp why it is true or how a hidden layer approximates non-linear functions.

So, in terms little more advanced than basic calculus and linear algebra, how does a feed-forward network with one hidden layer approximate non-linear functions? The answer need not necessarily be totally concrete.

I also posted this question at TCS, and CV. Previously, no one had given a solution. But now, here is a really excellent and comprehensive answer.

1

There are 1 best solutions below

0
On

If the hidden units are radial basis functions (i.e., they have a peak response when the input pattern is close in a Euclidean distance sense to the parameter vector of the hidden unit), then each hidden unit basically generates a "bump". A superposition of such "bumps" can then be used to approximate an arbitrary function. Other types of hidden units such as "sigmoidal" units also have this type of "bump" response property.