I was looking at the statement of the "Universal Approximation Theorem":
Universal Approximation Theorem: Fix a continuous function $\sigma:\mathbb R\to \mathbb R$ (activation function) and positive integers $d,D$. The function $\sigma$ is not a polynomial, if and only if, for every continuous function $f:\mathbb R^d\to \mathbb R^D$ (target function), every compact subset $K$ of $\mathbb{R}^d$, and every $\epsilon>0$, there exists a continuous function $f_\epsilon:\mathbb R^d \to \mathbb R^D$ (the layer output) with representation $$f_\epsilon =W_2\circ \sigma \circ W_1$$ where $W_2,W_1$ are composable affine maps and $\sigma$ denotes component wise composition, such that the approximation bound $$\sup_{k\in K}\Vert f(x)-f_\epsilon (x)\Vert<\epsilon$$ holds for all $\epsilon$ arbitrarily small (distance from $f$ to $f_\epsilon$ can be infinitely small).
For this theorem to work, i.e. for the function $f_\epsilon$ to approximate the function $f$ at some arbitrary level of precision - this theorem seems to specifically require that the function $\sigma$ to not be a polynomial. In a certain sense, it seems like being polynomial here has a negative connotation, in the sense that it might prevent the results of this theorem from being true.
Regarding this, I had the following questions:
- Could this theorem have been written "Fix a continuous non-polynomial function $\sigma$ ..."?
- Why can't the $\sigma$ function be polynomial?
- What else could be the $\sigma$ function be (e.g. linear?)?
- What happens if the $\sigma$ function is polynomial - does the Universal Approximation Theorem not hold?
Can someone please explain this?