I am currently reading the article "Multilayer Feedforward Networks With a Nonpolynomial Activation Function Can Approximate Any Function". In the proof of theorem 1 the authors state that: The set of algebraic polynomials of degree at most $k$ cannot be dense in $C(\mathbb{R}^n)$. Why is this the case?
Why can't the set of algebraic polynomials of degree at most k be dense in $C(\mathbb{R}^n)$
399 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 3 best solutions below
On
Proof sketch:
Without loss of generality, assume $n = 1$.
It suffices to find a function within the Hilbert (sub)space $L_2(\mathbb{R})$, which can not be approximated by a polynomial with degree at most $k$, such that the "maximal pointwise error" is sufficiently small. For this, it suffices to work over $[0, 1]$: If the error there is "high enough", the fact that the error for all real numbers error is at least as high leads to the error over all real numbers being "high enough" itself.
Let $P^k([0, 1])$ be the vector space of polynomials of degree $k$, equipped with the supremum norm $||p||_\infty = \sup_\limits{x \in [0, 1]} p(x) $. Consider the basis $(1, x, \dots, x^k)$ and "make them at right angles" by applying the Gram Schmidt procedure to get a basis of $P_k$, namely $$(p_1(x), p_2(x), \dots, p_k(x)) = (f_1(x), f_2(x), \dots, f_k(x))$$ consisting of in $L_2([0, 1])$ orthonormal (!) polynomials out of $P^k$.
Now, complete this basis to an orthonormal basis of $L_2([0, 1])$, namely $(f_1(x), f_2(x), \dots, f_k(x), f_{k+1}(x), f_{k + 2}, \dots)$. This basis is infinite, otherwise, $L_2([0, 1])$ would have a finite dimension.
Note that $L_2([0, 1])$ is separable, since $[0, 1]$ is compact!
Therefore, the elements of $L_2$ can be written as something like a Fourier series. (linear combination of orthogonal basis elements of $L_2$): $$f = \sum_{n=0}^\infty |<f, f_i> f_i$$ Let $$p = \sum_{n=0}^\infty |<p, f_i> f_i$$ be an arbitrary polynomial in $P^k$.
Since orthonormality is fulfilled, the triangle inequality becomes an equality. Hence, $$\begin{align*} ||f - p||_\infty &= \sum_{n=0}^\infty ||... \cdot f_i||_\infty\\ &= \sum_{n=0}^\infty |...|\cdot ||f_i||_\infty\\ &= \sum_{n=0}^\infty |...|\cdot 1\\ &= \sum_{n=0}^\infty |...|\\ &> |coeff(f)_{k+1}| \end{align*}$$ , since the Fourier series of $p$ is uniquely determined and guaranteed to be in $P^k$.
Note that you have found a lower bound for the error, which does not depend on choice of the polynomial. This means that you will never be able to get closer to $f$ than $|coeff(f)_{k+1}|$.
Technical details: A crucial point: It remains to find a function $f \in L_2([0, 1])$ with coefficients which are nearly all not $0$. I propose something Sawtooth-like.
My first idea was to use something like the reversed triangle equality, then I reminded the triangle equality in orthonormal case. Going down to the subspace $L_2$ guarantees that we can work with a Hilbert space (complete, has a scalar product). Restricting to the interval $[0, 1]$ guarantees separability. While stating "uniqueness" of Fourier transform, I did not bother with $f$ being in reality an equivalence class.
EDIT: It is mainly an elaborated, independently developed version of the comment of @Danny Pak-Keung Chan.
In fact, using polynomials of degree $k$, you do not obtain a dense set even if you restrict to their values at $k+2$ distinct points $x_1,x_2,\ldots,x_{k+2}$.
Proof: The set $$H=\{(P(x_1),\ldots,P(x_{k+2}) \,: \, {\rm deg} P \le k\}$$ is a linear subspace of ${\mathbb R}^{k+2}$ spanned by the $k+1$ vectors obtained when you take $P(x)=x^j$ for $j=0,1,\ldots,k$. Fix a vector $v=(v_1,\ldots,v_{k+2})$ with $\|v\|^2=k+2$ that is orthogonal to $H$. You can find such a nonzero vector $v$ by solving $k+1$ linear equations in $k+2$ unknowns $$\sum_{i=1}^{k+2} x_i^j v_i =0 \; {\rm for} \; j=0,\ldots k \,, $$ then scale $v$ to have the desired length.
For $h \in H$ Pythagoras gives $ \|v-h\|^2=\|v\|^2+\|h\|^2 \ge k+2$, so there is some $i\le k+2$ such that $|v_i-h_i| \ge 1$.
Finally, consider any function $f \in C[0,1]$ that satisfies $f(x_i)=v_i$ for $i=1,\ldots, k+2$. Then for every polynomial $P$ of degree at most $k$ there is an $i \le k+2$ such that $|f(x_i)-P(x_i)| \ge 1$.