Understanding the Stone-Weierstrass Theorem in Rudin's Principle of Mathematical Analysis

3.5k Views Asked by At

Below is the statement and proof of the Stone-Weierstrass Theorem in Rudin's Principle of Mathematical Analysis (page 159, Chapter 7.)

Question: Suppose Rudin uses, $Q_n=c_n(1-x^4)^n$ instead of $Q_n=c_n(1-x^2)^n$, how would the proof of the theorem change?


My thoughts: My thoughts are that the proof changes in (48), where we use the lower bound $1/\sqrt{n}$. I think if we use $Q_n=c_n(1-x^4)^n$, our estimate would have to be different. Thus, for (50), $Q_n$ would be less than something different than $\sqrt{n}$.

However, I'm not sure if this correct. I'm feeling as if though the proof would have bigger changes if we used $Q_n=c_n(1-x^4)^n$ as my professor gave me this question to contemplate about.


enter image description here enter image description here enter image description here

2

There are 2 best solutions below

2
On

It wouldn't change much, since $$(1-x^4)^n \geq 1-nx^4$$ by the same reason (ie, the derivative of $f(x)=(1-x^4)^n-1+nx^4$ is $-4nx^3(1-x^4)^{n-1}+4nx^3$, which is positive on $(0,1)$) and then you can integrate up to $1/\sqrt[4]{n}$ and get a similar analysis for $c_n$. Then, the only times that we see $(1-x^2)^n$ appearing again is in justifying the uniform convergence of $Q_n$ outside any small interval around $0$, which will hold by essentially the same argument, and in the last inequality, which will not make a difference.

However, there is more to it than "it wouldn't change anything". The idea of the proof by Rudin is that the convolution of two functions should preserve the best properties out of the two functions. One good example of this is the convolution with a smooth compactly supported function, which is a good way to prove uniform density of the smooth functions on $C^0(I)$, for example. So he strikes for a convolution with a polynomial.

The point is: if $f$ is a function (continuous, $L^1_{loc}$ or whatever depending on context), we expect that after making a convolution with a function $\phi$ which is normalized as to have $\int \phi =1$, if the support is small enough and concentrated near $0$, then the convolution $\widetilde{f}$ is near $f$ (think of a discrete averaging where $\phi$ is a "weight", and the convolution at a point is the average of the function according to that weight centralized on the point. If we put more weight at the point, and less on its near values, then the average shouldn't shake things that much). This is easy to arrange for $C^{\infty}$ functions, since we have bump functions. But here we would like to prove density of polynomials, which are not so malleable. With effect, we can't have a polynomial with compact support and integrating $1$ (we can't have non-trivial polynomials with compact support at all!).

So we try to emulate a compact support by taking a polynomial which is highly concentrated near the origin. The polynomial $p(x)=1-x^2$ is maybe the most trivial example of something concentrated near the origin. An important observation is that since the function is defined on $[0,1]$, what matters to our avering polynomial is its behaviour in $[-1,1]$, so the fact that $(1-x^2)$ explodes outside of $[-1,1]$ is irrelevant. Then we raise it to the power of $n$ since, being on $(-1,1)$, this will concentrate things even more near the origin (you can plot a few powers to try and see this). The estimates that Rudin do are to guarantee that everything goes well and represent the technical difficulty of not having compact support (also as small as desired).

The polynomial $1-x^4$ also satisfies the property that if we raise it to $n$, things will concentrate near the origin. The only difference is that it is less concentrated than $1-x^2$, so that the error you make with the convolution will probably be bigger than if you used $1-x^2$.

1
On

There are three key facts in the proof in Rudin (see this excellent textbook in real analysis by Terence Tao with a different presentation of the same proof):

  • polynomials can be approximations to the identity;1
  • convolution with polynomials produces another polynomial;2
  • one convolves a uniformly continuous function with an approximation to the identity, we obtain a new function which is close to the original function (which explains the terminology “approximation to the identity”).

Changing the polynomial $1-x^2$ to $1-x^4$ in the construction of $Q_n$ does not change the proof in the sense that the three facts above will be still applicable to the newly defined $Q_n$. To be more concrete, note that in general, we have for all $y\in[0,1]$ and nonnegative integer $n$ that $$ (1-y)^n\geq 1-ny\;. $$ So as the proof in Rudin, one has the lower bound $$ \frac{1}{c_n}=2\int_0^1(1-x^4)^n\;dx\geq 2\int_0^{1/\sqrt[4]{n}}(1-nx^4)\;dx >\frac{1}{\sqrt[4]{n}}\;, $$ which gives the key estimate $$ |Q_n(x)|\leq \sqrt[4]{n}(1-\delta^4)^n\quad (\delta\leq|x|\le 1)\;.\tag{1} $$ But the exponential term $(1-\delta^4)^n$ decays very fast:

If $p>0$ and $\alpha$ is real, then $\displaystyle \lim_{n\to\infty}\frac{n^\alpha}{(1+p)^n}=0$. (See Theorem 3.20(d) in Rudin.)

Once one has the uniform estimate (1), the rest of the proof in Rudin can be repeated verbatim for the new $Q_n$: $$ |P_n(x)-f(x)|\leq 2M\int_{|x|>\delta}Q_n(t)\;dt +\frac{\varepsilon}{2}\int_{|x|<\delta}Q_n(t)\;dt \le 4M\color{red}{\sqrt[4]{n}}(1-\delta^4)^n+\frac{\varepsilon}{2} $$

One difference between the two choices of $Q_n$ is that $\sqrt[4]{n}(1-\delta^4)^n$ decays slower then $\sqrt{n}(1-\delta^2)^n$ because the dominant term $1-\delta^4>1-\delta^2$.

See also the two figures below for a comparison of $Q_n$ with $1-x^2$ and $1-x^4$ respectively at $n=5,50,100,200$.

enter image description here


Notes:

1. The notion of approximations to the identity is of fundamental importance in analysis. Equations (47)--(50) show that for every $\varepsilon > 0$ and $0 < \delta < 1$ there exists an $(\varepsilon, \delta)$-approximation to the identity which is a polynomial $Q_n$ on $[−1, 1]$.

One version of the approximations to the identity can be phrased as follows. Let $ε > 0$ and $0 < δ < 1$. A function $f : R → R$ is said to be an $(ε,δ)$-approximation to the identity if it obeys the following three properties:

  • (a) $f$ is supported on $[−1,1]$, and $f(x)≥0$ for all $−1≤x≤1$.

  • (b) $f$ is continuous,and $\int_{-\infty}^\infty f = 1$.

  • (c) $|f(x)|≤ε$ for all $δ≤|x|≤1$.

2. Convolution is another fundamental concept in analysis that Rudin does not mention explicitly in the proof. The step of defining $P_n$ in (51) is a manifestation of this notion.