1) If we define a rapidly decaying function in the usual way, it says nothing about derivatives; rather, just that its decay beats any polynomial growth.
2) A Schwartz class function is then simply a smooth function that is also rapidly decaying.
2) A Schwartz class function is not only smooth, but it and all of its derivatives are also rapidly decaying.
3) The Fourier transform is an isomorphism from the Schwartz class functions to themselves. This all feels very beautiful and deep, like a truth underlying something much bigger.
At first glance it seems like rapid decay is such a powerful condition, and that smoothness is something quite common and boring. Because of these assumptions, I wouldn't have been surprised if the rapid decay condition for Schwartz class functions was the "more important" of the two conditions (whatever that means... perhaps that there is a loosening of the condition that could still lead to some interesting analysis?)
But as soon as you take away the smoothness condition, things start to unravel:
Obviously $e^{-|x|}$ is rapidly decreasing, but it of course fails to be smooth at $x=0$. And its Fourier transform ends up being $\frac{2}{1+\omega}$, which is certainly not rapidly decreasing any longer. Strange...
But for all $\epsilon > 0$, the family of functions $f_{\epsilon}(x) = e^{-\sqrt{\epsilon + x^2}}$ is smooth and rapidly decreasing (hence Schwartz class). Therefore $\widehat{f_\epsilon}(\omega)$ is also Schwartz class, even though $f_\epsilon \to f$ uniformly as $\epsilon \to 0$. Again, quite strange...
We could have also created a compactly supported bump function, $\beta(x)$, that is identically $1$ on some $\epsilon$-neighborhood of $x=0$. Then you can use $1-\beta(x/\epsilon)$ as a family of smooth cutoffs to eliminate the point of non-differentiability: $g_{\epsilon}(x) = (1-\beta(x/\epsilon))e^{-|x|}$. This does the same thing as before, with $\widehat{g_{\epsilon}}(\omega)$ being Schwartz class for all $\epsilon > 0$, with $g_{\epsilon} \to g$ as $\epsilon \to 0$.
And even if we go one level deeper and consider a function which is $C^1$, just not smooth, things aren't any better. Consider the function $h(x) = x|x|e^{-x^2}$. The exponential term, $e^{-x^2}$, is a Gaussian which is pretty much as nice as it gets when it comes to Fourier transforms; and the other term, $x|x|$, has a derivative equal to $2|x|$ and hence it is $C^1$. But sure enough, $\widehat{h}(\omega)$ involves some polynomial terms and the Dawson function, and ends up being $O\left( \omega^{-3} \right)$. Similar computations can be done for any function of the form $h_k(x) = x^k |x| e^{-x^2}$, with $k\in \mathbb{N}$, where each $h_k \in C^{k}(\mathbb{R})$, and yet none of these have a Fourier transform that has rapidly decay. So clearly being $C^k$ and rapidly decreasing is still not very much better than simply being $C^0$ and rapidly decreasing; and certainly nowhere close to being as good as being smooth and rapidly decreasing.
Again, I'm not disputing any of these facts, and these kinds of phenomena where sequences of "nice" functions converge to "not nice" functions are abundant in analysis. I'm just looking for some deeper understanding or insight (dare I say, intuition) as to the role smoothness plays when it comes to Fourier transforms. This then would also beg the question of what role rapid decay has as well?
How do these two unrelated ideas come together so perfectly for the Fourier transform? And is there analogous concepts when it comes to the more general Fourier transform on locally compact abelian groups?
I will mention a few elementary facts, there are a lot of deeper theorems on this subject.
If $f$ is $L^2(\Bbb{R})$ (so that $\hat{f}$ is $L^2$), then
In particular the space $\{ f\in L^2(\Bbb{R}),x^kf\in L^2,f^{(k)}\in L^2\}$ is stable under the Fourier transform. Same for $\{ f\in L^2, (1+|x|)^k f^{(k)}\in L^2\}$. The Schwartz space is the intersection of them all, it is stable too.
The main difficulty is that there are many other indicators of smoothness and decay, the Fourier transform still swap them, but in a more complicated way. For example if $f$ is compactly supported and Hölder $\alpha$-continuous then $2\hat{f}(\xi)=\int_a^b (f(x)-f(x+1/(2\xi)))e^{-2i\pi \xi x}dx\le \int_a^b C \xi^{-\alpha}dx=O(\xi^{-\alpha})$, but $\hat{f}=O(\xi^{-\alpha})$ doesn't imply that $f$ is Hölder $\alpha$-continuous.
If $f$ is $L^2$ (more generally a tempered distribution) then $f_n=e^{-\pi x^2/n^2} (f\ast n e^{-\pi n^2 x^2})$ is Schwartz and it approximates $f$ in quite every semi-norm/topology you can think about. So does $\hat{f_n} = ( e^{-\pi \xi^2/n^2} \hat{f})\ast n e^{-\pi n^2 \xi^2}$. It is quite rare that we need other kind of approximation, such as $ e^{-\sqrt{\epsilon + x^2}}\to e^{-|x|}$.