Wikipedia has different pages on distributions and on generalized functions, however until recently I believed they were the same thing.
From the wiki page on distributions - "Standard functions act by integration against a test function, but many other linear functionals do not arise in this way, and these are the "generalized functions"."
Am I able to swap the word generalized functions for distributions and have the above quotes meaning not change in anyway?
Some authors [1] reserve the term distribution specifically for Schwartz distributions and call everything else a generalized function.
[1] - see 'a concise introduction to colombeau generalized functions and their applications in classical electrodynamics' Andre Gsponer, 2008.
I think many writers (including myself) use "generalized function" and "distribution" interchangeably, at least in rough terms, if only to emphasize the point that we don't so much care about the dual space to test functions because it's a dual space, but because the original space imbeds in it, is dense, and so the distributions are some sort of extension/generalization of "function".
Yes, there are hyperfunctions and other sorts of "generalized functions", although by an abuse of language we might call these "distributions".
A simple example is that Fourier transform cannot possibly map all (ordinary, but/and non-tempered) distributions to (ordinary) distributions, but we do know that Fourier transform maps test functions to the Paley-Wiener space, so Fourier transforms of (not-necessarily tempered) distributions can be defined, but only as in the dual of the Paley-Wiener space. For example, functionals like "evaluate at point $z_o$, which is off the real line" have no meaning for test functions nor Schwartz functions, but certainly do make sense on the Paley-Wiener space.