We can generalize arbitrary kerneling methods and get at a suitable form that encompesses the common transforms:
$$\vec{F}(\vec{p}) = \vec{\int} \vec{f}(\vec{t})\vec{\phi}(\vec{t}; \vec{p})d\vec{t}$$
Looking at a single dimension results in $$F(p) = \int f(t)\phi(t; p)dt$$
Now, for the transform to be "invertible" we must have a similar case for $F(p)$:
$$f(t) = \int F(p)\psi(p; t)dp$$
and combing the two, we get
$$f(t) = \int\int f(\tau)\psi(p;t)\phi(\tau;p)dpd\tau$$
which, the inner integral is
$$\int \psi(p;t)\phi(\tau;p)dp$$
and for the former equation to be tautologically true, it must be equal to $\delta(\tau - t)$.
We can see that this clearly holds for the Fourier kernels.
For the general vector case we arrive at a similar result:
$$\vec{\delta}(\vec{\tau} - \vec{t}) = \vec{\int} \vec{\psi}(\vec{p};\vec{t})\vec{\phi}(\vec{\tau}; \vec{p})d\vec{p}$$
What is nice about these is that they encompass the well known transforms. The wavelet transform is a 1x2 transform where we have 2 degree's of freedom(scale and translation) that take a single dimension of "time" and decompose it in to a space that depends on the specific kernels how the parameters $\vec{p}$ modify those kernels.
What led me to this concept was to generalize to higher dimensions, since, in higher dimensions, we can more easily decompose data(e.g., neural nets).
The properties of the kernels impose themselves on to the transformed space. Scaling properties represent scaling factors of f(t), frequency properties represent frequency properties, etc.
Knowing about the two kernels seems to be quite useful as they shape the interpretation of the transformed space.
The Fourier transform is the most basic transform and has infinite support yet still represents a $\delta$ function because of the way the kernel behaves. Similarly, wavelet kernels have finite support yet also transform in to $\delta$'s.
We see that not all kernels are admissible. I am curious as to the general properties of them though. Obviously they must transform in such a way as to "represent" the $\delta$ function but the above derivations only produce a very general result, too general to allow for "generating" kernels.
I imagine with a more involved theory that one could generate more exacting transforms than what Fourier and Wavelets have to offer. The vector nature of the first transform may allow for more complex kernels,e.g., Fourier and wavelets, to be combined in a way that optimizes the "spectrum" for both. E.g., in the least squares sense, Fourier components would dominant when the signal contains large frequency components and transient wavelet components would dominant when they don't. This would produce a "spectrum" where the frequency components do not suffer degradation like they do with the normal Fourier transform(which has to account for the transients, even though it is infinitely ill-equipped to do so). Hence the wavelets take over in those parts of the signal that are transient like. This new "spectrum", while more complex to interpret, decomposes the original function in to more meaningful parts. We can extend this to a collection of kernels, each dealing with different properties that our signals can have and, being a larger dimensional space, is able to decompose better(in to more separable parts). This, of course, while more computationally expensive, would provide more information for analysis.
Of course, we can "window" the transforms for adding timing correlations, although I do not know if that is the most general or just a "hack". E.g., we could transform not in to $p$ space but in to $t$ and $p$ space but in the most general way rather than using a "window" to accomplish.
It is obvious from the above that the kernels must have the sifting property but are there any other general properties that we can deduce? Is anyone working towards a more complete theory along these lines?