I started three weeks ago to study wavelets with the intention of later applying them to some deep learning architectures. As a Ph.D. student in mathematics my interest lies mainly in the theoretical construction of the objects in question, I love to understand the deep reason behind things and not just understand the basics useful for application, this is where my problem arises: of all the papers I have read no one manages to formally and in detail explain the transition from continuous to discrete signals, they all skirt around it, this creates in me a huge frustration, that is why I turned to you. Before I get into the heart of the question, I would like to give a brief introduction of the work I have done so far so that those who will help me do not waste time on established things.
For the generic study of wavelets and multiresolution analysis I have referred to the following book by Ingrid Daubechies: https://epubs.siam.org/doi/10.1137/1.9781611970104
I have thoroughly read all chapters 1 to 6 inclusive, during that reading I have also rewritten (passage by passage) the most important demonstrations of the various chapters and all the demonstrations of chapter 5, so as to understand the subject better. The topics about which I am sure I have no doubt and which are extremely clear to me are as follows:
- Definition of a wavelet function in $L^2(\mathbb{R})$.
- Definition of Continuous Wavelet Transform of a function $f\in L^2(\mathbb{R})$
- Definition of Discrete Wavelet Transform of a Function $f\in L^2(\mathbb{R})$
- Definition and properties of MultiResolutional Analysis for $L^2(\mathbb{R})$
- Construction of a MultiResolution Analysis from a scaling function by the process of orthonormalisation.
- Construction of a 2D MutliResolution Analysis from a scaling function.
Having said this, my doubt arises from the fact that in practice I will be dealing with discrete 1D and 2D signals, so we are a long way from being able to apply the concepts learned as they are. It all stems from a proposition in Chapter 5 which introduces particular coefficients $h_n$ given an MRA with scaling function $\phi$:
Proposition Since $\phi \in V_0 \subset V_{-1}$, and the $\phi_{-1,n}$ are an orthonormal basis in $V_{-1}$ we have \begin{equation*} \phi = \sum_{n\in\mathbb{Z}}{h_n\phi_{-1,n}} = \frac{1}{\sqrt{2}}\sum_{n\in\mathbb{Z}}{h_n\phi(2x-n)}, \end{equation*} where convergence sum holds in $L^2$-sense, with \begin{align*} & h_n = \langle \phi, \phi_{-1,n} \rangle \quad & \sum_{n\in\mathbb{Z}}{|h_n|^2} = 1 . \end{align*}
My doubt arises from the fact that I understand from the online literature that, under the assumption of finiteness of the number of such coefficients, they are used directly as a filter to calculate the MRA (or DWT) at the discrete. This concept is introduced in chapter 5 section 6 of the cited book (Connection with subband filtering schemes). My problem lies in understanding what is the theoretical motivation behind the choice of the $h_n$ coefficients as filter in the discrete case, in particular I have the following doubts:
- How is the MRA defined in the discrete case? (Is it built on $l^2(\mathbb{R})$?)
- How do we logically and formally go from the continuous case to the discrete case?
- Why do the coefficients $h_n$ form the very filter needed for the discrete case?
The biggest doubt remains the third question, at present it remains for me a magic how one passes to the discrete case, I lack the deep intuition that I have in the continuous case, it all seems to be completely causal (which it obviously is not), I have not understood how one passes from scaling functions to filters. I believe that understanding in detail how MRA is defined at the discrete, what was the intuition behind the use of coefficients as a filter, and what the mathematical process was that led to the current MRA (or DWT) at the discrete would help me remove this enormous doubt.
Thank you to anyone who would like to answer.