This is my current understanding of convolution after having read through this blog post
The convolution operator can be thought of as an operation of linear superposition. If we have the response of a linear system to a unit impulse, the overall response to an arbitrary input signal may be constructed by taking a linear superposition of the unit impulse responses accordingly translated and scaled. This can be done through the convolution integral.
On the other hand, the convolution theorem allows us to perform the equivalent convolution operation by first taking the pointwise product of their Fourier transforms, then taking the inverse transform. In other words, the convolution operation is diagonalized in Fourier space, and acts on each Fourier component of the input signal by multiplying it by its eigenvalue, the corresponding Fourier coefficient from the unit impulse response.
While I follow the logic leading up to either approach, my difficulty is in finding the connection between the two - how does it intuitivly make sense that superimposing unit impuse functions accordingly to the input signal has the equivalent effect to multiplying the Fourier spectra of the two functions?