So I've been reading Grafakos's book on Fourier analysis and the chapter dedicated to distributions. And although it provides a lot of details, I came out of it without understanding the essence of the topic. I wanted to understand how the notion of a distribution generalizes/extends the idea of functions. I know that some functions define distributions but that only gives me a correspondence without necessarily knowing how the distribution behaves similar to the function. For instance, the Dirac delta (distribution) can be imagined as a pulse at the origin, or as the limit of the functions $\frac{n}{2}\chi_{[-\frac{1}{n},\frac{1}{n}]}$ (explained below). The action of the delta on $f$ should produce $f(0)$, so we think of a possible "function" $\phi$, integrating $f$ against which would give $f(0)$. Looking at the Riemann sums for the integral, we can ask that $h\phi(0)f(0)=f(0)$ for every small $h$. This suggests that the delta is an infinitely large pulse at the origin. Again, delta is the derivative of the Heaviside function. Now the Heaviside function can be approximated by the sequence $$f_n(x)=\begin{cases} 0, \:\:x<-\frac{1}{n}\\ \frac{n}{2}x+\frac{1}{2}\: \:\:|x|\leq\frac{1}{n}\\ 1,\:\: x>\frac{1}{n} \end{cases}$$ Whose derivatives $$f_n'(x)=\begin{cases} 0, \:\:x<-\frac{1}{n}\\ \frac{n}{2}\: \:\:|x|<\frac{1}{n}\\ 0,\:\: x>\frac{1}{n} \end{cases}$$ are (almost everywhere) equal to $\frac{n}{2}\chi_{[-\frac{1}{n},\frac{1}{n}]}$.
Both the above arguments (however informal they may be) convince me that the Dirac delta distribution has some likeness to a traditional function.
Is there a book/lecture that would allow me to view all distributions as generalizations of functions? Thank you all in advance.
Every locally integrable fucntion on $\Bbb{R}^n$ induces a distribution on $\Bbb{R}^n$ via integration: i.e we have an injective linear mapping $\iota:L^1_{\text{loc}}(\Bbb{R}^n)\to\mathcal{D}'(\Bbb{R}^n)$, given by taking $[\iota(f)](\phi):=\int_{\Bbb{R}^n}f(x)\phi(x)\,dx$. Note that this mapping $\iota$ is merely an injection, not a surjection (case in point the dirac delta $\delta$ is not in the image of $\iota$). What this says is that every locally integrable function gives rise to a unique distribution, but not every distribution arises form a locally integrable function. So, the fact that we only have an injective, but not surjective mapping is why the concept of a distribution is a strict generalization of the concept of a locally integrable function.
I'm not sure what you mean by $\delta$ has some likeness to a traditional function. It can be expressed as a limit (with respect to the weak* topology on $\mathcal{D}'(\Bbb{R}^n)$, as opposed to pointwise convergence) of locally integrable functions, but why should this mean that $\delta$ ought to be/behave like a usual function?
Just because a bunch of quantities behave a certain way, why should we expect that a "limiting object" ought to behave in the same way? For example, not every limit of rational numbers is again a rational number.
For motivating the definition of a distribution as a continuous linear functional on the space $\mathcal{D}(\Bbb{R}^n)$ of test functions, I found Amann and Escher's Analysis Volume III very helpful. Take a look at pages 177-179, and just read the text, and skip all the theorems if you wish on a first reading. The motivation for distributions as "measurement devices" was a very helpful analogy for me.