In Haberman's 'Applied Partial Differential Equations with Fourier Series and Boundary Value Problems', he says the following,
Dirac delta function. Our source $f(x)$ represents a forcing of our system at all points. ... In order to isolate the effect of each individual point, we decompose $f(x)$ into a linear combination of unit pulses of duration $\Delta x$ $$ f(x) \approx \sum_i f(x_i) (\textrm{unit pulse starting at} x = x_i). $$ This is somewhat reminiscent of the definition of an integral. Only $\Delta x$ is missing, which we introduce by multiplying and dividing by $\Delta x$: $$ f(x) = \lim_{\Delta x \to 0} \sum_i f(x_i) \frac{ \textrm{unit pulse}}{\Delta x}\Delta x. $$
This just seems like complete nonsense. In general a function cannot be approximated by a summation over neighboring values. Am I missing something?
More generally, how should I think of the Dirac Delta?
While the technical details of how the Dirac $\delta$ is usually defined are much different, I don't see any issue with how this is presented, at least heuristically/computationally. Let's say $K_{\Delta x}(x) = 1_{[0,\Delta x]}(x)$ as notation for the "unit pulse." We could also take a smooth version of the unit pulse instead of the sharp cutoff, but for the sake of this intuitive presentation that won't matter.
The first approximation $f(x)\approx\sum_i f(x_i)K_{\Delta x}(x-x_i)$ is valid at least for functions $f(x)$ which are let's say uniformly continuous, because then as long as $\Delta x$ is sufficiently small, $|f(x)-f(x_i)|<\epsilon$ if $|x-x_i|<\Delta x$. So then $|f(x) - \sum_i f(x_i)K_{\Delta x}(x-x_i)| < \epsilon$ as long as $\Delta x$ is sufficiently small.
The part where we say that $\delta(x)\approx \frac{K_{\Delta x}(x)}{\Delta x}$ should make sense if you think of the graph of the function $\frac{K_{\Delta x}(x)}{\Delta x}$. It is sharply peaked and positive for $|x|<\Delta x$, and zero or very small if $|x|>\Delta x$. The two approximations combine together to say $f(x)$ is close to an integral: $$ f(x) \approx \sum_{i}f(x_i)\frac{K_{\Delta x}(x-x_i)}{\Delta x}\Delta x\approx \int f(y)\delta(x-y)\,dy. $$ This isn't the usual way that the $\delta$ function is defined rigorously, but it is a somewhat flexible point of view, and it can be made into a rigorous definition of the Dirac $\delta$ as a "distribution." The family of functions $\{K_{\Delta x}(x)/\Delta x : \Delta x > 0\}$ is usually called a "nascent" Dirac $\delta$, or it's also called an approximate identity.