This is equation 6.23 in Elements of Statistical Learning, page 209.
It concerns KDE (kernel density estimation). They let $\phi_h$ be the kernel function, the Gaussian distribution pdf with mean $0$ and standard deviation $h$.
Their statement is as such: $\hat{f}_X(x) = \frac{1}{N} \sum_{i = 1}^N \phi_h (x - x_i) = (\hat{F} * \phi_h)(x)$, where $\hat{F}$ is the empirical distribution function, $*$ is the convolution operator.
They say this without proof, and I was trying to recreate the results for a paper I'm writing. However, I get something different. The $1_{[\varphi(x)]}$ here are indicator functions. $\Phi_h$ is the normal cdf with mean $0$ and standard deviation $h$.
$$[\hat{F}*\phi_h](x) = \int \hat{F}(t) \phi_h(x - t) \, dt = \int \frac{\# \{ x_i \leq t \}}{N} \phi_h(x - t) \, dt = \frac{1}{N} \int \# \{x_i \leq t \} \phi_h(x - t) \, dt $$ $$= \frac{1}{N} \int \sum_{i = 1}^N {1}_{[x_i \leq t]} \phi_h(x - t) dt =\frac{1}{N} \sum_{i = 1}^N \int {1}_{[x_i \leq t]} \phi_h(x - t) dt = \frac{1}{N} \sum_{i = 1}^N \int_{x_i}^{\infty} \phi_h(x - t) $$ $$ = \frac{1}{N} \sum_{i = 1}^N \left[ -\Phi(x - t) \right]_{t = x_i}^{\infty} = \frac{1}{N} \sum_{i = 1}^N \Phi_h(x - x_i)$$
So essentially I get that $\hat{f}_X(x) = \frac{d}{dx} \left[(\hat{F} * \phi_h)(x) \right]$, which is still a pleasant result. Is my derivation correct? Is this an error in the book? It has not yet been listed on the list of errors.
I think the authors glossed over some technical details in order to keep the book flowing. To fully justify this on a technical level I would make use of Lebesgue integration theory, but combining convolution with the Dirac delta also shows what is going on. If you are comfortable extending the convolution $$ (f*g)(x) := \int_{\mathbb{R}} f(x-y)g(y)dy $$ from $L^1$ functions to also work with the Dirac delta $\delta_a$ (see: https://en.wikipedia.org/wiki/Dirac_delta_function), then you can directly write \begin{align} \widehat{f}_n(x) &= \frac{1}{n} \sum_{i=1}^n k_h(x-x_i) = \frac{1}{n} \sum_{i=1}^n (k_h*\delta_{x_i})(x) = \frac{1}{n} \sum_{i=1}^n \int_{\mathbb{R}} k_h(x-y)\delta_{x_i}(y) dy\\ &= \int_{\mathbb{R}} k_h(x-y) \frac{1}{n} \sum_{i=1}^n \delta_{x_i}(y) dy\\ &= \left( k_h * \frac{1}{n} \sum_{i=1}^n \delta_{x_i} \right)(x) \end{align} where I used $(f*\delta_a)(x) = \int_{\mathbb{R}} f(x-y)\delta_{a}(y) dy = f(x-a)$ and the linearity of the integral. What this shows is that the kernel density estimator is not the convolution with the empirical distribution function, but rather with something resembling an empirical density.
In fact, you can in pretty much the same way write $\widehat{f}_n$ as a Lebesgue integral of $k_h$ with respect to the empirical distribution function (more correctly with the measure associated to the empirical distribution function) which incidentally places the mass $1/n$ at each $x_i$, i.e., write \begin{align} \widehat{f}_n(x) = \int_{\mathbb{R}} k_h(x-y) d\widehat{F}_n(y), \end{align} where $\widehat{F}_n(x) = \frac{1}{n} \sum_{i=1}^n \mathbf{1}_{(-\infty,x]}(x_i)$ is the empirical distribution function.