Prove that KDE is convolution of EDF and Kernel

Question

Prove that KDE is convolution of EDF and Kernel

373 Views Asked by Bumbble Comm At 13 Apr 2026 - 12:01

This is equation 6.23 in Elements of Statistical Learning, page 209.

It concerns KDE (kernel density estimation). They let $\phi_h$ be the kernel function, the Gaussian distribution pdf with mean $0$ and standard deviation $h$.

Their statement is as such: $\hat{f}_X(x) = \frac{1}{N} \sum_{i = 1}^N \phi_h (x - x_i) = (\hat{F} * \phi_h)(x)$, where $\hat{F}$ is the empirical distribution function, $*$ is the convolution operator.

They say this without proof, and I was trying to recreate the results for a paper I'm writing. However, I get something different. The $1_{[\varphi(x)]}$ here are indicator functions. $\Phi_h$ is the normal cdf with mean $0$ and standard deviation $h$.

$$[\hat{F}*\phi_h](x) = \int \hat{F}(t) \phi_h(x - t) \, dt = \int \frac{\# \{ x_i \leq t \}}{N} \phi_h(x - t) \, dt = \frac{1}{N} \int \# \{x_i \leq t \} \phi_h(x - t) \, dt $$ $$= \frac{1}{N} \int \sum_{i = 1}^N {1}_{[x_i \leq t]} \phi_h(x - t) dt =\frac{1}{N} \sum_{i = 1}^N \int {1}_{[x_i \leq t]} \phi_h(x - t) dt = \frac{1}{N} \sum_{i = 1}^N \int_{x_i}^{\infty} \phi_h(x - t) $$ $$ = \frac{1}{N} \sum_{i = 1}^N \left[ -\Phi(x - t) \right]_{t = x_i}^{\infty} = \frac{1}{N} \sum_{i = 1}^N \Phi_h(x - x_i)$$

So essentially I get that $\hat{f}_X(x) = \frac{d}{dx} \left[(\hat{F} * \phi_h)(x) \right]$, which is still a pleasant result. Is my derivation correct? Is this an error in the book? It has not yet been listed on the list of errors.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2020-12-15 23:04:37

I think the authors glossed over some technical details in order to keep the book flowing. To fully justify this on a technical level I would make use of Lebesgue integration theory, but combining convolution with the Dirac delta also shows what is going on. If you are comfortable extending the convolution $$ (f*g)(x) := \int_{\mathbb{R}} f(x-y)g(y)dy $$ from $L^1$ functions to also work with the Dirac delta $\delta_a$ (see: https://en.wikipedia.org/wiki/Dirac_delta_function), then you can directly write \begin{align} \widehat{f}_n(x) &= \frac{1}{n} \sum_{i=1}^n k_h(x-x_i) = \frac{1}{n} \sum_{i=1}^n (k_h*\delta_{x_i})(x) = \frac{1}{n} \sum_{i=1}^n \int_{\mathbb{R}} k_h(x-y)\delta_{x_i}(y) dy\\ &= \int_{\mathbb{R}} k_h(x-y) \frac{1}{n} \sum_{i=1}^n \delta_{x_i}(y) dy\\ &= \left( k_h * \frac{1}{n} \sum_{i=1}^n \delta_{x_i} \right)(x) \end{align} where I used $(f*\delta_a)(x) = \int_{\mathbb{R}} f(x-y)\delta_{a}(y) dy = f(x-a)$ and the linearity of the integral. What this shows is that the kernel density estimator is not the convolution with the empirical distribution function, but rather with something resembling an empirical density.

In fact, you can in pretty much the same way write $\widehat{f}_n$ as a Lebesgue integral of $k_h$ with respect to the empirical distribution function (more correctly with the measure associated to the empirical distribution function) which incidentally places the mass $1/n$ at each $x_i$, i.e., write \begin{align} \widehat{f}_n(x) = \int_{\mathbb{R}} k_h(x-y) d\widehat{F}_n(y), \end{align} where $\widehat{F}_n(x) = \frac{1}{n} \sum_{i=1}^n \mathbf{1}_{(-\infty,x]}(x_i)$ is the empirical distribution function.

Prove that KDE is convolution of EDF and Kernel

There are 1 best solutions below

Related Questions in REAL-ANALYSIS

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Trending Questions

Popular # Hahtags

Popular Questions