Kernel density estimation -Effect of bandwidth

Question

Kernel density estimation -Effect of bandwidth

1.3k Views Asked by Bumbble Comm At 28 Mar 2026 - 3:18

I am trying to learn Kernel density estimation, I need help to understand how the bandwidth $h$ affects the Kernel density estimator. Consider a Gaussian Kernel $k(x)~=~\frac{1}{\sqrt{2 \pi}} e^{-x^2}$. The Kernel density estimator is given by ${\hat{f}}_h (x) ~=~ \frac{1}{h} \sum_{i=1}^{n} K_h(x-X_i)$.

Clearly, $k(x)$ is independent of $h$, where does $h$ come in? What would be ${\hat{f}}_h (x)$? How does $h$ affect the Kernel?

Thank you!

Original Q&A

There are 3 best solutions below

**Bumbble Comm** · Answer 1 · 2018-02-16 14:09:03

Density estimators are of the form:

$$ \hat{f}(x_0) = \frac{1}{nh} \sum_{i=1}^n K \left ( \frac{x_i - x_0}{h} \right ) $$

For any choice of kernel, the bandwidth $h$ is a smoothing parameter, and controls how smooth the fit is by controlling the size of the neighbourhood around the reference, $x_0$.

If $h$ is large, we consider a large neighbourhood, and vice versa.

In the Gaussian kernel case, varying $h$ has the same effect as varying the variance of a Gaussian. Small $h$ leads to a thinner, more peaked Gaussian, whereas larger $h$ leads to a fatter Gaussian, in the extreme case, closer and closer to a flat line.

**Bumbble Comm** · Answer 2 · 2018-02-16 18:55:40

Most simply and intuitively: For a given sample size and population structure, a small bandwidth will do less smoothing than a large one.

Each panel in the plot below shows the histogram of a sample of size $n = 100$ from $\mathsf{Norm}(\mu=100,\,\sigma=15).$ Beneath the histogram, tick marks show exact locations of the observations. The density estimators (green curves) use the default Gausssian kernel; from left to right the bandwidths are the default multiplied by .5, 1, and 2, respectively.

A similar figure, but with $n = 500.$

You can read the particulars in the R help screen for 'density' and its references. I especially recommend the book by Bernard Silverman.

R code for the second figure is given below, in case you want it. Each run gives a different sample. With sample sizes as small as 100 and 500, results vary considerably from one run to another. However, the general principle that a larger bandwith gives a smoother density estimator is evident in almost all runs.

par(mfrow=c(1,3))
x = rnorm(500, 100, 15)
hist(x, prob=T, col="skyblue2", main="Small Bandwidth"); rug(x)
  lines(density(x, adj=.5), lwd=2, col="darkgreen")
hist(x, prob=T, col="skyblue2", main="Default Bandwidth"); rug(x)
  lines(density(x), lwd=2, col="darkgreen")
hist(x, prob=T, col="skyblue2", main="Large Bandwidth"); rug(x)
  lines(density(x, adj=2), lwd=2, col="darkgreen")
par(mfrow=c(1,1))

**Bumbble Comm** · Answer 3 · 2019-09-11 17:09:23

the optimal bandwidth should maximize the pseudo-likelihood $\mathcal{L}(h)=\prod_{j=1}^{n}\hat{f_j}(x_j|h)$ where $\hat{f_j}(x_j|h)$ is the leave-one-out density estimate with the $i=j$ term in sum omitted. So you'd solve for $\frac{\partial \mathcal{L}}{\partial h}=0 $ where $\frac{\partial^2 \mathcal{L}}{\partial h^2}<0 $ (n.b. often times more tractable to maximize ln-likelihood)

Kernel density estimation -Effect of bandwidth

There are 3 best solutions below

Related Questions in STATISTICS

Related Questions in PROBABILITY-DISTRIBUTIONS

Trending Questions

Popular # Hahtags

Popular Questions