I am trying to learn Kernel density estimation, I need help to understand how the bandwidth $h$ affects the Kernel density estimator. Consider a Gaussian Kernel $k(x)~=~\frac{1}{\sqrt{2 \pi}} e^{-x^2}$. The Kernel density estimator is given by ${\hat{f}}_h (x) ~=~ \frac{1}{h} \sum_{i=1}^{n} K_h(x-X_i)$.
Clearly, $k(x)$ is independent of $h$, where does $h$ come in? What would be ${\hat{f}}_h (x)$? How does $h$ affect the Kernel?
Thank you!


Density estimators are of the form:
$$ \hat{f}(x_0) = \frac{1}{nh} \sum_{i=1}^n K \left ( \frac{x_i - x_0}{h} \right ) $$
For any choice of kernel, the bandwidth $h$ is a smoothing parameter, and controls how smooth the fit is by controlling the size of the neighbourhood around the reference, $x_0$.
If $h$ is large, we consider a large neighbourhood, and vice versa.
In the Gaussian kernel case, varying $h$ has the same effect as varying the variance of a Gaussian. Small $h$ leads to a thinner, more peaked Gaussian, whereas larger $h$ leads to a fatter Gaussian, in the extreme case, closer and closer to a flat line.