Bandwidth in Gaussian kernel smoothing

150 Views Asked by At

I am trying to implement a Gaussian kernel smoother. The equation is

$$\begin{split}f(x)&=\sum_{i=1}^N w_i(x)y_i\\ w_i(x)&=\frac{\kappa_h(x-x_i)}{\sum_{i'=1}^N \kappa_h(x-x_{i'})}\end{split}$$

Additionally

$$\begin{split}h&=\left(\frac4{3N}\right)^{1/5}\hat\sigma\\ MAD&=median(|x-median(x)|)\\ \hat\sigma &= 1.4826 MAD\end{split}$$

Lastly the kernel is the Gaussian kernel with bandwidth

$$\begin{split}\kappa(x)&=\frac 1{(2\pi)^{\frac12}}e^{-x^2/2}\\ \kappa_h(x)&=\frac 1h\kappa(\frac xh)\end{split}$$

I produced the following incorrect plot. The reason is due to the incorrect calculation of the bandwidth $h$, which ends up being 368 using the formula involving $MAD$, $\hat \sigma$, and $N$.

enter image description here

However, using a bandwidth of 10, the kernel smoother captures the trend.

enter image description here

What is incorrect about my calculation of the bandwidth $h$, and how to fix it?

R code:

y <- c(((-100:100)^2)+rnorm(201,0,1000), 10000-(1:100)^2+rnorm(100, 0, 1000))
plot(y, pch=18)

#h <- 10 #a working h

#mad <- median(abs(1:301-median(1:301)))
#h <- (4/3*length(y))^.2*1.4826*mad #an incorrect calculation of h using the formula
  
k <- function(x) {
  return(1/h*1/(2*pi)^.5*exp(-(x/h)^2/2))
}

f <- function(x) {
  sum=0
  denom <- sum(k(x-1:301))
  for (i in 1:301) {
    sum=sum+k(x-i)/denom*y[i]
  }
  return(sum)
}

f <- Vectorize(f)

fx <- f(1:301)
lines(fx, col="red", lwd=2, lty=2)