I'm trying to solve an exercise in which I need to calculate the local constant kernel estimator and provide the bandwidth using leave-one-out cross validation. The idea is that I need to sort of implement this in Matlab and not use some built in function (that I haven't found anyway). The steps are something like this:
- First I need to calculate the leave-one-out estimator: $$\hat{g_{-i}}(x_i) = \frac{\sum_{j \neq i}y_{j}k(\frac{x_i - x_j}{h})}{\sum_{j \neq i}k(\frac{x_i - x_j}{h})}$$
- Then the second step is to calculate the residuals: $$e_{i,-i} = y_i - \hat{g_{-i}}(x_i)$$
- And finally choose h that minimizes the sum of squared residuals: $$h_0 = \underset{h}{\arg\min}\sum_{i=1}^{n}e_{i,-i}^{2}$$
The way I though of implementing this in Matlab was first to write a function that calculates step 2 for a single h and then using fminsearch function that will find the minimum.
My problem is that I am stuck in writing this cross validation function. In particular - looping and summing over all values except i gives me trouble.
This is my very incomplete code for the function:
function SSresiduals = CV(h, N, data, grid, y)
%sum(ghat_{i,-i} - y_i)^2
h_mone = zeros(N,1);
for i = 1:N
for j = 1:N
if j~=i
u=(grid(:,i)-data(:,j))/h;
k= ((1/sqrt(2*pi))).*exp(-u.^2/2);
h_mone = h_mone + k;
end;
end;
end;
Where N is a scalar equal to 205. data and grid are matrices of 801x205 and y is a vector of 205x1. This is incomplete because I was stuck in the part where I need to loop over the values and somehow summing them up, but the dimensions of the result just didn't seem right to me. I would really appreciate any insights regarding this. Thanks!