problem implementing leave-one-out cross validation for optimal bandwidth in Matlab

227 Views Asked by At

I'm trying to solve an exercise in which I need to calculate the local constant kernel estimator and provide the bandwidth using leave-one-out cross validation. The idea is that I need to sort of implement this in Matlab and not use some built in function (that I haven't found anyway). The steps are something like this:

  1. First I need to calculate the leave-one-out estimator: $$\hat{g_{-i}}(x_i) = \frac{\sum_{j \neq i}y_{j}k(\frac{x_i - x_j}{h})}{\sum_{j \neq i}k(\frac{x_i - x_j}{h})}$$
  2. Then the second step is to calculate the residuals: $$e_{i,-i} = y_i - \hat{g_{-i}}(x_i)$$
  3. And finally choose h that minimizes the sum of squared residuals: $$h_0 = \underset{h}{\arg\min}\sum_{i=1}^{n}e_{i,-i}^{2}$$

The way I though of implementing this in Matlab was first to write a function that calculates step 2 for a single h and then using fminsearch function that will find the minimum.

My problem is that I am stuck in writing this cross validation function. In particular - looping and summing over all values except i gives me trouble.

This is my very incomplete code for the function:

    function SSresiduals = CV(h, N, data, grid, y)
%sum(ghat_{i,-i} - y_i)^2
h_mone = zeros(N,1);
for i = 1:N
    for j = 1:N
        if j~=i
            u=(grid(:,i)-data(:,j))/h;
            k= ((1/sqrt(2*pi))).*exp(-u.^2/2);
            h_mone = h_mone + k;
        end;

    end;


end;

Where N is a scalar equal to 205. data and grid are matrices of 801x205 and y is a vector of 205x1. This is incomplete because I was stuck in the part where I need to loop over the values and somehow summing them up, but the dimensions of the result just didn't seem right to me. I would really appreciate any insights regarding this. Thanks!