Implementation of the LOWESS-algorithm (local regression data smoothing)

1.5k Views Asked by At

I need to implement the LOWESS-algorithm in a piece of software I am working on.

The LOWESS-algorithm is a type of filter, which applies a locally weighted regression on each data point. In this case, the input data are equispaced and a first degree polynomial (line) is to be fitted for each point. More general information can be found at Wikipedia (Local Regression).

I was unable to find an implementation on the web, so based on the sparse information I could find about the algorithm, I fired up MATLAB and made my own implementation. I have compared my version with the built-in MATLAB version (part of the Curve Fitting Toolbox-function "smooth") and it is clear, that my version is not doing exactly what it should.

Plot of the MATLAB-version of LOWESS (blue) and my implementation (red) for various bandwidths.

My implementation is in the file "lowess4.m" (the number 4 because it is my fourth try of implementing the algorithm). I have made a small test script that compares my version with the built-in version of the algorithm for four different bandwidths. As one can see, the first plot is an overfit -- and for some reason, only the built-in result is plotted (my version returns NaN-values in this case). In the three remaining plots, one can see that my version matches the built-in except for the outer values.

I think that further weighting is applied in the MATLAB-version, so that the outer values does not have that much influence on the result. I am not sure on that.

I would like some help to fix my implementation. Thanks in advance :-)

My test script "lowess4_test.m" is listed below:

% Make some test data (18 equispaced y-values)
x1 = 1:18;
y1 = [ 1 3 5 3 -3 9 -4 -2 -1 7 9 8 5 3 -2 0 3 5 ];

% Four test plots, comparing my algorithm (red)
% with the MATLAB built-in version (blue)
figure()
for i=1:4
    bandwidth = [1 4 8 12];
    sy1 = lowess4(y1, bandwidth(i));
    ty1 = smooth(y1,bandwidth(i),'lowess');
    subplot(2,2,i)
    plot(x1,y1,'b.',x1,ty1,'b-',x1,sy1,'r-')
    str = strcat('Bandwidth =', {' '}, num2str(bandwidth(i)));
    title(str);
end

My implementation of the LOWESS-algorithim "lowess4.m" is listed below:

% LOWESS-smoothing of equispaced data
% Input: List of y-values to be smoothed ("noisy") and bandwidth
% Output: List of smoothed y-values ("smoothy)
function [smoothy] = lowess4(noisy, bandwidth)
    n = length(noisy);          % Number of y-values
    smoothy = zeros([1 n]);     % Empty array of length n to store result

    % For all points in "noisy", do a local linear weighted regression
    for i=1:n
        % Initialize temporary variables
        A = 0;
        B = 0;
        C = 0;
        D = 0;
        E = 0;

        % Calculate span of values to be included in the regression
        jmin = floor(i-bandwidth/2);
        jmax = ceil(i+bandwidth/2);
        if jmin < 1
            jmin = 1;
        end
        if jmax > n
           jmax = n; 
        end

        % For all the values in the span, compute the weight and then
        % the linear fit
        for j=jmin:jmax
            w = weight(i,j, bandwidth/2);
            x = j;
            y = noisy(j);

            A = A + w*x;
            B = B + w*y;
            C = C + w*x^2;
            D = D + w*x*y;
            E = E + w;
        end

        % Calculate a (slope) and b (offset) for the linear fit
        a = (A*B-D*E)/(A^2 - C*E);
        b = (A*D-B*C)/(A^2 - C*E);

        % Calculate the smoothed value by the formula y=a*x+b (x <- i)
        smoothy(i) = a*i+b;
    end

    return
end

% Define the tricube weight function
function w = weight(i,j,d)
    w = ( 1-abs((j-i)/d)^3)^3;
    if w < 0
        w = 0;
    end
    return
end