I need to implement the LOWESS-algorithm in a piece of software I am working on.
The LOWESS-algorithm is a type of filter, which applies a locally weighted regression on each data point. In this case, the input data are equispaced and a first degree polynomial (line) is to be fitted for each point. More general information can be found at Wikipedia (Local Regression).
I was unable to find an implementation on the web, so based on the sparse information I could find about the algorithm, I fired up MATLAB and made my own implementation. I have compared my version with the built-in MATLAB version (part of the Curve Fitting Toolbox-function "smooth") and it is clear, that my version is not doing exactly what it should.
Plot of the MATLAB-version of LOWESS (blue) and my implementation (red) for various bandwidths.
My implementation is in the file "lowess4.m" (the number 4 because it is my fourth try of implementing the algorithm). I have made a small test script that compares my version with the built-in version of the algorithm for four different bandwidths. As one can see, the first plot is an overfit -- and for some reason, only the built-in result is plotted (my version returns NaN-values in this case). In the three remaining plots, one can see that my version matches the built-in except for the outer values.
I think that further weighting is applied in the MATLAB-version, so that the outer values does not have that much influence on the result. I am not sure on that.
I would like some help to fix my implementation. Thanks in advance :-)
My test script "lowess4_test.m" is listed below:
% Make some test data (18 equispaced y-values)
x1 = 1:18;
y1 = [ 1 3 5 3 -3 9 -4 -2 -1 7 9 8 5 3 -2 0 3 5 ];
% Four test plots, comparing my algorithm (red)
% with the MATLAB built-in version (blue)
figure()
for i=1:4
bandwidth = [1 4 8 12];
sy1 = lowess4(y1, bandwidth(i));
ty1 = smooth(y1,bandwidth(i),'lowess');
subplot(2,2,i)
plot(x1,y1,'b.',x1,ty1,'b-',x1,sy1,'r-')
str = strcat('Bandwidth =', {' '}, num2str(bandwidth(i)));
title(str);
end
My implementation of the LOWESS-algorithim "lowess4.m" is listed below:
% LOWESS-smoothing of equispaced data
% Input: List of y-values to be smoothed ("noisy") and bandwidth
% Output: List of smoothed y-values ("smoothy)
function [smoothy] = lowess4(noisy, bandwidth)
n = length(noisy); % Number of y-values
smoothy = zeros([1 n]); % Empty array of length n to store result
% For all points in "noisy", do a local linear weighted regression
for i=1:n
% Initialize temporary variables
A = 0;
B = 0;
C = 0;
D = 0;
E = 0;
% Calculate span of values to be included in the regression
jmin = floor(i-bandwidth/2);
jmax = ceil(i+bandwidth/2);
if jmin < 1
jmin = 1;
end
if jmax > n
jmax = n;
end
% For all the values in the span, compute the weight and then
% the linear fit
for j=jmin:jmax
w = weight(i,j, bandwidth/2);
x = j;
y = noisy(j);
A = A + w*x;
B = B + w*y;
C = C + w*x^2;
D = D + w*x*y;
E = E + w;
end
% Calculate a (slope) and b (offset) for the linear fit
a = (A*B-D*E)/(A^2 - C*E);
b = (A*D-B*C)/(A^2 - C*E);
% Calculate the smoothed value by the formula y=a*x+b (x <- i)
smoothy(i) = a*i+b;
end
return
end
% Define the tricube weight function
function w = weight(i,j,d)
w = ( 1-abs((j-i)/d)^3)^3;
if w < 0
w = 0;
end
return
end