I read that Kalman Filters can be used for continuous / online linear regression and at the end of the regression its results and ordinary linear regression (OLS) results would be the same. I tried it on a sample time series data, using the model below for the KF (based on this document),
$$ \left[\begin{array}{r} \alpha_t \\ \beta_t \end{array}\right] = \left[\begin{array}{rr} 0 & 1 \\ 1 & 0 \end{array}\right] \left[\begin{array}{r} \alpha_{t-1} \\ \beta_{t-1} \end{array}\right] + \left[\begin{array}{r} 0 \\ 0 \end{array}\right] $$
$$ z_t = \left[\begin{array}{rr} 1 & x_t \end{array}\right] \left[\begin{array}{r} \alpha_t \\ \beta_t \end{array}\right] + v $$
The code I have is
import numpy as np
def kalman_filter(x,y):
delta=0.0001
Ve=0.001
yhat = np.ones(len(y))*np.nan
e = np.ones(len(y))*np.nan
Q = np.ones(len(y))*np.nan
R = np.zeros((2,2))
P = np.zeros((2,2))
beta = np.matrix(np.zeros((2,len(y)))*np.nan)
Vw=delta/(1-delta)*np.eye(2)
beta[:, 0]=0.
for t in range(len(y)):
if (t > 0):
beta[:, t]=beta[:, t-1]
R=P+Vw
yhat[t]=np.dot(x[t, :],beta[:, t])
xt = np.matrix(x[t, :])
Q[t] = np.dot(np.dot(xt,R),xt.T) + Ve
e[t]=y[t]-yhat[t]
K=np.dot(R,np.matrix(x[t, :]).T) / Q[t]
beta[:, t]=beta[:, t]+np.dot(K,np.matrix(e[t]))
P=R-np.dot(np.dot(K,xt),R)
return beta
On sample data here, ran it with
import pandas as pd
df = pd.read_csv('mcd.csv',index_col=0)
y = df.MCD
x = np.ones((len(df),2))
x[:,0] = np.array(range(len(df.MCD)))
beta = kalman_filter(x,y)
The last values in the beta vector are 0.114 for slope and 7.538 for intercept. However, I ran an OLS on the same data
import statsmodels.api as sm
f = sm.OLS(y,x).fit()
print f.params
I get
x1 0.049484
const 40.502940
This is completely different from KF results. The problem could be with the x vector, besides the intercept, I have 1,2,..,len(df) in this variable, they are made-up but batch OLS is fine with this.
Then I thought my KF implementation might have been wrong, I tried it with the pykalman package - same results.
If I play around with the x vector (equally dividing 1,..,np.max(df.MCD) and change scaling little bit) I can get close to OLS results. But from the explanations I saw on KFs, it seemed as if this would not be necessary. I am a little confused then why it is said that KF and OLS eventually give the same result.
Thanks,
Note: I contacted the author of the original version of the code above; he mentioned "KF will reach similar result as exponentially-weighted least square, not ordinary least square" - so latest data will be favored over earlier ones. More details here.