As part of my masters thesis i am 'Examining the Reliability of Markov Chains and The Kalman Filter as Stock Market Forecasters'. I will be using the daily returns from the s&p500 over a 5 year period as a benchmark and compare the results from each model against this benchmark to establish which is more accurate. However i am looking for a formal test which will allow me to establish which data set is more similar to the s&p500?
I considered a simple correlation between data sets but i would prefer something more conclusive.
Any ideas?
I really would appreciate any help.
Thank You
I would keep this one simple, as follows: Models $A$ and $B$ each produce predictions $X_A^{(i)}$ and $X_B^{(i)}$ for each of $N$ historical data situations, where the actual results are $X^{(i)}$. I will discuss below how to choose these data situations given a time-sequence of states.
Then it is easy to form squared error ensembles $\Delta_A^{(i)} = \{X_A^{(i)}-X^{(i))^2} : i < N\}$ and similarly for $\Delta_B^{(i)}$. Now if those values were statistically uncorrelated, you could estimate (separately for ensemble $A$ and ensemble $B$) the mean and variance in the usual way where mean $\mu_A$is the average of $\Delta_A^{(i)}$, and variance is $$ \hat{\sigma}_A^2 = \frac{1}{N-1} \left( \sum_i (X_A^{(i)})^2-N\mu^2 \right) $$ Now you have two (assumedly normal) distributions with means separated by $\mu_A - \mu_B$ and variances $\hat{\sigma}_A^2, \hat{\sigma}_B^2$. The difference of the two is normally distributed random $\delta$ with mean $\mu_A - \mu_B$ and standard deviation $\hat{\sigma} = \sqrt{\hat{\sigma}_A^2 +\hat{\sigma}_B^2}$. You assume the standard deviation of $\delta$ is really your estimated value $\hat{\sigma}$, and apply the two-sided $z$ test of the hypothesis that the distribution mean is actually zero. For example, if your significance criterion is a 95% confidence level, the effect seen is deemed significant if significant if $|\mu_A - \mu_B| > 1.96 \hat{\sigma}$.
You could instead of this method compare Pearson's "coefficient of determination" for the two models; the trouble tiwth that is that you still need to figure out how big a difference is needed to consider it significant.
The remaining issue is how to form the ensemble of historical data situations such that the accuracy of method $A$ at point $i$ is not correlated with the accuracy at point $i+1$ (and similarly for method $B$). The simple technique is to first modify method $A$ by subtracting off its mean prediction error (if that happens to be non-zero; a decent model will already have done that). Then consider a time sequence of $\Delta_A^{(t)}$ and create, for $k=1$, an ensemble of $C_{t,k} = (\Delta_A^{(t=mk)})(\Delta_A^{(t=mk+k)})$ for all $m$ such that $mk+k$ is less than the total time available. The The mean value of this ensemble is related to the mean squared $\Delta_A^{(i)}$; their ratio is closely related to the time correlation coefficient for $k$ steps. This will likely be something close to $1$ for $k=1$. Try again for $k=2$, $k=4$ and so forth until you come to a step size that reduces the correlation to (say) 10% for both methods $A$ and $B$, and use that step size to form your ensembles for doing the $z$-test analysis.