I have several questions. I will split the text up in one high-level description of the goal of my exercise, a detailed description of my potential solution and finally my actual questions. Please remark on my overall approach or idea if you see a more efficient way of reaching my goal.
High-level: I want to predict a financial time-series X, currently on a daily level. I am looking to know whether X is more likely to go up or down tomorrow. I have access to X down to 1m ticks. X does not seem to contain any clear autoregressive component, but does exhibit some persistence in squared returns. I have therefore been able to, with limited success, predict the volatility of X using an ARMA+GARCH approach.
Detailed: There are multiple other assets Y that are very similar to X, and I believe that I could make use of the cross-correlations between X and Y to predict X. More specifically, I want to investigate the cross-correlation between X and Y where Y is lagged to Z lags. I have access to data for all Y in the same resolution as X.
My analysis becomes troublesome as the number of Y is large, and I need an efficient way of determining which Y are potentially interesting. I want to build a function in R for this. I am thinking of having the function output the following, small y indicates a financial asset contained in Y:
- Acf of X and y
- Pacf of X and y
- Correlation matrix
- CCF
- Simple linear regression of y~lag(X) and X~lag(y)
Questions:
Do you agree with the approach or would you recommend another method that may provide better results?
Would you include anything else in the function?
Would you drop anything from the function?
What type of output would you recommend if I want the function to correlate X to multiple Y, and lags up to Z of Y, at the same time?