time-series regression with missing data

224 Views Asked by At

I have a regression as follows for time-series data (e.g. stock prices versus other variables):

$$ Y = b \cdot X + b_1 \cdot X_1 + e$$

where $X_1$ will be missing based on pre-determined dates when $X_1$ is not available, e.g. bond trading holidays but stock market is open.

I want to ignore $X_1$ for those days.

I read briefly about various methods of imputation, such as regression approaches, propensity scores, etc, but the examples and the studies mostly deal with nicely behaved data (e.g. GDP, housing, income) whereas the stock/bond/market data is far more noisy and I certainly do not want to infer $X_1$ - instead avoid it all together.

Would it be scientifically correct to use:

$$Y = b \cdot X + e$$

for those days when $X_1$ is missing, or to be scientifically correct, do I need to do some adjustments such as:

$$Y = b_2 \cdot X + e$$

where $b_2 \neq b$ ?

If so, what should be the adjusted coefficients $b_2$?

Also, I need to force the intercepts in my regression to be zero (as they can not be traded).

1

There are 1 best solutions below

1
On BEST ANSWER

Generally I know 2 ways to deal with problem like yours.

  1. Replace your missing variable by some "nominal one". It might be zero if it makes sense (sometimes it does not if typical $X_1$ is far from zero.) It might be an average of $X_1$ on the past or on say last week.

  2. Introduce a categorical variable $Z(X_1)$ wich is one if $X_1$ is not missing and zero otherwise. Then your regression can be rewritten in the following form : $$ Y= b_1 X Z(X_1)+b_2 X(1-Z(X_1))+b_3 X_1 Z(X_1)+e $$ This automatically calculates b_2 on the subset when $X_1$ is missing and $b_2,b_3$ on the set when $X_1$ is not missing.