In a model of annual demand of automobiles on its prices and annual average income,the following set of information is received.

66 Views Asked by At

In a model of annual demand of automobiles on its prices and annual average income,the following set of information is received. For regression during $1927-1941$, RSS$_1 = 0.1151$, and for regression during $1948-62$, RSS$_2=0.0544$. For a regression on combined above mentioned years RSS$_3=0.2866$. Is it reasonable to assume that pooling the entire dataset is a better option than stratifying them into two groups? How can I test this? Will I need to find the gain due to stratification for this?

1

There are 1 best solutions below

1
On

I had to look up that "RSS" is a fitness metric of the model. Please correct me if it means something different.

I recommend that you have a look at Bayesian Model Comparison. It solves your problem I think

You have two hypotheses:

  • $H_1$ - that each dataset came from a different distribution
  • $H_2$ - that both of your datasets came from the same distribution

In order to determine which hypothesis is more plausible, you would need to calculate model evidences for each case and find their ratio, then compare that ratio to a tabulated value. In short, this method should tell you whether the improved fitness of $H_1$ is sufficient to justify the larger number of fitted parameters as compared to $H_2$. Note that you have more parameters in $H_2$ because you fit parameters separately for each dataset.

Also note that BMS is one of the most robust methods of doing this analysis, but is also potentially more difficult to calculate. There are very simple but more hand-wavy model comparison metrics like AIC and BIC