What criteria do I use to choose determine which regression result is best?

91 Views Asked by At

This could be the wrong forum but I'll ask anyway: I have run multiple regressions to best explain the results of a dependent variable. I have run each independent variable for both, a single X on Y mapping, as well as, in every possible combination of Xs on Y. What criteria do I use to choose the best combination of explicans? Do I use R² adjusted or something else, like lowest Fisher or t-test values? What's the best action?

EDIT #1

The dataset I'm using contains the following drink related variables:

Cal (kcal) per 100ml, Carbs (g per 100ml), ABV (%), SRM, and IBU

The dependent variable is Carbs. The others are candidates for independent variables. Relations among variables are widely known (a simple search would reveal this). I intend to to predict the carb content of a drink, outside the sample, given other known factors, such as ABV or IBU.

I have already wrote software to get this far and I would like to be able to better select which model of combination of explicans to use, from the values that multiple regressions have generated.

**EDIT #2 **

AdjustedRSquared values for Calories, which explicans are:

        //0.885289639377298: [ABV: 1], [IBU: 3], [SRM: 4]
        //0.883100773266796: [ABV: 1], [SRM: 4]
        //0.880762855952053: [ABV: 1], [IBU: 3]
        //0.873628037335544: [ABV: 1]
        //0.364565675835164: [IBU: 3]
        //0.361891884574654: [IBU: 3], [SRM: 4]
        //0.08342216096836: [SRM: 4]

AicCorrectionvalues for Calories, which explicans are:

        //400.062: [ABV: 1], [IBU: 3], [SRM: 4]
        //401.684: [ABV: 1], [SRM: 4]
        //404.496: [ABV: 1], [IBU: 3]
        //411.708: [ABV: 1]
        //641.050: [IBU: 3]
        //642.686: [IBU: 3], [SRM: 4]
        //693.070: [SRM: 4]

Please not that Aic Correction value order is pretty much the same as that of Adjusted R Squared order.

Looks to me that values for analogous to Adjusted R Squared is pretty much the same as the measurments for Aic (corrected) values?

2

There are 2 best solutions below

2
On

The first test is that the $X$s should be plausibly related to the $Y$. The second test is that you are using the right sort of model. I.e. you would not use a normal regression model for goals scored in a football match as this is a Poisson process. $R^2$ is good along with t-test on individual $X$s when they are included in the whole model. If some of the $X$s are highly correlated then this can influence the choice. There are very many tests on the appropriateness of using simple linear regression, e.g. tests on heteroscedasticity and the normality of the errors. You can also do F-tests on including groups of variables or not. There is no right way and there are many very many tests you can do, so it comes down to how much effort it is worth putting in.

0
On

If you'd like to automate your model selection, one way to do it is through the measure of error, and splitting your data set. You can split your data into a training (60% of the data set), validation (20% of the data set), and test (20% of the data set) set randomly. You then build your models on the training set, and calculate the MSE (mean squared error) on the validation set, and simply pick the model which has the least validation error. The test set can then be used to see how your model would perform on unseen data.