How do you determined what variables to remove from a regression model

Question

How do you determined what variables to remove from a regression model

71 Views Asked by user57159 At 08 Apr 2026 - 12:43

I apologise for how vague this question may appear but I am not finding any resources online to help with this issue.

I have a data frame loaded into R and split into two separate data frames: training and testing.

My data is around diabetes and has 8 variables including "Glucose" which is the primary variable I'm creating the regressional model against.

I have produced a lm of Glucose against all 7 other variables but I am now struggling to select which one needs to be removed.

This is the current output of my model:


Call:
lm(formula = Glucose ~ Pregnancies + BloodPressure + SkinThickness + 
    Insulin + BMI + DiabetesPedigreeFunction + Age, data = training)

Residuals:
    Min      1Q  Median      3Q     Max 
-68.652 -16.047  -3.082  13.346  75.723 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)
(Intercept)              61.14240    9.67267   6.321 1.08e-09
Pregnancies               0.04819    0.63083   0.076  0.93917
BloodPressure             0.14300    0.12764   1.120  0.26356
SkinThickness             0.10747    0.18138   0.592  0.55403
Insulin                   0.12793    0.01291   9.911  < 2e-16
BMI                       0.11406    0.28488   0.400  0.68921
DiabetesPedigreeFunction  6.95952    4.16151   1.672  0.09562
Age                       0.63202    0.20269   3.118  0.00202
                            
(Intercept)              ***
Pregnancies                 
BloodPressure               
SkinThickness               
Insulin                  ***
BMI                         
DiabetesPedigreeFunction .  
Age                      ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 23.78 on 268 degrees of freedom
Multiple R-squared:  0.4036,    Adjusted R-squared:  0.3881 
F-statistic: 25.91 on 7 and 268 DF,  p-value: < 2.2e-16
```

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2022-05-20 18:13:49

The question of model selection, i.e. picking what features to include in a model, doesn't have a uniform answer! There are a few helpful types of things to consider:

The first type of model selection technique is a stepwise technique. These include:

Backward Elimination.

In backward elimination, you run a regression with the full model (as you did) and then remove the variable with the largest $p$-value (or based on some other criterion (e.g. AIC, which we will talk about later). You continue to do this until you are left only with features whose $p$-value lies below a predesignated threshold. This is a very common method of model selection in Economics.

Forward Selection

In forward selection you start with a null model and then add whichever variable does best in terms of a metric you choose. This could be $p$-value or $AIC$ or something else. You stop adding once the next variable you add would be worse some pre-determined threshold.

Stepwise Selection

This procedure is best done by a computer and it goes back and forth through the model space, adding and subtracting variables based on a criterion to be specified. This can be implemented in many ways.

If you are using R, you can implement all of these easily with the command:

step(model, direction='')

Where direction can be "forward", "backward" or "both".

There is another type of model selection technique that is based solely on a particular criterion. One of the most common is AIC and BIC. Both of these are basically criteria for minimizing the RSS, but with a penalty for the number of parameters you have. You can implement these in R in a good way by first using the leaps package, which has the command

regsubsets(formula, data=)

which will produce the best subset for each number of parameters and you can look at the summary of this to get a sense of what it outputs and then you can find the value of the AIC or BIC for each subset given in that output.

There are actually a couple other methods of model selection (and of possible shrinking of parameters), but these are some of the most used.

Hope this helps!

How do you determined what variables to remove from a regression model

There are 1 best solutions below

Related Questions in REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions