Results of linear regression with just statistically significant features vs. whole dataset

14 Views Asked by Bumbble Comm At 29 Mar 2026 - 2:47

If I have a dataset, in this case the diabetes toy dataset, and am running a linear regression model, could someone explain what I should expect in terms of performance if I were to conduct the regression analysis with just the 'statistically significant' factors vs. using the entire dataset.

My intuition would tell me that using the entire dataset should be at least as good as using just the statistically significant features given the added information but this appears to not be the case as I see a ~7% reduction in the MSE for the statistically significant feature set.

Just for completeness, I evaluated the statistical significance using a t-test and computing p-values for each of the features and found that the patient's age, sex, BMI and the s5 feature were significant.

Would be keen to hear the communities thoughts here.

Original Q&A

Results of linear regression with just statistically significant features vs. whole dataset

Related Questions in STATISTICS

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions