Multiple regression - low F-statistics and multiple R-squared. What should i do/conclude

46 Views Asked by At

I have two independent variable and one dependent variable.

Residuals: Min 1Q Median 3Q Max -22.265 -9.563 -1.916 6.405 39.319

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 23.0107 18.2849 1.258 0.21407 x_1 23.6386 6.8479 3.452 0.00114 **

x_2 -0.7147 0.3014 -2.371 0.02163 * Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 14.84 on 50 degrees of freedom Multiple R-squared: 0.2018, Adjusted R-squared: 0.1699 F-statistic: 6.321 on 2 and 50 DF, p-value: 0.00357

This is my summary result when I use lm(y~x_1+x_2).

I got stuck because neither the F value and R squared are very significant. However the p-value is less than 0.05. Does it mean that y depends on both variables. What should i do next in my regression analysis?

Thank you!

1

There are 1 best solutions below

0
On

I believe your output can be 'unscrambled' as shown below. (If you type 5 blank spaces at the start of each line, you get a 'typewriter type' format one our pages that is useful for output.)

 Residuals: 
      Min    1Q Median    3Q    Max 
 -22.265 -9.563 -1.916 6.405 39.319

 Coefficients: Estimate Std. Error t value Pr(>|t|)
 (Intercept)   23.0107     18.2849   1.258  0.21407
 x_1           23.6386      6.8479   3.452  0.00114 **
 x_2           -0.7147      0.3014  -2.371  0.02163 *

 Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

It seems you have weak correlations of the predicted variable with the predictor variables. That might lead to uselessly long prediction intervals.

Roughly speaking "Adjusted R-squared: 0.1699" means that only about 17% of the variability of y-measurements is 'explained' by exploiting information in variables x1 and x2 via multiple linear regression. That there are $any$ connections at all may be a marvelous breakthrough. That they explain $so\; little$ may be a huge disappointment.

The low p-value tells you slopes for both predictor variables (especially the first) are significantly different from 0.

However, 'statistical significance' is not the same as 'practical importance'. Your x1 and x2 pretty clearly have some ability to predict y, but possibly not strong enough to be useful.

Statistical inference is something for software to report and for a statistical consultants to point out to their clients. Practical importance is for clients to determine based on the real-life situation at hand.

An important issue left hanging here is the $purpose $ of this study. Are you exploring possible $connections$ (associations) between measurements y, x1 and x2? Or are you hoping that measurements x1 and x2 can help you $predict$ values of y (with short enough prediction intervals to be useful)?