Is it possible with Linear Regression that the p-value is low, but the coefficient is high when the explanatory variables are standardized?

33 Views Asked by At

Let's suppose that the explanatory variables are standardized, so they are on the same scale. In this case, if I understand correctly, the coefficients of each feature determine whether the attribute is relevant or not. Is my conclusion true? Or can it happen that a coefficient of a variable is high but based on its p-value the variable is not relevant?

1

There are 1 best solutions below

0
On BEST ANSWER

It can get much more complicated than that in cases where you almost have colinearity. There are probably better examples, but try this in R with some numbers I just invented

> x1 <- c(-1, -0.51, -0.11, 0.11, 0.52, 1.0)
> x2 <- c(-1, -0.51, -0.12, 0.12, 0.51, 1.0)
> y <-  c(-1, -0.50, -0.10, 0.10, 0.50, 1.0)
> fit <- lm(y ~ x1+x2)
> summary(fit)

Call:
lm(formula = y ~ x1 + x2)

Residuals:
        1         2         3         4         5         6 
-0.005201  0.008268  0.009765 -0.006011 -0.015776  0.008955 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.001877   0.005720  -0.328    0.764
x1           1.126269   0.812855   1.386    0.260
x2          -0.133347   0.813761  -0.164    0.880

Residual standard error: 0.01361 on 3 degrees of freedom
Multiple R-squared:  0.9998,    Adjusted R-squared:  0.9996 
F-statistic:  6798 on 2 and 3 DF,  p-value: 3.277e-06

Each of $x_1$ and $x_2$ is very close to $y$ and on virtually the same scale, so on its own would make a very good predictor of $y$ with a coefficient very near $1$.

But crudely looking at the $p$ values on the right might suggest neither coefficient is significantly different from zero in the multiple regression, while the sign on the coefficient of $x_2$ is in fact negative when its relationship with $y$ looks obviously positive.