Linear independence as a way to gauge predictor usefulness.

27 Views Asked by At

Background.

The multiple linear regression model is of the form

$$ Y = \beta_0 + \beta_1X_1 + \cdots + \beta_nX_n + \epsilon $$

where we assume $\epsilon$ is normally distributed with constant variance. One way to decide whether any of the predictors are useful in predicting the response $Y$ is to test the following null hypothesis against the alternative:

$$ H_0: \beta_1 = \cdots = \beta_n =0 $$

versus

$$ H_1: \beta_j \neq 0, \text{for at least one } j = 1, \ldots,n $$

To test this, we can use the $F$ statistic.


Question.

Rather than using the $F$ statistic, since $X_1, \ldots, X_n$ are observed and are vectors, would it make any sense to test $H_0$ by checking whether $\{X_1, \ldots, X_n\}$ is a linearly independent set? What I mean is, checking whether the only solution to the equation

$$ \beta_1X_1 + \cdots + \beta_nX_n = 0 $$

is $\beta_1 = \cdots = \beta_n =0$.

If no, is it because we're not taking into account $\epsilon$?

1

There are 1 best solutions below

1
On BEST ANSWER

Sanity check: Your proposal does not involve $Y$ in any way, which suggests it doesn't really address the root question.

More detail: For most datasets, $X_1,\ldots, X_n$ are linearly independent (and in linear regression, we actually prefer this to avoid collinearity issues). And in many of those cases, some subset of them may be a good predictor of $Y$. So linear independence of $X_1, \ldots, X_n$ does not really correspond to $H_0$ here.