Multicollinearity analysis

188 Views Asked by At

What is multicollinearity? How do we detect multicollinearity? How can we mitigate or remedy multicollinearity? Is multicollinearity a statistical issue or a linear algebra issue?

1

There are 1 best solutions below

0
On BEST ANSWER

Many linear models require inverting a square matrix $M.$ Theoretically, an inverse can be found if no one column of $M$ is a linear combination of the others.

However, computational problems arise when one column of $M$ is nearly a linear combination of the others. A simple one-dimensional version is trying to find $1/m$ when $m$ is very nearly $0$. If $m = 10^{-100^{100}},$ then for many practical purposes, $1/m$ 'does not exist'. (In some formulations of the corresponding linear algebra situation, the determinant $\Delta$ of $M$ necessary to find the inverse may be very near 0.)

In a regression problem, multi-collinearity can mean that a very slight change in one of the dependent variables might make a huge change in the estimates $\hat \beta_i$ of some of the coefficients.

I'm not sure how to answer whether multi-collinearity is a 'linear algebra' or a 'statistical' issue. I would prefer to say it is a computational issue that may make some or all of the statistical estimates useless. Competently-written statistical software will warn of multi-collinearity issues. If it is a multiple-regression analysis, perhaps one can drop one (or more) of the dependent variables. Often, carefully done experimental design can avoid multi-collinearity.

One case that often produces multi-collinearlity is in trying to do 'polynomial' regression with dependent variables $x_i, x_i^2, x_i^3,$ etc. If the $x$'s are too narrowly spaced, then one might have something like $x_i \approx cx_i^3.$ In that case, two of the dependent variables are essentially the same and there will be computational difficulties.