Singular matrix while calculating p-value in Linear Regression

336 Views Asked by At

So I am calculating p-values in a Linear regression problem. I am using this for reference. Here as you can see there is one calculation

beta_cov = np.linalg.inv(X1.T@X1)

Which is nothing but the inverse of the dot product of (transpose of the independent variable matrix(X) and the matrix itself. example X in numpy format

import numpy as np

X = np.array([[1.000e+00, 0.000e+00, 3.840e+00, 5.260e+02, 5.970e+02, 8.100e+01,
        ....]])

Now if you try to calculate the beta_cov it will throw an error like Singular matrix because the determinant of the dot(X.T, X) is zero.

np.linalg.det(arr .T@narr) # det is 0

Now I am facing one problem. So as X is a feature matrix, X is like [independent_vector1, independent_vector2, independent_vector3,...,independent_vectorn]. Now while doing linear regression the position of the vecor does not matter, so it can be [independent_vector2, independent_vector1, independent_vector3,...,independent_vectorn-4]. But interestingly if you rearrange the columns of the above mentioned X, the determinant becomes non zero like

new_arr = arr[:,[0,3,13,14,7,8,9,6,5,4,11,12,1,10,2]] #changing the column position
np.linalg.det(new_arr .T@new_arr) #det is 1.7583201199503196e+20

Now I am confused, the position of the features should not affect the calculation. X[feature1, feature2, feature3] is effectively the same as X[feature2, feature1, feature3] etc...Then why is it affecting the p-value calculation? Does that mean the features are highly correlated and they are not mutually orthogonal?