I am trying to calculate Mahalanobis distance from a point to a cluster of points. The code below does that.
import tensorflow.keras.backend as K
import pandas as pd
import scipy as sp
import numpy as np
def mahalanobis(x=None, data=None, cov=None):
x_minus_mu = x - np.mean(data)
if not cov:
cov = np.cov(data.T)
inv_covmat = sp.linalg.inv(cov)
left_term = np.dot(x_minus_mu, inv_covmat)
mahal = np.dot(left_term, x_minus_mu.T)
return mahal.diagonal()
length = 100000
data = K.random_normal((length,1280))
x = K.random_normal((8,1280))
mahalanobis(x=np.array(x), data=np.array(data))
When I set length = 100000, it produces reasonable values. However, when I set length = 1000, it produces positive and negative values in the range of $$10^{17} - 10^{19}$$
Need explanation on why this happens and what I can do if I need to set length to be small number.
When length is 1000, the data matrix is rank deficient (since 1000 < 1280) so the inverse does not exist. Try
sp.linalg.pinvto compute the pseudoinverse instead.