Instability in Calculating Mahalanobis Distance

164 Views Asked by At

I am trying to calculate Mahalanobis distance from a point to a cluster of points. The code below does that.

import tensorflow.keras.backend as K
import pandas as pd
import scipy as sp
import numpy as np

def mahalanobis(x=None, data=None, cov=None):
    x_minus_mu = x - np.mean(data)
    if not cov:
        cov = np.cov(data.T)
    inv_covmat = sp.linalg.inv(cov)
    left_term = np.dot(x_minus_mu, inv_covmat)
    mahal = np.dot(left_term, x_minus_mu.T)
    return mahal.diagonal()

length = 100000
data = K.random_normal((length,1280))
x = K.random_normal((8,1280))

mahalanobis(x=np.array(x), data=np.array(data))

When I set length = 100000, it produces reasonable values. However, when I set length = 1000, it produces positive and negative values in the range of $$10^{17} - 10^{19}$$

Need explanation on why this happens and what I can do if I need to set length to be small number.

1

There are 1 best solutions below

1
On BEST ANSWER

When length is 1000, the data matrix is rank deficient (since 1000 < 1280) so the inverse does not exist. Try sp.linalg.pinv to compute the pseudoinverse instead.