Find outliers from a matrix X using standard deviation and frobenius norm

31 Views Asked by At

My math professor said that if you had a matrix $X \in \Re^{m x n}$ and you want to detect outlisers.

One could simply compute frobenius norm

$$||X||_F = \sqrt{\sum^m_{i=1}\sum^n_{j=1}|x_{ij}|^2}$$

and divide $||X||_F$ with the standard diviation of $X$

$$\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2}$$

Where $$\bar x = \frac{1}{N}\sum_{i=1}^Nx_i$$

So it will become like this:

$$A = \frac{||X||_F}{\sigma}$$

And if we take the transpose of $X$

$$B = \frac{||X^T||_F}{\sigma}$$

What $A$ and $B$ actually are. They are coordinates of outlisers from the $X$ matrix. For example:

% Create a matrix X
X = [1 2 3 4 3;
     2 3 4 5 6;
     2 4 5 4 4;
     4 5 6 7 5;
     6 6 7 3 9;
     3 7 8 9 3;
     8 8 9 1 8];

% Add two outlisers
X(3, 3) = 300;
X(7, 5) = 200;

% compute the coordinates
A = norm(X, 'fro')./std(X)
B = norm(X', 'fro')./std(X')

And the smallest values indicates what index they are into

A =

   144.8944   167.5229     3.2580   137.7220     4.9082

B =

   317.3993   228.8799     2.7292   317.3993   166.9278   127.9477     4.1791

In this case, I can see that there are two outlisers.

Two on A-axis index 3 and 5 and two on B-axis 3 and 7.

Removing the outliers

By multiply 2 times the standard diveation of the $B$ array, then outlisers can be detected.

% Create a matrix X
X = 20*randn(100, 10);

% Add two outlisers
X(3, 3) = 300;
X(70, 5) = 200;
X(23, 8) = 300;
X(43, 1) = 300;

% compute the row coordinates
row_outlisers = norm(X', 'fro')./std(X');

% Find the threashold
threashold = 2*std(row_outlisers)

% Remove outliers
X(row_outlisers < threashold, :) = [];

Question:

Is it very common to use standard deviation and frobenius norm together to find the outlisers of a matrix?