Maronna method : Covariance

190 Views Asked by At

What is the algorithm of Maronna method to calculate Correlation or covariance. I am not able to understand the steps to calculate it.

It would be great if anyone could tell and explain how to calculate it for 3 vectors say v1 = [ 1,2,4,6,3] v2 = [4,9,3,7,1] and v3 = [1,2,3,4,5]

For reference see: https://www.cs.ubc.ca/~rng/psdepository/algo2006.pdf

1

There are 1 best solutions below

0
On

The reference you mention is not very clear to say the least. I think it's better to start from the initial paper (http://projecteuclid.org/download/pdf_1/euclid.aos/1176343347) and also consider the more recent paper http://amstat.tandfonline.com/doi/pdf/10.1198/004017002188618509 . The method in the second one is more understandable (and is what seems to be implemented by the likes of Matlab and Intel as "Orthogonalized Gnanadesikan-Kettenring Estimate" (OGK)).


The method explained in the first paper is also non-trivial and yields a family of estimators rather than a single one. The aim is to recover the location $t$ and the scatter matrix $V$ assuming you get realisations from a distribution of the form $(\det V)^{-1/2} h[(x-t)^T V^{-1}(x-t)]$ where $h$ is a possibly unknown distribution.

The family of estimators Maronna considers are solutions of the following system of equations:

$$ {1\over n} \sum_i u_1[d(x_i,t;V^{-1})](x_i-t) = 0 \tag{1.1}$$ $$ {1\over n} \sum_i u_2 [d^2(x_i,t;V^{-1})](x_i-t)(x_i-t)^T = V \tag{1.2}$$

where $d^2(x,y;M) = (x-y)^TM(x-y)$ for $M$ positive definite. This is known as an $M$-norm or Mahalanobis distance, since $M$ is PD you could write $M=LL^T$ and it would read $d(x,y;M) = \|L^T(x-y)\|_2$.

So it's a family of estimators since you have the choice for $u_1,u_2$. A particular choice discussed in the paper is $u_1(s)=u_2(s)=u(s)$ with a specific form of $u$ (involving logs) where you recover the maximum-likelihood estimates.

Ok so the basic idea is that you build estimators $(t,V)$ using weighted sums:

$$ \sum_i w_i^{(1)} (x_i-t) = 0$$ $$ \sum_i w_i^{(2)}(x_i-t)(x_i-t)^T = V$$

where $w_i^{(1)} = u_1[d(x_i,t;V^{-1})]/n$ etc. Note that if you take $w_i^{(1,2)}=1/n$ you get the MLE.

Now assuming $u_1$ and $u_2$ satisfy some general properties then a few nice results can be proven (refer to paper, this forms 80% of it).

A way to solve equations of the type (1.1), (1.2) is to use a fixed point approach. Let's first rewrite as in the paper the more general system:

$$ E_P \left\{u_1[d(x,t;V^{-1})](x-t)\right\} = 0 \tag{2.1} $$ $$E_P \left\{u_2[d^2(x,t;V^{-1})](x-t)(x-t)^T\right\} = V \tag{2.2} $$

under a distribution $P$ where $E_P$ denotes the expectation operator (so that (1.1) and (1.2) are just using the empirical distribution). Maronna then defines

$$ F(t,V) := E_P \left\{u_2[d^2(x,t;V^{-1})](x-t)(x-t)^T\right\} \tag{3.1}$$ $$g(t,V) = {E_P \left\{u_1[d(x,t;V^{-1})]x\right\}\over E_P\left\{ u_1[d(x,t;V^{-1})]\right\}} $$

A solution must verify $F(t,V)=V$ and $g(t,V)=t$ so that you can run an iterative method aiming to find a fixed point of the system:

$$ \left\{\begin{array}{rcl} t_{n+1} &=& g(t_n,V_n)\\ V_{n+1} &=& F(t_n,V_n)\end{array}\right.\tag{A}$$

(use the empirical versions of $g$ and $F$ as in (1.1) and (1.2))

The author specifies that using $G(t,V)=F(t,V)V^{-1}F(t,V)$ is much faster than simply using $F$.


Ok so now you should have a rough understanding of Maronna's estimators. Now what about the paper you mention? Looking at the "Figure4" leads to some hints. (My understanding of that paper is superficial to say the least so bear with me here.., I also believe there are a few typos at important places)

What they do in https://www.cs.ubc.ca/~rng/psdepository/algo2006.pdf

I think they do the above method with $F$ (note the comment by Maronna not to do that and use $G$ instead) with $u_1$ the weight function derived from the Huber Score. (See also Example (i) in Maronna's paper). So $u_1(s) = \text{weight}(s)$ where weight is defined in equation (1) of the paper. You should be able to recognise the different parts with the computation of the Mahalanobis distance etc and the update for $\mu^{k+1}$ as looking like (A).

Note that there's a typo for the update of $\sigma^{k+1}$, the transpose is on the wrong term. Also I don't think it should be $(W[q])^2$ (it should be a function of the squared distance not a square of a function of the distance). But this I think you should check for yourself.

Hopefully this clarifies the whole story a bit. (And hopefully you should see how to compute these estimators for a toy example. I would refrain from trying on the example you suggested since I think it's likely to fail miserably given that you have too few data points)

Other Notes

I found this stuff reasonably interesting so here are a few notes (who knows, someone may read this some day too)

  • Maronna says in the original paper he's only tried with the bivariate case and that it doesn't perform well in higher dim / it's too expensive (!) but then this was in 1974 so we can probably do better now...
  • the OGK is said to be doing well up to 100-200 dimensions and the UBC paper indicates that they do better than OGK because OGK is not affine equivariant
  • The UBC paper actually discusses application of their method to a very large and high dim dataset of both "QC" and Maronna and seem to encourage people to use Maronna in quite a few scenarios (Figure 20).