What is the purpose of the contrast function in independent component analysis, specifically fast ICA?

976 Views Asked by At

What exactly is meant by a contrast function in this context? How does it differ from the objective function? Why is it that several contrast functions can work here?

1

There are 1 best solutions below

0
On BEST ANSWER

I would say that in this context, the contrast function is a certain type of objective function that measures the difference between IC and Gaussian distribution. And that there are so many possible contrast functions because the only requirement to the function is to have a local extremum whenever the ICs distribution is max. non-Gaussian and when super and sub-Gaussian ICs have to be distinguished, that their sign differs.

But, let's prove this with some theory:

As you probably know, the basic idea of ICA is to isolate variables (sources) that are assumed to be i.i.d. from known observations (which are thought to be a linear combination of the unknown i.i.d. sources).

Mathematically, the observations can be written as $\mathbf{x}=(x_1, x_2, ..., x_n)$ and the i.i.d source variables $s_i$ as vector $\mathbf{s}$. Further, $\mathbf{A}$ is a full-rank matrix that contains the mixing information. Then: \begin{equation} \mathbf{x} = \mathbf{A s} \end{equation}

Please note that here the combination of $\mathbf{A}$ and $\mathbf{s}$ has to be estimated by a single known matrix $\mathbf{x}$. This problem is solvable through an iterative and a symmetric approach, but both have in common, that a measure (objective function) is needed that rates the independence of the sources.

Exact measures are mutual information and negentropy and the article Minimization of Mutual Information makes clear why non-Gaussianity is equally suitable. (Gaussian distribution has maximum entropy)

This means, that the independence of the sources is measurable by the difference (contrast) of each variable's distribution and the Gaussian. The negentropy itself is quite difficult to calculate, as it requires finding all the density functions and so it is reasonable to take a higher moment like the kurtosis as measure.

Keep in mind, that the negentropy is positive for all distributions and 0 only for a Gaussian while the kurtosis is 0 for Gaussian distributions, negative for sub-Gaussians and positive for super-Gaussians. Therefore, the square of the kurtosis is the correct approximation (under the constraint of zero-mean and unit variance):

\begin{equation} kurt^2(\textbf{w}^T \textbf{x}) = (E\{\textbf{w}^T \textbf{x}\}^4 - 3)^2 \end{equation}

Indeed, Hyvärinen could show in one of his papers that the found objective function reaches a local maximum, exactly when the local combination equals one of the ICs: $\textbf{w}^T \textbf{x} = \pm s_i$

Knowing these requirements, Hyvärinen took a look onto arbitrary functions and could prove that the extrema of practically any non-quadratic well-behaving even function G coincide with independent components and can be used as non-linearity with the contrast function (here $v$ is a standardized Gaussian). \begin{equation} J_G(w) = [E_x\{G(w^T x)\} - E_v\{G(v)\}]^2 \end{equation}

Having this whole set of possible contrast functions, it is another thing, to choose the most optimal one, but simply said it has to be robust to outliers and have a low asymptotic variance. Practice has proven that $G(u) = log cosh(a_1 u), a_1 > 1$ is good for most use-cases. $G(u) = - \exp(-a_2 u^2 / 2)$ works fine for super-Gaussians and $G(u) = u^4$ may be used for sub-Gaussian ICs when very few outliers are present (because $u^4$ is very sensitive to large u) .

And back to the definitions: Hyvärinen refers to the non-linear function G as contrast function and sometimes he calls the whole objective function as the same.

For me, it is reasonable to call:

  • G non-linear / mapping function and

  • J the objective function which can also be called contrast function because it measures the contrast to Gaussian distributions.