Intuition about multiplicative gradient descent

557 Views Asked by At

Suppose we want to minimize a function $f(x)$ wrt $x$, i.e., we want to solve,

$$x^* = \arg \min_x f(x)$$

One method to solve such problems is gradient descent. In gradient descent, one uses the update rule,

$$x_{n+1} = x_n - \eta_n \nabla_x f(x) \big|_{x=x_n}$$

This is so called additive gradient descent. There is another approach. This approach chooses $\eta_n$ so that the multiplicative factor becomes the ratio of positive and negative parts of the gradient. This procedure is convergent (I do not know how). You can see an example from Nonnegative matrix factorisation here. The additive gradient update is given by Eq. (6) in that paper and multiplicative update rule is given by Eq. (4).

I just want to gain physical intuition about this procedure.

Thanks!