How to detailed derivation of multiplicative update rules for Nonnegative Matrix Factorization?

118 Views Asked by Bumbble Comm At 30 Mar 2026 - 8:42

Minimize $\left \| V - WH \right \|^2$ with respect to $W$ and $H$, subject to the constraints $W,H \geq 0.$ The multiplicative update rules are as follows:

\begin{equation} W_{i,j} \leftarrow W_{ij} \frac{(VH^T)_{ij}}{(WHH^T)_{ij}} \end{equation}

\begin{equation} H_{i,j} \leftarrow H_{ij} \frac{(W^TV)_{ij}}{(W^TWH)_{ij}} \end{equation}

the Lagrange $\mathcal{L}$ is: $$\mathcal{L}(W,H) =\left \| V - WH \right \|^2-Tr(\Psi W^T)-Tr(\Phi H^T)$$

The derivatives with respect to H can computed similarly. Thus,

$$\nabla_W f(W,H) = -2VH^T + 2WHH^T-\Psi$$ $$\nabla_H f(W,H) = -2W^TV + 2W^TWH- \Phi$$

According to KKT conditions, $\Psi_{ij}W_{ij}=0$ and $\Phi_{ij}H_{ij}=0$:

$$(-2VH^T + 2WHH^T)\circ W=0$$ $$(-2W^TV + 2W^TWH)\circ H=0$$

The question is:

Why the results are as follows: \begin{equation} W_{i,j} \leftarrow W_{ij} \frac{(VH^T)_{ij}}{(WHH^T)_{ij}} \end{equation} \begin{equation} H_{i,j} \leftarrow H_{ij} \frac{(W^TV)_{ij}}{(W^TWH)_{ij}} \end{equation}

Not are as follows: \begin{equation} W_{i,j} \leftarrow W_{ij} \frac{(WHH^T)_{ij} }{(VH^T)_{ij}} \end{equation} \begin{equation} H_{i,j} \leftarrow H_{ij} \frac{(W^TWH)_{ij}}{(W^TV)_{ij}} \end{equation}

In addition, Why should the learning rate be set like as follows: $\eta_W = \frac{W}{WHH^T}$ and $\eta_H = \frac{H}{W^TWH}$ Click here, you can see this paper.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 12 Dec 2019 - 5:12 BEST ANSWER

Note that near a solution point, the Hadamard fraction is roughly equal to the all-ones matrix $$\left(\frac{W^TV}{W^TWH}\right) \approx {\tt\large 1}$$ If the elements of $H$ are too big, then the elements of the Hadamard fraction are less than unity (since $H$ appears in the denominator). If they're too small then the fractional elements become greater than unity.

This self-correcting behavior is exactly what is required for a convergent iterative method, i.e. $$\eqalign{ H_+ = H \odot\left(\frac{W^TV}{W^TWH}\right) \\ }$$ On the other hand, the reciprocal fraction $$\left(\frac{W^TWH}{W^TV}\right) \approx {\tt\large 1}$$ has the opposite behavior, i.e. large elements in $H$ produce a Hadamard fraction with elements greater than unity, leading to even larger elements in $H$ in the next iteration, and eventually to divergence.

Bumbble Comm On 07 Jul 2020 - 10:31

The detailed derivation of multiplicative update rules is given here derivationMU

How to detailed derivation of multiplicative update rules for Nonnegative Matrix Factorization?

There are 2 best solutions below

Related Questions in REAL-ANALYSIS

Related Questions in MATRICES

Related Questions in DERIVATIVES

Related Questions in OPTIMIZATION

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions