Derivation of gradient for non negative matrix factorization

1.2k Views Asked by At

I am looking at a paper for non-negative matrix factorization and can't seem to figure out the derivation for the gradient. The function is as follows:

$f(W,H) = \frac{1}{2}||V-WH ||^2_F$

Where V is fixed.

The resulting gradients are:

$\nabla_W f(W,H) = (WH-V)H^T $

$\nabla_H f(W,H) = W^T(WH-V) $

I tried looking up rules in various reference manuals but never got anything solid. Even a hint / solid resource would be incredibly helpful.

2

There are 2 best solutions below

2
On BEST ANSWER

I find it's often easiest to compute matrix-derivatives in these terms, then translate to whatever the desired means of expressing the derivative is. That is, I prefer to think of $\nabla_W f$ as a map from $\Bbb R^{m \times n}$ to the set of linear functionals on $\Bbb R^{m \times n}$.

We find that $$ \DeclareMathOperator{\tr}{tr} f(W + \delta W, H) = \frac 12 \tr[(V - (W + \delta W)H)(V - (W + \delta W)H)^T] =\\ \frac 12 \tr[[(V - WH)- \delta WH][(V - WH) - \delta WH]^T] =\\ f(W,H) + \tr[(WH - V)(\delta W H)^T] + o(\|\delta W\|) =\\ f(W,H) + \tr[[(WH - V)H^T](\delta W)^T] + o(\|\delta W\|) $$ Thus, I would say that $$ [D_W f(W,H)](\delta W) = \tr[[(WH - V)H^T]^T(\delta W)] $$ Translating this to your matrix calculus in which $\nabla_W$ should be a matrix, we find that $$ \nabla _W f(W,H) = (WH - V)H^T $$ as desired.

0
On

For convenience, define the variable $$\eqalign{ M &= WH-V \cr }$$ Write the function in terms of the double-dot (aka Frobenius) product and this new variable. In this form it's easy to find the differential $$\eqalign{ f &= \frac{1}{2}\,M:M \cr\cr df &= M:dM \cr &= M:(dW\,H+W\,dH) \cr &= MH^T:dW + W^TM:dH \cr\cr }$$ Setting $dH=0$ yields the gradient wrt $W$ $$\eqalign{ \frac{\partial f}{\partial W} &= MH^T \cr }$$ while setting $dW=0$ yields the gradient wrt $H$ $$\eqalign{ \frac{\partial f}{\partial H} = W^TM \cr\cr }$$ Double-dot products can be rearranged in a variety of ways, e.g. $$\eqalign{ AB:C &= A:CB^T \cr &= B:A^TC \cr }$$ which follow from its equivalence to the trace $$A:B=\operatorname{tr}(A^TB)$$ and the cyclic properties of the trace.