Gradient of Cost Function To Find Matrix Factorization

403 Views Asked by At

The aim is to compute a matrix factorization of $X \approx WZ^T$ by Gradient Descent Optimization.

I have the following matrices definitions:

$X\in R^{D x N}$

$W\in R^{D x K}$

$Z\in R^{N x K}$

The RMSE cost function is used:

$f(W,Z) = \frac{1}{|\Omega|} \sum_{(d,n) \in \Omega} \frac{1}{2} \big[ x_{dn}-(WZ^T)_{dn}]^2$.

Some parts of the $X$ matrix might be missing, therefore the sum is only carried out over the non-missing parts of the matrix $X$ -> $\Omega$.

In order to use gradient descent, the gradient of the cost function with respect to $W$ and $Z$ has to be computed...

and I would like to ask: How would you calculate the gradient of the cost function ?

This is how I started:

  1. I define the error: $e_{dn} = x_{dn} - \sum_{k=0}^K w_{dk} z_{nk}$
  2. I calculate the gradient with respect to $w_{ij}$ : $\frac{\partial e_{dn}}{\partial w_{ij}}=-\sum_{k=0}^K \frac{\partial}{\partial w_{ij}} w_{dk} z_{nk} = z_{nj}$
  3. I calculate the gradient with respect to $z_{ij}$ : $\frac{\partial e_{dn}}{\partial z_{ij}}=-\sum_{k=0}^K \frac{\partial}{\partial z_{ij}} w_{dk} z_{nk} = w_{dj}$

... but then I don't know how to continue ?

1

There are 1 best solutions below

0
On

I am certain that you must have initiated the matrices $W$ and $Z$ with some values. You choose a learning parameter and then update the initiated matrices by multiplying the gradient value with the learning parameter. $$W^{t+1} = W^t -\eta\frac{\partial e_{dn}}{\partial w_{ij}}$$ $$Z^{t+1} = Z^t -\eta\frac{\partial e_{dn}}{\partial z_{ij}}$$ You keep doing this until convergence where the error reaches a minimum. I hope this is what you are looking for.