The aim is to compute a matrix factorization of $X \approx WZ^T$ by Gradient Descent Optimization.
I have the following matrices definitions:
$X\in R^{D x N}$
$W\in R^{D x K}$
$Z\in R^{N x K}$
The RMSE cost function is used:
$f(W,Z) = \frac{1}{|\Omega|} \sum_{(d,n) \in \Omega} \frac{1}{2} \big[ x_{dn}-(WZ^T)_{dn}]^2$.
Some parts of the $X$ matrix might be missing, therefore the sum is only carried out over the non-missing parts of the matrix $X$ -> $\Omega$.
In order to use gradient descent, the gradient of the cost function with respect to $W$ and $Z$ has to be computed...
and I would like to ask: How would you calculate the gradient of the cost function ?
This is how I started:
- I define the error: $e_{dn} = x_{dn} - \sum_{k=0}^K w_{dk} z_{nk}$
- I calculate the gradient with respect to $w_{ij}$ : $\frac{\partial e_{dn}}{\partial w_{ij}}=-\sum_{k=0}^K \frac{\partial}{\partial w_{ij}} w_{dk} z_{nk} = z_{nj}$
- I calculate the gradient with respect to $z_{ij}$ : $\frac{\partial e_{dn}}{\partial z_{ij}}=-\sum_{k=0}^K \frac{\partial}{\partial z_{ij}} w_{dk} z_{nk} = w_{dj}$
... but then I don't know how to continue ?
I am certain that you must have initiated the matrices $W$ and $Z$ with some values. You choose a learning parameter and then update the initiated matrices by multiplying the gradient value with the learning parameter. $$W^{t+1} = W^t -\eta\frac{\partial e_{dn}}{\partial w_{ij}}$$ $$Z^{t+1} = Z^t -\eta\frac{\partial e_{dn}}{\partial z_{ij}}$$ You keep doing this until convergence where the error reaches a minimum. I hope this is what you are looking for.