Is there a nuclear norm approximation for stochastic gradient descent optimization?

957 Views Asked by At

I want to minimize $E$ by using stochastic gradient descent. I know that there is a sub-differential for the nuclear norm, but i want to know if is there a approximation of nuclear norm in order to use stochastic gradient descent (s.g.d.) directly?. The cost function is defined as below:

$$E(\Omega) = f(\Omega) + \lambda \|\Omega\|_{*},$$

where $\lambda$ is the trade-off parameter and $\Omega \in \mathbf{R}^{m \times m}$ is a matrix which I want to optimize by using stochastic gradient descent I can minimize the first term of $E$ (loss function $f(\Omega)$) by using s.g.d. directly, but I can not minimize both due to the nuclear norm's not being differentiable.

I found an article where the nuclear norm (trace norm) was rewritten as a penalty over all possible factorizations of $X$, such as follows:

$$\|X\|_{*} = \inf\left\{ \frac12 \|L\|^{2}_{F} + \frac12\|R\|^{2}_{F} : X = L R^T \right\},$$

where $\|\cdot\|^2_{F}$ is the squared Frobenius norm. This minimum is achieved by using SVD (proof is in literature), if: $X=U\Sigma V^T$, then: $L=U\Sigma^{\frac{1}{2}}$ and $R=V\Sigma^{\frac{1}{2}}$. Resulting in the following expression for the nuclear norm:

$$\| X \|_{*} = \frac12 \|L\|^{2}_{F} + \frac12 \|R\|^{2}_{F}$$

With the previous assumption, can you help me to get $\frac{\partial \|X\|_{*}}{\partial X_{ij}}$ or $\frac{\partial \|\Omega\|_{*}}{\partial \Omega_{ij}}$? I want to use s.g.d. to minimize the cost function $E$.

Thanks for your help.

1

There are 1 best solutions below

0
On

I think the following papers correspond to your problem setup.

  • Avron et al., Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization, ICML 2012.
  • Zhang et al., Stochastic Proximal Gradient Descent for Nuclear Norm Regularization, arXiv 2015. https://arxiv.org/abs/1511.01664

Actually, the stochastic subgradient for nuclear norm could be related with low-rank approximation for nuclear norm minimization. The following paper is related.