Is there a nuclear norm approximation for stochastic gradient descent optimization?

992 Views Asked by dnovai At 06 Jun 2025 - 3:31

I want to minimize $E$ by using stochastic gradient descent. I know that there is a sub-differential for the nuclear norm, but i want to know if is there a approximation of nuclear norm in order to use stochastic gradient descent (s.g.d.) directly?. The cost function is defined as below:

$$E(\Omega) = f(\Omega) + \lambda \|\Omega\|_{*},$$

where $\lambda$ is the trade-off parameter and $\Omega \in \mathbf{R}^{m \times m}$ is a matrix which I want to optimize by using stochastic gradient descent I can minimize the first term of $E$ (loss function $f(\Omega)$) by using s.g.d. directly, but I can not minimize both due to the nuclear norm's not being differentiable.

I found an article where the nuclear norm (trace norm) was rewritten as a penalty over all possible factorizations of $X$, such as follows:

$$\|X\|_{*} = \inf\left\{ \frac12 \|L\|^{2}_{F} + \frac12\|R\|^{2}_{F} : X = L R^T \right\},$$

where $\|\cdot\|^2_{F}$ is the squared Frobenius norm. This minimum is achieved by using SVD (proof is in literature), if: $X=U\Sigma V^T$, then: $L=U\Sigma^{\frac{1}{2}}$ and $R=V\Sigma^{\frac{1}{2}}$. Resulting in the following expression for the nuclear norm:

$$\| X \|_{*} = \frac12 \|L\|^{2}_{F} + \frac12 \|R\|^{2}_{F}$$

With the previous assumption, can you help me to get $\frac{\partial \|X\|_{*}}{\partial X_{ij}}$ or $\frac{\partial \|\Omega\|_{*}}{\partial \Omega_{ij}}$? I want to use s.g.d. to minimize the cost function $E$.

Thanks for your help.

Original Q&A

There are 1 best solutions below

Tae-Hyun Oh On 09 Aug 2016 - 6:26

I think the following papers correspond to your problem setup.

Avron et al., Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization, ICML 2012.
Zhang et al., Stochastic Proximal Gradient Descent for Nuclear Norm Regularization, arXiv 2015. https://arxiv.org/abs/1511.01664

Actually, the stochastic subgradient for nuclear norm could be related with low-rank approximation for nuclear norm minimization. The following paper is related.

Oh et al., Fast Randomized Singular Value Thresholding for Nuclear Norm Minimization, CVPR 2015. http://arxiv.org/abs/1509.00296

Is there a nuclear norm approximation for stochastic gradient descent optimization?

There are 1 best solutions below

Related Questions in MATRICES

Related Questions in OPTIMIZATION

Related Questions in NORMED-SPACES

Related Questions in GRADIENT-DESCENT

Related Questions in NUCLEAR-NORM

Trending Questions

Popular # Hahtags

Popular Questions