Why they normalize dictionary atoms instead of L2-regularization in the objective?

566 Views Asked by Bumbble Comm At 11 May 2026 - 12:40

I noticed in the dictionary learning when we have a typical optimization problem like: $$\min_D \|Y-DX\|_2^2 $$ $Y$ is the matrix of the data samples and $X$ is the matrix of encoded vectors for the data (sparse codes).

they usually normalize the columns of the dictionary atoms at the end or during the optimization steps (ex. projected-GD), so they would have L2-norm equal to 1 like having it as a constraint: $$s.t~~~\|d_i\|_2^2=1$$ For example in the paper Online Learning for Matrix Factorization and Sparse Coding. But i noticed the result after this manual normalization (hard-constraint) could be sub-optimal.

my question is why don't they add a regularization term to the objective function like this: $$\min_D \|Y-DX\|_2^2 + \lambda \|D\|_F^2$$ would it force the norm-2 of dictionary columns to be limited and more or less in a similar range? In that case think the benefit is the that the optimum point would be found systematically and we can assume or check the optimal conditions for that point.

Please let me know if i'm right or wrong about this.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 19 Jul 2017 - 8:04

From algorithm 1 and 2 from the paper you referred to, they are solving a Lasso type problem. If you don't normalize your dictionary, the $\ell_1$-penalty will perform really badly. For instance, consider the following Lasso problem $$ \underset{x}{\text{minimize}}\quad \frac{1}{2}||y-Dx||_2^2+\lambda||x||_1 $$ where the columns of $D$ contains, let say, different sequences of lengths. If one column is measured in meters and another column in micro meters, your $x$ variable will be dependent on the unit of the columns of $D$. Since your sparsifying penalty does not take this imbalance into consideration, you will probably get a solution that sets the element in $x$ that corresponds to the column in $D$ that is measured in meters to zero, even though it actually might explain much of the signal.

I don't know if I answered your question, or if I justed stated the obvious. If so - sorry!

Why they normalize dictionary atoms instead of L2-regularization in the objective?

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in REGULARIZATION

Related Questions in CONSTRAINT-PROGRAMMING

Trending Questions

Popular # Hahtags

Popular Questions