What does L1 regularization for multiclass discriminative classification look like?

182 Views Asked by Bumbble Comm At 10 May 2026 - 11:14

In general, L1 regularization takes the form of $\mathcal{L}(w,X) + \lambda R(w)$, where $\mathcal{L}$ is a loss function, $w$ is a vector of weights, and $X$ is the data matrix, and $\lambda R(w)$ is the regularization term with regularization parameter $\lambda$. In the case of L1 regularization, I've always seen $R(w) = ||w||_1$.

In the case of multiclass discriminative classification (say multiclass logistic regression for example) with $k$ classes, $W$ is now a weight matrix which has either $k$ columns or $k$ rows depending on convention. In this case, is L1 regularization still given by $R(W) = ||W||_1$, where $||W||_1$ is the matrix L1 norm (max column sum)? Or is it a linear combination of the L1 norms of each column (or row)? In other words, is the regularization term given by $$\lambda ||W||_1$$ or $$\lambda_1||w_1||_1 + \lambda_2||w_2||_1 + \lambda_3||w_3||_1,$$where $w_i$ is the $i$th column of $W$?

For reference, I am trying to determine what the loss function at the bottom of page 15 on the paper linked below actually looks like.

https://www.cs.ubc.ca/cgi-bin/tr/2009/TR-2009-19.pdf

Original Q&A

There are 1 best solutions below

Bumbble Comm On 13 Feb 2020 - 6:54 BEST ANSWER

It is to be understood as $$ \lVert W \rVert_1 = \sum_{i} \lvert w_i \rvert, $$ i.e. you collect all weights in a vector and take the $L_1$ norm (same for $L_2$ by the way).

For reference see e.g. in the book "Deep Learning" by Goodfellow et. al. sections 7.1 and 7.1.2.

What does L1 regularization for multiclass discriminative classification look like?

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in LOGISTIC-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions