second order derivative of the loss function of logistic regression

Question

second order derivative of the loss function of logistic regression

1.2k Views Asked by user910082 At 27 Mar 2026 - 12:57

For the loss function of logistic regression $$ \ell = \sum_{i=1}^n \left[ y_i \boldsymbol{\beta}^T \mathbf{x}_{i} - \log \left(1 + \exp( \boldsymbol{\beta}^T \mathbf{x}_{i} \right) \right] $$ I understand that its first order derivative is $$ \frac{\partial \ell}{\partial \beta} = \boldsymbol{X}^T(\boldsymbol{y} - \boldsymbol{p}) $$ where $$ p = \frac{exp(\boldsymbol{X} \cdot \beta)}{1 + exp(\boldsymbol{X} \cdot \beta)} $$ and its second order derivative is

$$ \frac{\partial^2 \ell}{\partial \beta^2} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} $$ where $\boldsymbol{W}$ is a $n*n$ diagonal matrix and the $i-th$ diagonal element of $\boldsymbol{W}$ is equal to $p_i(1-p_i)$. However, I am struggling with the first order and second order derivative of the loss function of logistic regression with L2 regularization

$$ \ell = \sum_{i=1}^n \left[ y_i \boldsymbol{\beta}^T \mathbf{x}_{i} - \log \left(1 + \exp( \boldsymbol{\beta}^T \mathbf{x}_{i} \right) \right] + \lambda \Sigma_{j}^{p}\beta_j^2 $$

I try to extrapolate $\boldsymbol{X}^T(\boldsymbol{y} - \boldsymbol{p})$ and $\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}$ by simply adding one more term according to my meager knowledge of calculus, making them $\boldsymbol{X}^T(\boldsymbol{y} - \boldsymbol{p}) + 2\lambda\boldsymbol{\beta}$ and $\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} + 2\lambda$

But it appears to me that the thing does not work this way. So what is the correct 1st and 2nd order derivative of the loss function for the logistic regression with L2 regularization?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2021-04-06 15:58:38

$\def\D{{\rm Diag}}\def\o{{\tt1}}\def\p#1#2{\frac{\partial #1}{\partial #2}}$You have expressions for a loss function and its the derivatives (gradient, Hessian) $$\eqalign{ \ell &= y:X\beta - \o:\log\left(e^{Xb}+\o\right) \\ g_{\ell} &= \p{\ell}{\beta} = X^T(y-p) \qquad&{\rm where}\;\;p = \sigma(Xb) \\ H_{\ell} &= \p{g_{\ell}}{\beta} = -X^T\left(P-P^2\right)X \qquad&{\rm where}\;\,P = \D(p) \\ }$$ and now you want to add regularization. So let's do that $$\eqalign{ \mu &= \ell + \lambda\big\|\beta\big\|_F^2 \\ &= \ell + \lambda\beta:\beta \\ d\mu &= d\ell + 2\lambda\beta:d\beta \\ &= (g_{\ell}:d\beta) + (2\lambda\beta:d\beta) \\ &= (g_{\ell} + 2\lambda\beta):d\beta \\ g_\mu &= \p{\mu}{\beta} = g_{\ell} + 2\lambda\beta \\\\ dg_\mu &= dg_{\ell} + 2\lambda\,d\beta \\ &= H_{\ell}\,d\beta + 2\lambda I\,d\beta \\ &= \left(H_{\ell} + 2\lambda I\right)d\beta \\ H_\mu &= \p{g_\mu}{\beta} = H_\ell + 2\lambda I \\\\ }$$

In the above, a colon is used to denote the trace/Frobenius product, i.e. $$\eqalign{ A:B = {\rm Tr}(A^TB) \\ A:A = \big\|A\big\|_F^2 \\ }$$ when $(A,B)$ are vectors this definition corresponds to the standard dot product.

The Frobenius product inherits nice algebraic properties from the trace function, e.g. $$\eqalign{ A:B &= B:A = B^T:A^T \\ CA:B &= C:BA^T = A:C^TB \\ }$$ It also has nice behavior under differentiation $$\eqalign{ d(A:B) &= dA:B + A:dB \\ d(A:A) &= dA:A + A:dA \\ &= A:dA + A:dA \\ &= 2A:dA \\ }$$

second order derivative of the loss function of logistic regression

There are 1 best solutions below

Related Questions in MATRIX-CALCULUS

Related Questions in NEWTON-RAPHSON

Related Questions in REGULARIZATION

Trending Questions

Popular # Hahtags

Popular Questions