If the gradient of the logistic loss is never zero, does that mean the minimum can never be achieved?

728 Views Asked by Bumbble Comm At 22 Feb 2026 - 7:52

This question arose to me because I was trying to understand what would happen if its impossible to set the gradient to zero in an unconstrained problem with logistic regression (even if we iteratively try to minimize the function say we Gradient Descent).

I was studying logistic regression ($y \in \{ -1,+1\}$):

$$ J_{train}(w) = \frac{1}{N} \sum^N_{n=1} \log( 1 + e^{-y^{(n)} w^{\top}x^{(n)}} )$$

and noticed that its gradient is:

$$ \nabla _{w}J_{train}(w) = \frac{1}{N} \sum^N_{n=1} \frac{- y^{(n)}x^{(n)} }{1 + e^{y^{(n)}w^{\top} x^{(n)}}}$$

where my intuition tells me this can't be set to zero because the negative sigmoid/logsitc function $\sigma(-z) = \frac{1}{1 + e^{y^{(n)}w^{\top} x^{(n)}}}$ can't be set to zero.

Is this true for this especial case I am considering? Is true the gradient can't be zero?

Also, if the gradient is impossible to set to zero, does it mean that gradient methods will just wonder off forever? What does it mean? Does it mean there are no unique minimizer? But if the function is convex like the one I am considering but the gradient can't be set to zero, what does that mean with respect to its optimization landscape?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 31 Jan 2018 - 6:14

That $\sigma (-z)$ is not zero is something you have to be sure is true in your expression since it is denominator.

That a function has minimum or maximum does not depend solely on its derivatives, remember that the maxima/minima may occure not only in the iterrior of the domain of your function but also on the boundary. This boundary case is something the "derivative" approach may not see, so you have to study it separately. E.g. the function $y = x$ does not have min. or max. value on $\mathbb{R}$ but if you consider this function only on some closed interval $<x_1, x_2> $ then obviously the minimum is at $x_1$ and maximum at $x_2$. So the search for minima/maxima depends on optimization problem at hand and the domain on which you consider the problem.

If the gradient of the logistic loss is never zero, does that mean the minimum can never be achieved?

There are 1 best solutions below

Related Questions in CONVERGENCE-DIVERGENCE

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in MACHINE-LEARNING

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions