If the gradient of the logistic loss is never zero, does that mean the minimum can never be achieved?

728 Views Asked by At

This question arose to me because I was trying to understand what would happen if its impossible to set the gradient to zero in an unconstrained problem with logistic regression (even if we iteratively try to minimize the function say we Gradient Descent).

I was studying logistic regression ($y \in \{ -1,+1\}$):

$$ J_{train}(w) = \frac{1}{N} \sum^N_{n=1} \log( 1 + e^{-y^{(n)} w^{\top}x^{(n)}} )$$

and noticed that its gradient is:

$$ \nabla _{w}J_{train}(w) = \frac{1}{N} \sum^N_{n=1} \frac{- y^{(n)}x^{(n)} }{1 + e^{y^{(n)}w^{\top} x^{(n)}}}$$

where my intuition tells me this can't be set to zero because the negative sigmoid/logsitc function $\sigma(-z) = \frac{1}{1 + e^{y^{(n)}w^{\top} x^{(n)}}}$ can't be set to zero.

Is this true for this especial case I am considering? Is true the gradient can't be zero?

Also, if the gradient is impossible to set to zero, does it mean that gradient methods will just wonder off forever? What does it mean? Does it mean there are no unique minimizer? But if the function is convex like the one I am considering but the gradient can't be set to zero, what does that mean with respect to its optimization landscape?

1

There are 1 best solutions below

2
On

That $\sigma (-z)$ is not zero is something you have to be sure is true in your expression since it is denominator.

That a function has minimum or maximum does not depend solely on its derivatives, remember that the maxima/minima may occure not only in the iterrior of the domain of your function but also on the boundary. This boundary case is something the "derivative" approach may not see, so you have to study it separately. E.g. the function $y = x$ does not have min. or max. value on $\mathbb{R}$ but if you consider this function only on some closed interval $<x_1, x_2> $ then obviously the minimum is at $x_1$ and maximum at $x_2$. So the search for minima/maxima depends on optimization problem at hand and the domain on which you consider the problem.