Understanding Logistic Loss for binary classficiation

170 Views Asked by Bumbble Comm At 09 Apr 2026 - 1:45

I don't quite get how logistic loss works for binary classification:

$$\log(1+\exp(−y\cdot \mathbf{w}^T\mathbf{x})), \quad y\in\{−1,+1\}$$

Minimizing this function for $\mathbf{w}$ seems to me to simply make $\mathbf{w}^T\mathbf{x}$ as large as possible, meaning setting $w_i$ to infinity (negative or positive - depending on $x_i$).

What do I misunderstand?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 10 Jan 2017 - 3:00 BEST ANSWER

$f(w; x_i) = 1/(1+\exp(-y_i w^Tx_i))$ is the probability on observing $y_i$ given $x_i$. Given a set of observations, assuming independence, you obtain the product of these functions. Applying a logarithmic transformation does not affect the location of the maximum (merely its value), and removing the negative sign turns it into a minimization problem. You are therefore interested in the $w$ that minimizes $-\log \prod_i f(w; x_i)$, or, equivalently, that minimizes $$\sum_i \log\left(1+\exp(-y_i w^Tx_i)\right).$$ Now you cannot simply let $w^Tx_i$ go to $\infty$ for all $i$.

Understanding Logistic Loss for binary classficiation

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions