Understanding Logistic Loss for binary classficiation

170 Views Asked by At

I don't quite get how logistic loss works for binary classification:

$$\log(1+\exp(−y\cdot \mathbf{w}^T\mathbf{x})), \quad y\in\{−1,+1\}$$

Minimizing this function for $\mathbf{w}$ seems to me to simply make $\mathbf{w}^T\mathbf{x}$ as large as possible, meaning setting $w_i$ to infinity (negative or positive - depending on $x_i$).

What do I misunderstand?

1

There are 1 best solutions below

0
On BEST ANSWER

$f(w; x_i) = 1/(1+\exp(-y_i w^Tx_i))$ is the probability on observing $y_i$ given $x_i$. Given a set of observations, assuming independence, you obtain the product of these functions. Applying a logarithmic transformation does not affect the location of the maximum (merely its value), and removing the negative sign turns it into a minimization problem. You are therefore interested in the $w$ that minimizes $-\log \prod_i f(w; x_i)$, or, equivalently, that minimizes $$\sum_i \log\left(1+\exp(-y_i w^Tx_i)\right).$$ Now you cannot simply let $w^Tx_i$ go to $\infty$ for all $i$.