Cross-Entropy Loss for Logistic Regression

26 Views Asked by At

my book has the following short section about Logistic Regression:

What can be done with a single sigmoid unit? Logistic regression! For a binary classification problem let us define the cross-entropy loss on an example $(\mathbf{x}, y \in \{0, 1\})$

Def: Cross-Entropy-Loss

$$ l(\mathbf{x}, y, \mathbf{\theta}) = - \ln(\sigma(y\mathbf{x}\cdot\mathbf{\theta})) \tag{1} $$

Here we interpret the output of the logistic unit as a classification probability conditioned on $\mathbf{x}$, $\mathbb{P}(Y=y | \mathbf{x};\mathbf{\theta}) = \sigma(y\mathbf{x}\cdot\mathbf{\theta})$. The logistic loss is then simply the negative conditional log-likelihood.

Let me also give you our definition of $\sigma$:

Def: Logistic Unit

$$ \sigma(\mathbf{x}\cdot\mathbf{\theta}) = \frac{1}{1+e^{-\mathbf{x}\cdot\mathbf{\theta}}} \tag{2} $$

My confusion in the whole thing here is how we end up with that kind of loss function. Usually, the cross entropy for two classes is given by:

$$ l(y, \hat{y}) = - y \log{(\hat{y})} - (1-y) \log(1-\hat{y}) \tag{3} $$

whereas in our case, the logistic unit predicts $y$, but still: I don't see how we end up (1).