my book has the following short section about Logistic Regression:
What can be done with a single sigmoid unit? Logistic regression! For a binary classification problem let us define the cross-entropy loss on an example $(\mathbf{x}, y \in \{0, 1\})$
Def: Cross-Entropy-Loss
$$ l(\mathbf{x}, y, \mathbf{\theta}) = - \ln(\sigma(y\mathbf{x}\cdot\mathbf{\theta})) \tag{1} $$
Here we interpret the output of the logistic unit as a classification probability conditioned on $\mathbf{x}$, $\mathbb{P}(Y=y | \mathbf{x};\mathbf{\theta}) = \sigma(y\mathbf{x}\cdot\mathbf{\theta})$. The logistic loss is then simply the negative conditional log-likelihood.
Let me also give you our definition of $\sigma$:
Def: Logistic Unit
$$ \sigma(\mathbf{x}\cdot\mathbf{\theta}) = \frac{1}{1+e^{-\mathbf{x}\cdot\mathbf{\theta}}} \tag{2} $$
My confusion in the whole thing here is how we end up with that kind of loss function. Usually, the cross entropy for two classes is given by:
$$ l(y, \hat{y}) = - y \log{(\hat{y})} - (1-y) \log(1-\hat{y}) \tag{3} $$
whereas in our case, the logistic unit predicts $y$, but still: I don't see how we end up (1).