sigmoid and loss function

402 Views Asked by At

I am totally lost and trying to understand the following. In my class lecture, a is defined as following:

enter image description here

Here $\sigma$ is the sigmoid-funciton.

Followed to that the lecturer kept saying log is a good function to represent loss. But the question how log is a good function. log(0) is undefined and how he can write this graph. WhatI don't understand how he is defining log value at 0. When x=0 log is undefined - right?

enter image description here

2

There are 2 best solutions below

0
On

Your question isn't entirely well specified, but I can infer your confusion and pose an answer.

The binary entropy function is defined as: $L(p) = -p \textsf{ln}(p) - (1 - p)\textsf{ln}(1 - p)$ and by continuity we define $p\mathsf{ln}(p) = 0$. A closely related formula, the binary cross-entropy, is often used as a loss function in statistics. Say we have a function $h(x_i) \in [0, 1]$ which makes a prediction about the label $y_i$ of the input $x_i$. If $h(x_i) = 1$, then it is certain that the label of $x_i$ is 1, whereas if $h(x_i) = 0.99$, it is only very confident that the label is $1$. If we use the loss on the whole dataset $\mathcal{D} = \{(x_i, y_i)\}_{i = 1}^N$ as $\mathcal{L}(h; \mathcal{D}) = -\frac{1}{N}\sum_{i = 1}^N \bigl[y_i \mathsf{ln}(h(x_i)) + (1 - y_i)\mathsf{ln}(1 - h(x_i) \bigr]$ then this just says that if $h$ is certain about a particular label, and gets it wrong, then the loss is $\infty$. This isn't unreasonable -- we don't want our algorithm to be certain about anything, so it is heavily penalized for it.

0
On

Let's say your predicted probability is $q$ and the ground truth probability is $p$. You hope that $q$ agrees closely with $p$. How do you measure how well one probability agrees with another? In machine learning we measure the agreement of $q$ with $p$ by computing the quantity $\ell(p,q) = -p \log(q) - (1-p) \log(1-q)$.

If you plot $\ell(p,q)$ as a function of $q$, for a fixed value of $p$, you'll see that this looks like a reasonable cost function to measure how well $q$ agrees with $p$. There's no danger of taking the log of $0$ because your predicted probability $q$ is strictly between $0$ and $1$. You can also show that, for a given fixed value of $p$ between $0$ and $1$, $\ell(p,q)$ is minimized when $q = p$.