In machine learning (regarding cross entropy etc.,) I saw the usage of logarithm for a probability value $\log (p)$
We know that probability can be zero and logarithm is not defined at 0.
How to understand it?
In machine learning (regarding cross entropy etc.,) I saw the usage of logarithm for a probability value $\log (p)$
We know that probability can be zero and logarithm is not defined at 0.
How to understand it?
On
In particular for use in probability, since the entropy $H$ has $$ H = -\sum_{ \omega \in \operatorname{Im} X} p(\omega) \log(p(\omega)) $$ one can note by, say L'Hopital's rule that $$\lim_{p(\omega) \to 0} p(\omega)\log(p(\omega)) = 0 $$ which is the key result. And that makes sense, if you understand entropy as information, because for finite probability spaces at least, an impossible outcome should add no information to a random variable. Thus one may define $$ p(\omega) \log(p(\omega)) = \begin{cases} p(\omega) \log(p(\omega)) & 0 < p(\omega) \leq 1 \\ 0 & p(\omega) = 0\end{cases}$$ and note this is independent of the base of the log, and therefore the units of entropy. So, for all intents and purposes, if you only hope to calculate entropy, since these terms have no contribution, could arbitrarily define $\log(0) = 0$, or any other real number, and the results will work out.
You could write something like this:
$f(p) = \begin{cases} \log(p), & \text{if $p > 0$} \\[2ex] \text{undefined}, & \text{otherwise} \end{cases}$