Is there a risk, in some special cases of likelihood form, that using logarithm in likelihood derivation, drives to losing some good solutions?

51 Views Asked by At

In inference statistics, one uses the quantity "likelihood" (L), and derivates it with respect to a given parameter $\theta$ (or set of parameter) in order to maximize the likelihood. Thus we require that the differential with respect to the parameter is null. If several solutions are found, the chosen one is the one that maximimizes the likelihood.

Let's consider the Poisson likelihood, where $n$ is the number of events. We have $L(\theta)=e^{-\theta}\frac{\theta^{n}}{n!}$. If we require differentiation to be null, we get : $\theta^{n-1}e^{-\theta}(n-\theta)=0$. This gives 3 solutions to the condition derivate is null : $\theta=0$, $\theta=\infty$, $\theta=n$. (I don't know, formally, if $\theta=\infty$ is considered as a solution.)

Among them, the ones that maximizes the likelihood is $\theta=n$.

For computation commodity, one most often uses the log ($\ln$) of likelihood, in order to make the computation. Since the $\ln$ has a domain of definition, this prevents the $\theta=0$ to appear as a possible solution. In addition, since we have the introduced the $\ln$, the solution $\nu=\infty$ disappears.

Is there a risk, in some special cases of the form of the likelihood (if so, is there an example), that the usage of logarithm ($\ln$) in likelihood derivation, drives to losing the solutions that would maximize the likelihood if we would have not used the logarithm ?

2

There are 2 best solutions below

0
On BEST ANSWER

No. Because the logarithm function is monotonic, if a likelihood function has $n$ maxima, the log-likelihood function will have $n$ maxima at the same values.

4
On

The likelihood is between 0 and 1, inclusive, i.e. $L(\theta)\in[0, 1]$. When we apply the log transformation, we get $\log (L(\theta))\in(-\infty, 0]$ [and in fact log is increasing]. So we only lose the potential solution if the likelihood is 0. But this just means the sample is not possible to observe. So no, we cannot lose any potential solution by taking the logarithm.

This doesn't mean you always should take the logarithm though. For example if we have samples from the uniform distribution on $[0, \theta]$, the likelihood is $\frac 1{\theta^n} \textbf 1\{\max x_i\le \theta\}$. The log is $-n\log \theta+\log \textbf 1\{\max x_i\le\theta\}$. We can see from the first part that the max occurs at $\max x_i$ and the second at the same value. Do we even need to consider the $\theta<\max x_i$? It's undefined in the second expression, so how do we know that it's not a maximum in the original expression. We know because the only value that could have led to the second expression being undefined is 0 in the first expression, which is the worst possible likelihood, so we wouldn't pick that choice.