Does it make sense to have an extremely small maximum likelihood?

68 Views Asked by At

I am writing a script that finds the maximum likelihood given a model and a prediction. Here is a plot of the correct model, incorrect model, and the data. This data has a standard deviation of 10 with random, normal error. When I compute the likelihood function using:

$$ \exp(-\frac{\sum_i(d_if_i)^2}{2σ^2}) $$

Calculating this gives a maximum likelihood of 3 x 10^-20 for the correct model and 8 x 10^-29. These probabilities seem extremely small. When I compute the likelihood ratio (i.e. the ratio of the correct model likelihood over the incorrect model likelihood) I get what I would expect (i.e. the ratio suggests that the correct model has a higher probability of being correct). Does it matter that the individual likelihoods are extremely small, or is the ratio all that really matters? If the likelihoods are too small the problem is likely in my code, in which case I wouldn't really expect help. Thank you!

1

There are 1 best solutions below

0
On

Do not confuse likelihoods with probabilities. A likelihood can be a probability, but in general, it does not have to be. Consequently, a likelihood value only ever makes sense relative to another for the same data and same model.

If this idea is confusing, consider a simple example. For a sample of Bernoulli observations $(x_1, \ldots, x_n)$ where $X_i \sim \operatorname{Bernoulli}(p)$, we know that the likelihood of $p$ given the sample is $$\mathcal L(p \mid x_1, \ldots, x_n) \propto \prod_{i=1}^n p^{x_i} (1-p)^{1-x_i} = p^{\sum x_i} (1-p)^{n-\sum x_i},$$ where $t = \sum x_i$ is the sample total, or the number of observations that equal $1$. For a sample of size $n = 100$ and $t = \sum x_i = 10$, this gives a likelihood $$\mathcal L(p \mid t = 10, n = 100) \propto p^{10}(1-p)^{90}.$$ Our intuition suggests that a good estimate of $p$ would be $t/n = 0.1$. Evaluating the likelihood at this choice of parameter gives $$\mathcal L(0.1) = (0.1)^{10}(0.9)^{90} \approx 7.61773 \times 10^{-15}.$$ Of course this is not a probability, not even a posterior density for $p$ under a Bayesian framework.

What we can do with this is show that, given the data we observed, our estimate $\hat p = 0.1$ is more likely to be true than, say, $p = 0.5$, since $$\mathcal L (0.5) = (0.5)^{10} (0.5)^{90} = (0.5)^{100} \approx 7.88861 \times 10^{-31}.$$ And it is only in this comparison of likelihoods that the value has any meaning.