what is the difference between maximum likelihood estimation and usual probability inference?

388 Views Asked by At

Can somebody tell me a clear difference between MLE from the usual probability inferences?

1

There are 1 best solutions below

3
On BEST ANSWER

The idea of MLE is that you construct a model with certain parameters. However, you do not know the true parameters of the distribution. Hence, you use a sample from the population to estimate the parameters. MLE is an estimation method. Here is an example, using the Bernoulli distribution. This time, you are the main character of the story.

"Let's play a game! Flip this coin. If it lands on heads, I win the bet and if it lands on tails, you win the bet. Let's bet ten dollars a round." A game master approached you with this proposal.

"How do I know if the coin you possess is fair?", you asked.

"Test it!"

With that, you receive the coin and attempt to test it. You know that the only possible outcomes are heads and tails, with a probability of, in this case, $p$. Denoting $1$ as heads and $0$ as tails, and letting $X$ be the random variable that the coin will output, you say

$$ X = \begin{cases}0 \text{ with a probability of } p\\1 \text{ with a probability of } 1-p\end{cases} $$

This the the formulation of the model.

Alternatively, if you look at the probability function of the distribution, $f_X(x) = p^x(1-p)^{1-x}$

Remember! At this point, you do not know the actual or true value of $p$. However, you shall attempt to estimate it. I shall elaborate here and not go to the derivation.

The idea right here is that you estimate the value of $p$ based on your sample because you do not know the true value. In this case, of course the MLE estimate of $p$ is $\bar x$ however, it is not just a sample mean by elementary statistics! $\bar x$ is the value of $p$ that maximizes the likelihood function. The likelihood function, although counter-intuitive, is the conditional probability that $p$ takes a certain value given the joint probability distribution function. Hence, you see the notation $L(p) = L(p | x_1, x_2, \cdots, x_n)$.

Consider you toss the coin 5 times and the sample values are $1,0,1,1,1$. The likelihood function is actually $$L(p) = p^4(1-p)$$. Attempt to derive this via the pdf of the distribution above.

What you are doing, in MLE, is to find the value of $p$ that maximizes this function (that is why you say that value maximizes the likelihood function)...because you want to find the most probable value of $p$ given that you observed this sample.

If you understand this, you can extend this idea to other distributions like the Poisson, or even continuous distributions like the uniform, Gamma or Normal (in this case you have to estimate 2 variables concurrently.)

Remember, in the real world, you do not have the luxury of sampling the whole population which sometimes extends millions or billions (the log returns of a stock might be updated several times a second..so imagine the number of entries you have if you analyze 10 years). Also, you do not know the value of the parameters. Beyond this trivial example, consider this:

You believe that travel time from your home to the city follows a Poisson distribution with mean $\lambda$ but you do not know the true value of $\lambda$ because it depends on the traffic each day. You want to estimate $\lambda$ so you can plan for your weekly fuel usage. In this case, do you know the true value of $\lambda$? If not, how would you do to estimate the value of $\lambda$?