Difference in Joint Probability vs. Likelihood?

229 Views Asked by At

Conceptually, I am trying to understand the difference between Probabilities and Likelihood.

For instance, suppose I am flipping a 2 sided coin with equal probabilities - this corresponds to a Binomial Distribution. In this case for two flips, the "likelihood" of getting a "HEADS and TAILS" is equal to the "probability" of getting a "HEADS and TAILS": that is, the joint probability of both events is equal to the likelihood in this example. Both the likelihood and the probability of observing HEAD and TAILS in two flips are equal to 0.25. Also in this example, it appears as though the probability and likelihood for any combination of events can not exceed values between "0 and 1".

I am now interested in extending this idea over to continuous probability functions such as the Normal Distribution. Using the theory of Maximum Likelihood Estimation, we are interested in determining the "most likely" ("most probable"?) parameters (i.e. "mu" and "standard deviation") that would have generated the data we are observing - provided we assume that the data itself has come from a Normal Distribution. To illustrate (using R programming language), suppose we have the following set of 5 observations:

> observed_data
[1] -0.24928319 -0.19453417 -0.06760277 -0.07974553  1.10136472

I can find out the (approximate) joint probability for observing this sequence of observations from a Normal Distribution with mu = 0 and sigma = 1 (I know that in most cases we are actually supposed to estimate mu and sigma, but hear me out please):

epsilon = 0.01

joint_probability = diff(pnorm(c(-0.24928319, 0.24928319 + epsilon ), mean = 0, sd = 1, log = FALSE)) * diff(pnorm(c(-0.19453417, -0.19453417 + epsilon ), mean = 0, sd = 1, log = FALSE)) * diff(pnorm(c( -0.06760277,  -0.06760277 + epsilon ), mean = 0, sd = 1, log = FALSE)) * diff(pnorm(c(-0.07974553, -0.07974553 + epsilon ), mean = 0, sd = 1, log = FALSE)) * diff(pnorm(c(1.10136472,  + epsilon ), mean = 0, sd = 1, log = FALSE))

[1] -4.492915e-09

As we can see, the probability of observing these points from the specified Normal Distribution is very small.

Similarly, we can also calculate the "likelihood" of observing these 5 points from the same Normal Distribution:

denominator = 1/sqrt(2*3.14159*1)
likelihood =  (denominator ^ 5) * exp(-0.24928319)^2 * exp(-0.19453417)^2 * exp(-0.06760277)^2 * exp(-0.07974553 )^2 * exp(1.10136472)^2

[1] 0.02803526

As we can see here, (under the specific choice of Normal Distribution with a specific choice of parameters ) the probability of observing these 5 numbers and likelihood of observing these same 5 numbers is different.

All in all, this brings me to my question: It would appear as though in the case of Discrete Probability Functions, the concepts of Probability and Likelihood are equivalent - but in the case of Continuous Probability Functions, the concepts of Probability and Likelihood are not equivalent. If this is truly the case - why does this happen and what is the "conceptual" difference between Likelihood and Probability?

Thank you!