Maximum Likelihood and Density Function

1.1k Views Asked by At

Likelihood function

$$\mathcal{L}(\theta;\vec{x})=\prod_i f(x_i;\theta)=\prod_i\frac{dP_\theta}{dm}$$

where $\frac{dP_\theta}{dm}$ is the Radon-Nikodym Derivative w.r.t. Lebesgue measure in continuous case

For a continuous random variable, the probability of it takes on any value is zero. But in a statistical setting, for example maximum likelihood or EM algorithm, we plug in the observed values in order to maximize the probability. Is there a mathematically rigours definition of likelihood function or maximum likelihood estimate? Is the likelihood function the same as the joint density function of independent samples? Do we see semicolon ";" as a sign of conditional probability?

1

There are 1 best solutions below

0
On

For a short answer, the "in order to maximize the probability" is more an intuition to understand the motives behind the MLE's than a rigoruous mathematical definition.

Now for the long answer:

  1. Definition of likelihood function: The likelihood is nothing but (at least in the conventional sense), the joint density function or your observations (which are random variables, vectors etc). As you have said, probability density function is defined as the Radon-Nikodym derivative. Note that likelihood is often interpreted as the probability that you observe certain observations. And right as you have argued, this interpretation is not in fact mathematically correct. It should also be mentioned that although likelihood is often defined, at least in introductory examples of statistical textbooks, a product of probability density function of independent random variables, it does not have to be so.

  2. Definition of Maximum Likelihood Estimation I am going to present the rather broad definition of M-estimator as in van der Vaart, Asymptotic Statistics. The following is taken page 45 of the said book

Given an arbitrary random function $\theta \rightarrow M_n(\theta)$, (the M-estimator $\hat{\theta}_n$ is defined as) $M_n(\hat{\theta}_n) > \sup_{\theta} M_n(\theta) - o_p(1)$

Here $o_p(1)$ denotes a remainder term converges in probability to $0$, and is introduced to handle the fact that a maximum is not necessarily achievable. For example, if the parameter space is $(0, 1)$ and the supreme is achieved only when $\theta =1$, we then can still define the M-estimator as a sequence that converges to $1$.

Now, maximum likelihood estimator is nothing but an M-estimator that uses the joint density as function $M_n(\cdot)$ in the previous display

  1. Is the likelihood function the same as the joint density function of independent samples?
    Yes except that they do not necessarily form from independent samples.

  2. Do we see semicolon ";" as a sign of conditional probability Nope. A semicolon is just a way to say that the value of function $f$ (in your formulation) depends on $\theta$.