Explanation of maximum likelihood

Question

Explanation of maximum likelihood

218 Views Asked by Bumbble Comm At 28 Mar 2026 - 3:00

While revisiting maximum likelihood notes i got confused about maximum likelihood and probability density function.

It is saying that

We assume that the examples are independent, so the probability of the set is the product of the probabilities of the individual examples:

$$ f(x_1,...x_n;\theta)=\prod\limits_{j} f_\theta(x_j;\theta)\\ $$

The notation above makes us think of the distribution $\theta$ as fixed and the examples $ x_j $ as unknown, or varying. However, we can think of the training data as fixed and consider alternative parameter values. This is the point of view behind the definition of the likelihood function:

$$L(\theta;x_1,...x_n) = f(x_1,...x_n;\theta)$$

Note that

if $ f(x; \theta) $ is a probability mass function, then the likelihood is always less than one.
if $ f(x; \theta) $ is a probability density function, then the likelihood can be greater than one, since densities can be greater than one.

Can someone explain

if $ f(x; \theta)$ is a probability density function then how likelihood will be greater than one ?

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2017-01-23 08:43:00

Note that $f(x;\theta)$ can be larger than one, but does not have to be larger than one. The explanation is simple: densities are basically real functions which integrate to one. But it is not necessary that a density function $f$ satisfies $f(x) \leq 1$ for all $x\in \Bbb R$. For example the density of a continuous uniformly distributed random variable on $[0, \frac 12]$ is given by $$ f(x) = \begin{cases}2, & x\in [0,\frac 12], \\ 0, &x\notin [0,\frac 12]. \end{cases} $$

**Bumbble Comm** · Answer 2 · 2017-01-23 10:01:17

This example illustrates the role of a discrete PDF in finding probabilties, and (considered as a likelihood function) in estimating a parameter. Specific comments about the 'heights' of PDFs and likelihood functions are shown in italics.

Suppose that you observe $X \sim Binom(n, p).$ Then the PDF consists of individual probabilities $f(x;p) = P(X = x) = {n \choose x}p^x (1-p)^{n-x},$ for $x = 0, 1, \dots, n.$ In this case the probabilities all add to $1,$ so none of them can exceed 1.

Finding probabilities. If $p$ is known, the $f(x;p)$ gives a way to evaluate $P(X = k).$ For example, if $n = 10$ and $p = .4,$ we can use this formula to find $P(X = 3).$ In R statistical software, this probability is found to be 0.3823 (height of the red line in the figure below), but a hand calculation would not be difficult.

 pbinom(3, 10, .4)
 ## 0.3822806

Estimating $p$. However, if $X = x$ is observed and we want to estimate $p,$ then we can view $f(x;p)$ as a function of $p$, calling it a 'likelihood function'. One way to estimate $p$ is to find the value $\hat p$ at which $f(x;p)$ is a maximum. Suppose $n = 10$ and we observe $X = 6.$ We can use R to sketch a graph of the likelihood function as follows:

 p = seq(0,1, by=.001)
 like = dbinom(6, 10, p)
 p.hat = p[like=max(like)];  p.hat
 ## 0.6
 plot(p, like, type="l", lwd=2, col="blue")
 abline(v=p.hat, col="red");  abline(h=0, col="green2")

We say that $\hat p = 0.6$ is the maximum likelihood estimate (MLE) of $p.$ The code above searches for the maximizing value of $p,$ but in this case it is easy to find the maximizing value using calculus.

In this process, we are using the likelihood curve only to find its maximum. For that, it is not necessary to include the constant factor ${10 \choose 6}$ as part of the likelihood function, and so the true height of the likelihood curve is not an issue. Many authors stipulate that a likelihood function is defined only up to a constant multiple, and use the proportionality symbol $\propto$ accordingly: $f(x;p) \propto p^x(1-p)^{n-x}.$ This is especially common in Bayesian applications.

If $X$ has a continuous distribution, then the density function can exceed $1,$ as shown in the Answer of @Cettt (+1).

Note: The estimate $\hat p = 0.6$ above is hardly a surprise. The method of moments gives the same value: $E(X) = np$ so the MME is $X/n = 6/10.$

Explanation of maximum likelihood

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in MAXIMUM-LIKELIHOOD

Trending Questions

Popular # Hahtags

Popular Questions