MLE (maximum likelihood estimation) confusion

Question

MLE (maximum likelihood estimation) confusion

132 Views Asked by Bumbble Comm At 27 Mar 2026 - 11:30

I am a bit confused how the MLE (maximum likelihood estimation) formulation works.

Using Bayes theorem, we can see that

$$\mathbb P(\theta|X)=\mathbb P(\theta)\mathbb P(X|\theta)/\mathbb P(X)$$

$\mathbb P(X)$ is independent of model choice, so we can simply maximize $\mathbb P(\theta)\mathbb P(X|\theta)$. Assuming a uniform prior, we are then simply maximizing $\mathbb P(X|\theta)$.

Now, here is where my confusion starts. For the simple case, let's assume that $X=\{x_1\}$ and $\theta$ has parameters $\mu$ and $\sigma$ for a normal distribution. So then $\mathbb P(X|\theta)$ is just $\mathbb P(z\sigma+\mu=x_1)$. Given that the normal distribution is continuous, the probability mass of a single point is exactly zero for all points.

Now, common sense tells us that (holding the variance equal) a normal distribution with parameter $\mu$ close to $x_1$ is a better fit than a normal distribution with a parameter $\mu$ far from $x_1$.

However, $\mathbb P(z\sigma+\mu=x_1)=0$ for all $\mu$ and $\sigma$, so how is this reconciled mathematically?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

With a discrete prior probability mass function for $\theta$ of $\mathbb P_0(\theta)$ and a discrete conditional probability for $X$ given $\theta$, you have (much as you stated) a posterior $$\mathbb P(\theta \mid X=x) \propto \mathbb P_0(\theta) \mathbb P( X=x \mid \theta)$$ and you can divide by $\sum_{\theta'} \mathbb P_0(\theta') \mathbb P( X=x \mid \theta')$ to turn this into an actual probability
With a prior density for $\theta$ of $\pi_0(\theta)$ and a discrete conditional probability for $X$ given $\theta$, you have a posterior density $$\pi(\theta \mid X=x) \propto \pi_0(\theta) \mathbb P( X=x \mid \theta)$$ and you can divide by $\int_{\theta'} \pi_0(\theta') \mathbb P( X=x \mid \theta') d\theta'$ to turn this into a probability density
With a prior density for $\theta$ of $\pi_0(\theta)$ and a conditional probability density for $X$ given $\theta$, you have a posterior density $$\pi(\theta \mid X=x) \propto \pi_0(\theta) f(x \mid \theta)$$ and you can divide by $\int_{\theta'} \pi_0(\theta') f(x \mid \theta') d\theta'$ to turn this into a probability density

In the third case, you can find the $\theta$ which maximises $f(x \mid \theta)$ to give the maximum likelihood estimate of $\theta$. A Bayesian could find the $\theta$ which maximises $\pi(\theta \mid X=x)$ to find the mode of the posterior distribution (the so-called maximum a posteriori probability estimate) though, if being forced to give a point estimate, might prefer to take a loss function into account and give the value which minimises the expected loss based on the whole posterior distribution

Either way, shifting the analysis from probability mass functions for discrete distributions to probability density functions for continuous distributions should often remove your issue about zero probabilities for the maximum likelihood estimator or for the mode of the posterior distribution

MLE (maximum likelihood estimation) confusion

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions