How does maximum a posteriori estimation (MAP) differs from maximum likelihood estimation (MLE)

773 Views Asked by Bumbble Comm At 22 Feb 2026 - 7:27

I am having trouble understanding MAP, and how it differs from MLE. If I am correct, the MLE is finding the parameters of some distribution so the likelihood that our data set is generated with that distribution and those parameters is maximized.

But what is then MAP, and how the parameters of the distribution can be a random variable? If we are searching trough parameters to find which one of those are making the distribution generate the data set we have (or the one that is most similar to our data), how a parameter can be a random variable?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 17 Jan 2018 - 9:50

I noticed that you tagged machine learning so I'll give the common explanation seen in machine learning literature.

Recall that from Bayes Theorem we have: $$ f_{\Theta}(\theta \ | \ X,Y) = \frac{f_{Y}(y \ | \ X,\theta) \cdot f_{\Theta}(\theta)}{\int_{\Theta} f_{Y}(y \ | \ X,\theta) \cdot f_{\Theta}(\theta) \ \mathrm{d} \theta} $$

$\Theta$ is our space of parameters

$X$ is our input space

$Y$ is our target space

$f_X(x)$ represents the density of a random variable $X$

This can be simplified as: $$ \text{posterior} \propto \text{likelihood} \ \cdot \ \text{prior} $$

Now right away from this we notice that if we make no inference on the distribution of our prior (or assume that our parameters are uniformly distributed) we can maximize the posterior (MAP) by maximizing the likelihood (MLE)

MAP comes in handy when it isn't reasonable to assume that our parameters are uniformly distributed. In machine learning literature this is how we perform regularization. For example what's commonly referred to as ridge regression results from assuming a gaussian prior on our parameters.

$$ f_{\Theta}(\theta \ | \ X,Y) \propto f_{Y}(y \ | \ X,\theta) \cdot f_{\Theta}(\theta) \\ f_\Theta \sim \mathcal{N}(0,\sigma^2) $$

After some straight forward calculation we arrive at the common formula for MAP with $L_2$ regularization

$$ \max_\theta \left \{ \log(f_{Y}(y \ | \ X,\theta)) + \lambda \| \theta \|^2 \right \} $$

Where $\lambda$ is a hyperparameter stemming from our assumption on the variance of $\theta$

How does maximum a posteriori estimation (MAP) differs from maximum likelihood estimation (MLE)

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Related Questions in MAXIMUM-LIKELIHOOD

Trending Questions

Popular # Hahtags

Popular Questions