Which loss function does the maximum likelihood estimator minimize?

625 Views Asked by At

I'm trying to understand Maximum Likelihood estimators in the context of general estimation theory. I know Bayesian estimator minimizes mean squared loss, MAP estimator minimizes all-or-nothing loss (loss is zero if the estimator estimates the correct parameter and 1 otherwise). Which loss function does the maximum likelihood function minimize?

My thought was that it is negative of the log-liklihood function but the definition of the loss function includes an estimator $T(X)$ and parameter $s$. As I see it, the negative of the log-likelihood function does not have any estimator in it.

2

There are 2 best solutions below

0
On BEST ANSWER

Kullback-Leibler divergence (between the empirical and theoretical probability distributions) is the loss function minimized by the MLE, at least according to this derivation, which looks legitimate on first glance.

0
On

I think you are a bit confused about risk functions and likelihood functions, so I'll do my best to clear this out.

For a given likelihood function $\mathcal L\left(\theta;x_1,\ldots,x_N\right)$, the maximum likelihood estimator yields the estimate $\hat\theta$ that better explains your set of observations $\{x_1,\ldots,x_N\}$ given the statistical model you assumed for each $x_n$ (with $n=1,\ldots,N$). For instance, if each $x_n$ is independent and $x_n \sim \mathcal{N} \left(\theta,1\right)$ ($\forall n$), then your MLE is

\begin{equation}\hat\theta=\frac{1}{N}\sum\limits_{n=1}^N x_n \ . \end{equation}

However, maybe you know a priori that you are only interested in solutions that lie within the range $\theta\in[a,b]$ (i.e. "hit or miss" the range $[a,b]$). And that's when the Bayesian philosophy comes in play. The risk function is a mechanism for you to add prior knowledge about $\theta$ to the design of your estimator, and the particular shape of the risk function you pick will determine your Bayesian estimator. For the "hit or miss" risk function, for instance, the Bayesian estimator simply asks you to pick the mode of your a posteriori PDF, i.e. the most likely value according to the a posteriori PDF.

To tie the two concepts together, notice that in the first scenario we are implicitly assuming that each $\theta\in\mathbf R$ is equally likely to occur. This means that, for a given model statistical, the Bayesian estimator coincides with the MLE for a uniform prior distribution.

PS: For a more in depth discussion check Chapters 10 and 11 of the first volume (called "Estimation Theory") of "Fundamentals of Statistical Signal Processing" by Steven M. Kay.