Why are MAP estimators not invariant under reparameterization, while MLE's are?

1.5k Views Asked by At

K.P. Murphy writes that this is because:

The MLE does not suffer from this since the likelihood is a function, not a probability density."

I'm not sure I understand why or how that distinction matters, and it would be nice to get a more intuitive explanation.

2

There are 2 best solutions below

0
On BEST ANSWER

The likelihood is a probability density in the data space $p(x|\theta)$, but it's a function, $L(\theta|X)$, not a probability density in the parameter space. This means that there is no constraint on the integral over all parameter values. In contrast, any probability density in the parameter space, such as the posterior, $p(\theta|x)$, is subject to a specific constraint, i.e. that the integral over any interval should remain the same after parameter reparameterization, i.e that: $$\int_A p(y)dy = \int_A p(x)dx$$ This constraint, "forces" the probability density to change when the parameters are reparameterized, since just leaving the same function values would not guarantee the integral condition to remain the same. In contrast, the MLE does not suffer from this problem - the functions values simply remain the same, so that the maximum remains the same as well.

0
On

Adding to the correct answer of nbubis:

Since posterior density functions are a special case of probability density functions, it is sufficient to understand why in general the maxima of probability density functions are not invariant under reparametrizations.

For this, look at the example of the uniform probability density function on the parameter space $[0,\pi] $ and consider the reparametrization $[0,\pi] \ni x\mapsto y:= \cos x $. If maxima were preserved you would expect that $ y$ is unformly distributed as well. However, the density of $ y $ is given by a function on $[-1,1] $ that has peaks (even singularities actually) at $0$ and $1$. This is because the value of this new density at a given point $y_0\in [-1,1] $, i.e. the weight of $ y_0$, represents not only the weight of its (unique) preimage $ x_0$, but, in an infinitesimal sense, how densely the neighbors of $ x_0$ are packed around $ y_0$ under the reparametrization. Thus, the singularities of the density of $ y $, which is given by $1/\sqrt{(1+y)(1-y)} $ by the way, correspond to the fact that a value of $ y $ that is close to $\pm 1$ represents (in an infinitesimal sense) many values of $ x $, not that these values of $ x $ have higher weights. (More precisely, a small interval around such a value of $ y $ corresponds to a large interval of $ x $-values).