Bayes estimators intuition: why can't we simply maximize the posterior?

42 Views Asked by At

I'm learning about Bayes estimator, and I'm still confused about why we can't simply use the posterior for this purpose. More specifically, suppose I need to estimate a parameter $\theta$, which I assume to be random and follow a prior distribution $\pi(\theta)$. Now I observe some data $X$ and I can use the Bayes rule to update my prior distribution to get my posterior: $$\pi(\theta|X) = \frac{f(X|\theta) \pi(\theta)}{f(X)}$$ Then it seems natural to me to define the Bayes estimator to be simply the argmax of $\pi(\theta|X)$. However, in the formal definition, the Bayes estimator is defined to be the minimizer of the Bayes' risk. Can someone explain to me the intuition behind this? And what are the advantages of defining it this way ?

1

There are 1 best solutions below

2
On

Note that you get to pick the loss function in the definition of the Bayes estimator, and I think you'll get your argmax definition when the loss function is something like "the penalty for any loss is infinite." (Not sure if that can be made rigorous or is just a heuristic, depends on what is allowed for a loss function.)

Anyway, here's an example where the argmax gives an unintuitive answer.

Say that the prior on theta is that it's equal to $-1$ with probability $.001$, and the rest of its $.999$ probability is uniformly distributed on $[0, 1]$. The variable $X$ depends on $\theta$ as follows. If $\theta = -1$, then $X$ is $17$ with probability $.001$ and is otherwise drawn uniformly from $[0, 1]$. If $\theta$ is between $0$ and $1$, then $X$ is drawn uniformly from $[0, 1]$.

Now for any $X$ whatsoever, the argmax you have defined will be $-1$, since any other $\theta$ has infinitesimal probability. But unless $X = 17$, in fact this is not a reasonable estimator for $\theta$ -- our belief that $\theta$ is $-1$ is very small and should go down when we see data other than $17$.