If we want to minimize the expected squared deviation of the actual value of a parameter from our eatimate, then it is optimal to estimate the expected value of that parameter.
If we wish to minimize the absolute deviation, the optimal estimate is the median.
What loss function should we minimize in order for the maximum posterior to be optimal?
The loss function I can come up with is the loss function that is 0 if the estimate is correct, and 1 if it is incorrect, regardless of how far it is from the actual value. But this seems like a degenerate loss function. Is there a more reasonable loss function that results in the maximum posterior estimate being optimal?
This is available on the Wikipedia page, but the 0-1 loss (and a continuous analog to it) are the loss functions for which the MAP estimator is optimal.
The discrete case is helpful in getting intuition about it. Suppose we use the following loss function, with $\theta$ the action and $\theta_0 \in \{1,2\}$ the true parameter (taking one of two possible values):
\begin{equation} L(\theta_0, \theta) = \begin{cases} 0 &\mbox{if } \theta = \theta_0 \\ 1 & \mbox{otherwise } \end{cases} \end{equation}
Suppose further that we have a posterior distribution on $\theta_0: P[\theta_0 | X]$.
Then the posterior expected risk is:
$R(\theta) = \int_{\theta} L(\theta_0,\theta)dP[\theta_0|X]$ which is the expectation of an indicator ($L(\theta_0,\theta)$) under the posterior. This gives you:
$R(\theta) = P[\theta_0 \neq \theta |X] = 1 - P[\theta_0 = \theta |X]$.
From the far right hand side you can see that this risk is minimized when you pick the action $\theta$ for which the posterior has a the highest probability.
Doing this in the continuous case isn't as straightforward, as individual elements of the space are not given noticable probability mass from the posterior. The workaround people came up with is as follows, specifying a positive real constant $c$:
\begin{equation} L(\theta_0, \theta) = \begin{cases} 0 &\mbox{if } |\theta - \theta_0| < c \\ 1 & \mbox{otherwise } \end{cases} \end{equation}
Then as $c \rightarrow 0 $ the Bayes estimator approches the Maximum a Posteriori estimator. The curvature, global, and local properties of your posterior distribution will dictate in a lot of ways how good of an estimate this is.