The tail probability can be estimated by two methods:
- In Bayesian method: $$P_B(X>a)=\int^{\infty}_{-\infty}\pi(\theta|x)[1-F(a|\theta)]d\theta$$
- In Plug-in frequency method: $$P_F(X>a)=1-F(a|\hat{\theta})$$ where $\hat{\theta}$ is the MLE of $\theta$.
The numerical results show that it's always $$P_B \geq P_F$$ no matter what the distribution is.
Any ideas or any resources related to this topic to explain why is that?
Many thanks~
Let's say we were just trying to estimate $\theta$ rather that $1-F(a\mid \theta)$ Then the relation between these two quantities already depends on a number of factors. First off, there is the prior, which could conceivably cause a significant deviation in either direction. Let's say you're using a flat prior so that your maximum posterior is the same as your maximum likelihood. Now there's something else to worry about. The plug-in is going to be the mode of your posterior distribution whereas the Bayesian is going to be the average. How these relate depends on skewness and such (consider that the mode of an exponential is always zero, regardless of its mean). So let's simplify further and say $\pi(\theta\mid x)$ is symmetric (maybe it's a large sample so it's nearly Gaussian). Okay, now they line up.
But now say we switch to estimating $1-F(a\mid \theta).$ We want to know how the mean of this function of $\theta$ relates to the function with the mean plugged it. This basically depends on the convexity of the function. If the function is convex, Jensen's inequality tells you that the mean of the function is larger than the function of the mean. If it's concave, it's the opposite. So it really depends on how $1-F(a\mid \theta)$ is shaped. If, for instance, $\theta$ is a location parameter for a normal, then $1-F(a\mid\theta)$ will be convex for $\theta<a$ and concave for $\theta>a.$ Again a mixed bag.
Being that all these factors can push the difference either way, I can't tell without more information about your specific situation why the Bayesian version is larger. My best guess is perhaps you're taking $a$ large, and your typical values of $\theta$ (which is a location parameter) are smaller than that so $1-F(a\mid \theta)$ is convex around there. But I don't even know if $\theta$ is a location parameter for you, so I'm just guessing.