How can Bayesian and Frequentist approach be different?

216 Views Asked by At

Let's say I am trying to add numbers, like say one to ten. I can either add them in order, or I can notice that I can group them into five groups of eleven, so I suppose which method to use depends on which gives the better answer in practice. Low and behold though, the answer is the same, and turns out to be 55. It turns out arithmetic is self-consistent, as should all branches of mathematics be.

Another example of using two different methods to solve a problem is Bayesian v.s. Frequentist. According to this article, given the information that the tanks 2, 6, 7, and 14, were observed, a frequentist would estimate there are 16.5 tanks. A bayesianist would estimate there are 19.5. Wait what?

How can they give different answers given the same, givens. Namely, is the estimated number of tanks 16.5 or 19.5 (or something else.) It doesn't make sense to ask what kind of person the analyzer is in getting the number of tanks. Is one of them an approximation, in the same way people say that $\pi = 3.14$ when it does not.

Another thing to note is that both are based on the same foundations. If I ask what $\infty + 1$ is, it is sensible to ask what number system one is use. This does not seem to apply in this case though. There is a certain expected value of tanks that is well defined.

I do not understand how a mathematical question can have contradictory answers in such a way.

2

There are 2 best solutions below

1
On BEST ANSWER

Following up on OP's request to turn my loss function comment into an answer. Note that in statistics there is rarely a universally agreed-upon approach to an estimation problem, much less a notion of "correct" estimator. The rudimentary setup for parametric statistics is as follows.

We have a family of distributions $\{P_\theta\}_{\theta \in \Theta}$ indexed by a parameter $\theta \in \Theta$. For example, if we are trying to estimate the mean $\theta$ of a $N(\theta,1)$ normal random variable, we might take $\Theta = \mathbb{R}$. One could estimate the mean by $$\hat{\theta} = \bar{X} = \frac{1}{n}\sum_{i=1}^n X_i,$$ where $X_i$ are i.i.d. $N(\theta,1)$. Notice that $\hat{\theta}$ takes values in $ \Theta$. This is an estimator OP is familiar with. Now I propose the estimator $$\hat{\theta}_7 \equiv 7.$$ That's right: I estimate the completely unknown mean $\theta$ by 7. Is it a bad estimator? Maybe. But it is admissible in a sense that I will now endeavour to define.

Consider a positive loss function $L(\theta,\hat{\theta})$ defined on $\Theta \times \Theta$. $L(\theta,\hat{\theta})$ is a measure of the discrepancy between our estimate $\hat{\theta}$ and the true value $\theta$. Maybe we want to penalise every time $\hat{\theta}$ misses $\theta$: $$L(\theta,\hat{\theta}) = 1_{\theta \neq \hat{\theta}}(X_1,\dots,X_n).$$ Because the normal distribution is continuous, $\mathbb{P}_\theta(\bar{X} = \theta) = 0$, so we almost always have $L(\theta,\hat{\theta}) = 1$. On the other hand, I could take $$L(\theta,\hat{\theta}) = (\theta-\hat{\theta})^2.$$ The average value of $L$ has a special name. It is known as the mean squared error (MSE) and is defined as $$MSE(\theta,\hat{\theta}) = \mathbb{E}_\theta[(\theta-\hat{\theta})] = \mathbb{E}_\theta[L(\theta,\hat{\theta})^2].$$ The MSE has the following heuristic interpretation: given that $\theta$ is the actual value (hence the $\theta$ on $\mathbb{E}_\theta$), the MSE is the average square distance of the estimator from $\theta$.

Given a loss function $L$, an estimator $\hat\theta_1$ is said to be admissible if no other estimator $\hat\theta_2$ dominates it. That is, there is no $\hat\theta_2$ such that, for all $\theta \in \Theta$, $$E_\theta[L(\theta,\hat\theta_2)] \leq E_\theta[L(\theta,\hat\theta_1)],$$ with strict equality for at least one $\theta$. One can interpret this as saying there is no estimator which performs better (in the sense of $L$) for every $\theta$. Going back to my bold estimator $\hat\theta_7 = 7$, it's clear that for $\theta \in \Theta\setminus\{7\}$, $$MSE(\theta,\bar X) < MSE(\theta,\hat\theta).$$ However, in the event that $\theta$ is 7, my estimator is superb: $$0 = MSE(7,\hat\theta_7) < MSE(7,\bar X).$$ In this sense, $\hat\theta_7$ is admissible.

I have written all this out because OP said in comments that $3.5$ is the correct estimate for the roll of a die. I wish to convey that such a mindset should be immediately dropped.

Finally, to return to OP's question, let us consider the estimation problem being discussed. The frequentist approach described on the Wikipedia page is to use the UMVUE -- the uniformly minimum variance unbiased estimator. This turns out to be equivalent to minimising the MSE for an unbiased estimator. On the other hand, the Bayesian approach was essentially to find the $\theta$ for which $P_\theta(X_1 = 2, X_2 = 6, X_3 = 7, X_4 = 14)$ was maximised.

7
On

You said : I do not understand how a mathematical question can have contradictory answers in such a way.

The problem is that these are not "mathematical questions" as most people would like them to be.

If I am about to roll a six sided die once and ask you to estimate the result, you might say "5" and someone else might say "4." Which answer is right?