What is the prior and sampling distribution of this situation?

56 Views Asked by At

In an assignment, I have to assume that I'm lost in a city with four areas, let's call them Area A, B, C, and D. All I have with me is a spreadsheet of the city's voting results from a random election. I believe that with I am in Area A with a probability of $0.1$ and the rest of the areas have the same probabilities with each other.

Now I am starting to interview 10 people in the area I am to find out what they voted in the election (the options are Party 1, 2 and 3). The results are that 5 out of the 10 voted for Party 1. In the spreadsheet, it says the voting results for Party 1 in different areas is the following:

Area 1: 20%

Area 2: 34%

Area 3: 40%

Area 4: 15%.

Based on these and using Bayesian inference, what are my prior and sampling distributions and what is the area I am most probably in?

What I have gathered so far:

I think the prior distribution should be something like this: $$p(\theta)=\left(\frac{1}{10}\right)\left(\frac{\theta}{90}\right)^3.$$

Does this seem correct? However, I don't know where to go on with this since I don't know what is the sampling distribution. I know I can get it from the voting results but how do I know how to figure out the distribution from them.

1

There are 1 best solutions below

0
On BEST ANSWER

First, you have to identify the parameter of interest on which you wish to make an inference. Clearly, in this case, it is the area of the city that you are in. This parameter takes on four possible values, $A$, $B$, $C$, and $D$. Since these are categorical, it may be more convenient to assign numbers to them instead, as follows. $$A = 0, \quad B = 1, \quad C = 2, \quad D = 3,$$ and let $$\theta \in \{0, 1, 2, 3\} \sim \operatorname{Categorical}(\pi_0, \pi_1, \pi_2, \pi_3)$$ be the Bayesian parameter of interest whose prior distribution is $$\pi_k = \Pr[\theta = k] = \begin{cases}0.1, & k = 0 \\ 0.3, & k \in \{1, 2, 3\}. \end{cases}$$ We want to update the prior with the observed data in order to compute a posterior.

To this end, the sampling distribution is clearly binomial. It would be multinomial if the question gave you the frequencies of votes for each of the three parties, but it does not: it only tells you how many people you asked voted for Party 1; therefore, $$X \mid (\theta = k) \sim \operatorname{Binomial}(n = 10, p = p_k).$$ That is to say, given you are in an area that is coded by $\theta$, the probability distribution for the number of Party 1 votes is binomial with sample size $n = 10$ and Bernoulli trial probability $p_k$, where $$p_k = \begin{cases}0.2, & k = 0 \\ 0.34, & k = 1 \\ 0.4, & k = 2 \\ 0.15, & k = 3. \end{cases}$$ Note the $p_k$ do not sum to $1$, nor do they need to.

Finally, we compute the posterior via Bayes' theorem. We have $$\Pr[\theta = k \mid X = 5] = \frac{\Pr[X = 5 \mid \theta = k]\Pr[\theta = k]}{\Pr[X = 5]}.$$ This is actually four separate equations, one for each permissible value of $k$. For example, in the case $k = 2$, we have in the right-hand side numerator $$\Pr[X = 5 \mid \theta = 2] = \binom{10}{5} p_2^5 (1-p_2)^{10-5} \pi_2 = \binom{10}{5}(0.4)^5 (0.6)^5 (0.3).$$ The denominator must be computed via the law of total probability: $$\Pr[X = 5] = \sum_{k=0}^3 \Pr[X = 5 \mid \theta = k]\Pr[\theta = k] = \sum_{k=0}^3 \binom{10}{5} p_k^5 (1-p_k)^5 \pi_k.$$ Completing this calculation then gives you the posterior probability of being in Area $C$. You would then repeat this for the other three values of $k$; one shortcut is to note that the denominator is always the same in each of the four cases, so in fact, all you really need to do is tabulate the four cases $$\begin{array}{cc|ccc} \text{Area} & k & p_k & \pi_k & p_k^5 (1-p_k)^5 \pi_k \\ \hline A & 0 & 0.2 & 0.1 & ? \\ B & 1 & 0.34 & 0.3 & ? \\ \vdots & \vdots & \vdots & \vdots & \\ \hline & \text{Total} & \text{NA} & 1 & ? \\ \end{array}$$ and in the last column, compute the sum. Then divide the last column entries by the total of that column to get the posterior probabilities. The reason why you can omit the $\binom{10}{5}$ coefficient is because it is constant in both the numerator and denominator, so it cancels out.