Averaging Categorical Estimators: How do I optimally average three or more categorical predictions?

60 Views Asked by At

Example: Three analysts predict the likelihood of winners between Superman, Thor, and Wonder Woman in a series of three-way mathematical trivia competitions.

  • Analyst A predicts the probability of victory for each superhero $(p_s, p_t, p_w)$ in the next competition is $(p_{s_A}, p_{t_A}, p_{w_A})$.
  • Analyst B predicts the probability of victory in the next competition is $(p_{s_B}, p_{t_B}, p_{w_B})$.
  • Analyst C predicts the probability of victory in the next competition is $(p_{s_C}, p_{t_C}, p_{w_C})$.

I know one analyst is more accurate than the others, so I could simply create an ad-hoc weighting that emphasizes that person's prediction. In fact, I know each person's historical predictions and the final outcome, so I feel like I should be able to do much better than an ad-hoc weighting. However, I am not certain how to weight the predictions optimally using that historical data.

Side Note: If it was a continuous prediction of a single parameter ($x$) and there were only two estimators (A and B), I know I would simply do a reciprocal-variance-weighted average:

$$\mu(x_A, x_B) = \frac{\sigma_B^2 x_A + \sigma_A^2 x_B}{ \sigma_A^2 + \sigma_B^2 }$$

I am new to working with categorical data.