Calculating the integrity of the result of a weighted voting system

30 Views Asked by At

I have an ensemble model, which votes over many regression systems. I give my observation to all the models and record their output. Now I have knowledge of models accuracies as follows:

I know the probability that the model prediction accuracy lies within $95\%$ band, given an observation: $p(acc_{m} \geq0.95|O)$

Naming the above probability score accuracy of the model, I have two arrays of model predictions $\vec{H} = [h_0, \cdots, h_n]$ and the corresponding accuracies $\vec{\eta} = [\eta_0, \cdots, \eta_n]$, then a strategy is to introduce the final predicted value as $Y = (\vec{H}.\vec{\eta})/\|\eta\|_1$. Now my question is how can I calculate the probability that my expectation of prediction $Y$ lies within $95\%$ band?

I should again note that $\eta$ vector is not necessarily normal. For better illustration I have provided the following images:

Suppose this is $H$: enter image description here

and this is $\eta$: enter image description here

Then the histogram of output values would be something like: enter image description here

Example

(In case of 2 models) Alice is $80\%$ sure that the price of an item is about $10\$$

$P_a(|price-10|/10 \leq 0.05|O)$

Bob is $40\%$ sure that the price of an item is about $14\$$

$P_b(|price-14|/14 \leq 0.05|O)$

Now what is the probability that the price is about $(0.8\times10+0.4\times14)/1.2 = 13\$$ ?

$P(|price_{actual}-13|/13 \leq 0.05|OI)$

where $I$ is our above prior information.

1

There are 1 best solutions below

0
On

You can think of a Bayesian modeling strategy . If you have n models and each of your models output a probability distribution over the output range and for each model you have a confidence $p(i)$ (the probability -your belief- that the model output is right;$\sum{p(i)} = 1$) , the final probability distribution over the output values is

\begin{equation} p(x) = \sum_{i=1}^{n}P(x|i)p(i) \end{equation}

If you don't have any preference over the models you can use the uniform distribution $p(i) = 1/n, \forall i=1,n$ . Based on your examples , you may want to model the probabilities $p(x|i)$ also as uniform (continuous) distributions with different domains .

Note that many of my suggestions depend on some additional assumptions/modeling decisions .