Three independent algorithms are executed in parallel. The role of each algorithm is to give an answer (Yes or no) with a certain probability to a certain number of questions (say 100).
Example:
Question 3:: Is this a car?
- Algo1: Yes (0.7 sure) => P1
- Algo2: Yes (0.65 sure) => P2
- Algo2: Yes (0.2 sure, which is also 0.8 No) => P3
And P = f(P1, P2, P3) where f() is a function.
If I want to proceed with a voting process where the final probability P is affected by the majority. Meaning P is high when most of the answers (3 answers in this question) are high, and low otherwise, what is the expression of the function f()?
PS:
- I have tried a simple mean formula (average), but I don't feel that's enough or reasonable since the mean formula is affected by the max and min values.
- I am not explicitly/necessarily trying to compute an average value. The important thing is that P should represent the majority votes (more precise and "correct")
There might be better ways from people who have actually studied such questions, but I'd do this as follows. I'm assuming, as in your example, that you get answers in the the range from 0 to 100, 100 meaning definite "yes" to the question, 50 meaning "don't know" and 0 means a definite "no" to the question.
Then do the following
1) Order the results ($p_1,p_2,p_3)$ in ascending order: $r_1 \le r_2 \le r_3$.
2) Take the function
$$w(r) = \begin{cases} 0, & \text{if $r < 40$} \\ 0.05r-2, & \text{if $40 \le r \le 60$} \\ 1, & \text{if $r > 60$} \end{cases} $$
3) Calculate the weigthed average:
$$f(r_1,r_2,r_3) =\frac{1-w(r_2)}2r_1 + 0.5r_2 + \frac{w(r_2)}2r_3$$
Reasoning: By ordering the results, $r_2$ becomes the decision between leaning to "yes" or "no". If $r_2 > 50$, you are leaning to "yes", if $r_2 < 50$, you are leaning to "no".
You may have a clear 'consensus of two', which is codified in $w(r)$ as $r_2 > 60$ (clear 'yes' by two algorithms) or $r_2 < 40$ (clear 'no' by two algorithms). Or you may have a greay area, where $r_2$ is near 50.
In the consensus case, the formula I gave comes out as the average of the two consenus opinions: If $r_2 < 40$, then $f(r_1,r_2,r_3)=\frac{r_1+r_2}2$. If, OTOH, $r_2 > 60$, then $f(r_1,r_2,r_3)=\frac{r_2+r_3}2$.
The problem with applying this formula strictly for $r_2<50$ and $r_2>50$ is that it becomes non-continuous when $r_2$ crosses 50. For example, if $r_1=0, r_2=49, r_3=100$, applying the average of the consensus votes ("no") would result in $\frac{r_1+r_2}2=24.5$ If $r_2$ changes slightly to $r_2=51$, the consesus vote changes to "yes", so the avarage of the consensus votes would be $\frac{r_2+r_3}2=75.5$, which is a big change.
In the middle area of $w(r)$ ($40 \le r_2 \le 60$), it gives a weight to both extreme answers ($r_1$ and $r_3$), which takes into account that we don't really habe a consensus. As $r_2$ changes from 40 to 60, the weight shifts gradually from $r_1$ to $r_3$, making sure the function $f$ is continuous.