Is there a formula to best determine what is most likely true?

88 Views Asked by At

TL;DR

Given some finite set of data where each datapoint is a vote on what the individuals giving the vote think is true, and given that collusion and manipulation is possible is there an optimal formula for calculating what is most likely true given just that set of data?


I know this is a really vague question, so I'll explain what I mean with an example, and you can tell me if there is a proof somewhere saying this kind of thing is impossible or not. Though this is a specific example, the applications are general and endless so I'm sure people have thought about this, I just don't know what language they use to describe it.

Pretend you have a company of 5 people. They decide to decide their pay in a different way than most.

Each person voices their opinion about how much they think they should get paid, and how much they think everyone else should get paid (as a percentage of how much revenue the company makes, we'll say).

So the results might look something like this:

       Alice, Bob, Carly, Dan, Elmer
Alice  100    0    0      0    0
Bob    20     20   20     20   20
Carly  10     10   60     10   10
Dan    10     10   10     40   30
Elmer  10     10   10     30   40

Alice, as you can see is greedy, she says she should make all the profits and the others should get nothing. Bob is wants to share the profits evenly. Carly wants to make most the profit but give a little to everyone else, and Dan and Elmer may or may not have colluded thinking, perhaps if we vote each other up, along with ourselves we'll make more money.

My question is, 'if you were in charge of using these numbers, and only these numbers to know how much everyone should get paid, what would be the most balanced and even way of deciding? That includes the consideration of collusion.'

You might first decide to merely take the average for each:

       Alice, Bob, Carly, Dan, Elmer
Alice  100    0    0      0    0
Bob    20     20   20     20   20
Carly  10     10   60     10   10
Dan    10     10   10     40   30
Elmer  10     10   10     30   40
------------------------------------
total  150    50   100    100  100
avg    30     10   20     20   20

This doesn't seem fair because the obvious optimal strategy is to merely vote 100% for yourself and if everyone did that, no new information would be learned.

So it seems Perhaps a better way to balance the score would be to first ask - how much was your prediction of how much you should make off when compared to how much the rest of the group thought you should make?

             avg   group 
       self  group norma error
       eval  eval  lized (diff) 
Alice  100   12.5  20.83 -79.17 
Bob    20    7.5   12.5  -7.5
Carly  60    10    16.67 -43.33
Dan    40    15    25    -15
Elmer  40    15    25    -15

Notice that we haven't yet gotten to a final score, but so far it looks as though Dan and Elmer's alleged scheming may have given them more money than anyone. If we keep going in this vein they may have won via collusion. We don't want to turn the group into a dictator, because if that's the case if you use collusion to control the group you control the allocation of value.

Here's where I start getting lost. Is what I'm attempting to do really impossible? is there no optimal way to evaluate these numbers to come up with the fairest solution? Or has the formula for this already been discovered and is ubiquitous, though unknown to me?

Were I to continue I'd probably do the above calculation again on every group of pairs, then every triplet, then every Quartet, getting their errors of what the rest of the group thought they should make vs what they voted for themselves.

Once I had all those differences I'm not sure what that would do for me. I assume I would then combine them in some way, perhaps from largest to smallest (group size) to determine a balanced figure for each individual person. But that's about as detailed as my vision is.

This question really revolves around extracting information out of the data, now that I think about it. We don't know who's the boss, we don't know what anyone does in this hypothetical company. All we know is how they voted. Given that information is there an optimal formula for calculating a balance between those votes? What is most likely true given that set of data?

Seems like this would be benefited if it were an iterative process, like the evaluation of Bayes Theorem.

Thank you for your patience and any thoughts or literature I should read would be greatly appreciated. As a layman I often don't know the words or language that everyone in academia uses surrounding the topics and questions I have, so any direction is appreciated.