I'm trying to better predict the top three finishers of the next 1000 800m mens freestyle swimming race.
I've got a set of rules to rate the swimmers:
1) Add 5 points if the swimmer won his last race
2) Subtract 3 points if the swimmer is less than 6' tall
3) Add 10 points if the swimmer won his last two races
4) Add 3 points if the swimmer wears a full body swimming suit
5) Subtract 10 points if the swimmer's last race time was over 8 minutes
I want to optimize my set of rules so that the swimmer I rate the highest finishes first more often, the swimmer with the second highest rating finishes second (more often), and the swimmer with the third highest rating finishes third.
I can back-test over the last 5000 swimming races to optimize the rules.
For example, for a race on Jan1 2013 I have the following results:
swimmerID | rule1 | rule2 | rule3 | rule4 | rule5 | finalRating | finishPosition
Swimmer1 | 5 | -3 | 0 | 3 | 0 | 5 | 1
Swimmer2 | 5 | 0 | 10 | 0 | 0 | 15 | 4
Swimmer3 | 5 | 0 | 0 | 3 | 3 | 11 | 3
Swimmer4 | 5 | 0 | 0 | 0 | 3 | 8 | 2
Swimmer5 | 5 | -3 | 10 | 3 | 3 | 18 | 6
Swimmer6 | 5 | -3 | 0 | 0 | 0 | 2 | 5
I can't figure out how to model this problem. I originally thought it was a simple lp problem but I couldn't get that to work.