That is a brain teaser but it is also a mathematical problem in probability field.
One day, a man have been trapped into a building which has 18 floors. Now, he want to leave the building but there is a guard.
Assumptions:
- if the man bribes the guard by 10 thousand dollors, he will down 1 stair (with probability 1).
- If he invites the guard to dinner spending 20 thousand dollors, he will down 3 stairs with 80% probability while he will in vain with 20% probablity.
- If he bribes the guard's wife by a 30 thousand cosmetic, he will down a half of stairs with 50% probability(e.g., 12 floor will down to 6 floor, 9 floor will down to 5 floor. Odd floor will down to $(n+1)/2$). Also, he will in vain with 50% probability.
The problem is: if the man has 150 thousand dollors, how does him spend money will has a highest probability to leave the building(Say, the floor number is less or equal to 0)?
This is similar to a programing problem but has probability. So I wonder if there some mathematical theory about it and how to solve the problem?
Essentially, you need to know the expected payoff for each possible strategy that the trapped man can utilize. In this case, an expected payoff greater than zero means that he has a probability greater than zero of making it out of the building before running out of money.
A possible strategy might be something like, first bribe the wife in the hope that he will get down half of the floors, i.e., from floor 18 to floor 9, then bribe the guard 3 times and if necessary, pay for any remaining floors.
A given strategy leads to a set of sequences of choices, depending upon the outcome of the random elements of the problem. So, if everything goes according to plan, the strategy I outlined would lead to the expenditure of $\$30k + 3 * \$20k = \$90k$. However, the probability of everything going as well as possible is only $0.5 * (0.8)^3 = 0.256$.
I see that you linked to dynamic programming. Dynamic programming is a way of keeping track of all the possible paths through the graph of possibilities. Instead of calculating a separate cost and probability for each possible sequence of events, the partial probabilities and partial costs of the various paths are preserved in a table so that they don't have to be recomputed. It also helps to reduce the number of computations because one can see that cells that are far from the optimal path can't be reached without going through suboptimal cells.
That's the best I can do at this point. I don't think there is a neat, closed form solution to your problem. Your best bet is to use a dynamic programming approach to reduce the amount of work required to compute an answer.