I apologize for the lack of math format and possible ambiguity. If I had more maths knowledge, this could be a two-sentence question, but I don't have that.
I want to find a local optimum for my preflop poker bot strategy.
The idea is that I have a deterministic strategy that needs to decide which "action" to take, given the information. There is a finite number of possible actions in ascending order, including check/fold, call, raise 1, raise 2, raise 3.
In texas holdem poker you are dealt two cards, and the "card-strength" of each such pair is given (and is indirectly related to the chance that you win). The card-strength is a single number.
The bet actions of our opponents indicate a certain "oppositional strength" we're facing when it's our turn. For now, it consists of three numbers: the number of raises we're facing, the largest raise we're facing and the number of opponents that haven't folded.
First, let's look at the case where the oppositional strength would be only one number, let's say the maximum raise. We don't know how to compare this with our card-strength and that's the subject of this question. From the oppositional strength we will derive a card-strength-fold (which is a threshold) and when card-strength < card-strength-fold we fold.
We start with any 2-d monotone function N->N with on the x-axis the oppositional strength and on the y-axis the card-strength-fold. To evaluate the function, we let the resulting program play many tournaments against other AI players and see if he loses or earns money. Then we adjust the function and let it play tournaments again. This way we can compare the function with the adjusted version. As you can see, the evaluation function is expensive. What I wanted to do in first instance, was pick an x-value at random and increase or decrease it at random (within certain bounds) and see if the resulting function is better. I think it's a better idea to adjust function at the x-values in the neighborhood of the (randomly) chosen x as well, be it in a lesser matter.
Now, we may add a z-axis for the other actions. Along the z-axis we have the beforementioned actions. when looking at any single y-coordinate, ze x/z function is also monotone. I see the function as a plane in 3d space that may be bent at places. When adjusting the function, we "push" the plane somewhere like a finger pushing in a blanket, the neighborhood going up, but respecting monotonicity.
I can't visualize adding more axes, but you can imagine another ax with the number of players that haven't folded. Data-wise I can pick such a coordinate and find it's neighbors. Remember that monotonicity should always be respected.
If I would just pick and adjust a single number and not its neighborhood, there are easy algorithms that indicate how much you should increase or decrease it (the increment/decrement is just a number that decreases over time as the simulation runs). But given that number,
the question is, how much should I adjust the neighbors in such a way that it mathematically makes sense? And how do I respect monotonicity?
My experience with poker tells me that the optimal solution will have strong differences between some neighbors and smaller differences between other neighbors. How do I take that into account?