I'm writing my MDP's value function, and I wish to find a formulation which will result a switching curve. I have arrival operator, as described in Koole's book: https://www.nowpublishers.com/article/DownloadSummary/STO-002
For example: $V(a,b)=min{(V(a,b)+C,V(a+1,b))}$
But my environment operator is non deterministic. The Value function operator is: $V(a,b)=min{(V(a,b)+C,p1*V(a+1,b)+(1-p1)*V(a,b+1))}$
In Koole, all the operators are deterministic, meaning the min is always between two deterministic states. I have here an action which can lead to two different states, with p1 probability.
Is there any article or book that proves this kind of non deterministic arrival operators?