Rational Thief Problem, optimal stopping strategy

678 Views Asked by At

A thief goes out stealing every day and has a chance of $p_j$ of stealing a sum $j$ with $0\leq j \leq N$. But there's also a chance $p$ of getting caught, in which case he loses everything he got until that moment. When should he stop stealing? The answer is that he should stop at $$\frac{1-p}{p} \cdot \sum ^n _{j=0} p_j \cdot j$$ First I should make a model of this problem as the optimal stopping of a Markov chain, but I don't know how to do that. And then prove that the model is monotone. Finally I've got to show that the given answer is the moment that the expected extra profit is not greater than the expected loss when getting caught.

Thanks in advance!

Edit: The model should look like this:

$S=\{0,1,...,N\}, A(i)=\{0,1\}, i \in S, r_i=..., i\in S, s_i=..., i\in S, p_{ij}=...$

Where $S$ is the set of total profits the thief can have, $A(i)$ are the action he can perform in state $i$ (stop 0 or steal 1), $r_i$ is the profit he gets when he stops in $i$, $s_i$ is the additional profit if he continues and $p_{ij}$ is the cance of going from $i$ to $j$.

1

There are 1 best solutions below

1
On BEST ANSWER

Define the state $X_n$ at time $n \in \mathbb N$ to be the burglar's total loot at that time. Thus for $j=1,2,\ldots, N$ the possible transitions of $X_n$ and the respective probabilities are $$X_{n+1}=\begin{cases}X_n+j, & \text{ with probability } (1-p)p_j \\ 0, &\text{ with probability } p& \end{cases}$$ Now, the state is certainly monotone since it cannot decrease (conditionally on that he did not get caught). The expected extra profit by continuing for one more step, is equal to $$(1-p)\sum_{i=1}^{N}j\cdot p_j$$ and the expected loss is equal to $$p\cdot X_n$$ Thus, the expected extra loss is greater when $$pX_n \ge (1-p)\sum_{i=1}^{N}jp_j \iff X_n \ge \frac{(1-p)}{p}\sum_{i=1}^{N}j\cdot p_j$$


This problem is solved with dynamic programming techniques in Example 2.1c of Ross, Introduction to Stochastic Dynamic Programming, 1983. Indicatively, the optimality equation is $$V(i)=\max \left[\underbrace{\phantom{\sum_{j=0}^N}i\phantom{\sum_{j=0}^N}}_{\text{stop}}, \underbrace{p \sum_{j=0}^{N}V(i+j)P_j}_{\text{continue}}\right], i>0$$ (where $i$ denotes the total loot of the burglar) and the one-stage look-ahead policy (which can be proved - due to monotonicity of $i$ - to be optimal) is to stop when $i$ enters the set $$\left \lbrace i:i\ge (1-p) \sum_{j=1}^{N}(i+j)p_j\right \rbrace=\left \lbrace i:i\ge \frac{1-p}{p} \sum_{j=1}^{N}jp_j\right \rbrace$$