Drunkard walk with payments: when should the drunkard pay?

97 Views Asked by At

A drunkard stands at the origin. Each second he goes right with probability $p$, left with probability $q$, and remains still with probability $1-p-q$. When he arrives at $x=+D$ or $x=-D$, he pays $M$ dollars and is transported back to the origin.

Now, there is a new option: at each step, after he knows where he is going to go (left or right), the drunkard has an option to pay $m$ dollars and remain in its place. The drunkard's ultimate goal is to minimize his expeced payment over quadrillion seconds. How should the drunkard decide, in each second, whether to pay?

I think I can solve the special case in which $q=0$, i.e, the drunkard moves only rightwards. In this case, $D$ steps rightwards cost $M$ dollars, so a single step costs in average $M/D$ dollars. Therefore, the drunkard should never pay if $m>M/D$, and always pay if $m<M/D$ (there may be exceptions at the first few steps or at the quadrillionth step, but they are negligible).

However, I am not sure whether the same argument is true for arbitrary $q$.

So, when should the drunkard pay?

2

There are 2 best solutions below

1
On BEST ANSWER

This is only a very partial answer, looking at the simplest nontrivial case, $D=2$.

Presumably the quantity of interest can be described as the average expenditure per round, where the process is allowed to continue indefinitely (which I take to be the essential meaning of "quadrillion"). With this in mind, we may as well assume that $p+q=1$. Allowing the possibility of randomly remaining stationary only means you're slowing the average expenditure per round down by a factor of $p+q$, whatever stategy is employed.

In general, since the option to remain stationary isn't decided until after the next putative step has been randomly chosen, there is never any need to do so unless you are on the $M$-dollar "precipice" at one of the two ends, in this case at $+1$ or $-1$. At any other point you may as well take the step, incurring no cost, in hopes of reversing it on the next round.

Let's denote by $N$ the average expenditure per step if you Never pay to remain stationary. Under this strategy, you always wind up back at the origin every other round, so it's only on the even numbered rounds, when you're at $\pm1$, that you risk having to pay $M$. The average expenditure is

$$N={p^2+q^2\over2}M$$

Next, let's denote by $B$ the average expenditure per round if you pay to remain stationary at Both $+1$ and $-1$ (if the alternative would take you to $+2$ or $-2$). This gives a Markov process on the states $\{-1,0,1$} with transition matrix

$$\pmatrix{q&q&0\\p&0&q\\0&p&p}$$

and dominant eigenstate

$${1\over1-pq}\pmatrix{q^2\\pq\\p^2}$$

(Note: $q^2+pq+p^2=1-pq$.) It follows that the average expenditure per round (in the limit) is

$$B={q^3+p^3\over1-pq}m$$

For example, if $p=q=1/2$, then

$$N={1\over4}M\quad\text{and}\quad B={1\over3}m$$

so the crossover occurs at $m={3\over4}M$. (Note, if $pq=0$, then $N=M/2$ and $B=m$, with a crossover at $m=M/2=M/D$, as the OP observed.)

There are, of course, two more stategies: pay to remain stationary at $-1$ but not $+1$, and vice versa. Let's denote by $L$ and $R$ the average expenditure per round for these two strategies. If I've done the algebra correctly, the results are

$$L={p^3\over1+p^2}M+{q^2\over1+p^2}m$$ and $$R={q^3\over1+q^2}M+{p^2\over1+q^2}m$$

For $p=q=1/2$ we get

$$L=R={1\over10}M+{1\over5}m$$

It's not hard to check that, for $p=q=1/2$, we have $600(N-L)(L-B)=(3M-4m)^2$, which means that the $L$ (and $R$) strategy always lies between the $N$ and $B$ strategies, with crossover at $m={3\over4}M$, so for the classic, symmetric drunkard's walk, one should be willing to pay to remain stationary either Never or a Both ends (which makes sense, by symmetry). For general $p+q=1$, however, there can be cases when $L$ or $R$ is optimal.

It might be worth looking carefully at the case $D=3$, but hopefully this gives some indication of how things will go.

1
On

Even for the case $q=0$ your argument does not work. Suppose $N$ is the number of seconds the drunkard has to walk (a constant). Let us call each time the drunkard goes all the way to $D$ a tour.

Suppose $k$ tours are completed in $N$ seconds, with respectively $\{n_1, \cdots, n_k\}$ extra times the drunkards has payed. Also suppose the last tour remains incomplete with $\nu$ payups and $\mu$ non-payups. Then the total money paid will be $$ P=Mk+\nu m+m\sum_{j=1}^{k} n_j $$ Now the $j$th tour takes $D+n_j$ seconds. And the last incomplete tour is $\nu+\mu$ seconds, with $\mu<D$ of course. So the above is conditioned on $$ N=Dk +\mu +\nu + \sum_{j=1}^{k} n_j $$ This is now truly a conditional optimization problem. Now suppose $m>M/D$, you suggest never paying a single penny. Your payout then will be $M\lfloor N/D\rfloor$. But if instead, in each tour, we pay exactly $n$ times, and not paying anything in the last incomplete tour, $$ N=(D+n)k+\mu\Longrightarrow k=\lfloor N/(D+n)\rfloor $$ The payup is then $P_n=M\lfloor N/(D+n)\rfloor+mn$. The payup you propose is the $n=0$ case. For example suppose $N=2D\ell$ for some large $\ell$ and $n=D$. Then $$ P_0=\ell M+\ell M, \qquad P_D=M\ell +Dm $$ Clearly if $\ell M>Dm$ then $P_D$ case is much better. This happens when $\ell>Dm/M$, or $N>2D^2m/M$. This actually tells you $m>M/D$ or not, never paying up is not a good idea, if $N$ is large enough.

Your estimate for $m<M/D$ is much better. Suppose one goes with the strategy of always paying. Then the payup is $P'=mN$. In the same example as before this is worse that $P_D$ unless the condition $(2mD-M)\ell<Dm$ is satisfied. Considering $\ell\ggg 1$, the condition is more or less limited to cases when $m<M/2D$, which is similar to your estimate in this example.

However suppose instead $N=sD\ell$ with $s$ a small integer and $n=(s-1)D$. Then a similar analysis shows that $$ P=M\ell+(s-1)Dm $$ Comparing with $P'=sD\ell M$ (always paying case), the condition for $P'$ being better than $P$ is now $m<M/sD$. I guess my point is, this is a very complicated problem even in case of $q=0$. The behavior is very sensitive to the values of $m,M,D,N$.