Strategy for selling/buying a stock by average reward value iteration

138 Views Asked by Bumbble Comm At 27 Mar 2026 - 12:03

At beginning of any day $t$, I may own $0$ or $1$ share. The price of the share follows the Markov chain in the table below. At the beginning of a day where I own a share, I may either sell at today’s price or keep. At the beginning of a day where I don’t have a share, I may either buy or not buy. Find the value iteration to maximize the expected discounted profit over an infinite horizon (use $b = 0.95$).

Today's price/Probability of Tomorrow's price

$0$/ $P(0)=0.5$, $P(1) = 0.3$, $P(2)= 0.1$, $P(3)=0.1$

$1$/ $P(0)=0.1$, $P(1)=0.5$, $P(2)=0.2$, $P(3)=0.2$

$2$/ $P(0) = 0.2$, $P(1)= 0.1$, $P(2)=0.5$, $P(3)=0.2$

$3$/ $P(0) = 0.1$, $P(1)= 0.1$, $P(2)=0.3$, $P(3)=0.5$

My attempt: I'm trying to construct the value iteration equation with infinite horizon: $V_{k+1}(i) = \max_{k} (c_{ik} + \beta\sum_{j=1}^{4} P_{ij}(k)V_{j}(n))$. I define state $i$ as whether I have a share ($i=1$) or not ($i=0$) at the beginning of day $t$, and $c_{ik}$ denotes the cost when in state $i$ ($i=0, 1$). But I'm not sure if $c_{ik}$ means anything, because the stock price tomorrow is not affected by the action we take. Thus, I think there is no cost matrix in this problem. Is this correct?

Can someone please help me with this problem? Really appreciate your inputs.

Original Q&A

Strategy for selling/buying a stock by average reward value iteration

Related Questions in MARKOV-PROCESS

Related Questions in DYNAMIC-PROGRAMMING

Trending Questions

Popular # Hahtags

Popular Questions