A home supply store can place orders for fridges at the start of each month for immediate delivery. A cost of $\$ 100$ is incurred each time an order is placed. The cost of storage per fridge is $\$ 5.$ The penalty for running out of stock is estimated at $\$ 150$ per fridge per month. The monthly demand is given by:
demand $x:\;\qquad 0\;\qquad 1\;\qquad 2$
pdf $p(x):\qquad .2 \qquad.5\qquad .3$
respectively.
The policy of the store is that the maximum level of stock should not exceed $2$ fridges in any month.
a) Determine the transition probabilities of the different decision options of the problem.
b) Find the expected cost of inventory, per month, in function of the state of the system and the alternative of the decision.
c) Find the optimal policy of order in place within the next $3$ months.
Attempt.
- a) I tried to follow Henry's comment:
Take for example having 1 in stock at the beginning of the month and ordering 0, which would be represented by a single entry in your array. But what you need to describe for this single case is (a) demand of 0 with probability of 0.2 so so transitioning to 1 at a cost of $\$5$; (b) demand of 1 with probability of 0.5 so transitioning to 0 at a cost perhaps of $\$0$ or of $\$5$ (less profit); (c) demand of 2 with probability of 0.3 so transitioning to 0 at a cost perhaps of $\$150$ or of $155 (less profit).
Consider the states $0,1,2$ to be the available (columns) and $0,1,2$ the possible fridges to order at the first of the month (row), and having $1$ in stock and ordering $0$ then the matrix is
$$P=\begin{pmatrix} &&\\ .5\ or\ .3&.2&0\\ && \end{pmatrix}$$
And the matrix of costs
$$C=\begin{pmatrix} &&&\\ \$0\ or\ \$5\ OR\ \$150\ or\ \$155&\$5&\$0\\ &&& \end{pmatrix}$$
As you can notice on the first (and second) matrix I don't know what to place in the entry $(1,0)$ since both probabilities (costs) end up in the $0$ state.
How can I solve this problem?
Notice also that I need it in matrix form to solve b)
b) To find the expected cost of inventory, per month, in function of the state of the system and the alternative of the decision I think I should use $$c_{ij}=\displaystyle\sum_{j=0}^Mr_{ij}(k)p_{ij}(k)$$ where $k$ is the considered decision, $r_{ij}=$expected cost by choosing $k$ decision in the state $i$ and to transition to state $j$ and and $M$ is the total of states.
c)I think I can find the solution using Dynamic programming with $N=3$ on the formula $$f_n(i)=\max_k\{\sum_{j=1}^{m}p_{ij}^k[r_{ij}^k+f_{n+1}(j)]\},n=1,2,...,N$$ where $f_{N+1}(j)=0$ for all $j$
Am I correct so far?
Help me please

a) The matrix of transitions should be
$P_1=\begin{pmatrix} .3&.5&.2\\ .3&.5&.2\\ 1&0&0 \end{pmatrix}$
$P_2=\begin{pmatrix} 1&0&0\\ .5&.2&0\\ .3&.5&.2 \end{pmatrix}$
for $k=1$ (buy fridges) and $k=2$ (don´t buy fridges), respectively
and the matrix of costs
$R_1=\begin{pmatrix} 100+150&100+5&100+5+5\\ 100+150&100+5&100+5+5\\ 150&5&5+5 \end{pmatrix}$
$R_2=\begin{pmatrix} 150&0&0\\ 150&5&0\\ 150&5&5+5 \end{pmatrix}$
The rows and columns represent states $0,1,2$ and they read as I begin with 0 or 1 or 2 fridges and I end up with 0 or 1 or 2 fridges, respectively.
b) It´s correct as you mentioned
c) correct as well