I am working through the book "Reinforcement Learning: An Introduction" by Sutton and Barto
I tried to implement the jacks car rental problem from chapter 4 which states this. (summarized)
- Jack has a car rental franchise.
- He makes \$10 per car rented.
- He has two locations.
- He can move up to 5 cars between locations which costs him \$2 per car
- He can have a max of 20 cars on each lot, surplus goes to HQ.
- Rental requests are poisson distributed $\lambda_{req1} = 3, \lambda_{req2} = 4$ for location 1 and 2 respectively
- Returns are available the next day, they are also poisson distriubted at $\lambda_{ret1} = 3, \lambda_{ret2} = 2$ for locations one and two.
I wrote an implementation myself that was converging on the wrong solution.
I was thinking since the expectation of a poisson distribution is the $\lambda$ value, I could calculate the expected profit by...
- Assuming that there are $\lambda_x$ returns from yesterday available today and added to inventory...
$$ \begin{aligned} loc1 = min(inventory_{1}, \lambda_{req1}) * 10 \\ loc2 = min(inventory_2, \lambda_{req2}) * 10 \\ \end{aligned} $$
I was banging my head against the wall trying to figure out why my solution wasn't converging and I looked up some other solutions online which go through an entirely different process for dealing with the expectation...
To understand why you cannot compute expected values that way, consider this more simple example:
The demand on any given day is 1 rental with probability 40%, 2 rentals with probability 20%, or 3 rentals with probability 40%. The expected number of rentals is 2.
Lets say that you always keep an inventory of 2. Then your formula would say that your expected profit is $min(2, 2) * 10 = 20$.
But, since you only have two cars to rent, in reality you rent 1 car with probability 40%, and 2 cars with probability 60% (on days with a demand of 2 or 3). So the expected number of rentals is actually 40%(1) + 60%(2) = 1.6, which yields an expected profit of $16.