why can't I just use expected values of this poisson distribution?

84 Views Asked by At

I am working through the book "Reinforcement Learning: An Introduction" by Sutton and Barto

I tried to implement the jacks car rental problem from chapter 4 which states this. (summarized)

  1. Jack has a car rental franchise.
  2. He makes \$10 per car rented.
  3. He has two locations.
  4. He can move up to 5 cars between locations which costs him \$2 per car
  5. He can have a max of 20 cars on each lot, surplus goes to HQ.
  6. Rental requests are poisson distributed $\lambda_{req1} = 3, \lambda_{req2} = 4$ for location 1 and 2 respectively
  7. Returns are available the next day, they are also poisson distriubted at $\lambda_{ret1} = 3, \lambda_{ret2} = 2$ for locations one and two.

I wrote an implementation myself that was converging on the wrong solution.

I was thinking since the expectation of a poisson distribution is the $\lambda$ value, I could calculate the expected profit by...

  1. Assuming that there are $\lambda_x$ returns from yesterday available today and added to inventory...

$$ \begin{aligned} loc1 = min(inventory_{1}, \lambda_{req1}) * 10 \\ loc2 = min(inventory_2, \lambda_{req2}) * 10 \\ \end{aligned} $$

I was banging my head against the wall trying to figure out why my solution wasn't converging and I looked up some other solutions online which go through an entirely different process for dealing with the expectation...

https://github.com/ShangtongZhang/reinforcement-learning-an-introduction/blob/master/chapter04/car_rental_synchronous.py#L129

https://medium.com/@jaems33/this-is-start-of-my-exploration-into-learning-about-reinforcement-learning-d505a68a2d6

1

There are 1 best solutions below

2
On BEST ANSWER

To understand why you cannot compute expected values that way, consider this more simple example:

The demand on any given day is 1 rental with probability 40%, 2 rentals with probability 20%, or 3 rentals with probability 40%. The expected number of rentals is 2.

Lets say that you always keep an inventory of 2. Then your formula would say that your expected profit is $min(2, 2) * 10 = 20$.

But, since you only have two cars to rent, in reality you rent 1 car with probability 40%, and 2 cars with probability 60% (on days with a demand of 2 or 3). So the expected number of rentals is actually 40%(1) + 60%(2) = 1.6, which yields an expected profit of $16.