Invertibility of MDP policy evaluation matrix

187 Views Asked by At

For infinite horizon MDP, to compute the value function for a policy $\pi$ we can use the matrix equation:

$V(S) = R + \gamma P*V(S)$ where $P$ is a transition probability matrix and $0<\gamma<1$ leading to the equation $(I-\gamma P)*V(S) = R$,

I want to know how we know $I-\gamma P$ is invertible?

1

There are 1 best solutions below

2
On BEST ANSWER

By a telescoping sum argument, we have for all positive integers $k$ $$(I-\gamma P)\sum_{i=0}^k \gamma^i P^i = I - \gamma^{k+1} P^{k+1}$$ where we define $\gamma^0P^0$ to be the identity matrix $I$. Taking a limit as $k\rightarrow\infty$ and using the fact that $0<\gamma<1$ and all entries of $P^{k+1}$ are always between 0 and 1 gives
$$(I-\gamma P)\sum_{i=0}^{\infty} \gamma^i P^i = I$$ and so $$(I-\gamma P)^{-1} = \sum_{i=0}^{\infty} \gamma^i P^i$$