Invertibility of MDP policy evaluation matrix

187 Views Asked by Bumbble Comm At 29 Mar 2026 - 10:48

For infinite horizon MDP, to compute the value function for a policy $\pi$ we can use the matrix equation:

$V(S) = R + \gamma P*V(S)$ where $P$ is a transition probability matrix and $0<\gamma<1$ leading to the equation $(I-\gamma P)*V(S) = R$,

I want to know how we know $I-\gamma P$ is invertible?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 14 Apr 2020 - 1:49 BEST ANSWER

By a telescoping sum argument, we have for all positive integers $k$ $$(I-\gamma P)\sum_{i=0}^k \gamma^i P^i = I - \gamma^{k+1} P^{k+1}$$ where we define $\gamma^0P^0$ to be the identity matrix $I$. Taking a limit as $k\rightarrow\infty$ and using the fact that $0<\gamma<1$ and all entries of $P^{k+1}$ are always between 0 and 1 gives
$$(I-\gamma P)\sum_{i=0}^{\infty} \gamma^i P^i = I$$ and so $$(I-\gamma P)^{-1} = \sum_{i=0}^{\infty} \gamma^i P^i$$

Invertibility of MDP policy evaluation matrix

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in STOCHASTIC-PROCESSES

Related Questions in MARKOV-CHAINS

Related Questions in MARKOV-PROCESS

Related Questions in CONTROL-THEORY

Trending Questions

Popular # Hahtags

Popular Questions