I started studying RL recently using ashwin rao book "RL for finance".I'm studying Markov processes for now. At some point the author highlights the possibility of transforming by adding the time in the status rendering the process homogeneous. I must say i dont reall understand how is it making it non dependant on time. Ok the time is in the state but still the transitions are dependant on time.
Note that the arguments to P in the above specification are devoid of the time index t (hence, the term Time-Homogeneous which means “time-invariant”). Moreover, note that a Markov Process that is not time-homogeneous can be converted to a Time-Homogeneous Markov Process by augmenting all states with the time index t. This means if the original state space of a Markov Process that is not time-homogeneous is S, then the state space of the corresponding Time-Homogeneous Markov Process is Z≥0 × S (where Z≥0 denotes the domain of the time index). This is because each time step has its own unique set of (augmented) states, which means the entire set of states in Z≥0 × S can be covered by time-invariant transition probabilities, thus qualifying as a Time-Homogeneous Markov Process.
Can someone please explain it to me ? THanks
In the chapter 3 of “Foundations of Reinforcement Learning with Applications in Finance”, the author discussed the time-homogeneous Markov Process. Here is the quote.
Markov Process that is not time-homogeneous can be converted to a Time-Homogeneous Markov Process by augmenting all states with the time index $t$. This means if the original state space of a Markov Process that is not time-homogeneous is $\mathcal{S}$, then the state space of the corresponding Time-Homogeneous Markov Process is $\mathbb{Z}_{\geq 0} \times \mathcal{S}$ (where $\mathbb{Z}_{\geq 0}$ denotes the domain of the time index). This is because each time step has its own unique set of (augmented) states, which means the entire set of states in $\mathbb{Z}_{\geq 0} \times \mathcal{S}$ can be covered by time-invariant transition probabilities, thus qualifying as a Time-Homogeneous Markov Process. Therefore, henceforth, any time we say Markov Process, assume we are referring to a Discrete-Time, Time-Homogeneous Markov Process with a Countable State Space (unless explicitly specified otherwise), which in turn will be characterized by the transition probability function $\mathcal{P}$.
This means that we combine each original state with each possible time $t$, forming a new state space. For example, if the original state space is $\{A, B, C\}$, and the possible time $t$ is $\{1,2,3\}$, then the new state space is $\{(A, 1)$, $(A, 2),(A, 3),(B, 1),(B, 2),(B, 3),(C, 1),(C, 2),(C, 3)\}$. We have a probability transition matrix like this
Now, we augment all states with the time index t and obtain a new state space {(A, 1), (A, 2), (A, 3), (B, 1), (B, 2), (B, 3)}. This means that we combine each original state with each possible time t, forming a new state space. Then, we determine the new transition probability matrix based on the original transition probability matrix. Specifically, for any two new states (i, t) and (j, t+1), their transition probability is equal to the original transition probability from i to j at time t+1.
Thus, we transform a time-inhomogeneous Markov process into a time-homogeneous Markov process.