value iteration update rule in MDP

27 Views Asked by At

Well, I'm new to MDP and I have a basic question about the formulation of the transition matrix (T). The way I would think about it when constructing T is an SxS matrix when S is the number of states. where each Tij is the probability of moving from state i to state j. I've been tasked with an exercise that puts another layer in the T matrix: The new layer is that there is a probability of success when moving from i to j, I quote from the exercise: "An agent moves in a 1D grid of 5 blocks, at any given grid location the agent can choose to either stay at the location or move to an adjacent grid location,with a probability of success in each case of 0.5" then it continues: "If the agent chooses to move (either left or right) at any of the inner grid locations, such an action is successful with probability 1/3 and with probability 2/3 it fails to move". and then adds: "if the agent chooses to move left at the leftmost (or right to the rightmost) grid location the last probabilities are 0.5." In the model answer, the T matrix was given by a 5x3x5. it added the probability of movement success and I can't see the intuition behind this formulation. I added an image of the T matrix it suggested highlighting the middle rows in each row block together would form what I would be thinking of when I tried to build it. can somebody demystify me the intuition behind it? Solution of the transition matrix