Hello I have a question about reinforcement learning. I was watching RL course by David Silver, and in lecture 6: Value Function Approximation, he says that in reinforcement learning, the data you get can be dependent, like non-iid data. However to my knowledge, we use Markov state to represent the state of agent in reinforcement learning, and a characteristic of Markov state is each state is independent, we don't need to know the state before previous state to make decision, then why is he saying that data can be dependent? Can someone clarify it for me? Thanks
2026-04-06 20:39:16.1775507956
dependent data in Reinforcement learning
65 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in MACHINE-LEARNING
- KL divergence between two multivariate Bernoulli distribution
- Can someone explain the calculus within this gradient descent function?
- Gaussian Processes Regression with multiple input frequencies
- Kernel functions for vectors in discrete spaces
- Estimate $P(A_1|A_2 \cup A_3 \cup A_4...)$, given $P(A_i|A_j)$
- Relationship between Training Neural Networks and Calculus of Variations
- How does maximum a posteriori estimation (MAP) differs from maximum likelihood estimation (MLE)
- To find the new weights of an error function by minimizing it
- How to calculate Vapnik-Chervonenkis dimension?
- maximize a posteriori
Related Questions in MARKOV-PROCESS
- Definition of a Markov process in continuous state space
- What is the name of the operation where a sequence of RV's form the parameters for the subsequent one?
- Given a probability $p$, what is the upper bound of how many columns in a row-stochastic matrix exceed $p$?
- Infinitesimal generator of $3$-dimensional Stochastic differential equation
- Controlled Markov process - proper notation and set up
- Easy way to determine the stationary distribution for Markov chain?
- Why cant any 3 events admit Markov Property?
- Absorbing Markov chain and almost sure convergence
- Transition probabilities for many-states Markov model
- How to derive a diffusion tensor and stationary states given a Markov process transition matrix?
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
Suppose a set of states $\mathcal{S}$, a set of actions $\mathcal{A}$ and a reward function $\mathcal{R}:\mathcal{S}\times\mathcal{A}\rightarrow\mathbb{R}$. We assume time to start at $t=0$ and then progressing as $t=0,1,2,\ldots$. Furthermore, the dynamics of the environment are assumed to be governed by a transition function $P:\mathcal{S}\times\mathcal{A}\rightarrow\mathcal{S}$, such that $\rm{Pr}(s_{t+1}=s''|s_t=s',a_t=a')=P(s',a')$. Next, let us introduce a decision mapping $\pi:\mathcal{S}\times\mathcal{A}\rightarrow[0,1]$, where $\pi(s,a)$ is to be understood as the probability to choose action $a$ when being in state $s$. Obviously, we require $\sum\limits_{a\in\mathcal{A}}\pi(s,a)=1\ \forall s\in\mathcal{S}$. Now, let us define $\rho:\mathcal{S}\rightarrow[0,1]$ as the starting state distribution, that is $\rm{Pr}(s_0=s)=\rho(s)$. Crucially, let us consider policies $\pi$ for which there exists a distribution $\eta_{\pi}:\mathcal{S}\rightarrow[0,1]$ such that $$\lim_{t\rightarrow\infty}\rm{Pr}(s_t=s|s_0\sim\rho)=\eta_{\pi}(s).$$ A reasonable goal could be to find a such a policy $\pi$ which maximizes the quantity $$\mathbb{E}_{s\sim\eta_{\pi}}[\mathcal{R}(s,\pi(s))].$$ When training a reinforcement learning agent, we usually sample $s_0$ using $\rho$ and then obtain a rollout by applying $\pi$ to sample actions. This is also known as Monte Carlo simulation. Suppose now that we were to sample n-step trajectories using Monte Carlo simulation. This can be seen as a distribution on $\underbrace{\mathcal{S}\times\ldots\times\mathcal{S}}_{n\ \rm{times}}$. Most likely, this distribution would, however, be very different compared to sampling $n$ times from $\eta_{\pi}$. Image for example a robot driving in a circle at constant velocity. When sampling from $\eta_{\pi}$, the robot could be anywhere on the circle. However, when considering a sample from a rollout, you have a pretty good idea about the state in the next sample (next timestep).
I.i.d. (independent and indentically distributed) itself is a bit of a loose term, since it's not clear which distribution is meant. However, consider for example a 2D game, where a humanoid actor runs from left to right and has to overcome obstacles in order not to die. Assume that there is a set of possible types of obstacles (spears coming out of the ground, stones falling from the sky, etc.). If we were to use a policy gradient method to train our agent, it is not hard to see that using single rollouts might pose a problem. While a certain policy (which never dies) might have a certain probability to encounter a certain obstacle (this probability could be derived from $\eta_{\pi}$), for a single $n$-step rollout it could well be possible to include only a single type of obstacle. The data not being i.i.d. sampled from $\eta_{\pi}$ is a problem since the gradient update will now be biased towards a single obstacle.