Why is the gradient of this expectation intractable?

433 Views Asked by At

I am learning policy gradient from slides of Stanford CS231 reinforcement learning

\begin{align} \tau &= (s_0, a_0, r_0, s_1, a_1, r_1, ...) \\ J(\theta)&=\mathbb{E}_\tau [r(\tau)] \\ &=\int_\tau r(\tau) p(\tau;\theta)d\tau \\ \nabla_\theta J(\theta) &= \int_\tau r(\tau)\nabla_\theta p(\tau;\theta)d\tau \end{align}

Can anyone tell me why the last integral is intractable?

2

There are 2 best solutions below

0
On

It is not the integral itself not being mathematically solvable or something, you have to understand it w.r.t. context (i.e. Monte Carlo Simulation).

If i'm recalling it right, this is the math basis of Reinforce Algorithm, in which case you want to maximize the the object function through gradient update with Monte Carlo Simulation -- however, here comes the problem, in the last integral, there is no p term readily available s.t. you can write it into expectation form, so the missing piece here is only the p term, but you can do some math and extract such p term in the integral and transform it into some expectation w.r.t. policy distribution.

After you have the nice expectation form of gradient for update in Monte Carlo setting, you are good to go.

0
On

You said about the slide below from CS231n? (http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture14.pdf)

enter image description here

Why is the gradient of this expectation intractable?

This is because (in most case) the dimension is too high.

If you calculate the integral by MC, you have to uniformly sample from the high dimensional space. In most case, it's intractable.