Invariant measure weighted transition kernel

12 Views Asked by At

For $p(s^{\prime} | s, a) = (1 - \alpha)\rho(s^{\prime}) + \alpha p(s^{\prime} | s, a)$ for some $\alpha \in (0, 1)$, clearly, $\tilde{p}(\cdot | s, a)$ is a probability for every $(s, a)$. For every arbitrary policy $\pi$, define the transition kernel: $\tilde{p}_{\pi}(s^{\prime}, a^{\prime} | s, a) $ $=$ $\pi(a^{\prime} | s^{\prime}) \tilde{p}(s^{\prime}|s,a)$

Denote by $\tilde{P}_{\pi}$

the transition matrix associated with the transition kernel $\tilde{p}_{\pi}$

Is it possible to give a value of $\alpha$ which ensures that $\mu_{\pi}$ is an invariant distribution of $\tilde{P_{\pi}}$ ?

I start with this formula :

$\mu(s^{\prime,a^{\prime}) = \sum_{s, a} \mu(s,a) \tilde{p}_{\pi}(s^{\prime}, a^{\prime} | s, a)$

So, I have only :

$\mu(s^{\prime,a^{\prime}) = \sum_{s, a} \mu(s,a) \alpha \rho(s^{\prime}) \pi(a^{\prime}|s^{\prime}) + \sum_{s, a} \mu(s,a) (1 - \alpha) p(s^{\prime}|(s,a)) \pi(a^{\prime}|s^{\prime}) $

Then I use that $\sum_{s,a} \mu(s,a) = 1$ for the first term on the right, but I am blocked after.

I do not know if I can write :

$\sum_{s, a} \mu(s,a) (1 - \alpha) p(s^{\prime}|(s,a)) \pi(a^{\prime}|s^{\prime})$ $=$ $(1 - \alpha)P_{\mu}(s^{\prime},a^{\prime}) \pi(a^{\prime} | s^{\prime}) $

Edit : I think if $\mu$ is an invariant measure of $P_{\pi}$ I can find a simpler equation :

$\mu(s^{\prime},a^{\prime}) = \alpha \rho(s^{\prime}) \pi(a^{\prime}|s^{\prime}) + (1 - \alpha) \mu(s^{\prime},\alpha^{\prime})$

Thanks.