I have a doubt. if there are two different policies $\pi_1, \pi_2$ are the optimal policy in a reinforcement learning task, will the linear combination of the two policies $\alpha \pi_1 + \beta \pi_2, \alpha + \beta = 1$ be the optimal policy.
here I give a simple demo.
In a task, there are three states $S_0, S_1, S_2$, the action space contains two actiones $a_1, a_2$, $s_1, s_2$ are both terminate stae. an agent will start from $s_0$, it can choose $a_1$, then it will arrive $s_1$, and recive a reward of +1, it can also choose $a_2$, then it will arrive $s_2$, and recive a reward of +1.
In this simple demo task, we can first derive two different optimal policy $\pi_1$, $\pi_2$. where $\pi_1(a_1|s_0) = 1$, $\pi_2(a_1 | s_1) = 1$. the combination of $\pi_1$ and$\pi_2$ is $\pi: \pi(a_1|s_0) = \alpha, \pi(a_2|s_0) = \beta$. $\pi$ is the optimal policy, too. Beacuse any policy in this task is a optimal policy.