Why not directly obtain the solution from the equality constraint in this convex optimization problem?

28 Views Asked by At

On page 3 of Generative Adversarial Imitation Learning, Ho & Ermon derived the following Lagrangian.

$$\arg\max_{c \in \mathbb R^{\mathcal S\times\mathcal A}}\min_{\rho \in \mathcal D}\bar{L}(\rho,c) = -\bar {H}(\rho)-\psi(c)+\sum_{s,a}(\rho(s,a)-\rho_{E}(s,a))c(s,a)$$

where

  • $s$ denotes state

  • $a$ denotes action

  • $c$ is the cost function

  • $\rho$ and $\rho_E$ are the state-action visitation distribution associated to policy $\pi_\rho$ and $\pi_E$ respectively

  • $\psi$ is a regularization function (which we assume is a constant and omitted in the following discussion)

  • $\bar H$ is the entropy

Then, on page 4, they derive


print-screen from page 4


I'm confused about the explanation in the second paragraph, which explain why $\rho(s,a) = \rho_E (s,a)$. But why can't we directly derive it from the constraint? After all, the dual has a solution when the constraint is satisfied and the convexity of $-\bar H$ ensures that $\rho(s,a) = \rho_E(s,a)$ is optimum for the primal.