Why not directly obtain the solution from the equality constraint in this convex optimization problem?

28 Views Asked by Bumbble Comm At 09 Apr 2026 - 12:24

On page 3 of Generative Adversarial Imitation Learning, Ho & Ermon derived the following Lagrangian.

$$\arg\max_{c \in \mathbb R^{\mathcal S\times\mathcal A}}\min_{\rho \in \mathcal D}\bar{L}(\rho,c) = -\bar {H}(\rho)-\psi(c)+\sum_{s,a}(\rho(s,a)-\rho_{E}(s,a))c(s,a)$$

where

$s$ denotes state
$a$ denotes action
$c$ is the cost function
$\rho$ and $\rho_E$ are the state-action visitation distribution associated to policy $\pi_\rho$ and $\pi_E$ respectively
$\psi$ is a regularization function (which we assume is a constant and omitted in the following discussion)
$\bar H$ is the entropy

Then, on page 4, they derive

I'm confused about the explanation in the second paragraph, which explain why $\rho(s,a) = \rho_E (s,a)$. But why can't we directly derive it from the constraint? After all, the dual has a solution when the constraint is satisfied and the convexity of $-\bar H$ ensures that $\rho(s,a) = \rho_E(s,a)$ is optimum for the primal.

Original Q&A

Why not directly obtain the solution from the equality constraint in this convex optimization problem?

Related Questions in CONVEX-ANALYSIS

Related Questions in CONVEX-OPTIMIZATION

Related Questions in MACHINE-LEARNING

Related Questions in LAGRANGE-MULTIPLIER

Trending Questions

Popular # Hahtags

Popular Questions